NSA Collects as Much Data as Is Stored in the Entire Library of Congress Every Six Hours [???]

May 23rd, 2011

I try to avoid the black/grey/white propaganda narratives surrounding the alleged Bin Laden assassination. The whole thing is so absurd that it’s just impossible for me to take it seriously at all. However, in my routine monitoring of stories about the NSA, I came across the claim that the agency is processing an amount of data that’s equivalent to what’s stored in the Library of Congress every six hours.

Ok, so I tried to figure out how much data a Library-of-Congress-every-six-hours represents.

It turns out that the claim is completely meaningless! It tells us nothing about how much data the NSA is intercepting/archiving/processing because there are no digital versions of vast portions of the full Library of Congress collection.

Read this: How ‘Big’ Is the Library of Congress?

So it begs the question, just how “big” is the Library of Congress, in terms of our content, but especially if one tried to equate it to the digital realm?

I won’t go into any of the specific claims that are being made, but they’re easy to find out there in the ether, and suffice it to say that the Library would stand behind very few if any of them. There are certain things we can quantify, but far more that are purely speculative.

For instance, we can as of this moment say that the approximate amount of our collections that are digitized and freely and publicly available on the Internet is about 74 terabytes. We can also say that we have about 15.3 million digital items online.

Some may be tempted to extrapolate that those digital items represent a precise percentage of the nearly 142 million items in the Library’s physical collections, and then estimate some kind of digital corollary. But comparing digital and physical items is apples and oranges, at best. A simple example of that fallacy would be represented by a single photograph online depicting several physical objects.

Another source of digital estimates is likely based on the number of books and printed items in our collections, which is currently about 32 million. One could attempt to establish the average length of those items (pages, words, characters, etc.) and extrapolate the digital equivalent of those 32 million physical items.

Assuming one could do that with any degree of accuracy — and that’s a big assumption — it overlooks the fact that those 32 million books represent only about one-quarter of the entire physical collections. The rest are in the form of manuscripts, prints, photographs, maps, globes, moving images, sound recordings, sheet music, oral histories, etc. So how does that other three-quarters of the Library equate digitally? Can one automatically assume the digital resolution at which all maps or photographs, for instance, would be scanned? Those are major wildcards indeed.

And then there are our motion pictures, videos and sound recordings alone — around 6 million items stored at our new Packard Campus for Audio-Visual Conservation in Culpeper, Va. What is their digital equivalent? Most people who record television programs onto a computer or DVR know that a hard drive with hundreds of megabytes or even a terabyte or more can quickly fill up.

So, there you have it: A nonsensical claim, effortlessly woven into the tapestry of other nonsensical claims, to go with your morning coffee.

Via: The Baltimore Sun:

Parachini of RAND said the rule of thumb has been that every six hours, NSA collects an amount of information equivalent to the store of knowledge housed at the Library of Congress.

“The volume of data they’re pulling in is huge,” he said. “One criticism we might make of our [intelligence] community is that we’re collection-obsessed — we pull in everything — and we don’t spend enough time or money to try and understand what do we have and how can we act upon it.”

One Response to “NSA Collects as Much Data as Is Stored in the Entire Library of Congress Every Six Hours [???]”

  1. JWSmythe says:

    For whatever awful reason people feel the need to measure size with nonstandard measures is beyond me. The “size of a bus” could be a 1967 VW Microbus at 14′ x 5.7′ (L/W), to a GMC RTS (city bus) at 40′ x 8.5′ (L/W), to the Chinese “Red Dragon” bus, at 82′ long.

    At least with some of these measures, there is some sort of consistency. A bus is (usually) as large as when it was manufactured, or are in typical ranges (the size of a breadbox, grapefruit, cantaloupe, watermelons, or “that’s no moon, it’s a space station”)

    The Library of Congress though. That makes it tricky. It’s a library. Not only do things come and go, but they are spread between various archives, and they are constantly receiving new submissions.

    Luckily some of these folks who feel using arbitrary numbers give us some sort of conversion. One LoC (Library of Congress) is equal to 10TB. Unless you’re archive.org, then it’s 20TB. Or some other people who had said 1TB, when our hard drives were just breaking the 1GB limit. It seems that 1 LoC == 10TB is fairly standard.

    I’m not very impressed. We just tore down an entire row of cabinets full of arrays that was approx 30 TB. I replaced it with two desktop type servers that are 36 TB, happily running on my desk until we’re ready to put them into production. And back to the questionable measure, total storage is 3.6 LoC or 0.9 NSA/day. 🙂

    I refuse to take seriously any journalist who uses such units of measurement. It’s the journalists responsibility to ask for clarification. “So, how much information does the library of congress hold”, or “How does that convert into a standard unit of measurement?”. Even if they quote it, it should be clarified in the article.

    I’d be willing to bet that their claims on phone call tracking are somewhat or mostly bogus. From the article:

    “They’re listening for words, phrases, sentences that make no sense — ‘The angry red fox jumps over the moon at dawn.'” In addition to coded conversations, the computers also listen for obvious red flags like “bomb,” “plot” and “jihad.”

    I do use these phrases frequently on phone calls. I talk to my writers and editors (my side gig) about current events. Sure, there may have been a bomb threat, a plane crash, or a jihad. I was in a major international airport when there was a mall shooting a little while ago. I called one of my people to make sure it ran. So in an airport, talking about shootings and bombs.

    They must red flag every call I make. I’m always speaking in code too. Not to protect the information I’m conveying, but for the work I do. My day job is doing senior IT work. We have servers named by different conventions. Some were alphanumeric designations. Some were the names of cities, countries, colors, etc. Some are just arbitrarily short names for applications. A phone call may sound something like this:

    “John is on the project in Egypt”

    “Have him kill it quick. I need him on Libya fast, we have a priority 1 message for an issue there”.

    Were those assassination orders? Nope, just telling John to finish the project he’s working on, and fix a high priority trouble ticket for the server “Libya”.

    Since I haven’t been visited, there is no black van parked outside, and no signs of spooks anywhere, I’d be willing to bet my name has a big green sticker on the front that says “Harmless. Ignore him.” I know I’m not the only one who does this. If they investigated every supposed red flag, they’d have to hire half the people in the country to investigate the other half.

    Nope, this sounds like the 1960’s Echelon claims. The gov’t listens to everything, and knows what you had for breakfast. These sort of claims are great. They’ll convince stupid people that they’re safe (the gov’t will protect me). They convince stupid criminals not to do things (the gov’t will catch me). And that leaves people who think who realize they can’t monitor everyone all the time, and the smart criminals who will make every effort they can to elude traces. The Gov’t isn’t what portrayed on TV, just like crime scene investigators don’t have all the cool tech that CSI:*, NCIS, and countless other programs show. Still people believe the tech is out there. No, they can’t take 1px from a satellite image, and read the license plate from the wrong side of the car, just as much as the FSS can’t read an RFID chip in your pocket from Moscow (unless of course, you happen to be in Moscow).

Leave a Reply

You must be logged in to post a comment.