I spend a lot of time worrying about the data. No data in particular (I’m not a sysadmin, thank goodness) but the totality of human knowledge. Fritz and I have spent lots of time arguing about the best solution to this problem (while, of course, not actually doing anything about it) — I’ve always been annoyed at organizations like ‘Long Now‘ who are preparing for a coming stone age rather than basing their preservation techniques on existing and future technology. I think that’s because I want the data preserved for my civilization, not so much for the coming race of murderous turtles or whatever it is that’s next in line.

So, while lost in the Presidio yesterday I stopped by the Internet Archive to see what they were up to. I had the vague idea that I would get to ‘see’ the wayback machine, but of course it’s not actually housed in their offices but rather in some (or maybe several) anonymous data centers elswhere in the city. Instead, I met Casey Nelson who dropped everything (including his lunch) to chat with me despite the fact that I arrived unanounced and outside their visiting hours.

office-building-sign.jpg

The Presidio is an old military base, and the buildings are subject to historical preservation. So all of these crazy high-tech foundations are housed in 150-year-old officer housing.

office-building.JPG

I’m excited to learn that the Internet Archive has expanded its mission from backing up the internet to backing up everything. It turns out that, compared to the internet, all the rest of human knowledge is trivially small. So they’ve set up various interfaces for people to upload and catalog data. Most excitingly, they’ve been shipping scanning stations (like the one below) to libraries all over the US. Libraries are paying the Archive to store backups of their pre-Steamboat-Willie books, and the Archive is in turn making them available online to everyone.

scanning-station.JPG

They’re also planning to build book-binding machines and station them throughout the US as well. At which point anyone will be able to print themselves a copy of any book, at any time, for a nominal charge.
printing-station.JPG

Unlike the Gutenberg Project, they are storing images of the book’s pages as well as the content. So the online books are searchable, but things like fonts, illustrations, illuminations and such are all preserved and reproducable as well.

The data goes onto racks called ‘petaboxes’ like this one:

petabox.JPG

(A petabox does not, in fact, contain an entire petabyte of data. 50 of them do, and they have a lot more than 50. )

The data is mirrored in several places so that it won’t all be lost if San Francisco sinks into the Pacific. So, there you have it — the sum total of human knowledge (excepting all of 20th century literature, which the Disney corporation has ensured will be lost to history), backed up. Someone’s going to have a lot of papercuts to show for it.

This entry was posted in Uncategorized. Bookmark the permalink.