Archaeologists looking back on our time will be confounded by two problems of our own making: the Cultural Black Hole and a Digital Black Hole.
We live in amazing times. Technology gives us pocket-sized computers which allow us to communicate with others thousands of miles away, as well as record events in audio and video. We can easily store and transfer these and other data in vast quantities and over large distances.
This is a transitional period, where physical media is no longer necessary for delivery of images, video, audio, or text, but still rely on physical media mostly for reasons of commerce and habit.
But just because we can do things does not necessarily mean we should or that these things are for the best.
These should be the best documented times in history. Just about everyone carries at least one camera on their person, and we are capable of recording the most minute and mundane details of our everyday lives. We have far better records of how people dress, speak, and behave now than of any other time.
Unfortunately, there is no guarantee that any of it will exist a century from now.
The methods humans used to store information have gotten progressively worse over time. The Rosetta Stone, has survived (at least a fragment of it has) for over two millennia. Made, used, and stored properly, tapestries and paper can survive centuries. Modern magnetic media has a life expectancy measured in years.
That is to say, the more complicated we make our storage technology, the more fragile it becomes.
My day job involves digital video encoding, and nearly every day I am reminded of the problems involved in data storage. There are several aspects to the problem.
Remember 8 Track tapes? Microcassettes? Minidiscs? Philips DCC? 1/4″ reel-to-reel? Do you still have a cassette tape player? LP player? Some of these formats never took off, and some are just obsolete. Format interoperability is the biggest enemy of data longevity.
It’s easy to pick on 8 track tape, but while most of us don’t own 8 track players, they are still available on eBay, so if you had an 8 track tape you wanted to play, there is a chance of making it work. That chance will diminish over time as the existing players break and are thrown away.
The 8 track format is less than 50 years old. What about cylinder phonograph recordings, a format that is 130 years old? I’m willing to bet you’ve never seen a player. They do turn up for sale from time to time, but they are expensive and rare.
Formats go obsolete, and if information is not transferred to newer a format before that point, the information is lost. And it doesn’t really matter what format you transfer it to – all will eventually suffer the same fate of obsolescence.
I’ve started an audio format metaphor here, but it applies to video and data formats as well.
MP3 files seem ubiquitous now, but so did Sun uLaw files at one point. If you got a ADPCM or ATRAC file, would you know what to do with it? iTunes probably won’t play it. Granted, provided you can find technical specs for the format, it is possible to write a program to decode it, but for most people, that is not an option. Format compatibility is another aspect of the problem.
Even if we can decode the file, the physical media on which it is stored may not be readable. Magnetic tape is particularly susceptible to the ravages of time and the elements. Spinning hard drives, being more complicated, suffer from additional problems.
We don’t really know the reliable life span of a hard drive because drive capacities have increased so quickly that most people replace drives before they fail. Any data that is not migrated off the older drives, however, would eventually be lost.
There is, of course, technology to help recover from failed hard disk drives, such as RAID redundancy. However, no technology is error-proof, so even Google, which relies heavily on hard disk drives, was concerned enough about their longevity to do a study.
It is estimated that over 90% of all new information produced in the world is being stored on magnetic media, most of it on hard disk drives. Despite their importance, there is relatively little published work on the failure patterns of disk drives, and the key factors that affect their lifetime. Most available data are either based on extrapolation from accelerated aging experiments or from modestly sized field studies. Moreover, larger population studies rarely have the infrastructure in place to collect health signals from components in operation, which is critical information for detailed failure analysis.
They found that failure rates were less dependent on temperature and use patterns than is generally expected, but that failure rates increase significantly (more than four-fold) after two years of life. They do not actually put a statistical life span on hard drives, and the number of drives that fail at 3-5 years is still small, but still a factor.
Two years is a very short time. One goal of long term data storage should be that one can put some data (say, your music or photo collections) into a storage device, turn it off, stick it in an environmentally controlled place, and expect to be able to retrieve it at any point in the distant future. Usage level proved to be less of a factor in failure rates than expected, so it may not be reasonable to expect the hard drive to turn back on and be readable. Add to this the fact that future computers may (and probably will) not support the interface (remember SCSI drives?), even if the drive is in perfect working condition.
An active plan of constant data migration is necessary to safeguard any important data. Infrequently-used hard drives must be periodically powered up and tested. All hard drives must have their data moved to newer drives eventually.
My earliest music was created on an old Power Computing PowerCenter 120 Macintosh clone, purchased in 1996 or 1997. I had an external SCSI CD-R drive, with which I backed up all my data. My CD masters were disk images, which I burned as a file to a data disk. Audio CDs do not make a particularly good master format, as CRC errors on perfectly new discs are common. Unfortunately, I chose to write the disk images to the .gi (Global Image) file, which no program I have today can open. I had to re-make my early CD masters from audio CD burns I had made back in the day, and I came across discs which simply would not read without significant errors.
How does one safeguard against data loss? Multiple copies, preferably in different places. Leo Laporte is fond of saying if it isn’t in at least two places, it doesn’t exist.
While I am the active custodian of my data, I feel confident that I can manage the problem. But what happens when I can no longer do so? I cannot really expect that my data will much outlive me. And the same goes for everyone else’s data – all their digital photos, videos, iTunes downloads – all will cease to be.
There are people thinking about this problem and trying to develop reliable long term digital storage solutions. One of the more promising technologies is a Kodak project called Laser Optical Tape Storage, or LOTS. LOTS is kind of a hybrid between tape and compact disc technologies, and implemented correctly, could be far more stable than either. This addresses the longevity problem, but not necessarily the interoperability or format issues.
We need to start thinking in terms of much longer time frames. We need to worry about what happens to our (important) data after the human race is gone. How will aliens learn about our time period? If their only reference is deteriorated magnetic tapes, they may not be able to determine anything about our current culture. They may figure that somehow we disappeared at the time of the last paper picture prints and LP records.
In order for this to happen, we need a storage medium and interface that does not go obsolete. LOTS is promising because it is a visible medium – with a magnifying glass, one could see the holes in the recording layer. There would be no guessing as to the data carrier as there would be with magnetic tape or hard disk drives.
We should not assume that it is known how to read the data, either. At the head of each archive storage device should be a guide to decoding the data. A sort of Rosetta stone that any reasonably intelligent being should be able to figure out. This way, the means to read the data may be manufactured if necessary.
Stuart Brand’s Long Now Foundation is naturally thinking about such things. When that 10,000 year clock chimes, they want to be able to use some very old files, I guess. Their approach is to create a file format definition repository which could be used to decode then-ancient files. There are not many details on the web site, but it appears they are focusing on the data and are perhaps missing the media. But it is a start, and with luck future archaeologists will have their work to thank for being able to peer into our Digital Black Hole.