In 1986, to mark the 900th anniversary of Domesday Book, the BBC created a computer-based, multimedia version of the historical record, storing its data on two fancy interactive video discs at a cost of approximately £2.5 million. But when was the last time you used a video disc? By the late 1990s, computers could no longer read the outdated format. Though the original Domesday Book had survived for nearly a millennium, its modern version barely made it a decade and a half.
Though a team of heroes at Leeds University was able to resurrect the lost data in 2002, the Domesday Project is a great case study the problems of digital preservation. How could Britain’s earliest public record so easily outlive its digital facsimile? And how can you safeguard your own digital information against such an expensive, catastrophic (and mortifying) event?
If you think about the amount of information from our daily lives that is captured in digital form—all those emails, endless tweets, zillions of web pages—the sheer volume is astounding. It’s far too much data for most organizations to store in physical form. But that means much of it is at risk of being lost in the endless churn of digital updates. Want to see what your high school classmate ate for breakfast yesterday? No problem! What about what the internet looked like in 1994? Nope.
In 2015, internet pioneer and then-Google exec Vint Cerf warned that if we don’t find a way to preserve our digital data in the long term, the 21st century could become an “internet black hole.”
“Future generations will wonder about us, but they will have very great difficulty knowing about us,” Cerf said.
What is true for us is also true for corporate entities. While everyday citizens are at risk of losing cherished memories, organizations face corporate amnesia: a break in the continuity of organizational memory, in the very legacy of the place itself. (Head explodes.) Big challenges that most organizations face are the very reasons why the digital black hole exists in the first place: how much stuff we have (data volume), how to know what we have (data formats and metadata), and how we figure out what we’ve lost (data gaps).
Data Volume
It doesn’t matter what anything is if you don’t have room to store it. Think about the last time you audited your Downloads folder to clear space. How did you tackle that? Did you sort by file size or age, and did you use the same criteria every time to choose what to keep and what to delete? Did you transfer some files into cloud storage, and if so, which ones? If you needed to find one of those files again, could you? It’s overwhelming to think about and, frankly, exhausting.
The same questions that you ask yourself when you need to make room on your hard drive also apply to digital preservation on an organizational scale. In 2007, researchers estimated that 94% of our memory is stored in digital form. Since then, the cost and complexity associated with long-term preservation have only increased as more and more sources of data have emerged.
One example of how hard it can be to digitally store and structure an avalanche of data is the U.S. Library of Congress’ ill-fated Twitter Archive.
Launched in 2010, the project aimed to archive every single public post on Twitter since the platform’s launch in 2006. Despite having more than 200 years’ worth of wisdom and resources at its disposal, the library encountered challenges when it came to making that data available to researchers: “It is clear that technology to allow for scholarship access to large data sets is lagging behind technology for creating and distributing such data,” it said.
In short, it’s much easier to make a post than it is to archive one. There’s also the issue of volume: Between 2007 and 2017, the number of tweets posted each day grew from 5,000 to 500 million. (500 million. Each day.) As of 2018, the Library of Congress now only acquires tweets “on a selective basis.”
Besides sheer quantity, another predominant issue in long-term digital preservation is the amount of storage occupied by duplicate files saved in multiple places. As anyone who’s ever worked on a project with multiple versions can tell you, it’s hard enough to keep track of which file is the most recent draft when you’re actively working on it, let alone years or decades in the future. Which brings us to metadata.
Digital Preservation and Metadata
At only about 25 years old, digital preservation is a new-ish field. The American Library Association only developed definitions of digital preservation in 2007.
That means there aren’t a lot of precedents when it comes to best practices and tested strategies to ensure the longevity of digital formats. Combine that with the fact that digital materials tend to have shorter lifespans than their analog equivalents—newer generations of software phase out support for older formats, whereas human hands and eyeballs work as well as ever on an old physical blueprint or hardcover book—and it’s clear that digital preservation requires more active intervention throughout a material’s life cycle, starting at a much earlier stage.
For example, it’s easier to apply thorough and accurate metadata to digital content from the outset than to go back and add it after the fact.
Metadata is one of the most important and complex aspects of digital preservation. The supporting data associated with a digital file makes it discoverable, serves as a timeline and road map, verifies its authenticity, and helps clarify its context. Bad or lacking metadata is probably the most common and pervasive culprit in banishing digital data to the “digital black hole.” Without sufficient metadata, long-term preservation of digital material may not be beneficial—or even possible at all.
Digital preservation isn’t just a matter of acquiring material. You also need a strategy to provide ongoing access to that material, continually assess its integrity, and ensure that the software and hardware necessary to access and read it remain available and operational. Essentially, how you’ll find it when you need it.
It might seem counterintuitive that a process involving digital files would require more frequent attention and action than one involving physical artifacts. But the real danger is in the assumption that digital preservation is an automated, hands-off process. It’s quite the opposite.
Closing the Gap
How do we know what’s at risk and what we’ve already lost to the internet black hole?
Data-gathering initiatives and digital forensics tools like BitCurator and Digital Record Object Identification (DROID) help archivists discover digital data and recover deleted, encrypted, or damaged file information. BitCurator provides tools and techniques to extract technical and preservation metadata as well as to package digital materials for archival storage. DROID is a software tool developed by The National Archives of the U.K. that profiles a wide array of file formats and indicates file versions, age, and size, and when they were last changed.
Once the material is discovered (or recovered), the next step is to ensure long-term preservation to make sure you don’t have to discover it all over again. Like digital forensics, long-term preservation requires a specialized set of tools. The Digital POWRR Project, sponsored by the Institute of Museum and Library Services, works to make digital preservation more accessible to a wider range of professionals. In 2013, the group compiled a useful tool grid, listing and comparing commercial and open-source digital preservation tools.
In addition to these tools, the Consultative Committee for Space Data Systems (CCSDS), a division of NASA, developed a Reference Model for Open Archival Information Systems (OAIS) that put into place a broad framework for the long-term preservation of digital information and assets. Digital archives and organizations interested in upgrading or implementing digital preservation systems use these standards to keep their projects on track.
Fill the Gaps
Since digital preservation is a constantly evolving field, it can feel like one long game of catch-up to keep things from slipping through the cracks. Fortunately, there are a bunch of strategies and tools you can use to help fill in the blanks and ensure continuity.
First and foremost, there’s the cavalry: archival specialists who are trained in the organization and processing of digital-born content. Digital forensics and digital preservation strategies rely on these experts for a reason!
Discovery campaigns can also help to fill in gaps where digital material is lost or lacking. Discovery platforms enable any user within an organization to nominate digital material to a secure collection. The organizational history squirreled away on desktops, personal drives and file cabinets across a company’s entire footprint is usually astonishing. Opening up the digital preservation process to collaboration is a great way to boost engagement and “discover” your organization’s authentic content.
Oral history projects are another way to unlock heritage and fill in gaps in the digital record. Gathering, preserving, and interpreting the voices and memories of significant personalities helps capture the culture and character of an organization by remembering important people, communities, milestones, victories, crises, and turning points. After all, if you’re looking for authentic, contemporaneous narratives to help flesh out your organization’s history, who better to ask than the folks who were there?
These methods help to capture “near history” that would otherwise slip over the event horizon into narrative oblivion. They’ll leverage important company history to inform future business objectives and, bonus, resonate with diverse stakeholder groups. And they will ensure that when some future someone wants to understand what your company was doing today, there’ll actually be a record of that to find. So what’s the best way to keep your organization’s data out of the digital black hole so it doesn’t end up like the ill-fated Domesday Book? Contact us today for help figuring that out.