How to properly archive your digital files. Yes, it can be a bitch.

Home / Uncategorized / How to properly archive your digital files. Yes, it can be a bitch.

Will you be able to open today’s Word docs in 20 years? Probably not, unless you take some necessary steps to give those digital files an extra-long shelf life.

 

The above graphic is not exactly on point, but it’s a memory from my time working on the slave societies digital archive which preserves the most extensive serial records for the history of Africans brought to the New World.  

 

Andrew Lambert

Legal Technology Reporter

A member of the Project Counsel Media team

 

17 July 2024 (Los Angeles, CA) – When I started covering the legal technology market about 10 years ago, I had the benefit of coming into the industry with degrees in computer science and computer engineering. Plus I had 5 years at Lockheed Martin working on its contract with the U.S. National Archives. Lockheed had built a system to help prevent the loss of most digital information.

So I knew computer systems and information systems, metadata-about- data, magnetic storage media, the file architecture of optical media, such as CD-Rs, etc., etc., etc.

When I began to focus specifically on legal technology and commercial law and corporate commercial systems, our boss and founder Greg Bufithis told me “start reading Craig Ball. He is the Master Sensei in this industry and had written a boatload of stuff. Not just forensics and how attorneys must build chronologies of key events in a case, but all aspects of the preservation of data”.

Having Craig Ball’s knowledge, plus my own work over the last 10 years, helped me navigate through a recent workshop held by the Society of American Archivists on eDiscovery and the preservation of “dead data”.

The problem obviously is that everything we build, whether it is a highway, tunnel, ship, airplane, or an AI program is designed using computers. And now we are struggling to prevent the loss of so much digital information created in the not-so-distant past.

As one of the presenters at the workshop noted, the original proposal for the World Wide Web, written by Tim Berners-Lee in 1989, is an important piece of internet history. It also can’t be opened on modern computers. John Graham-Cumming, a British software engineer and writer, attempted to open the Word document containing the proposal. Modern versions of Microsoft Word and Apple’s Pages both utterly failed to open the file, as he explained. The open-source word processor LibreOffice worked, albeit with messy formatting. Graham-Cumming ultimately found a PDF exported by CERN in 1998, which was the only way he was able to see the document as it existed in 1989.

Yes, it’s worrying that such an important piece of history, in such a common file format, could be almost completely lost to the passage of time and software updates. Anyone with a collection of old digital documents, photos, and videos might be wondering if the same thing will happen to their files, which is the sort of question digital archivists deal with all the time, it turns out. 

And it is a bit wacky. 20 years, in the digital realm, is “ancient”. And so 1000s of digital preservation services have sprung up across universities and commercial centers. Biggest demand? Recovering digital files from old computers and “old” storage mediums. But these labs can deal with almost any old media — floppy drives, CDs, older computers. They can remove files/data from those types of media and move it into preservation systems – while ensuring nothing is lost. Well, most of the time.

But getting the files off the drive is just the first step: then you have to open them, and leave them in a state that will be openable for decades to come. It’s a job that’s given many of these vendors/services a reason to think about strategies for keeping documents around as long as possible. 

And if you aren’t a professional archivist, what do you do? Suggestions:

Use Open Formats

The Tim Berners-Lee Word document mentioned at the top of this article could no longer be opened by Microsoft Word because Word software had changed over time. This is part of the challenge of archiving digital files. With physical stuff, the less you look at it the longer it lasts. But with digital stuff, we’re constantly fighting with obsoleteness. As the file moves through time, it’s losing information.

Updates to software like Microsoft Word mean that files that opened fine in the 1980s don’t open in the 2020s. Part of the problem: Microsoft, and only Microsoft, controls the file format, or even knows how it works. For this reason, presenters encouraged people to export files in an open file format — especially files they want to keep accessible for the long term.

For documents they recommended PDF/A, an open standard built on top of Adobe’s PDF format that includes everything the file needs in order to be opened, including the fonts used in the document. Microsoft Office, LibreOffice, and Adobe Acrobat all support exporting to PDF/A, meaning it’s relatively easy to make such a file. It was recommended that you archive any document that you want to keep to that format. PDF/A is an open standard.

This principle can apply to all of your documents. An Excel spreadsheet that opens fine now might not open in 20 years, but if you export that spreadsheet as a CSV file – which is essentially just a text document that other spreadsheet applications can understand – you can be sure that the file will be openable for decades to come.

Basically, if a file on your computer can only be opened by a specific piece of software, and that software is controlled by a single company, you should probably export it to an open format. It’s the only way to future-proof it.

Keep Photos and Videos Fresh

There’s a lot less to worry about when it comes to photos because we’ve been using the same file formats – JPEGs, PNGs, and TIFF files – for a long time. All of those file types are open formats you can open with a wide variety of software.

But that doesn’t mean all of your photos are future-proof. For one thing, if you tend to edit photos a lot, you might lose quality over time. JPEGs aren’t bad, it’s just that every time you edit and save, it will throw out a little bit more information. This effect is called generation loss. One or two edits won’t be detectable, but keep this in mind: if you’re going to edit it a lot, make a copy each time and edit the copy.

And keep in mind that some photos, especially RAW files from your camera, might be in a proprietary format. You have to be careful because many cameras default to their own version of RAW, which is highly proprietary. The suggestion is exporting such photos to an open format called Digital Negative (DNG), which is a safer format to use for preserving RAW files.

Videos also aren’t too much of a problem – most video files are encoded using open file formats at this point. But as with photos, do not attempt to edit a video file multiple times. Just edit a copy instead. That’s the great thing about digital: you can make a million copies. Our 🎥 team over at Luminative Media are masters at video/film preservation.

Back Up Absolutely Everything

Converting all of your critical files to open formats affords you absolutely zero benefits if those files are lost. That’s why backing up your files is even more important. Ideally you would have 3 copies of every file – and keep one of those copies off-site. We use the automatic backup services Backblaze and Crashplan as good tools for the job. We recommend combining Backblaze with a local backup. The specific backup system you use doesn’t matter nearly as much as having some kind of backup strategy in place.

 

Related Posts