Rachel MacGregor is Digital Preservation Officer at the Modern Records Centre, University of Warwick.
This blog is a version of the presentation I gave at the DPC Briefing Day on Preservation Planning and Technology Watch in February 2021.
One of the most exciting (or is that offputting?) things about digital preservation is that things don’t stay still – new formats and software are appearing all the time. The sheer variety of formats out there and which are continuing to be developed can make the task of “keeping up” with technological advances seem overwhelming. There’s a massive temptation to leave the problem to someone else – either the provider of your preservation software or maybe to your successors.
DO NOT GIVE IN TO THAT TEMPTATION!
Why? Because this is not responsible approach – it’s not digital preservation at all. When we advocate for digital preservation either within our organisations or to the wider world we talk about how it isn’t just “backing up”. It’s precisely this kind of work that we mean: keeping an eye on software developments and file formats. It’s an enormous topic so where do we even start? This is the point where you might definitely feel like giving up!
Make a start by getting to know your collections: find out what you have. I guarantee you there will be things in there that you weren’t expecting or didn’t know about. If you are lucky you will have a preservation system with reporting functionality but it’s unlikely you have ingested (and identified) everything you have.
How you focus on your priorities will depend on your set up and should be linked to your collecting policy or organisational aims. Here at the University of Warwick we collect “archives relating to modern British social, political and economic history, with special concentration on the national history of industrial relations, industrial politics and labour history”. We are also maintain our institutional archive. What this boils down to is that we have a large number of office files – lots of Microsoft formats especially Word but also Excel, PowerPoint, Publisher and Access and a deluge of PDFs.
HEIC - rhymes with leek...
What 2020 brought for us was contemporary collecting in the form of a couple of projects relating to Covid 19, both of which included submissions from the public. It’s in this context that I came across a HEIC file. I’ve come across this format before, around the time it was launched when I uploaded an image from my iPhone and then couldn’t get it to open on my Windows desktop pc. I mentally filed it under “one to go back to” and thought no more about it. However recently working on a Covid collecting project one of the submissions was in HEIC format which I guess is to be expected because a lot of the submissions were created using phones.
On the plus side DROID does identify my file as being a HEIF format (of which HEIC is a subtype) although the entry is still very much work in progress. It doesn’t specify the different versions of HEIF files and I don’t know what the implications of this are especially for long term rendering. So I turned to my other favoured go to source on file formats: the Library of Congress Sustainability of File Formats pages. These are quite technical and there are parts which I can’t follow that well. That’s where the temptation to leave these issues to a nebulous “someone else” starts to creep in. “Those folk at the Library of Congress are doing a great job”, I think “I’ll leave them to deal with this”. My main focus was on what the Library of Congress had to say about whether it was a preferred format for preservation. In this case it definitely isn’t, hardly surprising given its recent arrival on the scene. So should I leave it as it is? Maybe – it doesn’t render well still, which means we might wish to normalize it. However normalization to a jpeg seems ridiculous given that is being touted as replacement for that format. The Library of Congress say “The Library of Congress has a handful of images in the HEIC format (a subtype of the HEIF format) in its digital storage system. These are likely to have been acquired in the course of archiving websites.” So who is going to come across this format in their collections? Most likely organisations who engage with community/contemporary collecting which isn’t typically us but it’s one to watch (this is how we obtained these). It occurs to me that as part of a community effort it’s definitely up to publicly funded institutions like ours to be putting time and effort that others can ill afford into increasing our understanding of formats, sharing knowledge about it and trying to keep one step ahead. Perhaps you can contribute more towards the Library of Congress Format Description pages. If you are feeling even more enthusiastic and have access to a suitable corpus of files you might even be able to improve the PRONOM entry.
My message is – do not be passive and assume that someone else is dealing with it. If nothing else living through a pandemic means we are reminded about the importance of individual contributions to a greater good. If each of us who can stays at home, the smaller the chance of virus transmission. Each small act contributes to the greater good and you can help build up a community pool of knowledge that really benefits everyone.