Adam Harwood is Research Data and Digital Preservation Technologist at the University of Sussex
For two and a half years we have been talking about doing digital preservation here in Special Collections at the University of Sussex. We've also been convincing various people that we should be doing digital preservation. We've been holding meetings, giving presentations, obtaining resources, evaluating products, reading blogs, figuring out integrations, working through possible workflows, attending conferences and workshops, collaborating with researchers and trying to figure out a forensics process. And learning - we've been doing a hell of a lot of learning. Two and a half years of being a digital preservation professional and I've not done a single bit of any actual digital preservation. Sure, I've played with a few pieces of software, calculated checksums, generated METS files and BagIt folders, but only in the name of learning.
After all this background work, it has finally come to pass that we are now doing digital preservation. We started a pilot of Arkivum Perpetua in August and we're currently putting it through its paces. This change of events is a bit like how a new start up business might go - put together a business plan for what you want to do, secure funding, employ people, move into premises etc... you might be doing this for months until you finally get down to the business that your start up is all about.
I'm using a whole new skill set now I’ve got started and for once I feel like a digital archivist; the start up business is open for business.
The pilot is for a year. Advocating the need for digital preservation to key stakeholders and finding more use cases for the system is a key project goal. I suspect that this kind of advocacy work will never end.
Working so closely with a digital preservation system has been less about the functionality of the system and the preservation processes (although this is obviously very important) and more about focussing on our records and what this system has allowed us to learn about them. What kind of file formats do we have? What normalisation processes are we going to carry out on them? And perhaps most importantly, how are we going to organise our digital archive? An AIP in the Perpetua system (which is an integration of a few open source systems including Archivematica and AtoM) doesn't look like it does in our own university shared drives. Archivematica applies a unique identifier to the SIP and every digital object, and this is manifested in the folder structure of the AIP. Each AIP is stored separately so what happens when we upload multiple SIPS for one collection? Does it matter how our AIPS are organised as long as we can search and find?
Something that I find a little un-nerving is that most digital preservation actions can be fully automated. The archivist need only upload a digital archive to a folder and it will end up in the digital storage and be available for cataloguing in AtoM. No worrying about format identification, normalisation, virus checking and creating an AIP - this is all done automatically. Is this a good thing? The archive world has been crying out for a black box to deal with the digital preservation problem for a long time and now we can just feed our records into a machine that will periodically perform checksums and monitor file formats. Will this make the archivist lazy - only having to intervene when prompted?
My feeling is that once we have set up rules in Archivematica for how certain file formats should be treated and what microservices we want it to run (Archivematica is really a compilation of separate open source tools that it can run in sequence) then we can automate the process and only check in once in a while for maintenance. This doesn't mean that our job is done and that I personally won't need to do digital preservation anymore. There will always be new file formats, and old file formats that we don't know what to do with. There will always be a microservice that we will have to devise a process for. It strikes me that the work of a digital archivist also encompasses the role of digital conservator - understanding how to care for the objects in the digital archive and having to do some conservation work once in a while, like overseeing a potentially problematic file format migration.
Then there will always be advocacy, horizon scanning, learning new techniques, securing resources, going to workshops and finding more users for the digital archive that we've built. We could possibly make it available to Sussex researchers or maybe even local community archives.
That kind of work is doing digital preservation as well.