Sarah Mason is Systems Archivist for Artefactual Systems Inc. and is based in the UK
World Digital Preservation Day is a great chance for the digital preservation community to celebrate achievements, to reach out to those outside and bring them into the community; it is also a chance to discuss what challenges we face and what opportunities are out there to help us move forward. So in the face of challenges that involve funding, staffing, and managerial or IT buy-in, how do we preserve the ever increasing volume and complexity of digital materials?
One way that we can face these kinds of challenges is by collaborating as part of a multidisciplinary team. Bringing together a diverse range of expertise, the team at Artefactual Systems is comprised of analysts (who represent domain specialists from archivists to librarians), developers, and systems administrators. Together, we can use of different viewpoints and specialisations to collaborate on digital preservation solutions--in different geographic locations! We understand that in this field no one person can know it all; sometimes it takes many voices to address issues in a balanced way.
Image by Freepik
An example recently came up in a weekly team meeting--there is even a GitHub issue now for it. The issue presented is that very large video files are produced when Archivematica normalizes for preservation. The rule contained in Archivematica’s Format Policy Registry (FPR) creates FFV1-encoded Matroska video files with LPCM audio as a preservation derivative for nearly all ingested video files. Compared to a highly compressed original, these files can be massive and lead to issues. What solutions do you think will balance preservation with both best practice, and storage and system concerns?
So what solution does a multidisciplinary team come up with? I took the issue back to an analyst, a developer, and a systems administrator to find out. These are just the highlights of their responses; but I highly recommend you read their full responses here.
----
Analyst: Ashley Blewer (AV Preservation Specialist), USA
I think the best possible choice a preservationist can make for the storage of their video assets is FFV1-encoded Matroska video files with LPCM audio because:
- the formats are open and can always be understood/deciphered/transcoded/accessed far into the future;
- it uses a losslessly compressed algorithm that saves on storage but can be reverted back to an uncompressed data stream;
- there are frame-level and section-level checksums so errors can be narrowed down to specific frames or parts of the file instead of the entire thing; and
- of the ability to embed and attach robust metadata into the file itself.
However, I think by the time files are considered for normalization by Archivematica the decision to normalize to a great video format is, frankly, too late. Normalization in Archivematica means storing the original file as well as a preservation copy. For video files that already have data loss (either from their digital birth or from being transcoded into a smaller file), it is too late to implement best practices because the data that has been lost cannot be retrieved. Encoding a lossy file into FFV1/LPCM/MKV will result in a very large but still poor-quality file.
For more technical details and preservation opinions, I wrote a blog post on why some video files get very big and pointers for what to do with them.
Developer: Douglas Cerna (Software Developer), El Salvador
My thought process addressing an issue like this:
- I start by finding out what part of the workflow represents the normalization microservice and the job I'm interested in. In this case "Normalize for preservation" or "Normalize for preservation and access"
- Then I check what client script is in charge of the job and what parameters are sent to it
- normalize_v1.0 seems like the one calling the FPR commands for preservation purposes
- I like to set breakpoints (pauses on execution) in the client script and run transfers through the UI and explore what the environment looks like when a video file is about to be “preserved”
- The breakpoint exploration would lead me to the command in the database I’m interested in: “Transcoding to mkv with ffmpeg”
- Once I notice ffmpeg is doing the work I’d ask Ashley who’s our A/V expert. She was a huge help when I replaced the old Flash video player in Atom.
- My first instinct would be to detect if the original file is in a specific format or compressed already and normalize conditionally, but this would probably have some maintenance burden for us.
We’d need to revise the formats we’d let pass periodically ("according to today’s 'best practices' this file format is considered safe")
System Administrator(s): Santi Rodríguez Collazo and Miguel Angel Medinilla Luque (Production Support Engineers, Dev Ops), Spain
Not normalizing video files might mean huge savings in disk space for most users, and budget restrictions are a higher risk to preservation efforts than we usually acknowledge, so a change like this will have a lot of benefits on digital preservation budgets. It will also have different ramifications on our day-to-day work. We'll need to update the tools we use for deploying Archivematica to reflect this kind of change, so users can configure their instances at will. It also means that we can save some disk space by re-ingesting those AIPs and normalizing them again to use different formats.
Aside from this change, there are other ways to minimize the risk of disk full issues when dealing with large video files. It’s a good idea to separate the system directories from Archivematica directories. Setting system requirements is an important pre-task before deploying the system. Always make sure that the CPU, memory and disk resources are enough for large ingests. Another important task is to optimize system disk, because it will be used a lot due to the large size of the files and the processing when recording the videos.
----
Together, their opinions and expertise are considered by our product owner, who is responsible for the roadmap of Archivematica. Her expertise is to consider it from the open-source project aspect and make sure our recommendations and opinions fit within the project and community needs.
----
Product owner: Sarah Romkey (Archivematica Program Manager), Canada
From the perspective of Archivematica as an open-source project, it’s important that we examine problems rather than just accepting the “status quo.” Digital preservation as a practice, and Archivematica, are now old enough that thinking has changed; what made sense as a best practice a decade ago has modern day implications that we must consider.
As product owner, I try to approach problems looking through two lenses:
- The Archivematica community: is the proposed change one that would make sense to most of our community members? Do they need to weigh in first?
- The Archivematica release cycle: what resources do we need to implement this; how does it fit amongst other priorities and where are we in the release cycle (new changes too close to the release date upends a lot of our release work!)
The way Archivematica implements preservation planning (through the highly configurable FPR) makes it easy for users to turn rules on/off, alter commands, etc. This is helpful, but obfuscates the work involved in the back-end to making a change--turning off a rule is easy, but to make it “stick” for future releases developers have to write a migration and analysts need to do testing to make sure it works as expected.
On this particular issue we’re hoping to get some feedback from the community that will help guide the decision, because while it is a change that is “un-doable” by the user, we don’t want to catch users off guard.
----
Many people may have considered this question before and—figuring there was a good reason that they were unaware of—not asked it aloud. We try to make space for any questions and open the discussion to everyone, and this example has shown that bringing together different viewpoints allows us to change and improve practices. In my opinion, a collaborative environment where all questions are welcome, where expertise across a variety of areas is highly valued, and where the development of staff is considered a core responsibility, are important tenets of a successful digital preservation team.