Hrafn Malmquist

Hrafn Malmquist

Last updated on 29 July 2019

Disclaimer: I must state that the following blog-post is written in a personal capacity, airing opinions that are my own and are not intended to endorse a particular piece of software. They should not be considered official on behalf of my current employer, The University of Edinburgh.

Last month, in June 2019, I attended the fourteenth Open Repositories (OR) conference held in Hamburg, organised by Hamburg University. Hamburg is a beautiful city, and this coincided with the Hamburg University’s centenary.

It is one of the biggest conferences in the world of its kind and had a packed four day schedule. It was the first OR I attended and I delivered a presentation: “Automating OAIS compliant digital preservation using Archivematica and DSpace”. A bit more about that later. I saw many interesting talks, both from an ideological perspective as well as technical (I am a developer although I do have a background in library and information science). I’ll now proceed to tell you a bit about my experience at the conference.

DSpace

Like many I am excited about the next major DSpace release, version 7. Quite a large chunk of the OR program was dedicated to DSpace 7. On Monday there were 13 occurrences of the word DSpace in the program, on Tuesday there were 10, 14 on Wednesday and 2 on Thursday.

I should note that we use DSpace extensively at The University of Edinburgh. Version 7 promises exactly what I wish for in a repository, “a single, modern user interface and REST API and integrates current technological standards and best practices. This new UI combines with the existing core backend of DSpace 6, resulting in a lean, responsive, next-generation repository.” One could even say this is a blueprint for current generation repository, because using technologies such as Angular or compatible for the front-end and a REST API that adheres to the HATEOAS architecture or compatible is fast becoming industry standard.

No more dual front-end technologies, JSP and XML, that require cumbersome maintenance and added, unwanted complexity for developers. A complete decoupling of the front-end with a more predictable REST API will bring much more flexibility to developers. I am certain we will see interesting, exciting and hitherto unseen hybrid software products integrating with the DSpace backend.

I know the good people at Duraspace, Atmire and 4science, who lead its development, would welcome your contributions towards the development of DSpace 7 be it in the form of coding, writing documentation or testing. I am going to try to do my bit.

Longleaf

A presentation I found particularly interesting was about the Longleaf software project, currently in development by The University of North Carolina at Chapel Hill's University Libraries. Not many institutions have the capacity for in-house purpose built digital preservation software solutions. Having only done rudimentary Googling it looks like the University of North Carolina partnered with other nearby universities to deliver the benefits of economy of scale in the field of higher education publication back in 2006 and this is now branching out to related services, such as digital preservation.

Longleaf is a “a repository-agnostic command-line utility that enables users to configure and apply preservation processes such as monitoring and replication at the file level”. Its appeal lies in its simplicity, with a non-existent dependency list for Linux its very much back to basics.

In the spirit of open software and open knowledge, solutions like Longleaf for digital preservation is exactly what the world needs. Resisting the monopolies of higher education publication giants is, in my opinion, a general, civic duty.

Archivematica

The presentation I made was about another open-source digital preservation piece of software is Archivematica, developed by a Canadian company, Artefactual using the bounty business model. Archivematica packages files for preservation or dissemination using a suite of 3rd party tools, going by the OAIS reference model and utilizing industry standard technologies; METS, PREMIS and BagIt.

About five years ago the University of Michigan’s Bentley Historical Library received a $355,000 two-year grant from the Andrew W. Mellon Foundation to integrate ArchivesSpace, Archivematica, and DSpace (UMBHL project hereafter). You can read about the project in a very informative journal article, Bridging Technologies to Efficiently Arrange and Describe Digital Archives: the Bentley Historical Library’s ArchivesSpace-Archivematica-DSpace Workflow Integration Project. The Bentley Historical Library also has a great blog touching on various things preservation where they documented the process of contributing to Archivematica.

The UMBHL project was ambitious and introduced valuable functionality; the appraisal tab which allows for content appraisal (although I understand that is conventionally done manually using Windows Explorer) and integration with ArchivesSpace and DSpace. Getting these different pieces of software to play ball is an important step that should not be underestimated.

Because there was only a solitary stakeholder in the UMBHL project there are some customised properties of the new features that could be more generalised. DSpace as the transfer destination is only supported for the AIP (and not the DIP) and even then it is split into two separate packages, the content on the one hand and the generated metadata on the other. This process was designed to link with the University of Michigan’s institutional customised DSpace 5, Deep blue. Changes in REST API functionality between DSpace version 5 and 6, means it does not support DSpace version 6.

At The University of Edinburgh we wanted to extend the functionality already developed by the UMBHL project by allowing the DIP also to be deposited and to support the then current version of DSpace, version 6. We also made designating the destination DSpace collection more configurable, it can now be passed with the transfer in a metadata file (see documentation). This was included in the Archivematica 1.8 and Storage Service 0.13 release released in November 2018. In the imminent Archivematica 1.10 and Storage Service 0.15 release there will be a bugfix and improvements.

We’re still testing our Archivematica-DSpace-ArchivesSpace integration. We will run pilot cases on various content, such as the papers from the University of Edinburgh's Court meetings, high-quality versions of thousands of digitized theses (compressed versions available at the Edinburgh Research Archive) as well as content produced by The European Ethnological Research Centre. Ideally in the future, all relevant metadata will be embedded in the files themselves, enabling more automated preservation processes and relieving the need for “sidecar files”, but that’s still some way off.

What are we guarding against?

Let’s take a step back from all the jargon and think a bit about our mission. Memory institutions serve a dual purpose; to ensure the historical records, created by modern institutions, and whatever it is we call culture is preserved.

It seems to me that data loss is much more likely to occur because of human error rather than because of technological faults or either physical media or format obsolescence.The mantra lots of copies keep stuff safe applies and will continue to apply for the foreseeable future. (The US based National Digital Stewardship Alliance’s (NDSA) guidelines on assessing digital preservation status (which are currently under review), take into account medium format and respective geographical proximity of copies.) This is the same idea behind legal deposit of printed material. You need golden copies, at different geographical locations because the world is a dangerous place.

Consider the legal deposit system that evolved in Poland. Now Poland is wedged between the great European super-powers Germany and Russia (no Brexit dig here). The practise of legal deposit started in early modernity, but today, in addition to the two copies that the National Library of Poland and the Jagiellonian Library receive, there are 15 other libraries to receive legal deposits to be stored for no less than 50 years: Maria Curie-Skłodowska University Library, University of Łódź Library, Nicolaus Copernicus University Library, Adam Mickiewicz University Library, Warsaw University Library, University of Wrocław Library, Silesian Library, City of Warsaw Library, Pomeranian Library in Szczecin, University of Gdańsk Library, Catholic University of Lublin Library, University of Opole Library and Podlaskie Library in Białystok. (source Wikipedia)

Just three years ago the question: What is file format obsolescence and does it really exist? was being earnestly asked. In theory file format obsolescence is an issue. For instance, Jenny Mitcham has done interesting work on migrating from the obsolete Wordstar file format (see this blog) for the Marks and Gran Archive at the University of York’s Borthwick Institute for Archives. These proprietary formats can perhaps be seen as analogous to the phasing out of acidous paper used in publication, often associated with the American chemist and conservator William Barrow.

The late 17th-early 18th century Icelandic manuscript collector Árni Magnússon, generally credited with single-handedly ensuring the preservation of tons of unique Icelandic and Nordic medieval manuscripts, once made the philosophical observation that:

“As the world turns, some men introduce errors and circulate them, while others afterwards try to correct those same errors. This keeps the whole lot pre-occupied.”

My point here is that these main technical threats of media obsolescence, while not entirely irrelevant, do not seem to pose an immediate threat to long term digital preservation. The risk posed by human incompetence, material degradation and unforeseen disasters seems much more real. Someone once coined the phrase, “The personal is political”, well if so, then “The archival is political".


Scroll to top