Last week I was happy to be able to tune into the Jisc sponsored DPC briefing day on Software Preservation ‘Insert Coin to Continue: A briefing day on software preservation’.
Did you know that DPC briefing days are recorded and available for DPC members to watch live? This is quite a task, especially when working with a range of different venues, but so important for our international members, and those institutions that do not have a large budget that can be spent on travel.
I would urge DPC members to put relevant DPC briefing days in their diaries regardless of whether they can attend in person. Tuning in to the livestream is *almost* as good as being there. I found it hugely beneficial to be able to watch the presentations, listen to the discussion and connect with other attendees via the Twitter hashtag (bring your own popcorn).
I don’t claim to be an expert in software preservation so it was helpful to start the day with a clear and comprehensive introduction to the problem by Neil Chue Hong of the Software Sustainability Institute and then hear about some of the great progress that is being made in this field.
Surprisingly, what I learned from the day was as much about how software preservation is similar to any other type of digital preservation rather than how it is different. That was a comfort in many ways. Software is complex and hard for many of us working in the field of digital preservation to understand, but yet, if we can think about it in terms that we already understand then half the battle is won!
Neil Chue Hong talked about the primary importance of understanding what the thing is that you are trying to preserve. This is of course the case with the preservation of any type of digital object – it is difficult to make preservation decisions until you know which elements of the item you would like to be able to provide future access to. In the case of software specifically, this could be the source code itself, the intent of the software, the experience of interacting with it, or an understanding of the technical environment within which a particular individual was working. Why you are trying to preserve the software is an important question to answer before you can start to consider how.
This theme was echoed by James Newman from Bath Spa University, specifically in the context of video games. He talked about a game’s need to be played and experienced. Preservation should not just enable a game to be played in the future, but should also capture how users interact with it now.
Neil Chue Hong talked about the rapidly evolving landscape of software. Like most areas of digital preservation, we are faced with a moving target. The way people develop and package software changes over time. Examples for research software specifically are the use of Docker (for containerisation) and Jupyter notebooks (which allow you to store your software code alongside visualisations and narrative text). As ever, the job of guardians of digital culture (such as ourselves) is to play catch up. We must continue to adjust our focus as the target and technology keeps moving.
It was mentioned by James Newman that software should be preserved with all its flaws. Software usually isn’t perfect - there are quirks and bugs and things that go wrong. Indeed, for some games there are entire cultures built around revealing some of those idiosyncrasies. James demonstrated how you could jump through walls in Super Mario Bros. and move into a ‘minus world’ of the game that really shouldn’t exist at all. Whilst enjoying a hugely entertaining presentation to prove this really is true, I reflected that this again is not a new thing for those of us working in digital preservation. Though in some contexts, we might try and ‘correct’ files that do not validate or appear broken, in other contexts, preserving an authentic version of a digital object ‘warts and all’ is the preferred approach.
Natasa Milic-Frayling of Intact Digital Ltd and UNESCO PERSIST spoke about ‘digital continuity’ rather than ‘digital preservation’. Digital continuity is all about continuing to enable use (which is of course also the main aim of digital preservation). Choice of language and communication is so important in advocacy for digital preservation, particularly when working across different sectors.
The fact that there is an urgent need to act now, came out strongly in more than one presentation. Roberto Di Cosmo from Software Heritage claimed we are at a turning point and in danger of losing our digital heritage. Klaus Rechert from the University of Freiburg almost (but not quite) wheeled out the #digitaldarkage klaxon and directed us to the Google Cemetery as an example of how rapidly software dies.
Copyright is not an entirely alien subject to most of us working in the field of digital preservation and of course is a key consideration when preserving software. Brandon Butler of the Software Preservation Network and University of Virginia gave a US perspective on copyright law and the stifling ‘permissions culture’ in which archivists and librarians feel unable to act without permission from the copyright holder. With software, as with many other complex objects that we manage, rights can be challenging to unpick, and can be particularly hard to trace once a product is no longer considered profitable.
Of course, preserving software is difficult. Klaus Rechert referred to software objects being blurry and difficult to define. There is often no CD box you can put on a shelf. It is a complex ecosystem. He argues that we may need more than just emulation and migration to handle software preservation, we may need to look for a new way of dealing with this problem.
It was however encouraging to see examples of some of the great work going on in this area, for example:
- Emulation as a Service, starting with work at the University of Freiburg, but now in collaboration with The Software Preservation Network and Yale University. And this reminds me of the brilliant blog post on this topic from Euan Cochrane.
- The hugely impressive Software Heritage archive (possibly the largest source code archive in the world).
- A paper from Roberto Di Cosmo on the important subject of identifiers for digital objects.
- A paper on game play preservation and the Game Inspector from James Newman.
- Software preservation services from Intact Digital.
- The work of the Software Preservation Network (which was mentioned several times over the course of the day).
I like to come away from events such as this with a real sense of how we can move forward in a practical sense. What I would say on the subject of software preservation is this:
- Software is complicated but in many ways similar to other digital content that we are used to managing over time.
- In the first instance it is important to collect software – we can’t preserve it in the future if we haven’t got a copy of it. So even if you don’t know too much about software preservation, don’t be put off collecting it (preferably with some documentation and metadata about what it does and the environment that it needs to run).
- As with any other type of digital content, we need to understand why we are preserving it and what exactly we are trying to preserve before we can consider how we might go about doing this.
- There is some really good work going on in this field already. Don’t reinvent the wheel, look at what the experts are doing and think about how they can help you.
I would urge you to have a look at the slides and the recordings of the presentations on the event page if you want to find out more.
All this information…and from the comfort of my very own desk!