I spent most of the latter half of November locked in a life or death battle with Microsoft Excel, arranging, sorting and making sense of entries to the BitList. I seem to have survived this ordeal-by-spreadsheet, even if myriad little boxes containing doubtful formulae dance restlessly through the half-darkness of my closed eyes.
The BitList is a simple enough proposition - a list of digital content and types that are called out for special attention because the digital preservation community has particular concerns over them. It’s a list created by open invitation to the whole digital preservation community and validated by an international expert panel drawn from the DPC membership. The impact is immediate but also, because, the DPC will maintain over the longer term we can move things up and down and in that way we can track progress over the long term too. The BitList therefore means we can identify honest-to-goodness problems and we can celebrate when those problems are resolved.
Two Tales of Digital Preservation
A few months ago, Sara Thomson tried to explain digital preservation to an immigration officer. That’s a brave thing: and if you’re reading this blog then you can probably already identify the times when someone has asked you what you do. I recall being asked a few years ago if a nuclear bomb was dropped on my home and my office and the off site storage how would I recover my data. In those circumstances I doubt data recovery would be my most pressing problem.
When you mention digital preservation at parties you normally get one of two reactions based on one of the two prevalent narratives. On one hand there is the gloomy tale of the digital dark age: a coming time when the bits die and all that is good goes with it. Here, just for fun, are two people who should know better missing the point entirely (archivists in particular should look away). Regular readers know my view on that. It’s a good story and an eye-catching one if you want headlines, but mostly I think it’s over-stated and because it’s overstated, it’s ultimately self-defeating. It also avoids the many good things we should say about the many clever things that the digital preservation community has done. Here, for balance, is some of that from November 2016, and there will be more good news like this at the end of 2018 (watch this space). As a footnote, the digital-dark-age narrative is typically and perhaps coincidentally supported by strange but equally unconvincing claims made of some new super-duper long-term storage technology that is going to solve all our worries: as if a technical solution will be sufficient to our socio-technical challenge. You doubt me? Here’s the sort of thing I deal with all the time, also from a press office that really ought to know better.
So that’s the ‘Digital Dark Age’ version of the digital preservation narrative. The other narrative looks like this:
That’s not a mistake in your browser. It’s the kind of coverage that digital preservation receives when we’re not sounding the Digital Dark Age hooter. Silence. Blank. Empty. I suppose if I were being honest, the space would otherwise be filled with some new hateful invective against immigrants or the elderly or the poor. It’s almost depressing to note that we’re so insignificant that we’ve not even been deployed as a false flag distraction from the latest misanthropic-misogynistic- abuse-of-power scandal. You can see why it’s so very tempting to pitch digital preservation as a catastrophe. Either way, the subtle or complex difficulties are lost and the nuanced solutions making slow but impressive progress to their resolution are invisible. Meanwhile, coming soon to a bitstream near you: the apocalypse.
I won’t hack away at this problem again because I’ve done it few times and readers will be as bored reading it as I am writing it: not all our bits are sacred and not all our solutions are perfect. The analogue world is a messy and distracting place and the digital world is too. Our messages are complicated and partial and fluctuating, but that’s no excuse for disengagement.
Therefore, the BitList: an attempt by us to create a new set of stories around digital preservation that are recognisable, actionable and informed; an attempt at a narrative that can adapt in line with our own progress and which can track real and emerging challenges; a narrative that simultaneously celebrates success, pinpoints real concern, and, when necessary, sets off a great big, urgent digital preservation siren.
Overcoming Atelophobia (aka the fear of imperfection)
The BitList began life as a conversation at the DPC advocacy sub-committee where we recognized that our advocacy within the digital preservation community – especially the biannual Digital Preservation Awards – was very effective, but our efforts at taking our successes to the wider public was just not connecting. This wider awareness raising effort is important because if we can hit the right tones in public imagination then members are going to find their own local advocacy easier. If the awards are every 2 years then there’s room for another, more public facing activity every other year.
We wanted to try something a little different but with a reasonable chance of success so we looked around at good examples of how to raised awareness in a targeted way that was nonetheless readily understood. The model of the IUCN ‘RedList’ has been in our minds for a while. This list provides a simple-to-understand classification of wildlife species, from ‘extinct’ to ‘least concern’ through various classes of endangerment. The classification is useful for the way that it means resources and attention are concentrated; but it’s also useful because it’s dynamic. So items can move up and down the classification depending on actions (or inactions) that follow.
But biodiversity is relatively well documented. It’s (theoretically) possible to assess an entire genus from top to bottom. That’s not so simple in the digital domain. There are some corpora and registries for the digital universe: but it’s hard to establish a taxonomy that doesn’t pre-judge to the nature of the threats that digital objects face. For example it would have been relatively easy to use a tool like Pronom to map the digital universe and invite DPC members to rank the different file formats according to the risks of obsolescence. But that would assume file formats were the beginning and end of preservation risks and it would create a narrative that privileges the technical problems over the organizational ones. There’s lots of technically robust digital content at risk because no one is properly tasked with its care: and there’s a lot of exotic and problematic data that survives in rude health because of the herculean efforts of committed preservationists. There’s all sorts of reasons why content faces sudden and unexpected risks.
We had always imagined that the BitList would involve a bit of community engagement, so we turned to the community to help us solve two problems: to make proposals about what should go on the list, and to make sense of those proposals once received. The nomination process was very simplistic to encourage as many nominations as possible.
Nominations for the BitList were open for the month of September, which handily enough encompassed the Archives and Records Association Conference in Manchester, PASIG in Oxford, RDA in Montreal, and iPres in Kyoto. The majority of nominations were submitted online but the conferences gave us the opportunity to stage a number of practical stunts with postcards and an ersatz post-box to encourage participation. We ended the month with just under 100 properly framed nominations.
We completed the first review of the pile in October when a panel of DPC members ranked and commented on the nominations. The lightweight nomination process meant that jurors had to bring quite a lot of insight and commentary to the entries, and also were invited to introduce a few late additions that in their view were important but had been overlooked. The jury were also tasked with weighting the criteria used to rank entries: so for example the jury wanted to make sure that we took account not just the likelihood that material would be lost but also the urgency of action and the consequences of loss. The nomination process also produced a fair degree of duplication, overlap and nesting so the second phase of assessment consolidated and simplified the 100 or so entries down to around distinct 20 items.
I've Got a Little List
The list was published on 30th November as the DPC’s own contribution to International Digital Preservation Day and we’re very grateful to colleagues at the University of Glasgow who helped us announce the results live by webcast. You can dive into the recording and read the list for yourself: but in this post I want to share five reflections on the process and the outcome.
Firstly, and reassuringly, there are some familiar items on the list. It’s hardly a scoop if I reveal that portable magnetic media or proprietary software are high on the list of people’s concerns.
Perhaps less obvious, though familiar to anyone working in digital preservation, are the number of times human deficiencies are the root cause of our worries. The space once occupied by risks of obsolescence or media rot is now a crowded space occupied by ill-managed rights, out of control political interests, failing markets and simple human frailties. So, as we have become more confident in addressing technical challenges, so we need to turn a spot light on human behaviours, whether in the political suppression of unwelcome truths or obfuscation of historical evidence. Corporate failure is rising quickly as a worry. Over and above our technological accomplishments we need policy and regulation, which in turn means people – governments, organizations, companies, law makers – need to take responsibility for preservation. ‘We inadvertently deleted the records’ is no longer an excuse.
It’s also evident that the gap between digital and analogue has all but disappeared. The digital preservation community may seem to be muttering to itself with obscure worries about arcane digital hazards: but there are real world consequences when these are ignored. Digital preservation is a life and death matter. Phrasing that differently, not everything on the Bit List matters to everyone people equally, but at least one thing will apply to everyone: so digital preservation matters to all.
I can’t tell you how often now I have been asked ‘what’s at the top of the list’. A usual my answer is ‘it depends’. We’ve worked hard to avoid a BuzzFeed style ‘top ten’. Instead we’ve associated broad groups of content with broad categories of risk and suggested a timeline for action associated with it. Where good practice is evident the risk reduces; and in the presence of aggravating conditions the risk is increased. We reserved a category of greatest concern, Practically Extinct, for material that presented significant technical challenges and no obvious archival home. On first inspection there are only two items in that group – Pre WWW View Data services (also known as Teletext) and Pre WWW Videotex Data Services (such as Bulletin Boards). That is quite a measured analysis and it seems on first inspection to understate the overall tendency for digital loss. But there are ten items in the next category down, Critically Endangered. In the presence of aggravating conditions these items become Practically Extinct: and when you realise that things like ‘Politically Sensitive Material’ or ‘Unpublished Research Outputs’ (ie research data) are in this category then we’re not far from putting sounding the alarm over a really large proportion of the digital estate. Each day presents new examples of how public figures (this month it’s Toby Young) use the delete key to distance themselves from their own unwanted narratives. This kind of material is critically endangered and all too often is deleted before preservation is enacted.
Finally, and perhaps not surprisingly, there was a lot more than we were able to process. The nomination process was deliberately open and the result was predictably anarchic and at times contradictory. We processed as many nominations as we could but we also had to meet a deadline: so a large remnant of material is now simply listed as ‘of concern’ – by which we mean legitimate concerns have been expressed but we’ve not had the time or the expertise to complete an assessment.
I was quite delighted to have completed the list and I hope you will find something useful or challenging in there.
Making Weather
We’ve been surprised and delighted with the immediate impact. Early in the process Sarah Middleton wisely suggested that we might want to prepare ourselves with some media training. She was right – as we were launching the list I found myself pretty quickly in a sound proof studio in Glasgow talking about digital preservation on the BBC World Service. It’s hard to know the impact of media like that: but if DPC can present the challenges and needs of the digital preservation community to a large audience, then there’s more hope that DPC members can make progress with their local, ongoing advocacy.
Just when I thought we had finished the list I realise that this was going to become an ongoing job. We’ve had some very useful follow up conversations from colleagues questioning or addressing items on the list which it’s useful to share.
The best example (so far) is definitely in relation to the TeleText services that we had categorized as ‘Practically Extinct’. We used the term ‘Practically extinct’ advisedly because we wanted to invite correction and draw attention to those places where such content may yet be saved. Although we had asked around pretty thoroughly, it became obvious almost immediately that there was more to the TeleText example than we had realised and quickly came across a couple of initiatives where the data had been saved. We knew that a lot of the data could have been captured within the video recording services which capture all broadcast output as it goes on air, but it was unclear whether there was any practical mechanism to recover or access this data. This became more interesting when we were put in touch with an under-funded research project that had been trying to migrate the TeleText component out of the broadcast video recordings and into a searchable digital interface.
I won’t attempt to describe this as I will get the technical details wrong. But whereas I had worried that we might have offended those behind the project, on the contrary they were delighted that we had shone a light on this complex and overlooked issue. We’ve been told about a working prototype which manages this task and we’ve challenged them to come good with a working, scalable demonstration that would allow us to change the initial categorisation. And this, it seems to me, is what success looks like for a project like the BitList.
BitList 2.0?
I think we should be maintaining the BitList, but at this point we’re open to ideas as to how that might be made to work. There are three things I’d like to propose:
- I’ll be honest and say that getting off the ground was a big effort and that we hope that the second iteration will be easier and simpler. So version two should involve fewer spreadsheets and perhaps more distribution of the effort.
- I’ll also confess that we were worried about over-cooking the publicity in 2018 because we were not simply creating the list but also the processes that produced it. I think we could make a bigger splash the 2nd time around because by then we’ll know better what we’re aiming at and can more confidently lean on the global digital preservation community to talk it up.
- I think the current list is our point of departure: we should be reviewing current entries and revising some of them. We also need to assess the large body of material listed in the ‘of concern’ category which merits closer inspection.
But at this point everything else is up for grabs. We’d love to hear your comments and criticisms; of how it has been useful (or not); of the things we’ve missed; of the things which we’ve simplified beyond recognition.
In 2017 I took to referring to the BitList as ‘BitListBeta’, so as to set my sights on a viable product, but not be too worried if it looked a bit rough around the edges. I assured myself, and indeed the DPC Board, that a finished product would arise thereafter based on the experience and wisdom gained. I realise now that it will always only be Beta, only ever as good as the last iteration and always adapting to some future point. So with some awareness of irony, in making the case for permanence we’ve added to the sum-total of things that face obsolescence. Such is the fate of all digital resources.
Sic transit gloria digitalis
I am grateful to Sarah Middleton who commented on a draft of this post before publication.