In October 2022 I was privileged to join colleagues virtually for an event organized by UNISA in Pretoria, South Africa. The event marked International Open Access Week 2022 and had the title 'Open for Climate Justice'. This blog is a version of the paper that I presented. (Added 1/11/2022: The slides are available from UnisaIR at https://hdl.handle.net/10500/29530)
Thankyou very much for the invitation to join you today to share some thoughts about the relationship between digital preservation and climate justice. Ansie’s invitation was very timely, not just because it’s International Open Access Week. This time last year I was invited to share some thoughts on this very topic on the fringes of the COP 26 summit here in Glasgow so it’s an opportunity to share how my thinking has progressed since then. Also, environment was a main theme of the iPres conference in September so it’s an opportunity to reflect on some of the emerging thinking presented there.
I’m going to cover a lot of ground in the next 30 minutes or so but it will all be published later today on the DPC blog so you can listen along or make notes as you please.
I aim to develop eight ideas.
Firstly, I need to define the digital preservation problem then make explicit the link to the main theme of climate justice.
Then I want to talk a little about openness in climate science. In my head this links to core themes of authenticity which are central to the mission of most archives; but you won’t fail to notice a wider issue about accountability and the challenges to climate justice that arise from vested interests. A spoiler: openness is going to emerge as a requirement.
I will then turn my attention back to more familiar themes in digital preservation, the relationship between preservation and disposal, and consider the opportunities that digital preservation creates to manage and reduce the amount of data we retain.
Digital preservation is more than storage. We can track energy consumption right across digital preservation workflows, and that has implications for how we might structure preservation.
We also need to recognise that energy consumption is not the only way in which digital technologies impact the environment. The virtual world is physical. The manufacture and disposal of computing equipment has a real and unsustainable environmental cost. I will explore some of these issues.
Towards the end I will take a brief detour into the history of digital preservation. This might seem indulgent, but it is not irrelevant to the climate crisis and demands for climate justice. Changes which will disrupt the digital economy will also disrupt our understanding of digital preservation.
Finally, I want to reflect on the DPC and how we’re beginning to make changes in our own work.
Digital Preservation 101
What’s the problem. Here is a list of the technical challenges that typically result in data loss. I am not going to dwell on it except to say two things: firstly it’s a long list, and certainly there are things missing that don’t fit easily on the slide. I’ll leave you to reflect on which ones you’ve experienced and what the impact was. Also notice that some of these are emergent risks, so just when you think you’ve fixed it the problem changes. It’s fashionable therefore to think of digital preservation as a ‘wicked' problem.
That leads to a brief definition of digital preservation. It’s “the series of managed activities necessary to ensure continued access to digital materials for as long as necessary.” The words in bold are important. It’s a process not an event; it fits within a managed framework; it’s about access which means more than just backup or storage; and it’s for as long as necessary – not forever and certainly not everything. Digital lifecycles are short so lots of agencies which are not archives in the traditional sense have a digital preservation problem.
What’s in scope? An awful lot is in scope for digital preservation: born-digital and digitized; structured and unstructured; ephemeral and essential. Really any digital object where the lifecycle and use case is longer than the lifecycle of the infrastructure on which it was created.
The exercise I give students at this point in my introduction is to identify the things they have already lost and why the lost them; then think what digital objects they want to have available in 10 years and apply the learning from the loss to develop an elementary preservation pathway. I’ll leave you to do that on your own.
Climate Justice and Digital Preservation. Really?
You’ve asked me to talk about climate justice so perhaps you’re wondering how this relates. In two ways.
Firstly, from a wide angle, we don’t do digital preservation for the sake of the bits and bytes: the files will not thank us. For all our talk about formats and metadata and authenticity, digital preservation is about helping to achieve real world outcomes through successive generations of technology. Any long-lived purpose that relies on digital interactions is going to need digital preservation.
That connects digital preservation to the theme of climate justice more explicitly. Climate change is a temporal process, sometimes very long-time scales.
I am speaking today just a few miles from Loch Lomond which was the furthest extent of glacial advance in the last ice age known as the ‘Lomond Stadial’. So, perhaps somewhat against the trend you might expect, sea levels have been falling in Scotland for about the last 10,000 years as the earth rebounds from the weight of all that ice. You'll understand the concern as to whether that will continue . Whether it’s desertification, glacial retreat, sea level change, water quality or migration patterns: all of these need to be understood as processes through time. Even seemingly one-off events like extreme weather phenomena need to be contextualized to understand if their frequency and severity and distribution are changing.
We need long lived data, even if just to get a basic measurement on whether and how the environment is changing. It means climate models and predictions will be tested and refined over many years, perhaps decades. That means we need ongoing access to a vital, dependable and growing body of scientific output. Digital preservation is a core requirement of climate science, and by extension of climate justice also.
Framing
There are unique challenges associated with preserving long-range scientific data about climate change.
It stands to reason we would want to preserve such data because its value and usefulness grows through time. But universities here in the UK have faced quite serious ‘denial of service’ attacks because have published findings or gathered data which might be considered hostile to vested interests, especially the carbon lobby. There have been concerted if obscure efforts to make research institutions think twice before engaging in climate science.
That’s a significant worry and, something we need to discuss openly at events like International Open Access Week. I wonder if this is a global issue? In any case, climate science is not devoid of bad actors and misdirection. Transparency is essential and I am not sure we always get that.
Here’s a short story to illustrate.
I spoke on the fringes of COP26 last year about archives and climate change. Later in the same workshop a consultant posted what seemed a very useful and actionable insight, claiming that 'If each adult in the UK sent one less email per day that would prevent 16,433 tonnes of CO2 per year; equivalent to 81,152 flights from London to Madrid or 3,334 diesel cars of the road.’ The inevitable slogan: ‘think before you thank’.
That’s very eye catching, and who wouldn’t want fewer emails?
But it sounded a bit simplistic to me. The established wisdom is that carbon footprint of any digital activity is based on the source of the energy and the carbon locked into devices rather than the nature or size of the objects involved.
How did this eye-catching but dubious ‘think before you thank’ slogan find its way into a workshop about archives and climate science?
Here’s what I have worked out. The claim was quoted from an IT consultant, who in turn had quoted the statistic from a press release about research that had been commissioned and promoted by a company called OVO. OVO is one of the largest energy firms in the UK. Despite its claimed green credentials, critics report that OVO’s power generation comes with a higher than average carbon footprint when compared to industry norms.
So what’s going on here? Why did the discussion get diverted into a fruitless discussion about deleting email? Quite ostentatiously, the slogan ‘think before you thank’ passes responsibility for carbon reduction onto the user. It makes no commitment or comment about what the energy generators should be doing. It’s not that the statistic or the research is bogus: I honestly cannot pass judgement, but it evidently sends us in the wrong direction.
In the media this is called framing. That’s what happened at our workshop. We were framed.
Here’s another example. In January this year OVO advised customers to reduce their energy bills by cuddling pets for warmth, “challenging the kids to a hula hoop competition”, “doing star jumps”, and “cleaning the house”. Incidentally that blog post was hastily deleted, how’s that for transparency? At the same time, OVO reportedly made a profit of £600million.
I can think of a better way to reduce energy bills. On the face of it, OVO could subsidise those bills by half a billion pounds and still be 100 million in profit.
It doesn't have to be like this.
CSC in Finland has a carbon negative data centre at their Kajaani facility which is based entirely on hydroelectric plant and provides energy and heating out to the local community as well as data services to the higher education and research sector. CSC is a not for profit which is in public ownership, nor does it employ expensive public relations firms to engineer eye-catching slogans.
There are two points here.
Firstly, this is what the debate is like. A nuanced discussion about the carbon costs of data management was derailed by the third-hand repetition of an unconvincing but eye-catching claim based on research promoted by an energy firm heavily invested in carbon. Anja, I’d love to know if this is also your experience. A little bit of transparency goes a long way. You might even think that, if energy firms spent a bit more on infrastructure and a bit less on shareholder dividends and salaries (not to mention misinformation) we'd all be better placed.
There is a very strong intellectual case about preserving scientific data about the environment. It’s a complicated task, made even more complicated by the need to understand and be open about the entire lifecycle journey of the data from inception. In fact, if the data is not open, and the methods and origins not entirely transparent, then there’s every reason to be suspicious.
Climate justice needs climate honesty. Digital preservation can help with that.
Storage
You’ve come here for a talk about digital preservation and that’s a topic I am better able to address so I should return to my main theme.
On the face of it, digital preservation might sound like keeping lots of data for a long time and that sounds energy intensive. That could be true, but only if you did it badly. Here’s a paradox: digital preservation enables deletion.
Good digital preservation is also about keeping control over the digital estate and thereby creating permission to dispose – what to get rid of and when to get rid of it. This is the opposite of unmanaged proliferation.
One of the most frequently asked questions in digital preservation is how many copies should we keep? A rule of thumb would suggest that three copies on the basis that you need to poll two files to establish if the third is corrupt. There are other ways to do this which could allow you to have only 2 copies and you might well need more copies for high value or high-risk assets. The more copies you have the greater the overhead to maintain them so taken to extremes, too many copies can mean greater risk not less.
On the face of it, digital preservation implies three times more data than we started with. That could quickly get out of hand. But let’s remember that we’ve already selected the most important parts of the digital estate for preservation: those parts which we don’t want to leave to the mercy of bitrot, format obsolescence or corporate disruption. The three-copy-mantra applies to a small percentage of the collection. That selection process gives us permission to dispose of the rest.
One might be tempted to think of storage as energy intensive. It doesn’t have to be. The type of storage and the source of the energy are more important than the scale of the data.
There are some good – and some very bad practices in the management of data centres and cloud computing. There are economies of scale: a small number of large data centres will generally consume less energy than a large number of small ones. This is one of the claimed benefits of cloud computing. A data centre consumes the same amount of energy as around 5000 homes, not to mention the local environmental costs. Let’s not think that the cloud is some ethereal and intangible summer’s day. I’ll let Steven take this up as it’s his specialism.
We have to be a bit sceptical about the claims which cloud service providers make about their electricity supply but let’s look at some examples.
For example, data centres operated by Google and Apple in North Carolina take energy directly from the state’s power grid which in turn depends on around 50% of its supply from coal and 39% from nuclear. Now if you are already in North Carolina, you would still achieve carbon reductions by using a cloud data centre over a proliferation of local networked drives in a series of small data centres. But for reasons already mentioned, it would be ecological madness to use a service in North Carolina if you were in Finland and had access to a service like the CSC data centre in Kajaani.
More than storage
Digital preservation is more than storage, so the carbon footprint of digital preservation is not just storage either. Every touchpoint in a digital preservation workflow requires energy – ingest, migration or access for example. These workflows are dependent on use cases so it’s not wise to make sweeping statements.
You could reasonably argue that file integrity is a low priority and that files don’t need to be monitored continuously. But there are some high-value or high-risk environments where the chain of custody is critical and continuous monitoring is essential. There may be highly complex data sets with lots of small parts: do you need to check the integrity of all the parts? Similarly, some data sets, like audio-visual collections tend to be predictable but large. Computing across a large or complicated data set is going to require processor time and the larger the data set the more energy that will require. Can we reduce the data?
Migration is another example, making intensive use of processor time and therefore consuming energy. Should we migrate and standardize files on receipt at the repository, or do we migrate only when the need arises? There are arguments both ways depending on the specific use case.
In a similar way, instant access means spinning disks and if it’s global access then it’s likely to require data cached in numerous locations around the world. That’s very intensive for energy consumption as against tape or offline disk storage. Offline storage comes with slower access so is less good for the user but healthier for the planet.
It’s common place to describe digital preservation as the management of risks. It’s time we included environmental costs within the risk assessment.
A small but important literature has emerged in the last two years with practical guidance and worked examples of how to measure and therefore reduce the carbon footprint in digital preservation. For example Alex Kinnaman and Alan Munshower have measured the carbon cost of different elements in the digital library programmes at Virginia Tech University Libraries, which has had implications for both collecting and digitizing (iPres 2022 forthcoming). At a different scale, Tamara van Zwol and colleagues have described emerging efforts to calculate the environmental impact of digital preservation in the Netherlands (iPres 2022 forthcoming). Both have picked up from work by Keith Prendergrass and colleagues who proposed a range of questions and possible answers, reading the digital preservation requirements and expectations of cultural heritage organisations against current scientific work on climate change and the environmental impact of ICT. This literature is of relevance also to digital preservation service providers: Matthew Addis has reflected on how emerging concerns about the environment have had an impact at Arkivum. There’s also a clear move to codify these issues more explicitly as good practice in benchmarking tools like the DPC RAM. This signals rapid progress after years of very little.
This leads me to question some of the assumptions that lurk beneath the surface in digital preservation. We have a habit of treating objects as either preserved or not preserved, and to treat repositories as ‘trusted’ or not. We need some middle ground. Imagine a sort of graceful or planned decay: files which are checked occasionally, formats that are not migrated unless someone really needs them, access which is slower but more sustainable; imagine a repository which runs and calculated risks and has a published threshold of acceptable and orderly loss.
My argument is that digital preservation strategies provide an informed and sustainable basis for reducing your data footprint, and moreoever they should be better able to adapt to to the wider costs of carbon consumption too. This stack of ideas makes sense in its own terms and it is shot through with ethical considertaions at every stage: who decides what to keep?; who decides what could be left at risk?; what users should be asked to wait? who is served first? It's not yet clear to me that digital preservation has a suffieicnet framework to make these decisions justly and transparently? Serious digital preservation practitioners ignore ethics at their peril.
Climate justice can learn from digital preservation, especially through the preservation of climate data; and digital preservation needs to think about climate justice too.
More than energy
So far I have mostly discussed energy consumption. I can’t not mention questions about the extraction of rare elements and e-waste. I am straying far from my comfort zone but the title of this event ‘Open for Climate Justice’ encourages me to be ambitious.
Digital services can claim to be ‘carbon neutral’ on an operational basis, but that excludes the physical components upon which these services depend. They don’t generally include the concrete and steel that are needed in the construction of a data centre, nor the copper, lithium and the rare earth elements on which modern electronics depend. The environmental impact of their extraction is well documented but complicated global supply chains mean that the glossy products we consume are seldom linked to the pollution caused by extraction.
There are significant geopolitical issues here about the ownership and processing of ‘rare earth’ elements which are essential in computing. In the 1990’s Deng Xiaoping noted that ‘The Middle East has its oil; China has its rare earths.’ In so doing he anticipated the ban which China has since imposed on exports of rare earth elements. You might also be aware that some of the world’s largest deposits in Lithium are in Donetsk region. There’s a reasonable perspective that Russia’s attack on Ukraine earlier this year was the world’s first war after oil.
If anything, disposal is worse. The Forum on Waste Electrical and Electronic Equipment estimates that 5.3 billion mobile phones will be discarded this year. One third of the mobile phones in Europe are no longer in use. The United Nations estimates that less than 18% of e-waste is recycled annually. It’s hard to see how we will make the UN target of recycling 30% of e-waste by 2023.
So, whether from the perspective of extraction or disposal, the technology stack seems unsustainable. Something has to give.
What might this mean for digital preservation? Some of the most striking developments over the last few years have been in the development of DNA storage, which were well represented at iPres this year, emerging also as a serious competitor in the Digital Preservation Awards. There’s a white paper from the DNA Storage alliance which makes interesting reading
You’ll understand that my job exposes me to a lot of excited engineers telling me that they’ve got a new technology that will last forever. I am a bit weary of engineers promising eternal life.
Perhaps DNA storage is different. It’s cheap to store massive volumes of data; it’s stable over a very long time; it’s easy to replicate; can be stored at room temperature; and doesn’t produce anything like the same volumes of e-waste. It also disrupts the data storage industry and established wisdom about preservation. So perhaps there is reason to be hopeful. I don’t want to underestimate the challenges to this new technology and nor do I fully understand yet the different kinds of risks that may be involved: digital preservation is a relatively conservative community and can be quite demanding of new entrants into the market but this seems like a significant change.
You might well accuse me of having unreasonable aspirations in my desire to reduce the overall environmental impact of technology. I don’t just want a data centre which is ‘carbon neutral’ during its lifetime: I want one that is sustainably sourced and which can be composted afterwards. But we also have to understand the seriousness of the crisis that is coming, that is already here. Everything is at risk, and everything has to be up for discussion. It’s hard to imagine anything more important We should tolerate no shortage of ambition.
Digital Preservation Origins
It may seem extravagant to be talking about digital futures when the environmental present seems so challenging. It occurs to me that digital preservation is entangled with bigger economic and social challenges which we ignore at our peril. If we succeed in addressing some of the underlying economic issues then we will also transform the digital preservation ones.
Digital preservation emerged in the mid-1990s in response to the widespread move from analogue to digital, brought about by the home computer and the Internet. Although we spend a lot of time resolving issues of technology and meaning-making, the origins and causes of digital preservation are a response to that digital shift. Digital preservation arises as a response to the accelerating cycles of innovation, adoption and disruption which have characterized information technology in the last fifty years. Market forces lock us into short lifecycles of technology and speculative business models. It’s the historical context in which we take obsolescence for granted, witnessing the unsought creation and unapologetic deletion of core infrastructures at the whim of profit-seeking. It’s fashionable to refer to this as late capitalism. I have previously described digital preservation as an insurgency against the deeply embedded economic forces which sit behind technology. Digital preservation is an obsolescence rebellion against non-renewable consumption.
There are assumptions about the continuation of these unsustainable economic outcomes embedded into our professional practice and our institutional plans. The climate crisis challenges these. The last fifty years of boom-and-bust technological innovation is not likely to be the historical norm. A pivot to sustainable long-term business models in the technology sector will alter how we approach preservation.
The better long-term answer will be to render obsolescence obsolete.
Sustainsble Development and the DPC
Finally, to the DPC. I will freely admit our own work on climate justice has been insufficient. I spoke to a session about this at iPres in 2010, but it was an empty room: no one seemed very interested. It didn’t occur to us to include carbon costs into the cost models and risk management tools we have developed over the years. That’s changing quite rapidly as we saw at iPres this year in Glasgow.
As for the DPC, environmental impacts are now embedded within our Rapid Assessment Model for organizational benchmarking. DPC members are being invited to include policies about energy consumption within their digital preservation activities. The DPC’s new strategic explicitly commits us to the Sustainable Development Goals, and we’re going to set these not simply as aspirations or values as they were before, but as auditable goals for which we will be accountable.
The term ‘Anthropocene’ has been proposed as a new geological epoch defined by the nuclear age.
The question arises, how will future generations remember the Anthropocene, if at all? Will there be more than just a fossil record? We have to hope that there is and we should not we limit our actions to hope alone.
Digital preservation has a role to safeguard that memory but it’s not for the sake of the data. The digits won’t thank us. Our ultimate purposes are the healthier, weathier, safer, fairer, more transparent outcomes upon which climate justice depends.