Matthew Addis is the Chief Technology Officer at Arkivum.
Concerns are frequently raised that the way that humanity generates and uses data is simply not environmentally sustainable and is a significant contributor to greenhouse emissions and climate change. Archives and other organisations who hold and provide access to digital content are justifiably concerned about the part they play and how they can be more environmentally sustainable.
The argument will often go a bit like this:
-
if you do anything with data – store it, process it, access it, use it – then you inevitably use Information Communication Technology (ICT);
-
ICT is significant part of global energy usage, and if ICT was a country then energy usage would be second only to the US and China;
-
this energy usage results in lots of greenhouse gas emissions, much of which comes from energy hungry data centres and data networks powered by non-renewable sources such as coal, and currently totals more than the whole of the airline industry;
-
the rate that humanity is generating, storing and using data is increasing exponentially;
-
and therefore, we are inevitably headed down the road of using ever more energy and having an ever growing and detrimental impact on the environment.
Worst case scenarios include headline grabbing figures such as communications technologies contributing up to 23% of greenhouse emissions by 2030.
Actually, a great deal of care is needed with what seems to be a simple and straightforward argument like this - things aren't necessarily as bad as they look - and more on that below. Nevertheless, environmental sustainability of data-centric activities, including digital preservation, is undeniably something we should all be concerned about.
What is the environmental cost of preserving digital content?
Memory institutions and other long-term holders of digital content make use of ICT when doing digital preservation, when archiving digital content, and when providing access to their digital holdings. Doing anything with digital content involves using ICT in some form or another and there is always an environmental cost. Organisations are understandably concerned about the environmental sustainability of what they do. This is fuelled (yes, any excuse for a pun) by often inaccurate or pejorative media coverage and doomsaying around the way that global data centre operators and cloud providers are destroying the environment.
But are things really as bad as they seem? What practical steps can be put in place to help make digital archives more environmentally sustainable? How can environmental sustainability be aligned with other priorities and challenges that archives face such as limited budgets and staff?
The DPC recently hosted a great webinar on environmentally sustainable preservation which involved the authors of a report published last year on this topic and there’s a write up on the webinar by Jenny Mitcham here. The underlying report contains a very detailed and very well researched analysis – including some proposed ‘paradigm shifts’ for the way we do digital preservation, the way we consider permanence, and how we should provide access to digital content in our archives.
The report includes a detailed piece on the environmental impact of ICT and rightly takes the approach of considering the whole lifecycle of ICT equipment as well as the environmental costs of running that equipment in data centres. The environmental costs should not be underestimated of extracting and processing the raw materials needed to make up computer hardware along with its eventual disposal and, ideally, recycling. This is sometimes called the ‘embodied footprint’ of ICT and is in addition to the footprint of the ‘use’ stage of operating the ICT, e.g. greenhouse emissions as a result of electricity consumption by data centres. For consumer devices, e.g. mobile phones, the ‘embodied footprint’ can be equal or greater to the ‘use’ footprint, but for ICT equipment used in data centres the number is lower.
The report also cites a commonly used statistic of electricity usage as being 7% of global production in 2012 with a trend of this rising to 12% by 2017. But this isn’t the case – energy consumption for ICT is currently estimated at 1-2% of global energy production. Data centres are not the villains of the piece either. A recent report published in Science Magazine aims to straighten the record and shows that from 2008 to 2018 the use of energy globally by data centres is largely flat – it hasn’t spiralled out of control as some predictions have previously suggested would be the case. During 2008 - 2018, data centre storage capacity increased 25 times, but that has been offset by increased data centre energy efficiencies (PUE), increased storage densities and increased compute power per watt of electricity needed. The result is the ability to store and process a lot more data, but at little or no additional energy cost.
I presented a common and naïve argument at the start of this blog which implied a direct and proportional link between the increased amount of data that we create and store, and an increase in energy consumed by data centres. This is false, or at least it has been for at least the last 10 years! Of course, whether this remains the case is not guaranteed.
Is the cloud the answer?
The report from Pendergrass et al suggests that there is much still to be done in the use of clean energy by data centres and that some are located where electricity is cheapest rather than cleanest. This is true, but again huge changes are underway. There are now numerous examples of how the global ‘energy hogs’ are using renewable energy and cooling. They are locating data centres specifically so they can operate in an environmentally friendly way. This includes;
-
Running their data centres in cold climates such as Scandinavia or Iceland
-
Using renewable energy such as solar or hydroelectric
-
Operating at extreme levels of efficiency through ‘hyperscale’ techniques.
Microsoft even announced earlier this year that it will become carbon negative by 2030.
Cloud providers are becoming increasingly driven to reduce the environmental impact of their operations (often because it also reduces their financial costs and improves their bottom line) and they are increasingly transparent about what they do. You can see this in the Greenpeace ClickClean report that assigns green ratings to major internet companies. Some large data centre operators such as Google are already coming top of the class with A grades.
I presented another common and naïve argument at the start of this blog that implied that environmental impact and greenhouse gas emission of ICT is directly proportional to energy consumption. This is also false. Global data centre and network operators such as the big cloud providers all use a vast amount of electricity, but it does not follow that this results in a similarly vast carbon footprint.
The environmental impact of the ‘production’ of ICT (the embodied footprint) and the ‘use’ of that ICT are also connected. This is often overlooked. The more efficient the use of ICT, e.g. in modern hyperscale data centres, then the less ICT hardware is needed to store/process a given amount of data. Likewise, the scale at which large cloud providers operate also mean environmental efficiencies in the supply chain for the ICT kit that they consume, e.g. in its manufacture, distribution and eventual disposal. This is unlikely to be possible for smaller scale data centre providers or when operating kit ‘in house’ within archives. There is a ‘double-whammy’ here – using hyperscale cloud providers can be both friendly in terms of using renewable energy but also friendly in terms of using less physical hardware. Surveys suggest that up to 30% of servers sit idle, especially true in on-premise environments – cloud providers can’t afford to let this happen in their environments.
I might be sounding like a cloud evangelist at this point. I’m certainly not suggesting that cloud data centres are a complete solution to environmental sustainability of digital preservation. However, I do think it is important for archives to think critically about the environmentally sustainability of in-house ICT operations compared to what the major cloud providers are currently achieving. The cloud is just ‘someone else’s computer’ and archives (or their IT departments) should look hard at whether they can do a better job than Google, Microsoft, Apple or other Greenpeace ‘A rated’ hyperscalers.
There is not a panacea here. There are also risks. Even zero carbon data centres are not free from environmental impact, especially in areas like manufacturing and disposal of the equipment they use. There is a risk that Jevons Paradox kicks in and archives start consuming more resource simply because they believe it is environmentally free to do so. If there is zero environmental impact, then why not simply keep everything? But we are still some way off zero environmental impact of digital preservation and archiving and it beholds us all to look critically about what we do in the meantime.
What can we do to make digital preservation more environmentally friendly?
Much of this blog has been based on thoughts that I’ve had following a recent webinar as part of the DPC Digital Preservation Futures series where I was asked by the DPC how Arkivum can help content holding organisations with environmental sustainability.
My answer back then was roughly two-fold.
The first part centred on efficient use of cloud services in order to minimise energy hungry computing/storage of digital content. Basically I argued that use of hyperscale cloud can help with both the ‘production’ and ‘use’ impacts as outline above.
The second part was more along the lines of the ‘paradigm shifts’ that Pendergrass and colleagues present in their report. However, I put things a bit more bluntly and in order of merit. Basically, I said that the best way to run an environmentally sustainable digital archive is not to store anything in it – and if you really do have to store content then don’t allow anyone to access it! This would appear to fly in the face of digital preservation which is all about keeping stuff and enabling people to access and use it. Not so.
An extended, slightly nuanced, but still quite blunt version of my suggestions is below.
-
Don’t store content if you don’t have to. By far the best way to reduce the environmental impact of digital preservation is not to preserve content in the first place. The less you preserve, the smaller the impact. No digital preservation is free from environmental impact – not even squirrelling content away in a dark archive. Create a simple checklist when considering adding content to an archive: ‘do I already have it, or something like it?’, ‘does someone else already have it in their archive?’, ‘does it have enough value to justify the cost of keeping it?’.
-
If you do keep content, then delete the content as soon as you can! This is basically point 1 again but done at a later date. Use retention schedules to ensure content gets reviewed on a regular basis. If you don’t need to store it anymore then get rid of it. When content is up for review, then ask whether the content is actually being used and do you really need to keep it. Regulated industries have been doing this for years, for example as part of records management – it’s not as if processes and systems don’t exist for doing this. In many respects, ‘idle content’ is as bad as the ‘idle servers’ that I mentioned above.
-
If you do keep content, then think carefully about how and where you keep it. Think critically about cloud vs. on-premise, especially the possible environmental benefits of using multi-tenanted hyperscale public cloud. But look carefully at the cloud provider, e.g. using the CleanClick report. Cloud providers operate their data centres in many different geographical locations and there is often the ability to select and control exactly which locations your data will be stored in.
-
Avoid unnecessary preservation activities, e.g. don’t generate extra preservation derivatives (normalisation) unless you really need to. Don’t create an excessive number of copies of your data or fixity check them unnecessarily. On the DPC webinar I called this ‘Preventing Pointless Preservation’. Every preservation action has a carbon cost to some degree because everything involves using ICT resources. Consider the Minimum Viable Preservation Look at Minimal Effort Ingest approaches. Dial-back your NDSA preservation levels or DPC RAM service capabilities for different content types or content uses if sensible. Procrastination can be the friend of environmental impact – don’t do today what you can put off until tomorrow!
-
Don’t provide instant online access to every bit of content in your archive. Content that doesn’t need to be accessed quickly can be pushed down onto ‘archival storage’ which is much more energy efficient and also uses less IT kit. It is also a lot cheaper. The environmental cost of network access to online content is significant – it might be free to put content on social media channels, but that doesn’t mean there is no carbon cost when people access it.
-
Consider creating low-footprint versions of your content for access. For example, use normalisation to create lower fidelity access derivatives for images, audio and video. These go online which leaves the ‘master versions’ free to be pushed down onto lower impact archival storage. Smaller access versions take less storage space, they need less computing power to serve up, and they use less network bandwidth. The AV community, e.g. broadcast archives, have been doing this for years and have a model of multiple tiers of content where the infrequently accessed high-res preservation and archive masters are only stored on deep archive storage tiers. The same approach can be applied to other content types too.
A paradigm shift or an evolution of what we already do?
Very roughly speaking, 1 and 2 above come under the Pendergrass et al’ paradigm shift of ‘appraisal’, 3 and 4 come under their paradigm shift of ‘permanence’, and 4 and 5 come under their paradigm shift of ‘availability’. But I’m not sure I’d consider any of the above as real paradigm shifts – although they will often need cultural change. These sound like big changes because the measures above are pitched in terms of minimising digital preservation and access in order to reduce environmental impact. But often there are other good reasons for doing these things anyway. Many archives will already be doing some of the above because of financial or staffing imperatives. This is an important point – many environmentally sustainable approaches to digital preservation also have positive monetary benefits too – which means the environmental agenda can be aligned with the day-to-day financial sustainability and cost reductions that many archives already have to face. Many of the points above reduce the amount of data being stored, the amount of processing that needs to be done when ingesting content, and the amount of ongoing work that needs to be done to keep content accessible and usable. Those things all reduce the financial costs of digital preservation. This alignment is likely to help enormously with the business case that archives may need to make to their stakeholders for changing the way they operate. It also makes it easier to quantify and measure change, e.g. when looking to shift practices and culture at an individual or organisational level.
Where does Arkivum fit into the picture?
The combination of financial and environmental benefits of doing a better job on environmentally sustainable digital preservation applies to Arkivum as much as it does to our customers. We already use many of the approaches above and our solution enables customers to make decisions on how they want to run their digital preservation activities in a more environmentally sustainable way. Here is not the place for a sales pitch, but a few things that we do include deployment where there is a choice of public cloud infrastructure (AWS, Microsoft and Google), multi-tenanting customers to increase efficiency and avoid ‘idle servers’, supporting tiered and archive storage using data placement policies, holding a reduced number of copies of data where guaranteed long-term data integrity is less of a concern, user specified retention schedules and policies for deletion, automatic deduplication to avoid storing the same content multiple times, configurable preservation policies that can be turned off as well as on, and the ability to generate and host access versions of content that are managed separately to preservation and archive masters. This helps us minimise our costs and environmental footprint as well as for our customers to optimise their approach.
And there lies my final point. Digital preservation and archiving is increasingly done through extended and complex supply chains that couple together archive organisations, preservation vendors, cloud providers, network infrastructures and equipment manufacturers. We all have our part to play in environmental sustainability and we all need to think critically about the approaches we take. There is no scope for complacency. There is still a pressing need for education, advocacy, and identifying and effecting change – and the Pendergrass report does a fantastic job in that area. And in doing so we all need to be mindful of the rapid changes that are taking place, especially in the IT industry and the way that digital preservation is done through supply chains. We should use that to our advantage when we develop better and more environmentally sustainable approaches to digital preservation.
Comments
Kind regards,
Barbara