Shibboleth
I am asked, from time to time, how to persuade management that digital preservation matters. It’s a puzzling question in context and content. For a start, I am not sure I have ever persuaded anyone of anything. I have been on hand when people persuaded themselves but that’s not the same thing. It’s like finding the fire brigade at the scene of every major fire and assuming they are to blame. Moreover, I am not sure it’s possible to offer a global shibboleth for digital preservation that will work for all time zones and all sectors. I’m not saying it’s not possible to make the case but, in this particular conversation contexts matter. A lot of local truths don’t make a universal one.
In any case, decisions are not made that way. Persuasion isn’t an event it’s a process. DPC offers pick-and-mix support to the set piece dramaturgies of advocacy. Depending on your tastes we can help with business cases, elevator pitches, management briefings, executive breakfasts, unflattering comparisons with your competitors, the noose of non-compliance, the klaxon of digital annihilation. It’s all just a mouse click away. But real progress is ongoing, erratic and gradual. So my best advice to those wanting to persuade their bosses of the importance of digital preservation is to stop hanging around on the road to Damascus waiting to pounce. It’s going to take a while and maybe require more than divine inspiration. Work at it. Be funny, be sneaky, be clever, be whatever you think you need to be in the context where you work. But most all, be persistent.
I’ve been thinking about this topic quite a lot recently, partly because of a series of encounters with senior management in DPC members, and because on four occasions in the last year I have been a keynote speaker at industry events where I have been explicitly asked to speak about advocacy. It’s also timely because we’re having a fun time reviewing our strategic plan which says an awful lot about the need for advocacy. So here is where my thinking has got to.
Risk
You can try the risk angle. I quite like the risk angle. As DPC’ers know to their frustration, the risk register is one of my favourite corporate documents. The gloom speaks to my Scottishness. It also works as a way to persuade your boss that he or she owns a significant information risk and that there will be consequences if they don’t act. It’s the closest most of us will ever come to having ‘blame’ as the mime type of an email attachment. And you can encourage your boss to get funky with the risk register, translating the forbidding jargon of digital preservation into the higher form of occlusion known as management speak.
There’s not much I can say about the risk angle that hasn’t been said already. I am a huge fan of the SPOT model for Risk Assessment which is wonderfully useful and surprisingly under-used. But if that doesn’t work for you then we also wrote at length about Risk and Change Management in the Digital Preservation Handbook, and there was a tremendous outpouring of thinking about Business Continuity Management and Risk Assessment for Digital Preservation in the TIMBUS project.
In an uncharacteristic moment of sunshine, I would simply add that we need to talk not just about the risks but also about the opportunities which digital preservation affords, of which more in a moment.
Enter the Social Entrepreneur
Here is another idea which I’ve been trying for a while, which complements the risk angle well. If you can’t persuade your manager that digital preservation will save them from grief, you can show them how it will make them richer.
I am a fan of an accountancy framework called the triple bottom line. It’s a framework that many businesses use to measure their performance in the creation of value broadly defined. Whereas a traditional accounting process would lead to a ‘bottom line’ of financial value alone, the ‘triple bottom line’ measures a company’s performance on social and environmental value too.
One can instantly see why businesses, large and small might want to show how they are profitable not just for their shareholders but for their communities and their ecologies. How many companies have seemed lucrative, but have suffered untold reputational damage from the ecological or social catastrophes which the pursuit of profit have inadvertently encompassed. Maybe it started with the slave trade, or perhaps it is older than that. The question of ‘what value your profit’ has never not been controversial and remains so today, even in these times of corporate social responsibility, employment rights and the living wage. Names like Sports Direct, Exxon, Union Carbide, BP (formerly the Anglo-Iranian Oil Company just in case you thought that Deep Water Horizon was a low point) have suffered diminished financial value because of apparent failures in their corporate values. If they had measured their value not simply by cash profit each year then perhaps things might have turned out differently not just for their neighbours and staff but for their shareholders too. And conversely those companies that invest with a social and ecological conscience are valued not just by their shareholders but by all their stakeholders. Enter the social entrepreneur.
Just a reminder that this is a DPC blog: I am getting to a point about digital preservation shortly. In fact it’s here: what happens when you translate triple bottom line accounting into a discussion on digital preservation?
Greener
If you can’t impress management directly with digital preservation and they don’t engage with risks, then there’s reasonable case to be made that digital preservation is greener.
There has been surprisingly little interaction between the digital preservation community and the green ICT agenda. A panel session inspired by Neil Grindley at iPres in 2010 is the only action I can remember, encouraged by Diane McDonald’s ‘Greening Information Management’ report of 2010. This noted how data centres and allied data storage technologies contribute to the overall energy consumption, and therefore carbon footprint, of the higher and further education sector. So, the question arises as to whether retaining data for extended periods of time adds to the total. In short does digital preservation increase or reduce our data storage and energy consumption needs? As usual the answer is, it depends. There are a range of issues here that are well outside the competence of the digital preservation community and are unlikely to appear within a preservation plan. But there are broadly three considerations that the digital preservation community can influence directly: the type of storage used; the scale of the facility required; and the amounts of data in question.
By and large, spinning disks consume more energy than offline storage. Moving data to tape, or powering down the disk drives will certainly slow down playback and is likely to be problematic for materials that need to be available instantaneously and continuously. So there is a balance between the responsiveness of the repository and the amount of carbon we judge wise to consume. It’s really just an additional design consideration in the long-established principle of tiered storage.
There are also economies of scale to be achieved: a small number of large data centres will generally consume less energy than a large number of small ones. This is one of the claimed benefits of cloud computing, though it’s fair to warn that not all commentators agree. Once again pointing to upstream energy generation issues, the location of a data centre matters hugely. Data centres operated by Google and Apple in North Carolina take energy directly from the state’s power grid which in turn depends on around 50% of its supply from coal and 39% from nuclear. Now if you are already in North Carolina, you would still achieve carbon reductions by using a cloud data centre over a proliferation of local networked drives. But if you were in Finland it would be ecological madness to use a service in North Carolina. The CSC data centre in Kajaani, established with access to its own hydro-electric plant, means that Finnish researchers can store and access their data with virtually zero carbon emissions. From my desk in the sun-kissed sub-tropics of Glasgow I can only marvel at how CSC have turned an annual average temperature of less than 2oc into something not far short of a boast. So if you want greener data storage, it makes sense to seek economies of scale and to probe service providers’ claims about their own green credentials.
All of this is to some extent tinkering, and much of it is common sense which the IT department may already have in hand. But digital preservation creates one specific opportunity to be green: by reducing the quantities of data that an organisation holds. There’s a strange and lingering misconception that Digital Preservation is about saving everything. Let’s clear this mess up immediately. It’s about identifying the parts that matter, privileging them, and getting rid of the rest.
In the context of burgeoning digital resources, a determined effort to identify, document and retain data of enduring value means that the right data is available to the right people at the right time in the right format: it brings efficiencies of scale and scope to corporations, agencies and individuals. It enables planned disposal and deletion. Digital preservation enables the consolidation of legacy systems: without it, agencies are forced to maintain and repair a profusion of redundant systems which add cost and reduce effectiveness.
Just in case you think I am off on a tangent, that last paragraph is from the DPC’s strategic plan. Disposal is baked into the digital preservation. It takes different forms in different contexts I grant you, and for sure appraisal, selection and disposal may happen before anything is transferred to the repository. But deletion and preservation need each other. Disposal without selection and preservation is mostly reckless; preservation without prior appraisal and some disposal is almost never achievable.
Let’s get back to the green agenda. Here are three ideas that align digital preservation more directly with greener ICT.
We know the value of checksums to ensure the authenticity of documents over time and that two files with the same checksum are almost certainly identical (check the small print on this claim: it is possible to falsify a checksum so you might want to apply two different algorithms). That means, with almost no additional work, digital preservation tools create an opportunity for de-duplication, one of the basic tenets of green ICT. And in a large organisation (such as a large government department), when a large uncompressed file (imagine it was a PowerPoint) is saved many times by many different employees (such as by being emailed to all staff), and when rampant duplications mean that one file consumes 10% of the corporate network storage (it’s a real example but just not one I am allowed to publish), then perhaps we can agree that this is overkill? We could maybe save one of them?
Compression is sometimes perceived as a cardinal sin among preservationistas, so it might seem strange that I should be recommending it to you here. I am fully signed up to the arguments against lossy compression and can also understand how certain types of compression amplify the impact of bit rot. But, like every other aspect of digital preservation, context matters. A little compression is better than no data at all: compression may indeed increase the risks of bit rot but it also makes replication easier. It’s why we need to move from speculative theories of data loss to testable hypotheses, reproducible experiments and quantifiable risks. It’s why we need a series of testbeds and laboratories to help guide our decisions. I don’t have a problem telling you to investigate compression, and to reduce your carbon footprint accordingly. Just make sure you have a clear understanding of the risks, which in turn means making sure you have a clear understanding of the long-term use case. It sounds awfully like a reason to develop a digital preservation strategy and to understand your designated community.
De-duplication and compression mean we can be tough on archives: but a digital preservation strategy means we can be tough on the causes of archives too. Imagine a large multi-national physics experiment with a colossal research facility buried under a mountain. Now imagine the hundreds of petabytes that such an experiment generates. No one will thank you for wanting to store and back up all that data but after such a substantial investment we can presumably agree that some of the data has potential to be re-used. And so by offering a simple guarantee that the most important elements will be secured for the long term, and by making a serious commitment to reproducibility, and by having well established and understood disposal and retention schedules, we give researchers permission to create as much data as they like. You can’t have one without the other.
(It's) not (all) about the money (money money)
The triple bottom line encourages us to think about value additional to financial gain: but it doesn’t exonerate us from thinking about the financial aspects of digital preservation. In comparison to green digital preservation, there’s an awful lot published on the costs and benefits of digital preservation so this can only be a summary. (If you want to get into this topic properly, I recommend your entrée is the whole of the 4C project blog roll and most of its deliverables, especially the Economic Sustainability Model.)
What is the financial case for digital preservation? I have previously argued that, in a financial world where houses are no longer as safe as houses, trying to persuade investors of the value of data is a strange kind of folly. Data has little intrinsic worth: it’s the opportunity that matters. That matters because while we might get bullied into telling our managers how much we cost them, we shouldn’t talk about cost unless we’re also sure about benefits. Cost models are alluring but by definition they take you to the wrong place.
Let’s start by noting that most organisations are experiencing digital preservation for the first time and even those well versed in digital preservation face significant new pressures as data volumes expand. The practical experience for many new-entrants is that digital preservation is an unfunded mandate. It’s just something else you are being asked to do on fixed or dwindling budgets. A new, big and growing responsibility has simply thudded onto your desk but the Board hasn’t ever asked if you can do it, let alone voted you the resource to deal with it properly. And because it’s new (or expanded) then you need to develop or expand your infrastructure to cope. In most other walks of life we’d understand that the costs of building something are not the same as running it: that one-off capital costs are different from on-going revenue costs. It’s easy, but perilous to muddle these and too many digital preservation case studies depend on revenue functions delivered on capital budgets. It is the curse of projects that they are often the only way to make progress but are often more expensive than the properly funded service. Ask yourself, are you estimating revenue costs based on a capital project? If the answer is yes, then you may be looking in the wrong end of your telescope.
Secondly, most digital preservation teams are trying to solve a problem now and a problem for the future. There’s a data heap in every home and every office and every laboratory in the land. Unsorted, undocumented unloved but growing. So, like some modern digital workflow rendering of an ancient moral fable, is it better to reduce the size of the heap or to try to stop it growing? Sorting obsolete data of dubious provenance and limited documentation is incredibly time consuming which is why we need to start thinking about preservation much earlier in document lifecycles. That makes digital preservation more affordable. But, and this is perhaps where I depart from a lot of digitization or records management advice, if we only consider the digital lifecycle then we miss an opportunity to remodel the business process that generates the data in the first place. Perhaps if we insert a tiny bit of preservation thinking into the design of the business process, then preservation becomes an intrinsically achievable goal from the outset, not something we need to tack on at the end. And asking if business processes are sustainable in digital terms, places us into a useful dialogue about the digital transformation that is in turn generating the content we are being asked to manage. Of course, business processes are themselves subject to the constraints of organization and infrastructure lifecycles. If organisations like DPC can insert continuity into the underlying technologies or organisational vision, then we can start to make obsolescence obsolete. That’s a longer-term goal. For now we should simply note that if an organisation wants to make sure digital preservation is expensive just ignore it for a few years.
Even if we don’t call it digital preservation we need to do something sooner or later about the quantities of data our organisations retain. As David Rosenthal has artfully reminded us over and over, the per-byte cost of data storage has fallen impressively each year from about 1960 following Kryder’s law, but the rate at which it has fallen has slowed significantly since 2010. To quote: ‘if the industry projections pan out … by 2020 disk costs per byte will be between 130 and 300 times higher than they would have been had Kryder's Law continued’. And all the while our ability to generate data is expanding. You can take your pick, but here’s one widely quoted source saying that the digital universe has been predicted to grow by a factor of 300 between 2005 and 2020 (that’s a factor, not a percentage). Granted, there’s a mismatch between the capability to generate and necessity to store. But whatever way you look at it, these trends cannot but result in very large increases in data storage costs: it’s only a matter of time.
And here’s the problem. Our budgets are barely expanding at all. If at all. One DPC member – a national agency with a global reputation – reported to me privately that their future projections were in the region of a 6% drop per annum in the near term. They were secretly pleased it was not worse. Now it may be possible to offset some of these costs with cheaper storage or to move effort around. But as economists warn over the turmoil of mounting inflation (albeit a UK phenomenon) then its clear something is going to have to give. A digital preservation strategy, empowering strategic retention and deletion, is not simply an environmental benefit but a hard-financial necessity.
So, if you know that data storage costs are likely to get out of hand, would you rather plan it around the digits that you are required to keep or the whimsy of storage vendor pricing?
But we’re not just in the business of driving down costs. Digital preservation can stimulate growth. I won’t steal their thunder, but watch this space for an interesting analysis coming soon from Neil Beagrie, John Houghton and others about a robust and thorough analysis they have completed recently on the economic impact of three major data centres in the UK. It will give some very interesting figures about the return on investment generated by the Archaeology Data Service, the Economic and Social Data Service and the British Atmospheric Data Centre. I will let Neil and colleagues give the figures but I can at least attempt to put them in context. The UK government is planning to spend £42.6Bn on a railway line into London from Manchester and they expect this will return to the economy two pounds for every pound spent. It’s quite controversial. The ESDS, ADS and BADC cost a lot less and, pound for pound, deliver a lot more. And there’s no protest movement trying to shut them down.
Digital preservation for all, for good
A museum collection or art gallery may contain priceless articles and it’s true that many objects are irreplaceable: but we can insure them based on crude market valuations. Archives and libraries may be treasure troves of learning and literature but I know of at least one major University which raised a socking great mortgage against its library to pay for a campus expansion. The ‘treasure’ is not just metaphorical. And the value of data? I doubt I could ever raise a mortgage but I could probably build a business model depending on what you do with it.
We talk a lot about data loss, but I think we miss we miss the point. Let's go back to the very beginning. We have invested massively in the digital transformation not to benefit the data but to solve real world problems with real impacts on real people and real lives. Digital technologies help us detect crimes, cure diseases, and prosecute wars; they connect products with customers, problem owners with problem solvers, and friends and families with each other. They put tremendous cultural riches at our finger tips whenever and wherever we want them. They enrich and entertain in ways which our parents could never have imagined. Which we could scarcely have imagined. This is not an abstract question of bits and bytes. It’s about the real lives in the real world.
These technologies depend on a complex interaction of software and hardware with people, but software and hardware and people have the inevitable habit of changing. Thus the attainment of real world goals is contingent upon a fragile configuration which we know will change. Change is not a bug. Without it there would be no digital technology. And strictly - repeat this only very carefully after grasping the full nuance of the argument that follows - data loss is not a problem either. But the inescapable and entirely predictable loss of opportunity matters a great deal.
And here, we find ourselves back in the triple bottom line. Investing not simply for financial gain but adding value to the social and cultural contexts we inhabit.
Following this analysis, digital preservation is not about data or files or bytes. It’s not about access: if access is all we want then we palpably lack ambition. Nor even is our concern with managing risks: as if mitigation were teleological. Our goal is to have an impact on the real world, now and in the future. You might come to it intrinsically as transparency, wellbeing or business continuity, or we may be forced into it as legal protection, regulation and compliance. Drawn to it or dragged to it, digital preservation is oriented towards healthier, wealthier, safer, smarter, greener, more creative, and more transparent citizens, companies and communities.
The answer to ‘why preserve’ won’t be found among the bits and bytes: it will be found among the strategic ambitions our companies and agencies espouse. Thinking about how digital preservation plays a role in the creation, accumulation or destruction of social value challenges the digital preservation community to look beyond our comfortable assumptions. As if it wasn’t already hard enough, we are suddenly confronted by the ethics and norms of the agencies and communities we serve; and with some small duty to ensure they point the right way. And it means quickly and emphatically turning on its head our self-created and self-defeating narratives of gloom. Digital preservation is not about data loss, it’s about coming good on the digital promise. It’s not about the digital dark age, it’s about a better digital future.
A foot or three in the door
A recent innovation, the quadruple bottom line, adds sustainable innovation to the mix. So as well as asking what value is being generated now, it asks how if that value can be generated on an ongoing basis and whether that value will be available to future generations or conversely are we inadvertently and unreasonably plundering our children’s inheritance? I haven’t developed that idea further here, partly because I don’t see much uptake to that approach, and partly because the term evidently has different meanings and an argument about semantics would be sterile. But also, I don’t think we’d need to sweat that one. Sustainable innovation is what this whole blog post has been about.
There’s no single solution to the question ‘why to do digital preservation’ and there’s probably not even a higher-level reference model or framework that is appropriate in every setting. There’s no doubt however that we cannot make the case on our own terms. We need to make the case in terms that our audience understand and that means aligning our practices and aspirations with the prevailing assumptions of management. All too often the economic arguments are privileged because the economic perspective is the discourse of the privileged, a ruthless and unconscious bias that is indifferent to ethic or consequence.
By moving the argument away from crude assessment of financial impacts to a triple bottom line that is sensitive to social and environmental purpose we can achieve three things: we respect the currents of management; we create space for values and outcomes; and we treble the chances that someone will listen. Perhaps you could call it a foot in the door: and the old management axiom puts it a foot in the door is worth two on the desk. Perhaps you could extend the metaphor and say it puts three feet in the door.
Comments
I would also take the opportunity to hazard a similarity of "digital preservation" and "PRINCE2": in order to have chance that someone will listen, we have to "tailor" the digital preservation design "to suit the environment".
Following this comparison, I believe we have to "exploit" the obsolescence risks, setting next to the IT managers and modelling services and preservation infrastructure together.