Paul Stokes is Product Manager for Jisc
Losing your data may well cost you a LOT more that you thought. How much more? That's a difficult question to answer, but one that Jisc and the Digital Preservation Coalition are looking to answer…
Oh no!
The data's gone!
The disk failed / I pressed the wrong button / there was a bug in the backup software / we were hacked / there was a ransomware attack / [insert disaster of choice here].
That’s terrible.
10 gigabytes / 500 gigabytes / 10 terabytes / 1 petabyte / [insert size of loss here] of data gone without hope of ever getting it back. What's it going to cost us to recover; to get back to where we were?
Quick… What did it cost us to create the data in the first place?
Well, we spent ten years gathering it, using grant money we no longer have access to, from people who are no longer available, using resources we no longer have. Sooooo. Rough back of the envelope calculation (+--++++x ++= ). The data cost us this much…
Ah. That's a big number. So that's what we've lost? That's the sum total of our financial exposure?
Actually, no.
That’s just the beginning.
Quite apart from the fact that the cost of creating data is a very poor proxy for the actual value of the data (valuing data is HARD), and quite apart from the fact that it may not even be possible to recreate the data anyway, there’s much more.
So much more.
Let's start with the simple stuff. The root cause analysis (you ARE going to do a root cause analysis aren't you?). What caused the loss in the first place? Hmmmmm. Gather a group of the finest (expensive) brains and have them spend anything from a few hours to a few days or longer figuring it out.
So now you know why it happened.
Next you pay to fix the problem (replace that server for instance) AND you pay for the mitigations you're going to put in place to stop it happening again. Yet more expense. If your root cause analysis team are up to scratch, they will probably uncover other dodgy stuff at the same time—more single points of failure. So now you pay to fix those as well and put in place yet more mitigations[1].
Now let's consider the people who were about to use all this data[2]? All of a sudden they can't, and they're being paid to do nothing…. Okay, I know that's a little simplistic. They'll either be redeployed or made redundant. Either way, they're not achieving what they were being paid to do so it's a cost (as well as being a personal tragedy for those who've lost their jobs). And what if that data underpins the use of all the day-to-day systems? Too bad. It's downtime for everyone.
And whilst we're on the subject of downtime and job losses… One widely cited statistic is that "94% of companies suffering from a catastrophic data loss do not survive – 43% never reopen and 51% close within two years"[3]. That data loss could cost EVERYONE their job. The statistic is a little long in the tooth (c 1987), but in the interim companies have become more and more reliant on data so it wouldn't be surprising if the situation were worse today.
There's more…
Although you may have tried very, very hard to keep the loss under wraps, in some cases—particularly when it comes to data that subject to some form of statutory control and public scrutiny—it may be impossible to do so. The world knows about your humiliation. Your reputation takes a massive hit. Yet another cost that is challenging to calculate. How many customers do you lose as a result of loss of trust? How many contracts do you lose? How do the banks react (that operating overdraft is suddenly attracting a higher rate if interest and a lower ceiling)? How much are you going to have to spend on spinning a positive message to try and regain that lost status?
Insurance… surely you can insure against this sort of thing? Well, that's the funny thing about insurance. You have to insure before the disaster has taken place. And after the disaster? Sure, we'll insure you… for this hugely inflated premium (and by the way, you wouldn't have been covered for that last event as you didn't have the appropriate mitigations in place to start with).
I mentioned statutory control a little earlier… What if the data lost was kept as part of a statutory requirement? Or underpinned a regular statutory reporting requirement? "Sorry, but we lost it" doesn't really cut the mustard with regulators. Fines all around.
What about the intellectual property that was tied up in the lost data? I don't even know where to begin with this one…
So you can see that, although the initial value of a data loss may at first glance appear to be substantial in its own right, the true cost (taking into account the knock on effects) could be enormous (and very, very difficult to quantify).
So, what was the point of me highlighting all of this pain and expense?
It's this.
Digital Preservation is about mitigating risk.
Mitigations cost money.
It is hard to justify spending that money—creating a credible and proportionate business case—without a firm grasp of the magnitude of the sums of money involved (the value of what's at risk and the cost of protecting it) and the likelihood of loss. There has been some work relating to the likelihood of loss and the cost of preserving data but, as outlined above, understanding the true magnitude of any (potential) loss is significantly more difficult.
As I mentioned earlier, it is difficult to value data. It's even more challenging to put a figure on the cost of the knock-on effects of data loss. Often the sums involved only become apparent after a disaster has happened.
And no one wants to be in the position of suffering their own data loss to drive a business case.
However, we know that destructive data disasters have already happened to some unfortunate organisations and individuals. Given the commercial sensitivity of any such loss, it is likely that the disasters we know about are just the tip of the iceberg—no one likes to share such information publicly if they can avoid it.
Assuming that the organisations that have suffered data loss have survived the incident, it is safe to also assume that they will have undertaken a root cause analysis and possibly quantified the magnitude of the loss. This means that there is without doubt information in existence that can shed some light on the true cost of data loss. The problem is, it is challenging to persuade those who have this information to share it, even for the greater good.
And that's where Jisc and the Digital Preservation Coalition come in—two organisations who are trusted and independent and with the expertise to collate and anonymise this type of information. Organisations who can use it to produce something that could help others to make their case and avoid the pain and financial hit of avoidable data loss.
We're asking organisations and individuals who have suffered loss to share anonymously some key information that will benefit the whole of the digital preservation community (and beyond). This information will be used to undertake some quantitative and qualitative research around the cost of data loss and then produce an anonymised report for the benefit of the community.
Please help.
(and please pass on the link to this survey to any individual or organisation you feel could help as well)
[1] One of those mitigations is, of course, going to be a properly specified and fit for purpose preservation strategy and system.
[2] Someone should be in the frame here labelled as a "User", even if the actual somewhen is still a little vague. No? If the data was never going to be used, then why were you keeping it?
[3] It's difficult to trace this one back to the original source, but it appears to be from a journal article—Christensen, S. R., et. al, "Financial and Functional Impacts of Computer Outages on Businesses", Center for Research on Information Systems, The University of Texas at Arlington