The value equation surrounding data retention has changed forever. In the past, it was easy to accumulate endless stockpiles of data under the (at least plausible) assumption that one day it would be searchable and valuable – remember the hype surrounding “big data” from just a few years ago.
Even the notion behind the “data as the new oil” analogy speaks of endless optimism. If you read the quote, largely attributed to UK mathematician Clive Humby, it talks about the need for refinement…
Data is the new oil. It’s valuable, but if unrefined it cannot really be used. It has to be changed into gas, plastic, chemicals, etc. to create a valuable entity that drives profitable activity; so data must be broken down, analyzed for it to have value.
Several factors are behind the changed calculus. First, data is proliferating at such a rate that the signal to noise ratio has become unmanageable. According to a recent Forrester survey, the terabyte age is quickly morphing into the petabyte age, with most of the growth in unstructured data.
Next, information is becoming stale at an ever-increasing rate, with utility often being measured in hours and days instead of weeks and months. Without any sort of refinement, in the form of analytics, even relatively current data has virtually no business value.
This deluge of largely unused information is only complicated by our unending desire to replicate data so that nothing ever gets lost. According to Gartner analyst Nader Henein in “Practical Privacy — Managing Data Retention and Backups” (Practical Privacy):
Three decades of innovation in the archiving, backup and recovery industry have made it nearly impossible to truly “forget.”
The risk of over-retention of information is now tangible and significant. Henein goes on to state how the new wave of global privacy regimes are impacting the disposition of information…
Data protection laws such as the GDPR and the California Consumer Privacy Act establish requirements for data deletion both to reduce risk to individual privacy and to satisfy subject rights requests (SRRs).
In sum, the new information retention calculation must center around the “realized” value of information versus the perceived value of data. As a case in point, most unstructured, data at rest simply isn’t being leveraged for any business purpose.
A recent study found that a third of data stores have not been touched for three years.
– Practical Privacy
Finally, data breaches are now a statistical certainty. According to the IBM/Ponemon survey (“2018 Cost of Data Breach Study: Global Overview”) the average global probability of a material breach (per entity) is almost 30% within the next 24 months. With any breach comes the inevitable loss of information, made that much more inexcusable if that information isn’t being harnessed:
A breach impacting the existing user base is bad enough, but a breach impacting both existing and historic records unendingly exposes personal data — data that generates no revenue for the organization while proportionally increasing the potential for fines and regulatory action.
– Practical Privacy
These drivers have caused the Sedona Working Group (WG1) to promulgate a new treatise: “Principles and Commentary on Defensible Disposition.” The document is currently in draft form awaiting commentary, but this initial foray advances the disposition discussion significantly. One of the more impactful notions is that data “disposition” doesn’t need to be “defensible” in all scenarios:
.. the phrase “defensible disposition” suggests that organizations have a duty to defend their information disposition actions. While it is true that organizations must make “reasonable and good faith efforts to retain information that is relevant to claims or defenses,” that duty to preserve information is not triggered until there is a “reasonably anticipated or pending litigation” or other legal demands for records.
The goal of this blog isn’t to replicate the discussion contained in the Sedona paper’s well-articulated principles. Instead, it’s to note that the discussion needs to quickly move towards one where information isn’t horded. The new formulation needs to consider the realized value of any information, particularly unstructured data, because data breaches and privacy regimes have made untapped information much more likely to result in liability than organizational value.