As part of a burgeoning information governance (IG) project one of the most helpful things any organization can undertake is to run an audit of its unstructured data (its content) and if it has serious IG ambition it really needs to acquire the technology and skills to run and interpret its own audits. In short, when you truly understand what you’ve got, a whole range of possibilities open up and experience suggests that the most obvious is to get rid of some junk. Its a natural human reaction to feel that way and it makes good sense to do it too.

In my line of work, I utter the above statement endlessly in many different forms and despite much practice and a fast maturing IG market, I still find myself having to work hard to convince some organizations that its true. I guess the real problem is that the bean counters want (need?) to see some numbers that make ROI crystal clear (see my other article) and unfortunately people find that hard to do. The issue often lies in the lack of clarity around not only the cost of storage but also in how storage is charged and the intangibles that make up the resulting project (uncertainty of outcome, staff effort, stakeholder management, disposal/retention policies, managing legal hold etc). Let’s address these points head on:

  • Uncertainty of Outcome. Uncertainty of cleanup outcome comes from two sources. The first is that there is always a chance that your content is clean (or clean enough). The second is in an organization not being able to decide what clean content looks like. Ultimately, history determines the first point; new organizations, small organizations or ones where turnover and change is very low are really the only ones likely to have clean content – you should be able to make a judgement on that. When it comes to a definition of ‘clean’, the good news is that the organization decides and so it is really only uncertain if the organization doesn’t have the will to make decisions and policies for content disposal. In short, ‘are they up for it’?
  • Disposal/Retention Policies. It confuses me that customers can cite retention policies as a reason for making clean up difficult. For me it is quite the opposite. Retention policies require disposal and when you understand that principle applies to all content, physical and electronic, in whatever state (forget the pointless ‘what is a record?’ discussion), most organizations should conclude that content discovery and clean up is a great first step to a modern retention implementation.
  • Legal Hold. This really shouldn’t be a big deal. What I mean by that is, if legal hold is an issue, you need a means to handle it and ensure potentially responsive files are collected/protected and you should not proceed unless you can answer that question. All you need to do is make the clean up process aware of those factors and crack on.
  • Staff Effort. With a definition of ‘clean’ in place, a lot of clean up is possible without huge staff effort. Also, with a properly engaged management the main issue of ‘over-review’ because staff are concerned about the risks of cleanup should not be a problem. That is, if the clean up is supported by management, defensible and done in good faith, those executing it should not be concerned and feel the need to explicitly eyeball every file.

With those points addressed, the question becomes one of cost benefit and ROI. First and foremost you need to understand the cost recovery relevant to the organization and apply the right model. In short, if you have a pile of sunk capital costs in storage infrastructure, seeking operating cost reduction through clean up can lead to disappointment. Also, if your service arrangement does not reduce charges with reducing volumes you’ll get no return there either. In such cases, you’re looking for return at milestone points by avoiding part of the contracted or capital commitment you need to make (avoiding purchase of 20TB of enterprise storage might save more than USD 100k on its own. For operating cost reduction, the variables are complex and varied and so I’ll make some simplifications and assume cost saving is possible based upon a volumetric charge.

Its surprising how hard per-TB operational costs and charges can be to pin down but let’s be clear that it is MUCH MUCH more than the cost of storage hardware and that it must account for blended costs across storage tiers including backup, cloud and similar. I see wide ranging estimates but across a global customer base and many different industries it all seems to boil down to a range USD 5k to 10k (GBP 3.5k to 6.5k) per TB annually. Regardless of the real number, the point I want to make is that this estimate provides a budget from the clean up that allows organizations to acquire and embed the necessary technology as a key capability for much broader, softer and more valuable IG benefits. Doing the maths suggests that modest content volumes (say 20TB) achieving a 30% clean up rate would provide a USD 35k/GBP 22k year 1 budget for the procurement. As volumes progress beyond 100TB the budget becomes much more significant and can support a compelling standalone business case.

So, my point is that content clean up is best considered part of a broader approach to IG. In many ways it is a key first step because it can actually be connected to real savings (be they in-year operational or longer term avoidance) and therefore provide a budget to acquire the technologies, processes and skills that will be needed to implement many other parts of an IG programme or strategy. Gartner, Forrester and the Information Governance Initiative do those justice in a number of their papers – suffice to say that risk reduction, migration cost avoidance, operational efficiency etc all figure heavily – and getting started with content clean up can often make good sense.

