database or sludgebase?

Getting rid of the datacentre

It’s time to move our thinking forwards, by getting rid of the datacentre concept as quickly as possible. We need to start building the next generation informationcentres. If we don’t, we will find ourselves choking on a surfeit of useless data while gasping desperately for the oxygen of vital information. 
sludge return by defmo on flickr
strange attractors

While reading about chaos, it occurred to me that data and datacentres act as strange attractors for each other – it’s surprising how quickly a datacentre, once created, attracts ever-growing heaps of data; much of which may not have been envisaged when the datacentre was first conceived. And much of which is not really needed at all.

never mind the quality, feel the width

The public sector seems particularly prone to the thrill of the ever larger dataset (and I know they deal with big numbers but I also know that they could be better designed) so size is not always a compelling design constraint. A number of factors, including sloppy application design, piecemeal architecture and unnecessary data capture all contribute to the inexorably rising volume of data “under management.” 

Of course, human nature will usually respond to an abundance of cheap resource by consuming ever-larger amounts, to paraphrase an old saying: never mind the quality, feel the width of my database. 

paying the cost of falling prices

As data technology (aka IT) prices continue to fall in real terms, i.e. on a bang per buck basis, so we have fallen into a seemingly bottomless spiral of using more and more [and ever cheaper] data storage, instead of making sure that we hold only that which is genuinely necessary to conduct business and satisfy our retention obligations. When storage is relatively cheap and kit is relatively cheap, it’s very hard to resist the temptation of “cheap” data.

there’s no such thing as cheap data

But when you consider the lifetime costs of capture and retention, there’s no such thing as cheap data because data needs administration. Which isn’t getting cheaper. Easier perhaps, but not cheaper. 

gigabytes, terabytes and trilobytes

Left to its own devices (sic) a database is quite capable of turning into a sludgebase, wherein once useful data lurks – steadily deteriorating into many layers of depreciating value. Without regular attention a sludged up database becomes less like an administrative task and more like an archeological challenge of geological proportions. Data mining and “drill-down” reports may well be slick, routine operations for a well-administered datacentre but a nightmare where databases have been allowed to fossilise and ossify, the technology becomes a fast track to paleontology.

how could things be different?

Well there are a number of obvious things that we can do to prevent ourselves being overwhelmed by data:

  1. Stop gathering so much data – many applications seem to gather data (especially personal data) for the sake of IT. We were arguably so much better at reducing data volumes when storage and processing power cost more.
  2. Avoid or eliminate [non-transactional] duplicate data.
  3. Archive only compliance-critical data
  4. Dispose transactional data more regularly

Let go of the datacentre concept and start delivering real value

Good housekeeping will only take us so far though. If we really want to progress we should let go of the datacentre concept because the focus on data is sub-optimal. We should be focused much higher up the intellectual scale, i.e. on information and knowledge. As long as we continue to treat data as king, we will struggle to deliver real value for our stakeholders. 

Have you ever stopped to really consider why we still see so many reports of unexpected cost and disappointment as the outcomes of IT investment?

I have. And I believe that the IT-centric paradigm, with its data-based (sic) pre-occupation, is preventing us from achieving truly effective information systems that will deliver real value. The consequent problem is the Trillion Dollar Bonfire – a massive, global issue.

We must recognise this important formula: Information ≠ Data

I have written before about the dangers of using the terms data and information interchangeably. Our enterprises are given data (even if they pay for capture, storage, transportation and manipulation) but they have to create information, by attributing meaning to data within context.

So, if our enterprises need information, why do we still have datacentres?

We should scrap the [last-century] datacentre concept and start talking instead about informationcentres. That is a far more logical concept, and it would focus our attention squarely on a much more important business asset.

I will write more about informationcentres in a future posting. In the meantime, I would love to hear from anyone who has already scrapped their datacentre and moved forward to a more appropriate construct for 21st Century systems. :mrgreen:


  • http://knowledgemodel.blogspot.com/ David Flint

    “enterprises are given data”.
    Not at all. Customers and suppliers give us information. We carefully reduce this to data by discarding the context in which it’s supplied or, worse, assuming that we already know the context. We store it and process it and print it.

    THEN we worry about what it means. It’s no surprise that we get things wrong.

    • http://www.colin-beveridge.com Colin Beveridge

      A really excellent point, thanks David. Perhaps this re-inforces the principle that one man’s data is another man’s information? And vice versa, of course. :mrgreen:

  • http://knowledgemodel.blogspot.com/ David Flint

    Yes indeed. The technician thinks it’s data and the user thinks it’s information.

    Data and information are not different THINGS but different VIEWS. I’ve developed a more sophisticated model by applying semniotics. It’s on my blog at http://knowledgemodel.blogspot.com/2009/06/information-is-data-in-context.html and
    http://knowledgemodel.blogspot.com/2009/07/creating-message.html.

    I’ve tried to be comprehensive and rigorous so it’s a bit dense.