Who Moved My Data?

If you’re looking for a cozy change management parable in the mold of (or on) “Who Moved My Cheese”—Spencer Johnson’s 1998 top-selling business book—this is not the blog for you.

Who would have thought back in the early 1980s that when we started copying and moving business data to the then new-fangled contraptions called PCs that we were writing the first notes of a Requiem for Privacy? Then, of course, PCs were big in mass and slim in storage, and mostly chained to office desks behind locked doors, so it didn’t seem like we were compromising the security or reliability of the data from our mainframes. Then, we knew that the true Master resided on that tape or this DASD (Translation for millennials: disk drive), and there it would be forever safe.

But now we carry mainframe-sized data stored on our smartphones, extracted, copied and modulated to answer multiple masters. Now that every Jawbone and jet engine is recording detailed data on the Internet of Things, we have finally realized that the privacy and security Jini is well and truly out of the bottle. There are many facets of the problem, from information provenance to data protection. But, one aspect that is getting increasing attention from regulators and governments is data residency: where data exists and the rules and regulations about that data that must be met.

Leaving aside the almost intractable issues of mobile, distributed data copies and untrustworthy sources on the Internet of Things, data residency poses significant challenges today even for transaction data, which is the basic, legal foundation of every business. In mainframe days of yore, we knew the legal jurisdiction and geographical location of our transaction data—it was in this database, owned by that application and it lived in this data center. And yet, even then, it wasn’t quite that simple: there was a backup tape somewhere else and a data warehouse copy in another data center. But, at least, IT knew how many copies existed and could read the JCL that said they lived.

Fast forward to today and data increasingly resides “in the cloud”. Many business people are unaware that the cloud actually touches the ground, depositing not-quite-random bits and replicated pieces of business data in a widespread net of data centers known only to the algorithms of the cloud file system. It is done for good reasons, of course: resilience, update performance, speed of access from all over the world, and so on. But, with governments and regulators applying different, overlapping and even contradictory regulations on the storage and use of personal information, throwing your hands up in the air and saying “I know nothing… about where it’s stored” will not prove to be an acceptable defense.

The bottom line is that these highly distributed, decentralized, cloud-enabled databases will have to be extensively tagged with metadata about the provenance and physical locations (both current and historical) of the stored data. Furthermore, the metadata will have to be maintained continuously throughout the lifecycle of the corresponding data. For database vendors, this implies additional overhead and complexity in their tools, based on new architectural approaches that consider data residency issues from the start. For database implementers in business, it demands deeper analysis of data usage and higher costs, especially in cross-border use cases.

Come to think about it, there probably is a parable to be told here. Unfortunately, unlike in the case of “Who Moved My Cheese”, it is unlikely that the answers can be easily written on walls. “Be Ready To Change Quickly And Enjoy It Again… They Keep Moving The Data” doesn’t really solve anything!

Add new comment