Structured vs. Unstructured Data
Moth: They have been at a great feast of languages, and stol'n the scraps.
Costard: O, they have liv'd long on the alms-basket of words.
Shakespeare, Love's Labor's Lost
At least 80% of the data in the world is labeled unstructured, although most definitions of that word are perhaps kindly described as unstructured thinking. Much of the excitement about Big Data, especially social media data, stems from that one number. The thought is that such a large volume of data must hold a vast trove of business value… if only it could be discovered and mined.
While there is undoubtedly value to be found in this alphabet soup of data, a couple of key questions are seldom asked. First, how does that value compare to that available in the 20% of structured data? Second, how does the cost of mining the huge volumes of unstructured data compare to that of more traditional BI on the more limited and better understood set of structured data?
What is often overlooked in the rush to Big Data is that our structured business data is data that we have designed and constructed with purpose and intention. Like Shakespeare’s words, structured data has meaning and context, quality and care inherent within it. In contrast, social media might be compared to an “alms-basket of words” dropped casually and perhaps deviously by its creators. How can we know how far it a fair and statistically valid representation of the real world? “The Parable of Google Flu: Traps in Big Data Analysis” shows how Big Data can also mislead.
In many companies, there remains a wealth of untapped value in their existing, smaller, structured data. It can most likely be liberated at a lower overall cost than the Shakespearean pound of flesh that will eventually be demanded by the Big Data ecosystem to apply context, meaning, quality and all the other little words we take for granted in traditional data management.