With the advent of Big Data, it is estimated that 70%-80% of all data collected and stored by an enterprise is in an unstructured form. There are various approaches, technologies and methods to automate the analysis of unstructured data such as text.
However, regardless of advances in technology, some Customer Experience Management, Marketing and Customer Service professionals continue to use the accuracy argument to deny their employers access to significant operational and financial benefits. They argue that the results, produced by the textual analysis software products, are substantially less accurate than results produced by humans, and therefore it is best to ignore the vast repositories of human knowledge and disregard the immense cost of storing them until the technologies mature.
It is humorous that people with such attachment to “accuracy” usually have difficulty clearly defining what it means to them in this context or how to measure it.
“In the fields of science, engineering, industry, and statistics, the accuracy of a measurement system is the degree of closeness of measurements of a quantity to that quantity’s actual (true) value. The precision of a measurement system, also called reproducibility or repeatability, is the degree to which repeated measurements under unchanged conditions show the same results”
Given the ambiguous nature of unstructured data, the challenges of formal definition are easy to understand. In its core we deal with an interpretation by a human or by a machine of what was said or written by another human. A single individual will interpret the same text with different results depending on a multitude of conditions, such as time of day, context in which the text is framed or the state of mind of the interpreter at that moment. In addition, no single individual can possibly handle the volumes of data available – and with each additional interpreter joining the task, the reproducibility of translation results declines exponentially.
The speed and cost are obvious arguments for the automated processing, but a machine also offers a better solution to the problem of the “accuracy” of big, unstructured data analysis. An interpretation of a single piece of text may not agree with an interpretation of a detractor at a given moment, but an average result of a large data set analysis will consistently produce measurements within 10% of a human tester’s results*.
The debate isn’t whether or not automated analysis of unstructured data is “accurate” enough. The debate is whether an enterprise can ignore their vast data reserves in the Age of the Social Consumer.
* This number is based on our internal tests that we conduct at least 3 times per year.