August 3, 2016

Bridge to Big Data | Extraction, Mining & More

Over the past month or so, I’ve spent quite a bit of time talking to a variety of people on the topic of bridging the gap between content stored as documents in a repository and big data analytics, and realized that a common theme emerged in most of my conversations.

 

To fully understand the theme, I believe it’s important to understand the major premise – namely, that valuable data is stored as content within document repositories. Report Analytics is a new name for a relatively mature technology. It’s all about extracting data from the information contained in that content, whether it be statements, invoices, or reports. For purposes of this brief discussion, we’ll refer to all of this content as reports.

I use the different terms “information” and “data” purposely. Reports deliver data that is presented so as to be read by a human. To a certain extent, relevant information is provided solely by the fact that it is eye-readable. The position of the data on the page and its relationship to other data on the page is indicative of a hierarchical relationship. In other words, where the data appears can usually indicates its relative importance, its sort order, and its “belonging” to other data on the page. This positioning and referential integrity provides information derived from the data.

Of course, the very fact that the structure of the report is necessarily static limits the use of the information in the report to the specific purpose that report was originally designed for. Even though the report may (and most likely does) contain data that can be used for other purposes and to answer other business questions, it is unusable in the originally designed format for anything other than its original purpose.

These statements, invoices, and reports, stored in an Enterprise Report Management system like IBM’s  Content Manager OnDemand (CMOD), serve the very important original purpose of providing a legal archive and record of transactions that can be used to answer legal and regulatory questions as well as the, perhaps more importantly, customer queries. But these stored documents can provide so much more.

Back to the common theme. Although most of the people that I spoke with had knowledge of data mining, none really understood the bridge to big data and the power of extracting data from the information contained in reports. This is data mining taken to the next level – extraction of data from the information contained in reports, invoices, statements, etc. and transforming, repurposing and combining the data with information from external sources. It results in critical knowledge that can be used to gain additional insight, analysis, and intelligence.

Learning more about internal processes, manufacturing time-lines, quality control, customer purchasing patterns, customer satisfaction levels, and a host of other valuable information is readily available in the reports, invoices, Explanation of Benefits, statements, etc. that are produced on a normal everyday basis throughout business and industry. Timely acquisition of this verified, substantiated data can help increase revenue and reduce costs and make better business decisions.

In many conversations, I had the gratifying experience of creating an “Aha!” moment. Recognizing that the reports stored in an ECM system can be used for data acquisition and not just data and information distribution is, admittedly, somewhat of a paradigm shift for many. But, once realized, it’s a paradigm that opens the door to reduced costs and valuable insights.

Crawford Technologies provides software to quickly and easily extract pertinent data from huge volumes of invoices, statements, and reports enabling you to turn your old content system into a new Big Data resource. It’s all there – it’s just a matter of creating the bridge.