What is Structured Data, and why do I want it?

By Daniel Warren-Smith

Data and information is the lifeblood of any business.  Emails.  Paper archives.  ERP transaction data.  Board papers.  Financial reports.  Presentations.  Analysis.  Invoices.  Quality records.  Internal audits.  HR records.  Customer files.  Purchase orders. CRM notes.  Intellectual property.

All are these sources of information were useful at the time of their creation.  Depending on your strategy for incoming documents, you may have automatically extracted the necessary data and kicked off subsequent business processes, or this may have been manual.  Over time, with growth in your data storage (volume doubles every 18-24 months), if the data isn’t structured, it becomes a fatberg , clogging the arteries of your business.  By contrast, when data is structured, it is easily organised, and sets up your business for automation.

Data in business exists in 3 forms:  Structured, semi-structured and unstructured.

Structured Data is clearly identified by type of record, has key metadata captured for each data element, and exists in a searchable format.  The metadata attached to each data element includes standard information such as creator, data and time of creation, location, record type, as well as data specific to the type of record might.  Additional metadata might include customer name, employee number, account number, or vendor number.  When the data element is a recorded in a database (eg purchase order), the database system automatically creates the metadata.  But what if the data element is an email?  Or a customer credit application?  Or a box of paper invoices? Some data is innately structured as part of its creation, but other data needs additional work to create the structure.  Examples of structured data include a transaction record in a database, a PDF customer file in searchable text format, with metadata tags of customer name, customer number and date, a PDF remittance advice in searchable text format which includes customer name, amount, date and invoice reference.

Semi-Structured data is data that can be searched but not in any systematic way.  Think of a folder full of word processed documents – desktop computing tools can search the contents of all documents for a string of text, but you cannot group the documents (for example by customer name), nor can any business intelligence be derived from the document set.  You may have some metadata on the records (eg Destruction date, or departmental owner), but no way of analysing the recordset.  AI tools such as Ephesoft Transact, or the Document Automation capability of RPA software can be applied to analyse semi-structured documents. Example of semi-structured data includes file folders including PDF invoices, or archived emails.

Unstructured data is sometimes called “Dark Data”, not because it has any nefarious intent, but because it exists in the dark!  Think of a box of paper.  You believe it contains records relevant to the business, you  may have some idea of the age, you may even know the department it came from.  But do you don’t really have any idea of what’s in the box.  Similarly, a group of non-searchable pdfs stored within Sharepoint.  The only way to understand or make any use of the content is to manually review each document, which is obviously time-consuming. Examples of unstructured data include paper records, scanned pdf of handwritten documents, or non-searchable PDF scans of files.

Why do I want structured data?

Structured Data enables automation. When incoming documents are structured, subsequent processes can be easily automated. Data entry into operational or customer relationship management systems can be automated from the metadata extracted from each record, automated process flows can be initiated and document filing and storage can be done automatically.  This has the dual benefit of reducing processing cost, as well as reducing  cycle times.  Document capture software such as Ephesoft Transact can ensure that document types are recognised, and data fields automatically extracted.  Instant feedback can be provided to external parties regarding missing or illegible documents, and the automation removes bottlenecks than occur when masses of documents are waiting to be processed. Lower cost, and better customer satisfaction? Almost sounds like Process Nirvana!

Data with Context enables your team to be more productive. Data with context facilitates search.  In 2012, McKinsey reported that 19% of the typical work day is spent searching for documents.  A 2018 IDC study put this figure at 2.5 hours per day, or 30% of an Knowledge Worker’s day.  When data is stored centrally, and  enriched with contextual metadata, it can be easily searched. Whether the records are stored on network file servers, or in Enterprise Content Management systems, the right document can be located very quickly.  A conservative 50% improvement in time spent searching for documents translates to over $5,000 per annum per knowledge worker – for any mid to large size organisation, this translates to hundreds of thousands dollars of productivity annually. 

Data with context enables analysis. When there is context attached to your data, it can be analysed by big data tools.  For example, if you have 30,000 trade customers, it can be very time consuming to review all customer credit limits if this data is only stored in paper or pdf format. If this is part of the metadata captured when filing the item, then the data can be easily analysed.  Similarly, it can be very time-consuming  to review 1,000 employee contract records for specific clauses, for example notice periods.  If this data is held in a separate field, then the analysis is easy! The right profile of metadata for record sets enables insight to be drawn from the data which can be used to support business decision making.

Structured Data enables you to do make decisions on storage.  Can I destroy this data?  Should I destroy it?  In general, best practice is to keep information for the minimum amount of time you are required to by law.  The exception to this is that you should not destroy data that is required to run the business (obviously!), or data that provides you a competitive advantage (such as the 11 herbs and spices recipe!) Disposing of data reduces your storage cost, and also reduces your liability in the event of a Discovery Order.  The volume of data doubles at least every two years.  Managing this ballooning storage environment can be problematic without proper disposal programmes. When record classification is captured as a mandatory medadata element at the time of creation, then retention periods can be set as the item is placed into storage, and disposal can be easily managed across that entire recordset.

Technology solutions

Adding structure and context to inbound documents was once problematic.  Document templating was more black art than science, understood by few.  However the machine learning and artificial intelligence capability of Ephesoft Transact means that adding context to inbound data is much easier to manage across the whole enterprise.

If you would like to know more about how you can transform your organisation by structuring incoming data and automation processes using Robotic Process Automation solutions, contact us at info@24pc.com.au, or schedule a call using the link below.

Leveraging 18 years experience in Information Management, Daniel Warren-Smith founded consulting form 24PC in 2020  to help organisations optimise, digitise and automate document centric business processes using best in breed software across Capture, Robotic Process Automation, Workflow and Enterprise Content Management platforms.