Why Unstructured Data Visibility Matters

By Krishna Subramanian

Most enterprises are flying blind with their unstructured data. They don't know what they have, who is using it, why it's growing so fast, or how to be more efficient in managing it.

IT leaders need insight into their unstructured data. Without it, they are hindered in their ability to cut significant costs on data storage. As it is, most enterprises are spending more than 30% of their IT budget on data storage, backups, and disaster recovery, according to a 2022 survey on unstructured-data management.

Beyond high spending—which can get higher if you don't optimise cloud-storage placement—there is also the question of monetising data. Unstructured data too often holds significant untapped business value. Most organisations use but a small percentage of the data they produce and store. A recent Accenture study revealed that 68% of companies don't realise tangible and measurable value from their data.

Since unstructured data comprises the lion's share of all data in the world, you need to know what data you have, who needs access to it, how much of it is active, where it is stored, and its value to the organisation. You need visibility.

Attaining this visibility isn't easy, of course; in our complex world of hybrid clouds, unstructured data is strewn across corporate and colocation data centres, edge systems, and various cloud services. Moving data into a central repository would be an expensive and likely impossible proposition because of the distributed nature of data and data creation in the modern world.

Since unstructured data (including images, video, and documents) can reach billions of files of various types and sizes, organisations need a systematic approach to analysing and classifying it. Creating searchable data index of all the organisation's data across silos—from on premises to edge to cloud—is an important first step to getting visibility.

Getting Started with Fundamentals

You can address data-visibility issues in your organisation by developing a plan and process to assess and track your unstructured data. There are several fundamentals about your data that you'll want to start tracking, including:

Volume of data in storage
Growth rate of data over time
Age of data
Access patterns, such as time of last access
Location of data
File types and file sizes
Top data owners and types of data they are storing
Costs of data storage, backup, and disaster recovery today and in the future

Here's why these data points are important:

Data-usage metrics: Without the ability to see which files/shares/directories are being used regularly and which haven't been touched for a year or more, it's hard to do anything other than keep all your data on your expensive, high-performing storage. If, however, you can see how much of your data is rarely accessed (or "cold"), then you can manage it at a much lower cost by migrating or tiering it to cheaper storage, such as cloud object storage (AWS S3 or Azure Blob, for instance). Additionally, in organisations with chargeback models in place, department managers need to know data-growth metrics and who the top data owners are so that those individuals are included in data-management conversations.

Sensitive data: Organisations sometimes need to delete data altogether for legal reasons—for instance, ex-employee data or ex-customer financial data. The ability to easily search customer and individual names connected to files delivers a huge advantage here. Granular search capabilities (such as by file extension or metadata) let the user locate intellectual property or financial data that might have been copied or moved to a location without appropriate security protections or access rules applied.

Financial metrics: As part of a data-operations (DataOps) and financial-operations (FinOps) strategy, IT leaders should understand the costs of storing data on current technologies and be able to project costs for moving to a different storage platform. From there, they can determine if it would be cost-effective to, say,

Move less-active data to the cloud
Move on-premises data to network-attached storage (NAS)
Delete some portion of data archives

When armed with knowledge on their data assets, IT teams can set policies to transparently tier data to the most cost-effective storage based on data sets' use cases and priorities. With this empowerment, IT leaders can slash storage and data-management costs while accommodating rapid data growth.

Data Refinement

Once you get started on an unstructured-data assessment through indexing and analytics, consider further refinement. When you tag data with additional context, such as demographics, descriptive details (for instance, "image of eyes"), or project names, you open search parameters to help users and to make better data-management decisions. (Look for an unstructured-data-management solution that supports automated tagging by policy and can retain tags for data wherever it moves.)

Moreover, systematically classified, well-managed, easily searchable data is vital for fuelling the latest generation of affordable, powerful artificial-intelligence (AI) and machine-learning (ML) applications. New AI/ML tools can jump-start an organisation's innovation cycles, deliver noticeable productivity gains, and/or optimise anomaly detection to dramatically reduce security/compliance risks.

As data becomes ever more central to business decisions, product development, and customer strategy, knowledge about that data is increasingly valuable to people across the organisation. The CIO needs to understand high-level implications of cloud storage and data growth. Researchers want to know what data is available for future projects. Legal and security teams need to ensure data is protected and discoverable if needed for auditing or investigations.

Yet visibility alone isn't enough. To get ROI from unstructured-data management, this data knowledge must be integrated into workflow processes. It should be simple to move from insight to action—migrating, tiering, copying, and deleting data, along with ongoing data-lifecycle management—to meet user, application, and departmental needs.

Krishna Subramanian is Co-founder, President and COO, Komprise. This article initially appeared HERE

Business Solution

Enterprise Content Management

Information Analytics