Destroying Dark and Unstructured Data

By Chris Boyes

Over the course of regular business activities, organizations collect large sum of data and all too often fail to utilize this information. All this unnecessary data not only adds to maintenance costs, but also makes the organization susceptible to greater risks.

All data eventually amasses into what is call dark data. 90% of all big data is dark data. Through this article, we'll go through what exactly is dark and unstructured data with the best methods for destroying this unneeded data.

When you are collecting large amounts of technical information, you are bound to end up with some data that has either become valueless or obsolete. Such data is called dark data.

Gartner originally coined the term and defines it as: "the information assets that organizations collect, process, and store during regular business activities, but generally fail to use for other purposes".

The reason why organizations aim to get rid of dark data is because both dark data and useful data get the same treatment when it comes to storing them. Hence, dark data has all the costs of useful data but provides no real value.

Unstructured data is data without any structure. On one hand, we have structured data that has a clear structure or format. If you wanted, you could easily find and pull data from a structured database as each type of information and data is placed into categories and differentiated using parameters, making it very accessible. On the other hand, we have unstructured data which has different types of data all mixed together making it hard to find specific information. This inaccessibility can lead to data being lot and ultimately turning into dark data.

Following are some of the popular examples of dark and unstructured data:

  • Email conversations and attachments
  • Knowledge base articles
  • Log files
  • Previous employee information
  • Raw survey data
  • Old versions of documents
  • Archived web content
  • Completed marketing campaigns
  • Hidden files used by various applications


What are the risks of dark and unstructured data?

Dark and unstructured data is collected in a similar fashion as its more organized and useful counterparts. This makes it vulnerable to attacks from outside entities. Though the data is often called useless, someone else with other intentions may find it useful. In fact, hackers find dark and unstructured data as a major source to exploit companies.

Furthermore, Gartner predicts that through 2021, "more than 80% of organizations will fail to develop a consolidated data security policy across silos, leading to potential noncompliance, security breaches and financial liabilities."

As time passes by, the dark data continues to grow in size. The increasing dark and unstructured data leads to increased storage and maintenance costs and most importantly, increased risk to a security breach. Unlike companies, hackers may find all this data quite useful to cause disruption.

The first step to dealing with dark and unstructured data is to find it. Once you have identified and converted your dark and unstructured data into a more formal and organized format, you can start the process of safely disposing of it.

Finding dark and unstructured data

To start the process of destroying unneeded dark and unstructured data, you must start by analyzing and converting the data into a useful form. In other words, you must structure your data. To find all that data - start by looking in places where you might've stored it. These locations could be physical storage devices such as hard disks, pen drives, CD-ROMS, etc. Or they could be online databases such as clouds and servers.

After you have located your data, you need to classify and structure your dark and unstructured data. This is important because you do not want to wipe all of your data, even if it's dark or unstructured. Your dark and unstructured data may have information that can be analyzed to get better business insights.

As we mentioned earlier, unstructured data can have important business notes that can be used to improve different parts of your business. Therefore, it's very important that you identify and separate data that has no value for the business from data that does.

It's also a good practice to regularly audit your old databases to avoid it piling up. There are many tools available that can help you streamline this process - both free and paid applications. It might be a good idea to take a look at such tools as they can help make the whole process faster and more hassle-free.

Storage devices are made to retain data, therefore simply deleting files may not be enough. Due to the advancements in forensic tools, deleted data can be easily recovered from storage devices such as hard disks and pen drives. To make sure their data doesn't fall into the wrong hands, organizations prefer the process of data sanitization.

Data sanitization is the process of deliberately, permanently, and irreversibly removing or destroying the data stored on a memory device. The keyword here being irreversibly which means that once a device has been sanitized, there is no way of getting that data back - not even with the most advanced forensics methods. There are a number of ways to wipe your storage devices clean. The most common ones being: physical, encryption, and overwriting.

Physical Destruction

Physically destroying the storage device by shredding, drilling, or incinerating it is a common industry practice and is generally accepted as a secure method of data destruction.

The physical destruction of the storage device makes recovering data virtually impossible, but it comes at the cost of not being able to reuse or sell the devices. Not only is this method harmful to the environment, but the data hasn't been fully eliminated since the data remains on fragments of the drive.


One of the newest methods of data sanitization, it is also known as cryptographic erasure. The main idea behind the method is you use an encryption software to encrypt or lock data using a key. The key is then destroyed or deleted, rendering the data inaccessible.

The biggest drawback to cryptographic erasure is that the data still remains on the drive. Though locked an inaccessible, the fact that data is still present, make it unsuitable for destroying sensitive data.


Overwriting, also known as data erasure is a trusty method of data sanitization. Software is used to overwrite data on the storage media completely destroying data in the process. It is considered the fastest and cheapest method of data sanitization. Not only that, but overwriting is the most eco-friendly process and removes the chance risk of human error when properly applied.

A Good Data Erasure Software

The software is the key here since it does most of the work. To ensure complete sanitization, there are a few points to consider:

  • The ability to select a specific standard, based on unique needs and different parameters.
  • A feedback mechanism to verify that the data sanitization process was successful and all data is now gone.
  • Automated audit trail to confirm complete drive erasure.
  • Detailed reporting used as a quality assurance tool.

With so many choices, it is important that you pick the method best suited for your needs. Keep in mind the following when picking a data destruction method:

Economic value: What's the economic value of the data? If it's worth a lot, it might attract unwanted attention and thus the safest method should be chosen (degaussing/data erasure)

Resale value / reusability: Storage equipment can be expensive, therefore you might decide to sell your storage devices or reuse them after safely wiping the drive clean. Data erasure is a good method if this is the case.

Traceability: If it's important to you that you can trace the entire history of your data sanitization process, then that leaves you with the overwriting method which creates a log of serial numbers, security standard used, date of sanitization, and verification of complete erasure.

 Chris Boyes is a data solutions specialist with Clarabyte