The Tale of the Exploding Archive

By Michael Lappin

Recently, Microsoft had to solve a common archive migration problem for a Global 300 pharmaceutical company. The company was using a legacy archive platform that wasn’t meeting its needs. So its email management team made a great decision to move the data from the legacy email archive to Microsoft Office 365.

When they looked at what Microsoft was offering—unlimited personal archives plus the ease of use and reliability of Microsoft—the decision was obvious. But the real puzzle was figuring out how to get upwards of a billion email messages, about 100 TB, out of a single journal archive into 80,000 individual Office 365 mailboxes.


Distributing mail to 80,000 mailboxes is a Herculean task. Photo: Beate Meier

The Problem
Most heavily regulated companies such as those in the pharmaceuticals industry keep their email in a journal archive. A journal is a single repository of every email sent or received. It offers a convenient way to secure, manage, and search large quantities of email, which is why many organizations use a journal for eDiscovery and routine compliance auditing operations.

The catch with journal archives is that they are “single instanced,” meaning if a message is sent to multiple recipients, the archive only stores one copy of that email and maintains a list of everyone who received it. But in the Office 365 platform, the concept of a journal doesn't exist—only individual mailboxes archives.

You can see where this is going, right? It means if you’re migrating from a journal archive into Office 365, you have to “explode” the journal and copy those single-instanced messages into multiple individual archives. It also means searching that data for eDiscovery must span multiple archives, rather than the single journal mailbox.

Moving from a journal archive to Office 365 mailboxes involves a lot of explosions. Photo: VirtualWolf

So Microsoft faced two challenges. Firstly, most migration tools don’t know how to recognize the ownership of each email in a journal archive, so they can’t explode the messages into individual mailboxes. The other problem was end-user experience. You see, journal archives are only viewed and accessed by eDiscovery users, not everyday end-users. So how do you explode the emails from the journal in to the user’s personal archive without confusing them?

Microsoft called the only migration partner that could handle such a scenario: Nuix

The Solution
Nuix approached the problem differently. Our technology doesn’t use the legacy archive’s proprietary application programming interface (API) but reads the journal format directly. In a single motion, our technology was able to stream the data out of the old archive, recognise the owners of each item (which was often several people including the sender and multiple recipients), and copy the message into all relevant individual Office 365 mailbox archives.

Nuix technology placed all of these messages into queues on a Microsoft Azure cloud-based virtual servers for ingestion into those 80,000+ individual archives.

Because these messages were only needed for compliance purposes, we moved them into each user’s “Recoverable Items” folder rather than the inbox or personal archive. The recoverable items folder is not accessible to the user but the company’s legal team can search the emails or place them on legal hold. So users were not burdened by having to manage their old email but the compliance and risk teams had a defensible way to perform searches and audits without a traditional journal repository.

The Takeaway
This case underscores that Nuix is the only vendor that can solve two major problems many customers face when migrating to Office 365 from a legacy journal-based email archive:

  1. “Exploding” huge journal repositories into many users mailbox archives
  2. Placing the data in a location where the legal and compliance teams can access it without confusing end-users.

By the way, using 10 Microsoft Azure servers, Nuix can maintain an ingestion rate in to Office 365 of 50+ gigabytes per hour. That is more than a terabyte per day!

By redefining “explosion” as a positive, Nuix helped this multinational pharma walk away a happy eDiscovery-ready customer.

Michael Lappin is Director of Archiving Technology at Nuix