Are SharePoint BLOBs dragging you down?

By Gary Van Buhler

I might be a little old fashioned, but I came up through an Electronic Content Management (ECM) and Transaction Content Management (TCM) community where you never put BLOBs directly in your database. By BLOBs, I am referring to Binary Large Objects (not the 1988 thriller movie) which are chunks of images or other large data stored as a single object. I was taught that the database should hold pointers to BLOB content stored on another inexpensive storage system like SAN or NAS drives. However, when I entered the world of SharePoint, I realised that everyone was storing BLOBs in their databases.

To BLOB or not to BLOB? That is the question (in SharePoint).

You can certainly store BLOBs externally from SharePoint using EBS (External BLOB Stores) or RBS (Remote BLOB Storage) or third party solutions with pointers in your content database. My natural tendency is to recommend these solutions since they reduce the size of the database and ease the load on the database server. The result is often a performance improvement without upgrading hardware. 

However, I’ve been advised often that it’s OK to put BLOBs in SharePoint and look to hardware improvements to account for any performance effects. I’ve also realized that the potential downsides of external BLOBs are closely related to SharePoint issues that we frequently consult on. For example, RBS has the potential to slow down backup-and-restore, SharePoint version upgrades, or migrations between environments. (For more info, read this great series of articles by Dan Holmes – http://sharepointpromag.com/sharepoint/blob-or-not-blob).
Yet, do these considerations lead to the conclusion that BLOB storage in SharePoint is actually best practices?

ECM versus TCM 

The best practices answer is, “it depends.” Specifically, the correct solution here depends on the balance between and organisation’s ECM and TCM needs. After 25 years in this industry, I see a clear tension between these two areas that is often ignored.

ECM solutions built for collaboration and living documents are not well suited for TCM. On the flip side, solutions designed for transactional proficiency – such as the static document processing of BLOBs – are simply not good collaborative environments (although many will claim to be). Simply put, content management focused on the end result of business operations is different than content management focused on the creation of content. The latter is intended to make knowledge workers more efficient and effective. The former should drive and record business actions and decisions. These are different goals.

How do you strike a balance? 

The first step toward best BLOBs practices is understanding one’s TCM needs and avoiding the tendency to lump these requirements in with larger ECM and SharePoint goals. Remember, TCM is a system of record for managing process-related documents (read high volumes of static documents, most often images or PDFs).

Examples of TCM targets include invoice processing, application processing, employee on boarding, insurance claims, Loan origination, patient charts, and the processing of permits. If an organisation doesn’t engage in these or similar activities, then limited TCM activity and BLOB storage present few problems when stored in SharePoint.

However, for others, SharePoint may demonstrate legitimate performance issues when trying to organise and run transaction based imaging solutions alongside collaborative processes when a large number of BLOBs are stored in the database.
Let’s do a little math around the classic example I’ve encountered among customers – scanning large numbers of documents into the SharePoint repository (the database).

  • 1GB of disk space will hold about 20,000 images.
  • 1TB of disk space will hold around 20,000,000 images.
  • 1600 pages per month dumps 1 GB of data into your SharePoint database.

After 5 years you have pumped 100,000 documents and 5 GBs of BLOBs directly into your SharePoint Database.

That’s a big chunk of data for a database server to handle! Note that many organisations have even drastically higher page-per-month operations further exasperating this problem. Such an organisation could spend money to upgrade the server hardware to handle the load (expensive hardware at that). However, this seems like kicking the can down the road. Why not get those BLOBs out of the database and improve performance for SharePoint’s best qualities - collaboration and living document content management?

Database size and additional considerations

When RBS is enabled, the database size won’t grow at the same magnitude with each file that’s added. However, Microsoft’s recommendations for database sizes in SharePoint will include the size of the database and the BLOB store, so you’re not escaping their size recommendation. 
From a simple performance perspective, if the files are going to be more frequently read but not revised, RBS can improve performance. If files are going to be revised frequently, then RBS may decrease performance.

With your files being stored outside of SQL, you remove SQL overhead and memory usage when requesting the files. This allows SQL to process other important tasks and queries. Also, accessing the files from a file system is faster than pulling it out of a SQL table.

Last, but not least, TCM cost should be addressed. Luckily for buyers, ECM and TCM solutions no longer needs to break the bank. Although SharePoint itself is a good ECM starter, many organisations need a more robust solution. Thankfully, recent players are causing significant disruptive pricing in this market. However, TCM in particular has been quite commoditized over the past several years.

It’s now possible to run a very low cost TCM solution alongside SharePoint (which you have already made an investment in), to minimise the effects of transaction based BLOBs on SharePoint’s collaborative activities, while still providing transparent access to all the content through a SharePoint unified search environment.

A real world example
Let’s examine how this might work for a purchase-to-pay process.

A user can:

  • Pull up a purchase requisition form from a library in SharePoint (which could maintain several revisions of the form) and fill it out.
  • Send the form around for approval in a SharePoint Workflow (or third party workflow) and on to accounting to create a purchase order (the generated purchase order could be saved/printed to SharePoint or the TCM for long term storage and access).
  • After the ordered product is delivered, shipping documents can be scanned at the point of acceptance, into the TCM, and the trailing paper invoices (yes, many, many organizations are still working with paper invoices!) could be scanned and processed through the TCM Accounts Payable process (providing automated two or three way matching).

When all is said and done, the living editable documents are in SharePoint and the static BLOB images are in the TCM. If a customer service rep needed to view all the documents, they would search using a SharePoint enterprise search tool that would search both SharePoint and the TCM based on metadata and bring back a document list without worrying which repository it is coming from.

Despite, the “norm,” I am a strong proponent of taking all the transaction based activities out of SharePoint, and run it where it makes sense.
There are many other great examples where TCM processes could run side-by-side with SharePoint processes and take the burden off the SharePoint database server. 

Gary Van Buhler is an Experienced Business Consultant currently VP, Business Development, Software Solutions at Total Solutions a SharePoint Consulting and Development firm.