Top 7 SharePoint document scanning best practices

Are you planning to scan and digitise all your paper documents and manage them electronically in SharePoint? Then you are in the right place. SharePoint is a mature Enterprise Content Management (ECM) platform that allows you to manage the full lifecycle of content in an orgnisation. However in order to get the maximum value of its rich ECM capabilities you need get the content into SharePoint the right way. If you are coming from a paper based environment this requires seamless integration of your scanning solution with SharePoint delivering uninterrupted flow of metadata rich content into the platform. In this post I am going to share 7 best practices in planning and executing a SharePoint document scanning project
1. Define a metadata extraction Strategy
SharePoint as enterprise content management system allows you to organise, share, manage and find documents in the enterprise. However your ability to leverage the ECM features in SharePoint depends on how well you define a context into the unstructured content. Defining the context to the documents in SharePoint is done via metadata. Manual metadata entry (manual indexing) is inconsistent, cumbersome and error prone. This is where you might want to look at automated metadata extraction (automated indexing) of all documents in SharePoint during the document scanning process.
2. Define content search criteria first and work backwards.
One of the major goals in going paperless is seamless content search experience for your users. Ability to find and discover documents and information from any place, any time using any device is fundamental for meeting today’s business demands. Providing an advance search experience need a proper planning of index fields to find and discover content in an efficient manner. Low fidelity general search can be enabled by releasing searchable .pdf documents as the scanned output into SharePoint while high precision search requires upfront planning of managed properties, search schema, taxonomy and search result refinement criteria. Once you have a clear goal and idea of how your end users want to find information, you will have much deeper knowledge of what metadata data fields to associate with the documents at the point of scanning and data extraction.
3. Get your SharePoint content type design right
Content types are the fundamental base artefacts in SharePoint that allows you to organise the content. i.e. Vendor Contracts, HR Contracts, Policy Documents etc. Content types can have its own metadata fields, workflows and information management policies with retention schedules defined as it applies to specific groups of documents. When you analyse documents of a department you may find hundreds of content types which is a challenging tasks to manage. Therefore it is important to design a set of base level and child level content types as part of your information architecture in SharePoint. Once the content type design is done,this can be mapped into document scanning and indexing profiles for uninterrupted flow of information into SharePoint straight from your scanning software solution.
4. Centralise the routing and business rules
SharePoint has buit-in document routing and workflow engine that allows you to distribute the documents across the organisation based on the metadata properties associated with the documents. This can be enabled via content organiser feature in SharePoint with the help of drop-off libraries. It is important that you manage business rules centrally that distribute across the organisation. You scanning software should release the document into drop-off libraries along with right metadata where SharePoint can make the routing and distribution design from that point onwards. Clear separation of document data validation and business rule validation is very important for a maintainable solution in the enterprise. Data validation can be done within your scanning software application at the point of capturing the data off the document where as the rule validation must be done centrally in SharePoint upon release.
5. Configure the SharePoint Site collection as DRM sites
SharePoint 2010 and SharePoint 2013 has a mature document and records management (DRM) features built into the platform. High volume scanning sites can be configured as DRM sites to manage the full lifecycle of the scanned content and information. When SharePoint sites are configured as DRM sites it will facilitate information capture, control, store, find and disposition with least or no manual intervention. Some of the most popular features to leverage are in place records management, records centre, information management policy automation, legal holds,Document ID, Manage metadata services and termstore.
6. Do your capacity planning and container design right.
SharePoint as an Enterprise Content Management (ECM) System has been designed to holds millions of documents if planned correctly. Unlike in file share environment it has various levels containers where we store and organise digital content. They are site collections, Sites, Document libraries, Document Sets, folders and so on. Did you know that some of these containers has default throttling limits? Therefore it is important to design a document indexing and cataloguing architecture that can work with the default throttling limits. In a high volume batch scanning scenario capacility planning of SharePoint container should be done by assessing the initial document workload to be scanned and expected annual growth. This is a fundamental design decision in arriving at a sustainable document scanning solution with SharePoint.
7. Select a OCR scanning software solution that has seamless integration with SharePoint
Today there are number of OCR scanning solutions that has varied depth in integration with SharePoint as an ECM system. Some of scanning software solutions better leverage the rich enterprise content management (ECM) features such as Content types, Taxonomy, Document Sets, Drop-off Libraries etc. It is equally important to select a scanning solution that has zero footprint (no installation of software) on the SharePoint environment.
Modern intelligent scanning software solutions can handle enterprise content with minimal or no human intervention required. Some of the features to look for in selecting a enterprise OCR scanning and capture solution are:
- Distributed capture and centralise processing architecture
- Capture documents and content from multiple input channels (paper, fax, email attachments, web, file share, social,mobile etc.).
- Automated document separation and image enhancement,
- Auto-Classification of documents against SharePoint taxonomy and term store
- Automated data extraction and validation, Automated Data Redaction, Learn by example (auto learning)
A fine blend of OCR scanning and Enterprise Content Management (ECM) features in SharePoint allows you manage the full lifecycle of content information from the point of origination as profitable asset. We at Data Capture Experts have helped number of organisations with the design and implementation of high volume document scanning solutions leveraging the enterprise content management features built into SharePoint. If you need any help and advice in planning or implementation of any of the above points drop us a line and we will be glad to help. Email
Nalaka WithanageNalaka Withanage is the founder and CTO at Data Capture Experts. A company specialised in Enterprise Content Management (ECM) solutions with Microsoft SharePoint. Their approach to information lifecycle management uses a proven digital content analysis and transformation framework that allows you to capture, control, store, find, and deliver content and documents related to organizational processes. This framework helps organisations, maximise the value of their SharePoint investment to drive process efficiencies and minimise regulatory compliance risks. He is also a Microsoft Certified Professional and an active member of the SharePoint community and speaks regularly at SharePoint user groups and events in Australia.