PDF: The document format for everything
PDF is one of the most widely used formats worldwide. Numerous companies use it to exchange information between business partners or in-house. PDF offers various possibilities – which is why the format is complex. Sometimes it is necessary to define stringent quality standards in order to guarantee interoperability.
Quality requirements for certain workflows and processes are defined via the PDF standard formats. The standards relieve users from the burden of technical details and help clarifying responsibilities in the process chain.
In order to guarantee reliable PDF processing given the many PDF generation programs and the central importance of PDF, the standards are developed by ISO. A frequently asked question is whether a single standard for "good quality" PDF would not be easier to deal with. The answer to this and the reason for the currently 6 subset standards lies in the different application scenarios of the format.
The aim of the ISO definitions is to provide users with a PDF that is functional for their specific purposes. A standard that would, for example, combine the requirements for printing with the specifications for archiving or accessibility could be easily defined, but would entail enormous effort for the creation of valid PDFs. Differentiation, therefore, takes place here taking the different requirements and their implementation into account.
The youngest ISO standard published in 2017 is PDF itself, which is now available in version 2.0. The standard is almost 1,000 pages long and contains many detailed improvements and some important clarifications. A number of chapters have been completely rewritten to improve the comprehensibility and uniqueness of the specifications for the PDF constructs, which remain largely unchanged.
PDF/X - Standard for the printing industry
PDF files are the "raw material" for almost all professional printing. There are many requirements that go beyond what is needed for display on a monitor. For this reason, the industry formulated corresponding requirements shortly after its adoption of PDF and developed PDF/X (X stands for eXchange) under the umbrella of ISO. This standard was already published in 1999. The format is largely established in the printing industry and supported by numerous software solutions. Specifications based on PDF/X, e.g. from the Ghent Workgroup or the Swiss initiative PDFX-ready, define further requirements for various printing processes and products.
The technical advancement was taken into account by PDF/X-4, which is the current state of the art. (PDF/X-5, which is also available, rather covers special requirements.) ISO is currently working on PDF/X-6, which incorporates changes made by PDF 2.0. These include page-based output intents, black point compensation, halftone origin and spectral measurement data for spot colours (CxF). The latter enables a better reproduction of spot colours, for example in a company logo, as well as consistent output on different channels, such as digital printing or printing on different media.
PDF/A for long-term archiving
In addition to PDF/X, PDF/A plays an important role in real-world applications. The standard was developed at the instigation of the manufacturing industry, which required a recognized, robust PDF for archiving its production documents. It was published at the end of 2005 as international standard ISO 19005-1.
Companies and public institutions benefit from PDF/A because digital documents can be archived permanently. While the format was initially used primarily for scanned paper and as a replacement TIFF in archives, it is now also predominantly used for digitally generated documents. The format is widespread, especially in Europe.
There are today three standard parts of PDF/A and different conformance levels:
- Level B (Basic) guarantees clear visual reproducibility of the content. It is easier to generate than the other levels, but does not guarantee text extraction or searchability. Therefore, reuse of the content is not necessarily possible. Scanned paper documents can normally be easily converted into PDF/A documents of this conformance level.
- Level U (Unicode) was introduced with PDF/A-2. This is an extension of Conformance Level B in the form that all text is represented in the Unicode standard, which guarantees full text search etc.
- Level A (Accessible) requires additional information on the content structure and the correct reading sequence of the document. Text content must be extractable and the structure must reflect the natural reading sequence. As a rule, this PDF/A level can only be achieved without time-consuming post-processing when converting from original digital documents.
PDF/A-3 was published as the third standard part in 2012: It allows the embedding of arbitrary files. Digital folders with more than one file can be conveniently represented with one PDF. Another example is ZUGFeRD invoices. The machine-readable XML data record with the invoice information is embedded in a human-readable PDF/A-3 file.
ISO is currently working on updating the standard on PDF 2.0 which would appear as PDF/A-4. PDF/E will be merged into PDF/A-4.
PDF/E for Engineering
PDF/E was developed for use in engineering and published as a standard in 2008. It integrates the ability for interactive 3D representation, making it particularly suitable as an exchange format for manufacturing specifications. In practice, however, this standard is hardly relevant. However, the manufacturing industry continues to formulate a need for an archivable basic format that - unlike PDF/A - also allows for 3-D models. Work is currently underway to modernize the previous PDF/E format and integrate it into PDF/A-4 as a conformance level.
PDF/VT & PDF/VCR for variable data and transactional printing
PDF/VT (ISO 16612) is the exchange format for variable data and transactional printing and was published in 2010. It is based on PDF/X, provides an alternative to PCL, PPML, AFP, etc. and addresses new trends in printing technology with individualization and digital printing.
Still relatively young is PDF/VCR (ISO 16613) for variable data printing in real time. Driven by practical requirements and published in September 2017, it allows variable data to be defined with PDF templates based on PDF/X-4. It is used for special applications that require secure real-time processing. One example is the printing of accompanying letters for credit cards, in which the credit card is read on the fly to determine the print data.
PDF/UA for barrier-free documents
In this ISO standard "UA" stands for universal accessibility. The requirements for a PDF/UA-compliant file define how texts, images, forms and the like must be created so that people with disabilities - and machines - can use them. PDF/UA thus helps to meet legal requirements for unrestricted access to electronic information - for example in public institutions, insurance companies and banks.
A technical basis of PDF/UA are the requirements for texts from PDF/A-2u. In addition, the standard contains requirements for the normally “invisible” structure of a document, i.e. reading order and structure in headings, paragraphs, columns, tables and alternative texts for images. All this information can be encoded in the “tagging structure” of a PDF file. Automatic, subsequent generation of the structure is extremely time-consuming, so PDF/UA generation usually begins at the original document.
PDF/UA can also be seen as the successor to the conformance level "A" from PDF/A. From all the PDF standards presented here, it places the highest demands on generation. While PDF/A files can be generated in conformity level "B" from almost all PDF files, this is not always possible with PDF/X and only very rarely with PDF/UA.
Prognosis: Will PDF standards be combined in the future?
In order to keep the creation of standards-compliant PDFs as simple as possible, the PDF standards mentioned will probably continue to exist separately.
However, the ISO committees are working to ensure that the next versions use exactly the same formulations wherever possible, so that it will become easier for software vendors to support more than one standard. However, also the current standards are written in such a way that such "double-valid" PDF files are possible without further ado. Some special knowledge is required to generate such “multi standard” PDFs with generation programs such as Microsoft Word, Adobe InDesign and others, but there are products that enable subsequent conversion - even in automated processes.
Dietrich von Seggern is Managing Director at callas software GmbH