Excuse me, I'll have that data in XML size, please

Excuse me, I'll have that data in XML size, please

A look inside XML, the Web's new mode of document delivery that could bring new order to the Internet.

By Alan Roebuck

Apart from faster bandwidth and cheaper access, the one thing we need on the Web today is structure. The appeal of the Internet in the early days of its evolution, its random nature and chaotic organisation, is precisely the quality that potentially stands most in the way of its successful adoption as an e-commerce infrastructure. But structure is coming to the Internet, courtesy of the eXtensible Markup Language (XML). This new language is designed to complete the evolution of the Hypertext Markup Language (HTML) that has been the staple backbone of Internet document design since the origins of the World Wide Web at the beginning of last decade. It will not replace HTML, being designed to complement it until Web applications, browsers and all the other elements of the Internet and intranet have evolved to be able to fully implement compatibility with the new mode of document description.

The World Wide Web Consortium (W3C) is the driving force behind the evolution of XML, which is reaching critical mass and breaking widely into public consciousness as a major e-commerce enabling technology. Indicating the importance of its mainstream appeal, Microsoft promises that Windows 2000 is XML-compliant from top to bottom, and the vendor is playing a leading role in the use of XML to ease interoperability between diverse corporate and e-commerce applications.

An eventual replacement for HTML, XML can offer immediate benefits to vendors and users alike.

The development of XML began in 1996 with the formation of the XML Working Group (originally known as the SGML Editorial Review Board) under the umbrella of the W3C. XML is more than the latest trendy TLA (three letter acronym). To find out one of the reasons XML is so significant, turn to a Web search engine such as AltaVista and type in "John Kennedy." I did and I received 25,601 returns, which would have taken about a week to trawl through. The first 10 listings included a weirdly diverse range of entries beyond the expected references to the assassinated American president. There was a link to the Robert Burns home page, courtesy of a letter the poet once wrote to someone called John Kennedy; a welcome message from the State Treasurer of Kentucky (John Kennedy Miller); and a ream of genealogical data on John Kennedy Brown (Senior), a "journeyman cooper born in 1857 in Durham County, England".

Fine tuning my search criteria would have narrowed down the field, but it is increasingly difficult to develop expertise on the different query syntax used by all of the different search engines available on the Web. Also, it is impossible to currently specify what type of reference you are seeking when searching a particular term. If I narrow the search down to "Book" AND "John Kennedy," am I asking for a book about John Kennedy, or one written by John Kennedy and which particular John Kennedy am I looking for? Meta tags are able to provide limited information about HTML documents that is able to be processed by search engines, but a more detailed description of Internet content that enables much finer tuned searching will depend on the widespread rollout of XML.


The reason is that XML will divorce descriptions of the content of a document from the way that the contents are presented to a browser or Web application. XML is a subset of Standard Generalized Markup Language, or SGML, which is used in many large publishing operations, but is too complex to be understood by a browser. XML is a database-neutral format that is able to be targeted at any browser and any application. Instead of just presenting data to a browser in a way that gives no ability for the browser to infer what the data is pertaining to, XML will provide detailed breakdowns of all elements of the data in a hierarchical form.

With XML, other kinds of hyperlinks will insert text or images right where you click, instead of forcing you to leave the page.

The presentation of an XML document is dependent on a stylesheet. The standard stylesheet language for XML documents is the Extensible Style Language (XSL). Other stylesheet languages, like Cascading Style Sheets, are supported as well. But XML is about more than the way that content is presented on the Web. Its implementation will allow the easier interchange and interpretation of business documentation between varying applications. The ability for enterprise information to seamlessly interoperate in diverse applications across the intranet and Internet is the secret to effective implementation of ERP and e-commerce.

XML provides the ability for applications to process information more easily by providing a standard way of labelling data by using a particular tag that allows the information to be found and extracted automatically. Much of the programming work that is required for e-commerce sites today is devoted to extracting information from applications and making it available in an HTML format, and vice versa. It will be easier for organisations to standardise on XML data within the enterprise intranet, then implement applications and processes that make use of the metadata, than on the World Wide Web.

For this reason, there will be a need for tools such as Vignette StoryServer that can automatically convert XML to HTML. It allows you to create an HTML "template" with rules that specify how to format various XML element types for display on the Web. In addition to making information easier to find and process on the Internet and intranets, XML also revs up the conventional hyperlink that we are used to encountering in HTML pages to be transported to another Internet destination. The Xlink standard allows you to choose from a list of multiple destinations. Other kinds of hyperlinks will insert text or images right where you click, instead of forcing you to leave the page.


XML is now a required technology for database management systems (DBMSs). Oracle 8i, IBM's DB2 and even Microsoft's next release of SQL Server all provide support for translating data into and out of XML-defined formats. On the document management side, many vendors are either working on or have just finished integrating support for XML into their document repositories. FileNET is one notable example, with its recently released Panagon suite using XML to power its move into e-commerce. Content management developers have already embraced XML as the foundation for their applications.

One of the earliest uses which was foreseen for the Extensible Markup Language was to replace the Electronic Data Interchange (EDI) protocol, which has been used for many years as a communication standard for supply chain and business-to-business trading.

However, there are still clouds on the horizon that could threaten XML's promise to remove inconsistency between Web-based data formats. At the moment, XML transactions rely on DTDs (document type definitions) to allow browsers or applications to determine the properties of XML data. When DTDs do not match, it's not always possible to share messages. To avoid the potential scenario where DTD incompatibility threatens the potential for information sharing via XML, vendors have already responded with specific products.

IBM is developing the Business-to-Business Protocol Framework, or BPF, to help developers create applications using tpaML (Trading Partner Agreements Markup Language), a set of new IBM extensions to XML. The extensions, which IBM has recently submitted to the international standards body OASIS, take XML beyond a simple data transport protocol to include capabilities that enable companies to integrate business processes, workflow, security and other services into a B2B transaction, according to the company. IBM plans to add tpaML support to its software by the middle of this year.


For its part, Microsoft has established the BizTalk initiative (see www.biztalk.org) to provide a central point for organisations to define the arbitrary tags that can be used to represent any data within an XML document. Microsoft is using BizTalk to encourage the development of standards for business-to-business and data interchange, at least for the Windows world. In a move to evolve its Visual Basic programming tool from a PC-centric world to one where the Web is at the heart of every enterprise application, Microsoft has announced it will be releasing an update in about a year that will feature a built-in, drag-and-drop HTML editor that will let developers create Web sites without having to write HTML code. Visual Basic's support of XML will allow for easy conduct of online transactions on the Internet and intranet. Because of the XML support, Visual Basic developers can link their Web applications to other programming models, including Enterprise JavaBeans and Component Object Request Broker Architecture (CORBA).

Another organisation promoting the uptake and standardisation of XML is OASIS (www.xml.org), the Organization for the Advancement of Structured Information Standards. OASIS is a non-profit, international consortium with more than 50 members, including Microsoft, Oracle, and Sun Microsystems. It is attempting to create a set of guidelines for businesses to use XML to send data to each other. The challenge is simple. To ensure that business documents are able to be interchanged easily between different systems, open standards must be developed.

Business Solution: