Enter the Dragon: XML

Enter the Dragon: XML

May/June Edition, 2007: Allowing people to talk to machines was the first step. The next was enabling machines to talk to machines. Peter Webb canvasses the evolution of XML and its importance to real time e-commerce and SOA.

Readers of last edition’s column will recall that we discussed the major problems of traditional means of systems integration, which basically have relied on low level, direct point-to-point communication. The first is that the systems become “closely coupled” – a relatively minor change to one system (for example to its database structure) can cause another system to fail. The second is that as the number of systems increases, the number of potential point-to-point links increases as the square of the number of nodes – get enough servers and the problems start getting completely unmanageable.

As of the mid 1990s, the industry knew it had a problem, and even knew it had a name for the solution – a “Service Oriented Architecture”, all that was lacking was an actual solution.

The answer came from the web development industry, to address an almost unrelated problem. The issue was the limitations of html, which is the language used to describe the layout of pages in web browsers. The html language is almost completely a set of instructions for how information should be displayed, and contains almost no information about what the individual components “really” are. For example, if you want all your headings in purple and bold, html just allows you to essentially say “display the following in purple and bold”. What you really want to say “the following are headings, use the standards for headings”, much as you do in Word. This potentially allows the browser to cater for different local issues – like if the web page was on a mobile phone, or perhaps being accessed by a blind person.

Each of these environments would have a different profile for how a heading should be displayed. Of more commercial importance were issues relating to internet commerce. If you have an on-line price list, it would be far better if the browser “knew” that the amounts in the cost column really were costs, and the quantities were quantities, etc. This avoids a lot of programming effort and communication traffic to make the form behave intelligently.

So what they needed was a standard that defined what different fields on the page were, and their relationship to each other - a general purpose way of describing the information fields on the document, with instructions on how to display them held separately. To do this, they dipped into the standards bag of word processing and publishing in general, and pulled out a somewhat decrepit 1960s standard known as “SGML”, stripped it down to its bare essentials, renamed it as XML, and by 1996 had it agreed as the new standard by the W3C (the internet standards body).

What’s XML got to do with SOAs?

Now at this point, you may well be wondering what this has got to do with SOAs, but this is the bit where two worlds collide

Basing the new standard for describing the contents of a web page in a modified form of SGML turned out to be a stroke of genius, despite the fact that the standard was 30 years old and effectively dead for the last 25 of them. The part of SGML that became XML provided three huge advantages.

Firstly, XML completely separates the information content from its presentation (display), which is the problem they were actually trying to solve.

Secondly, it allows the information to be transferred in an hierarchical fashion – the “person’s first name” is part of their “full name” which is part of their “mailing address” etc. This is what’s called a tree structure, and whilst it’s not ideal (in my humble opinion), it is clearly far better than flat file structures such as CSV, being the only real alternative.

Thirdly, it was self documenting, in that every field had the field name (“First name”) as well the field value (“John”) in the record. Every XML file contains a stand-alone definition of the data within it. This makes the files a bit larger, but provides huge convenience if you want to distribute information – a client can tap into a data feed and decode the database structure using nothing except the information in the data feed itself. Cool, huh?

And this is really all an XML file actually is – the names of the data elements, their position in the tree structure, and their values. Users can agree on common data element names – as has happened in thousands of special interest groups, ranging from EDI for paying invoices, through to chemical research (for defining chemical structures), and on to publishing. (Yes, there is an XML standard for Word Processed documents. In fact there are two, Microsoft’s and the rest. Which do you think will prevail? The first reader to correctly answer this question will receive a complimentary, unlicensed copy of Word Perfect on 5½ inch floppy).

This allowed a completely different way to use the Internet. Previously, the World Wide Web had been about real life people accessing web pages. Now other computers could easily access web pages, by simply asking for the XML file and ditching the page layout information. Because it was self-documenting – every XML file must describe the data structure and all the field names – you could pull two XML structures on a screen and directly compare them. Your Finance system may have “employee.surname”, and your HR system may have “person-lastname”, but using a graphical tool you can say which fields correspond, and 15 minutes later you have completed your system integration (perhaps I exaggerate, but you get the idea).

Here, at last, was part of the answer as to how 10,000 hotels could talk to 20 travel agents with a million customers in real time.

Now, it’s been fashionable for a long time to bag Bill Gates and the Evil Empire generally. God knows, I’ve been running Office 2007 for a week now and want to personally strangle him. But say what you will, Bill Gates was the first major player to pick up on the significance of xml, and the .NET framework from Microsoft is based upon it.

Microsoft was also instrumental in another key development – the use of xml to encode instructions for what to do with the data, along with the data itself. This allows XML files to “invoke” processes on the other computer which are then to process the data, such as database searches. It’s very simple – the XML field basically allows you to specify which program is invoked (something like www.my-intranet.financesystem) and the exact service required (maybe “getallpayrolldata”, or whatever name the Finance system uses for this function). This became the SOAP standard, which in a bizarre and unprecedented display of unity the whole industry – read Microsoft, Oracle, SUN and IBM – suddenly agreed was a good, common standard.

And with that, the industry had created the standards base for a practical “Service Oriented Architecture”.

Who cares about SOA anyway?

“A proprietary SOA design which doesn’t use XML is almost certainly not going to fit in easily with other services on your network, and so is probably of little benefit.”

You may recall from the last column that the OASIS Group defined an SOA as ““A paradigm for organizing and utilizing distributed capabilities that may be under the control of different ownership domains. It provides a uniform means to offer, discover, interact with and use capabilities to produce desired effects consistent with measurable preconditions and expectations.”

That may be true, but what an SOA really is, is chucking some XML data at a TCP/IP address. This causes some process such as a Finance system to wake up (if its not already), process the data according to whatever instructions are contained in the header (like getpayrolldata), and send it off to wherever else it has to go – perhaps to a database server, or a print queue, or to a workflow process. No, you don’t HAVE to use xml to provide this functionality; and mainframe messaging systems like CORBA and its predecessors have been doing this for 20 years or more but here’s the rub: if you want to use or buy an information management system – say an EDMS, ECM, or CRM system – if it doesn’t provide at least SOA and XML compliance, then while it may meet the OASIS group’s (and other) definitions of an SOA, in practice it isn’t an SOA. A proprietary SOA design which doesn’t use XML is almost certainly not going to fit in easily with other services on your network, and so is probably of little benefit.

Of course, the Information Management system should also be able to “expose” services to other systems (for example, searching the EDMS) and use existing services on your network (for example, for email generation). This is a topic that will be addressed in future issues.

However, in the next issue, we will be examining why browsers are often so popular with CIOs. Peter Webb is the Principal and in fact only employee of Stollznow Consulting, so the views expressed in this column are those of both him and his employer. Stollznow Consulting provides independent consulting primarily relating to Enterprise Architecture, Information Management and Strategic Planning. Peter can be contacted on 0413 737509 or at pwebb@stollznow.com.au.

Comment on this story.

Business Solution: