Why JSON is better than XML

By Amandeep Singh

The world’s digital infrastructure is currently characterized by a plethora of data interchange formats. It’s not the least bit surprising that such a multiplicity undergirds things at the moment. The internet is scarcely a generation old, while the “Internet of Things” and “Big Data” more closely resemble regulative ideals than realities. But I nonetheless believe that there are strong, discernible historical tendencies currently at work in this field, tendencies that strongly favour JSON over others.

Ten years ago, XML was the primary data interchange format. When it came on the scene, it was a breath of fresh air and a vast improvement over the truly appalling SGML (Standard Generalized Markup Language).

It enabled people to do previously unthinkable things, like exchange Microsoft Office documents across HTTP connections. With all the dissatisfaction surrounding XML, it’s easy to forget just how crucial it was in the evolution of the web in its capacity as a “Swiss Army Knife of the internet.”

But it’s no secret that in the last few years, a bold transformation has been afoot in the world of data interchange. The more lightweight, bandwidth-non-intensive JSON (JavaScript Object Notation) has emerged not just as an alternative to XML, but rather as a potential full-blown successor. A variety of historical forces are now converging and conspiring to render XML less and less relevant and to crown JSON as the privileged data format of the global digital architecture of the future. I think that the only question is how near that future is.

I strongly believe that this transformation can be attributed to four broad trends, which I’ll discuss in turn:

  • APIs (application programming interfaces)
  • Big Data
  • The Internet of Things
  • Full-stack JavaScript

1. APIs - Like it or not, today’s web landscape remains heavily siloed in a lot of crucial respects. There’s tons of information out there that you will never, ever be privy to (and this extends beyond things like authentication information that should be secret in principle). But beginning with companies like eBay in the mid-oughts, APIs have come along as a kind of de-siloing force. This has created a scenario in which organizations like Twitter, Facebook, LinkedIn, and millions of others

(a) essentially offer information-based services in exchange for data, and

(b) increasingly have an interest in opening up a wide variety of information to third parties. A lot of that data never sees the light of day (hence the silo metaphor). APIs are a force to be reckoned with, and changes in that space leave a mark on the rest of the web.

There isn’t a lot of hard data on XML vs. JSON usage in APIs, but sources like Programmable Web and others strongly suggest that XML is still a major player in the world of APIs, but that JSON’s star is rising fast. Twitter’s API went JSON-only almost two years ago. Foursquare has followed suit.

2. Big Data - The rise of JSON as a key player in database technologies is another bad portent for XML. As it stands, Big Data does not have a preferred data interchange format per se. But the claim that I’d like to make about Big Data and JSON is a bit more specific. What I’d like to argue is that JSON is emerging as a preferred format in web-centric, so-called “NoSQL” databases. These are databases that are

(a) intended to accommodate massive scalability,

(b) designed to deal with data that often does not seamlessly conform to a columnar/relational model, and

(c) to be web-oriented at their very core.

The most well-known examples of databases of this sort are MongoDB, CouchDB, and Riak. All three are JSON-based, horizontally scalable, and deeply web-driven.

Other examples abound: the architecture of Amazon’s DynamoDB is entirely REST/JSON. Neo4J, a graph database that really confounds a lot of our thinking about what databases are all about, has a REST/JSON API, with no corresponding XML support. HBase‘s REST architecture currently supports XML, but that support is on the way to deprecation.

For some time now, it has been possible, via various means, to feed queries into MySQL and get JSON back (there are plenty of ways to do this, but MySQL 4.1’s AS json command is surely the most handy). The same goes for Postgres and other columnar databases. But MySQL, Postgres, and the others were not constructed with JSON as a fundamental building block.

For Postgres, this will soon be changing. In version 9.2, Postgres added support for a JSON data type, which will “allow for hybrid document-relational databases which can store JSON documents, and JSON functions which convert array and row data into JSON” (quoted from this article). Although Postgres has had an XML data type for some time, this change strikes me as a not-so-subtle acknowledgment of the rising importance of JSON.

There are a few databases out there that are XML-based (such as MarkLogichttp://www.marklogic.com/), but there isn’t any movement in this sphere analogous to what we’re seeing involving the rapid adoption of JSON-based storage models.

3. The Internet of Things - Movements in this sphere are more difficult to discern than in the other spheres I’ve mentioned. The Internet of Things remains an idea, albeit a powerful one. It remains far too unrealized to be able to make claims about ideal or even preferred data formats. The internet is basically a whole bunch of computers hooked up to a handful of things.

But it deserves mention that JSON has begun to establish a toehold in this realm. There’s a library for using JSON on the Arduino. It is argued in the book “Architecting the Internet of Things” (p. 102) that “JSON is better adapted [than XML] to devices with limited capabilities such as smart things. Furthermore, it can be parsed to JavaScript objects. This makes it an ideal candidate for integration into Web Mashups.” You can construct LED gauges running on JSON. Your next thermostat might run on JSON.

We haven’t yet reached a point where you can look around at a densely interconnected world of objects and almost feel the JSON coursing through the air. But who knows?

4. Full-stack JavaScript - In addition to the three forces mentioned above, there is one more deserving brief mention: JavaScript is the new hotness and that probably won’t change anytime soon. node.js has gone mainstream and the community surrounding it is rabidly productive, new client-side JavaScript libraries are coming along every single day, JavaScript is already the lingua franca of the web, etc. To say that the people involved in this growing branch of the web dev world prefer JSON to XML is more than a wee bit of an understatement.

Sure, there’s an XML parser for node, but it’s largely geared toward dealing with legacy XML-based endpoints. The fact remains that if you’re doing top-to-bottom, full-stack JavaScript, using anything besides JSON is borderline silly. And full-stack JavaScript has already gone mainstream.

One way or another, the future is bright for JSON

It would be quite surprising if the above-mentioned tendencies had nothing to do with JSON itself. Many have argued that JSON is better because it’s less “verbose” than XML, and more readily intelligible to humans than formats like pure binary.

These factors have certainly helped JSON. Some believe JSON’s rise has to do with the fact that JSON possesses a very limited set of data types. It’s essentially restricted to null, Booleans, numerics, strings, arrays, and dictionaries. It doesn’t even have a Date data type. JSON is thus not only generally less verbose than XML: it is more parsimonious in its use of data types.

Restricting itself to primitive data types makes JSON deeply and immediately interoperable with pretty much any programming language that exists out there (in fact, the list of languages on JSON’s main page is frankly staggering).

Overall, my claim isn’t really as audacious as it might seem at first. It basically has two components:

(1) in order to have a global digital infrastructure, you need have pervasive data interchange formats to knit everything together and establish intelligibility across nodes; and

(2) there are good reasons to think that JSON will someday hold a privileged position in our digital architecture. Our expectations and skill sets should register this change and adjust accordingly.

Amandeep Singh is Principal Software Engineer at QASource in Chandigarh, India