I am a software engineer, database developer, web developer, social media user, programming geek, avid reader, sports fan, data geek, and statistics geek. I do not care if something is new and shiny. I want to know if it works, if it is better than what I use now, and whether it makes my job or my life easier. I am also the author of RegularGeek.com and Founder of YackTrack.com, a social media monitoring and tracking tool. Robert is a DZone MVB and is not an employee of DZone and has posted 87 posts at DZone. You can read more from them at their website. View Full User Profile

The Problem Is Not JSON Or XML, It Is About Data Context

12.03.2010
| 11302 views |
  • submit to reddit

There is an interesting discussion occurring regarding data transfer in web applications. The discussion has centered on the differences between JSON and XML in the JavaScript heavy sites. It started with Norm Walsh commenting on Twitter and Foursquare removing support for XML in their APIs. The basic idea of his post was that if you are using JavaScript, and you are only passing around atomic values or lists and hashes of atomic values, then JSON makes complete sense. He then talks about the difficulty of JSON when you need more context or you have mixed content. Overall, it was a very sensible post. The discussion  gained steam because of Norm’s “Meh” reaction, and because talking about “which is the better technology” tends to get people all riled up.

A few days after Norm’s post, two other posts appeared refuting his stance even though in many comments it is considered a non-debate. First, Manu Sporny talked about the move to JSON being more of a paradigm shift to simpler markup. The complaints about SOAP and XML Schemas are obvious, but he complicates the argument by introducing JSON-LD into the conversation. JSON-LD introduces syntax for JSON to denote LinkedData, and there is notation very similar to XML Namespaces, to which Norm replies “Wow. By the time you start doing that, you’re sure you wouldn’t be better with a richer markup vocabulary?” Lastly, James Clark throws his opinion into the mix. His commentary is more about the fact that XML is losing web developers which could be a bad thing.

Now that you have the background story, I wanted to state that people are missing Norm Walsh’s original point. This problem is about context and it is not really being treated that way. People are using JSON for web development because there is almost zero learning curve. It is used because of the increasing trend of JavaScript heavy sites to drive interactivity and some of the mashup creativity. Because Twitter’s is readily available, people have created widgets to display their tweets on a web page. For many web developers that means grabbing a JSON representation of some tweets and converting it to HTML. This process barely takes longer than trying to find the correct API documentation.

In this same context, if the APIs are XML based you then need something to parse the XML into an appropriate JavaScript object. You can already tell that this is getting more complicated than simple JSON. To make matters worse, past browsers handled XML differently and sometimes very poorly. Because web developers were depending on the XML support in the browser, the problems of cross-browser support arose again. Obviously, developers do not want to go down that path and JSON is easier anyway.

However, what if you are using PHP or Java on the server? PHP has plenty of XML handling libraries, with SimplePie being a hugely popular RSS feed processing library. If you can make the Twitter API call from your Java server code, there are plenty of libraries for handling XML there as well. So, in that context XML may be a better option.

If you look at the type of problem, this could also change the data format. In the Twitter API example, if you have a simple widget that just displays tweets, then a solution based on JavaScript and JSON makes a lot of sense. What if that widget needed to be more dynamic? For example, let’s say that someone registered on your site will see those tweets displayed differently and there are additional links in the tweet for replying or retweeting. You could write some JavaScript code to just check for registered users and generate different HTML, but this gets unwieldy if the number of different displays grows beyond 2 or 3. If the data is in XML, then you can write different XSLT scripts for each display which remains separate from the main widget code. You just need to select the appropriate XSLT based on the user interacting with the site. At this point people are likely going to complain about the use of SOAP for web services and its complexity. Let’s ignore that option as REST has won, and SOAP is overly complex, can we agree on that and move on?

As with any programming problem, different requirements and different contexts may call for different technologies. If you get stuck on saying that JSON is better than XML (or the other way around), you lose another tool in your toolbox.

The other context that people are missing is why Twitter and Foursquare chose to support JSON only. This is likely a question of application complexity and analytics. Like any good API provider, Twitter is probably tracking all calls to the API and this includes the data format requested. It is very possible that the demand for XML was fairly low and it did not warrant separate support. In addition to this, there are plenty of JSON processing libraries available for mainstream languages like Java, so there was little risk in dropping support for XML. If there is no support for XML, then their API becomes simpler to support. That means less code to maintain, simpler maintenance of code because there are not multiple representations of one set of data, and fewer questions about the different formats.

So, quit whining about whose data format is better. Each one is better in a different context, otherwise it is highly unlikely that they would have become so popular. The important thing is to learn both formats, and other popular ones that appear, that way you can make an educated decision on which format to use in your situation.

References
Published at DZone with permission of Robert Diana, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

Greg Brown replied on Fri, 2010/12/03 - 10:22am

JSON tends to be easier to work with because it maps directly to the standard types and structures used by most programming languages (string, number, boolean, map, and list). XML requires additional types to represent elements, text nodes, etc., making it a bit more cumbersome to work with. JSON also has a much simpler path syntax: "foo.bar[0]" vs. XPath.

However, XML is excellent for defining more complex hierarchies that would be difficult or impossible to represent in JSON. For example, the following XML conveys that I specifically want a "Foo" that has a "bar" property of value "123":

<Foo bar="123"/>

In JSON, the best I can do is:

{ bar: "123" }

which doesn't tell me anything about the nature of the object that contains "bar".

So I agree that they both have their places.

Robert Csala replied on Fri, 2010/12/03 - 10:41am

Am I mistaken that there are existing JSON libraries for both Java and PHP as well? I've never used them, though..

Greg Brown replied on Fri, 2010/12/03 - 10:52am in response to: Robert Csala

Yes, there are quite a few for both. See the list at the bottom of http://json.org.

Robert Csala replied on Fri, 2010/12/03 - 11:02am in response to: Greg Brown

Thanks, Greg.

Stan Dyck replied on Fri, 2010/12/03 - 12:45pm

Here's the thing you're missing though. HTML *is* XML. If twitter provides me with nice, semantically marked up, well formed html, I don't need to covert or parse anything. I can just jam the output directly into a page as is and style it however I want with a css file. That's the easiest option of all.

You say the html isn't formed the way you want? Well, first I'd challenge you to make sure you really mean that, but if you aren't convinced, then there are already tons of great javascript libraries (I'm partial to JQuery) that know how to manipulate plain old html in whatever fashion you want. No xml parsers or xslt knowledge is required.

Robert Diana replied on Fri, 2010/12/03 - 12:53pm in response to: Greg Brown

Greg JSON is much simpler than XPath because the purposes are really different. JSON is about objects and XPath is more of a light query language. You actually have a very good example of this difference with find Foo with a bar="123". In JSON you need to loop through the objects and properties to find a match. With XPath, you basically just query the document and get your answers.

Greg Brown replied on Fri, 2010/12/03 - 12:59pm in response to: Stan Dyck

Actually, HTML is not XML. It looks a lot like it, but the rules are nowhere near as stringent. Only XHTML is truly XML, and, though that was a W3C recommendation for HTML 4.0, it has been dropped for HTML 5.

That aside, the purpose of using JSON or XML is to separate data from presentation. I may want to present that data in a form that is completely different from how you want to present it. I may not even want to use HTML - perhaps I want to generate an SVG diagram, or a PDF, or display it in a table control in a rich client app.

You are correct that it is possible to manipulate HTML via JavaScript, but doing so is generally much more difficult than manipulating either XML or JSON, because the rules for HTML are considerably more lax, and because HTML is designed for presentation, not data transfer.

Greg Brown replied on Fri, 2010/12/03 - 1:01pm in response to: Robert Diana

I understand the differences between JSON and XPath.

Robert Diana replied on Fri, 2010/12/03 - 1:04pm in response to: Stan Dyck

Actually, I was avoiding HTML. Also, we need to be careful when dealing with HTML because it is not XML. XHTML-strict is XML, but traditional HTML is not. I believe HTML5 is supposed to be conformant XML, so that the differences will finally go away. As you say, if you get well-formed HTML, you do not need to do anything, but APIs typically are only giving you data, not what the HTML should look like. If you have your own internal APIs you could definitely transfer HTML directly. Personally, I am not a big fan of returning HTML from a web service because that normally means the service returns data that has one purpose only. I would prefer to get the data and then use frameworks like JQuery to build the HTML. Also, HTML is a display markup so you would need to know how the data is formatted in order to use it. Obviously, you need to know the format of JSON and XML data, but these can be "discovered" better than the HTML. Overall, not using HTML for data transfer is more of a personal preference.

Robert Diana replied on Fri, 2010/12/03 - 1:08pm in response to: Greg Brown

Greg They dropped the XML requirement for HTML5? I have not checked recently, but that is really annoying. HTML is so much easier to work with when it is actually XML conformant.

Robert Diana replied on Fri, 2010/12/03 - 1:10pm in response to: Greg Brown

Sorry about that. Based on your original comment it looked like you were trying to compare them directly. My apologies for making assumptions.

Greg Brown replied on Fri, 2010/12/03 - 1:13pm in response to: Robert Diana

Hm. I heard that HTML 5 was dispensing with XHTML a while back, but now of course I can't find any conclusive information on it. :-) So feel free to ignore that comment.

Stan Dyck replied on Fri, 2010/12/03 - 2:05pm in response to: Greg Brown

It is true that the rules for html are more lax, but if I'm an api provider I am free to produce html that is well formed xml and to guarantee that it is so. Call it xhtml if you wish. Nothing prevents me from presenting the same html differently from the way you do, either via css or more sophisticated javascript means.

How is using jquery (say) to manipulate html more difficult? Remember, my requirement is semantically marked up, well formed html. If a company like twitter is interested in having people use their api, they will make it as easy to use as possible. What I'm saying is that this is the easiest way.

My main point is that (svg and pdf formatting aside) the stuff I get from twitter or whomever is problably going to end up as html. Why not send it to me as html instead of making me convert it from some xml or json format *into* html.

p.s. Your statement that html isn't designed for data transfer is pretty funny considering that (I would guess) the majority of the world's data gets transferred as html every minute of every day.

Stan Dyck replied on Fri, 2010/12/03 - 2:12pm in response to: Robert Diana

Nothing prevents you (meaning the API provider) from:

  • serving up well formed (in the xml spec sense) html
  • marking it up semantically (without caring about the presentation aspects)
My point is that if you do this, it is easier for me to use. The "one purpose" of this html should be to mark up the data. Most people get caught up in the idea that html is for presentation purposes only. That is not the case.

Greg Brown replied on Fri, 2010/12/03 - 2:31pm in response to: Stan Dyck

Your statement that html isn't designed for data transfer is pretty funny considering that (I would guess) the majority of the world's data gets transferred as html every minute of every day.
You are describing the presentation of data. Most of the actual data is stored in a UI-independent manner in a back end system somewhere (e.g. in a relational database or...yup, you guessed it - XML or JSON, or some other format).

I'm honestly shocked that there is a single developer who would actually argue in favor of combining data and presentation! But to each his/her own, I guess. Fortunately, I believe you are in the minority here.

Attila Király replied on Fri, 2010/12/03 - 2:32pm in response to: Greg Brown

"Only XHTML is truly XML, and, though that was a W3C recommendation for HTML 4.0, it has been dropped for HTML 5."

This is not accurate. The HTML5 standard supports two serialization formats: HTML and XHTML. Both syntax are part of the spec. So you can use xml syntax for HTML5. And I think xhtml will rise in the future because after long years IE with version 9 is finally supporting application/xhtml+xml content type too.

HTML syntax
XHTML syntax

Greg Brown replied on Fri, 2010/12/03 - 2:40pm in response to: Stan Dyck

My main point is that (svg and pdf formatting aside) the stuff I get from twitter or whomever is problably going to end up as html. Why not send it to me as html instead of making me convert it from some xml or json format *into* html.
Because I don't want to have to work around or worry about your formatting, or deal with it when you decide to change that formatting. This isn't an issue for XML or JSON, since they are data formats (whereas, again, HTML is not).

Also, not all consumers of data feeds are simple web pages that want to regurgitate HTML. They could be (and often are) back end systems that want to perform more detailed transformation or analyses on content. Sourcing that content as HTML requires the developers of these systems to jump through way more hoops than they would otherwise have to, since HTML processing is typically fairly difficult in a headless environment.

So I can't follow your argument that HTML is the best choice for sharing data. I'm almost convinced that you're actually making a joke, trying to convince people that you are serious! :-)

Greg Brown replied on Fri, 2010/12/03 - 2:42pm in response to: Attila Király

Yeah - as I mentioned, I thought I heard that a while back, but couldn't find anything to substantiate it this morning. So I retract that.

Stan Dyck replied on Fri, 2010/12/03 - 3:02pm in response to: Greg Brown

You already have to worry about my (when I say "my" here I mean an API provider) formatting and deal with my changes in the JSON or proprietary XML cases. I can change those just as easily as I can change the html that I produce. The difference is that if I always produce html, and you don't modify it, you can at least be sure that *something* will render. If I change my xml or json format under your nose, your use of my api will probably break.

If you don't like my html you don't have to send it to your clients' browsers. You can manipulate it how you want. But then you have to do that with XML or JSON too. In fact you are required to since you can't display them as XML or JSON.

I can't say I'm not crazy, but I'm not really joking, nor do I think I am alone. My guess is that you may not understand what I mean by semantic html. Wikipedia can explain that better than I.

Greg Brown replied on Fri, 2010/12/03 - 3:10pm in response to: Stan Dyck

Yes, I understand the concept of semantic HTML. Unfortunately, not all data fits that model. Semantic HTML is well-suited to document-oriented constructs, particularly human-authored text. But it is not the best choice for, say, representing tabular data. For that, I'd probably choose CSV. For exchange of simple data structures, I'd go with JSON. For complex hierarchical data, XML.

So the idea that "semantic HTML is the best way to represent data" is a bit naive. It may be the best way to represent some types of data, but certainly not all.

Greg Brown replied on Fri, 2010/12/03 - 3:27pm in response to: Stan Dyck

Thinking it through a little further, it occurs to me that "well-formed semantic HTML" is simply a special case of using XML for data exchange. So, in essence, you are actually arguing in favor of using XML. ;-)

Stan Dyck replied on Fri, 2010/12/03 - 4:01pm in response to: Greg Brown

No, you were describing the design of html. The fact is data *is* transferred as html. When you refresh this page, the data stored in some db somewhere will be converted and transferred to your browser as html. Multiply that by a few billion clicks across the Internet. See what I'm saying?

Perhaps an example (bear with me...or not...either way). Say I want to produce an application/website/whatever that displays articles from this site but one that removes silly commentary by people named "Stan". Can't imagine why, but hey, the client is always right.

I don't have this site's database or their nice object model with its awesome separation of presentation. (I don't even have their permission but we'll let that slide). All I've got to work with is the html that is spit out when I go to urls on their site. Under those circumstances, the best I can hope for is

  1. The site's html is clean (i.e it can be parsed with an xml parser)
  2. It's marked up semantically (lots of class and id attributes with span and div tags around important stuff).

Perhaps each comment is contained in a div with a "comment" class. Each comment has a span tag around the comment's author with "authorname" class attribute (bye bye, Stan!). The article itself might be inside a div tag with an "article" class and a unique id in an id attribute so I can strip out all the ads and non-article clutter around it. Stuff like that.

With that kind of thing in place, I can do wondorous things with the html because IT HAS BEEN RENDERED AS DATA!!! The nice thing too is that it's not really much of a burden for the people producing the html to make these changes (look at the source of this page. They are already most of the way there). They probably should be doing it anyway, especially if they want to do things like publish an API but don't want to take the time to translate their object model to some arbitrary textual format. HTML is a text format. It's expressive. Lots of things know how to deal with it. We should use it!

If you read this far, you are awesome. I hope you had fun. Thanks.

Stan Dyck replied on Fri, 2010/12/03 - 4:19pm in response to: Greg Brown

Yeah, you win! There will be a little something extra in your paycheck this week. But it's better than just using XML because it's not proprietary, or at the least not *as* proprietary and browsers know how to do something with it.

Stan Dyck replied on Fri, 2010/12/03 - 4:28pm in response to: Greg Brown

We're on two sides of the same coin now. Not the best way for all types, but much more than none is what I say.

p.s. I think html has a table element. Most people misuse it, but a perfectly functional semantic construct. [ducking to avoid thrown objects].

Greg Brown replied on Fri, 2010/12/03 - 4:38pm in response to: Stan Dyck

I think you may be misusing the term "proprietary". XML is certainly not a proprietary technology, nor is JSON. Perhaps what you mean is "application-specific".

Either way, you run into the same issue with HTML - this site's content is formatted in an application-specific way, just as a hypothetical XML feed containing the same data might be. I would still need to write logic that is specific to this site in order to process it.

So let's look at a simple example. Let's say I want to write a service that provides access to contact information. I can easily envision how that data might be represented in JSON, or even XML. How would you do it in HTML? There's no "firstName", "lastName", etc. tag. You'd have to hijack some other tag and attach additional markup to it so that a caller would know that, rather than <td>, you actually mean "firstName". In JSON or XML, I can just use the purely semantic "firstName". Much more intutive, much easier to read, and much easier to process.

Greg Brown replied on Fri, 2010/12/03 - 4:38pm in response to: Stan Dyck

The tag is much more verbose than CSV. You can also represent tabular data in JSON or XML, but again it is a bit more verbose.

Stan Dyck replied on Fri, 2010/12/03 - 5:04pm in response to: Greg Brown

Fair enough, application specific then, but you obviously took my meaning. You say, "I would still need to write logic...". I say in the semantic html case "I might not need to write logic..." and in the application-specific xml or json case "I am required to write logic...". The difference between "might not need to" and "am required to" can be huge and is certainly reason to use semantic html as opposed to xml assuming I want to maximize the usefulness of my API.

As for your example, the answer is easy. I would use the hCard microformat. Take a look at how they model your exact example in html. One advantage that accrues magically is that other people know about hCard and can do things with it without any prior knowledge of your API. For example Firefox extensions like Operator. It costs a site developer very little to mark up contact data this way. 

Greg Brown replied on Sat, 2010/12/04 - 10:48am in response to: Stan Dyck

If you are only targeting a browser for read-only presentation of this data, then hCard would work fine. However, if you want to manipulate the data programmatically, it is much easier to write:

var streetAddress = contact.address.streetAddress;

than it is to write an XPath expression to extract that data from semantic XHTML. It is even more difficult to update it. However, in JSON I can simply say:

contact.address.streetAddress = "My new street address";

So again, it depends entirely on use case. The nice thing about JSON or XML is that it is easy to map these formats to XHTML. It isn't quite as easy to go the other way.

Nicolas Frankel replied on Sun, 2010/12/05 - 3:27pm

Simple and to the point. You read my mind (or my article): it's not only true for XML vs Jason but you can generalize for TechX vs TechY.

When you hold a hammer, everything looks like a nail!

Tim O'farrell replied on Mon, 2010/12/06 - 8:17am in response to: Greg Brown

Why not {Foo:{bar:"123"}} ?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.