Who lives in Rectangle under a Tree? Spongebob Squaredata!
RebelGeekz offers up ideas for XML 2.0 that's been running on a background thread for me all day. I've really been trying to see value in my world for what he's proposed in response to Mike Champion's query for XML 2.0 ideas.
Rephrasing what's been proposed as I understand it:
- Some percentage of the information represent in XML is an object, the rest is tabular data.
- The expression of tabular data shouldn't be sequences of nodes but should be delimited lists.
- The “cruel reality” is that data is “square shaped.“
- This somehow yields “relational XML.“
The more I think about what's being said here, the more Good Old Spongebob comes to mind. No, I'm not calling the RebelGeekz a Spongebob. But this doesn't feel like “Relational XML” as I'd want to see it.
First, I don't think the there's just two models for data in XML. I agree that there's certainly an object graphs and tables uses for XML. But there's also a couple of others: hierarchical and what I'll call “relational.” If you go with definition of an object graph as a directional graph of objects, it seems to be that you'll run problems where one object contains instances of its own type. But we do that kind of stuff all the time: consider the XML representation of an organizational chart, where any <person> might contain a broad-and-deep collection of children elements also of <person>. That's not a direct graph at all (at least as I learned ages ago) -- but it is a simple, recursive hierarchy. We've also learned from the database world that its not such a good idea to have multiple instances which represent a distinct entity in the same schema (i.e., the goal of normalization is to minimize repeated instances by replacing them with references to a single instance within the schema.) So to really be “relational XML” (albeit somewhat primitive by comparison in a relational sense to what an RDBMS can do) all we need is what we have: ID and IDREF.
So while I agree that there are certainly cases where XML is used to express tabular data, I'd rather term such instances as Tabular or Structured XML, where all of the data “lives in rectangles“ joined together “under a tree.“ thus: “Spongebob Squaredata.“ Not-so-squarely-but-regularly-patterned data might then be called -- continuing the analogy -- Patrick Starfishdata. What about the data that's somewhat regularly shaped but primarily just refers to other data to achieve semantic value? Wow, that's hard because Scobel isn't in the show. Eugune Krabdata maybe? The chaotic structure of XML in document form? Jellyfishdata maybe? But I digress... My point is that just viewing XML as only using used to represent only object graphs and tabular data isn't accurate since it leaves out significant use cases. If there is a cruel reality then, it is that while Spongebob Squaredata might be the star of the show, there's others on the set (or on the cell, I guess, since we're talking about Animation afterall...) that play a part. Since any of them may need to be transmitted over the wire, though, it doesn't make sense to have different compaction or compression method for each.
Seems to me that its this compaction that point number goes after. I have two problems with it. First, we already have attributes that can achieve some compaction, so why not use them. Secondly, the majority of the schema that exist today require an explicit qualified name for an entity. With what's suggested here, I'm left guessing that the first <row> of a family is essentially a list of qualified names for the tuples in the subsequent rows. How would you actually indicate that with a schema though? And how exactly, with schema as we know it today, would you describe the tabular data is since the meta data occupies the same space? I think that would hard, and it would certainly cause a lot of semantic reinvention in the schema space. How do you'd even begin to do transformation, query and modification on data in XML leaves me dumbfounded. Considering you can already achieve significant compaction today with attributes instead of elements whilst retaining the full power of Schema, XPath, XSLT and XQuery, I really wonder if that's not the best choice going forward.
Again, I'm not against compaction or compression of the XML infoset -- that we need. I do think there's better ways to get it though: ones that don't complicate or compromise the existing tool set for working with XML. I think a better way to accomplish it would to have some sort of streaming compression encoding option for XML. Let's face it: Microsoft and others have great algorithms and tools for doing that already, they've just not been applied to this problem. Heck, you probably don't even need to go that far... couldn't you just bzip2 the payload? That could really shrink XML down to size without altering impairing the higher level tool set's ability to work with it.
Just like SpongeBob is still a Sponge: Porous, Absorbent and Yellow, let's not let our desire to make XML more wire-friendly just to have it drop to the deck flop like fish. Keep it simple, machine parsible and “toolable” by what we've already invested in.