# Wednesday, December 04, 2002

Dare On XML Schema Ive Finally Made Time To Read Dare

Dare on XML Schema.

I've finally made time to read Dare's  W3C XML Schema Design Patterns: Avoiding Complexity.

Dare wrote his article as a "counterpoint" (though maybe "derivation by extension" is more apt, to Kohsuke Kawaguchi's W3C XML Schema Made Simple.  Kohsuke sums up his view by saying

Consider W3C XML Schema as DTD + datatype + namespace

though you might add "- Notation", since he points out that Notation declarations shouldn't be used because they aren't compatible with DTD Notations.  This is probably decent, if conservative, advice.  Judging from the comment I noted the other day, and from the comments on Kohsuke's article, the most controversial statement in either article was

 Do not try to be a master of XML Schema. It would take months.

which is pretty much the point of both articles: learn what's useful and ignore all the nooks and crannies; they'll just get you into trouble.  This is essentially conceding the argument of the anti-Schema crowd that WXS is too complex and ambiguous, but regardless, people are using WXS by choice or compulsion, and these articles are an attempt to steer users towards the best practices.  And as far as I'm concerned, it's true.  I've tried to wade through Patricia Walmsley's Definitive XML Schema, but as a friend of mine said, it's "dry as day-old toast".  I feel better served by getting a more succinct guide and filling in the details later, if ever.

Dare loosen's Kohsuke's guidelines a bit.  To start, rather than eliminate the use of local declarations, Dare takes the time to explain the elementFormDefault behavior that put Kohsuke off.  It seems like Kohsuke's recommendation could be modified to say "use elementFormDefault='qualified'", which is one of Dare's recommendations, and more useful advice to boot.  I don't see a particular problem with unqualified, except that I prefer the way qualified looks, and it seems like that's Dare's justification too.  The other justification might be that unqualified interferes with default namespace declarations.

I don't quite get the recommendation on built in types.  The initial list of recommendations says "Do use restriction and extension of simple types.", but the actual recommendation is to use the builtin simple types.  Dare's recommendation is to use the simple types and consider avoiding the subtypes of string and integer.  I've seen (and written, truth be told) schemas that start building levels on top of the simple types, and really all this achieves is a less readable schema.  The OTA schemas are very much into subclassing simple types, and others I've talked to who've worked with OTA agree.  The OTA defines types like StringLength32, which may be a valid restriction, but probably not a great first class type - it's true that lots of elements are 32 character strings, but this seems to me to be a micro-optimization in the type system.  It makes sense to declare this type if all the StringLength32 data suddenly became StringLength64, but then you have to carefully consider whether the data's really related to another use of that type and likely to stay in sync.  This seems like a paralell to the Inheritance vs. Aggregation considerations in OO design, where you should consider whether a new type really IS-A instance of another type.  I'd say that it's not necessarily a good idea to declare named simple types, unless that type information is really going to be reused. 

One other point that Kohsuke made was that when restricting complex types, you have to repeat the entire definition of the base type, and that validators have a difficult time with restriction.  Dare gives some concrete examples of the validation problems, but doesn't really offer much besides "here's the rope, don't hang yourself".  Restriction has its appeal, maybe because it doesn't work like the type systems I'm used to, but given the problems, I'm not sure complex type restriction is worth even a qualified endorsement.

Overall, Dare did a better job of explaining his rationale that Kohsuke.  Kohsuke's guidelines are a bit too conservative, but my trouble with Dare's guidelines comes from features qualified with "use carefully".  It's good to get an explanation of the pitfalls, but I felt like the justification for situations when the feature should used were pretty weak.  Maybe this subject needs 2 articles, one for the "safe" parts, and another for the ones that need extra care.

[Gordon Weakliem's Radio Weblog]