Intro

The basic grammar and data structure of an XML document is defined by a DTD (Document Type Definition) or an XML Schema for a particular use. A valid XML document is a Well Formed (consistent with standard XML syntax) that is also checked against an XML grammar to verify conformance of the content and structure.

Validation defines the basic grammar and data structure including the following:

You can make your own XML grammar (because XML is extensible!), but there are also many major real world XML grammars that are task or market specific. EGs:

DTD

DTDs, like XML and HTML, have roots in SGML (Standard Generalized Markup Language).

A DTD includes these major tokens:

A DOCTYPE declaration is also used in an XML document to reference a DTD. The DOCTYPE declaration identifies the root element and its DTD reference. See also my section on SGML Level Tags for HTML.

EG

Here is a well-formed XML document.

<?xml version="1.0" ?>
<books>
	<book>
		<title>Crime and Punishment</title>
		<author	nationality="Russian">
			Fyodor Dostoevsky
		</author>
	</book>
	<book>
		<title>Tom Sawyer</title>
		<author nationality="&DefaultNationality;"
			realname="false"
		>
			Mark Twain
		</author>
	</book>
</books>

The DTD file providing the grammar for the preceding XML document might be something like the following.

<!DOCTYPE books [
	<!ELEMENT books (book)*>
	<!ELEMENT book (title,author)>
	<!ELEMENT title>
	<!ELEMENT author>
	<!ATTLIST author nationality CDATA #REQUIRED>
	<!ATTLIST author realname (true | false) "true">
	<!ENTITY DefaultNationality "American">
]>

The preceding basically says the following:

The same XML document would be valid if it referenced and conformed to a particular DTD. Here is how the same XML document would reference a DTD.

<?xml version="1.0" ?>
<!DOCTYPE books SYSTEM "books.dtd">
...

Note that instead of referencing the DTD like above, the DTC could have been embedded directly into the XML document. EG:

<?xml version="1.0" ?>
<!DOCTYPE books [
...
]>
<books>
...
</books>

XML Schema

XML Schema (aka XSD (XML Schema Definition), XML-Data) technology is newer than DTD, hence DTDs are supported by most parsers. However XML Schemas are being set up as a W3C standard, provide richer definitions, support data-typing, are more extensible, and are XML documents themselves. See also W3C's XML Schema Part 0- Primer [§].

schema is the root element for the schema document itself.

EG

Here is an XML document (po.xml). Note that both of these examples are direct from the W3C.

<?xml version="1.0" ?>
<purchaseOrder orderDate="1999-10-20">
    <shipTo country="US">
        <name>Alice Smith</name>
        <street>123 Maple Street</street>
        <city>Mill Valley</city>
        <state>CA</state>
        <zip>90952</zip>
    </shipTo>
    <billTo country="US">
        <name>Robert Smith</name>
        <street>8 Oak Avenue</street>
        <city>Old Town</city>
        <state>PA</state>
        <zip>95819</zip>
    </billTo>
    <comment>Hurry, my lawn is going wild!</comment>
    <items>
        <item partNum="872-AA">
            <productName>Lawnmower</productName>
            <quantity>1</quantity>
            <USPrice>148.95</USPrice>
            <comment>Confirm this is electric</comment>
        </item>
        <item partNum="926-AA">
            <productName>Baby Monitor</productName>
            <quantity>1</quantity>
            <USPrice>39.98</USPrice>
            <shipDate>1999-05-21</shipDate>
        </item>
    </items>
</purchaseOrder>

Here is the example XML Schema (po.xsd) based upon the preceding XML document. I have put the major items in bold.

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">

 <xsd:annotation>
  <xsd:documentation xml:lang="en">
   Purchase order schema for Example.com.
   Copyright 2000 Example.com. All rights reserved.
  </xsd:documentation>
 </xsd:annotation>

 <xsd:element name="purchaseOrder" type="PurchaseOrderType"/>

 <xsd:element name="comment" type="xsd:string"/>

 <xsd:complexType name="PurchaseOrderType">
  <xsd:sequence>
   <xsd:element name="shipTo" type="USAddress"/>
   <xsd:element name="billTo" type="USAddress"/>
   <xsd:element ref="comment" minOccurs="0"/>
   <xsd:element name="items"  type="Items"/>
  </xsd:sequence>
  <xsd:attribute name="orderDate" type="xsd:date"/>
 </xsd:complexType>

 <xsd:complexType name="USAddress">
  <xsd:sequence>
   <xsd:element name="name"   type="xsd:string"/>
   <xsd:element name="street" type="xsd:string"/>
   <xsd:element name="city"   type="xsd:string"/>
   <xsd:element name="state"  type="xsd:string"/>
   <xsd:element name="zip"    type="xsd:decimal"/>
  </xsd:sequence>
  <xsd:attribute name="country" type="xsd:NMTOKEN"
     fixed="US"/>
 </xsd:complexType>

 <xsd:complexType name="Items">
  <xsd:sequence>
   <xsd:element name="item" minOccurs="0" maxOccurs="unbounded">
    <xsd:complexType>
     <xsd:sequence>
      <xsd:element name="productName" type="xsd:string"/>
      <xsd:element name="quantity">
       <xsd:simpleType>
        <xsd:restriction base="xsd:positiveInteger">
         <xsd:maxExclusive value="100"/>
        </xsd:restriction>
       </xsd:simpleType>
      </xsd:element>
      <xsd:element name="USPrice"  type="xsd:decimal"/>
      <xsd:element ref="comment"   minOccurs="0"/>
      <xsd:element name="shipDate" type="xsd:date" minOccurs="0"/>
     </xsd:sequence>
     <xsd:attribute name="partNum" type="SKU" use="required"/>
    </xsd:complexType>
   </xsd:element>
  </xsd:sequence>
 </xsd:complexType>

 <!-- Stock Keeping Unit, a code for identifying products -->
 <xsd:simpleType name="SKU">
  <xsd:restriction base="xsd:string">
   <xsd:pattern value="\d{3}-[A-Z]{2}"/>
  </xsd:restriction>
 </xsd:simpleType>

</xsd:schema>

Here is how the original XML document (po.xml) would start if it referenced the XML Schema (po.xsd) above:

<?xml version="1.0" ?>
<purchaseOrder xmlns="http://www.example.com/PO1"
               orderDate="1999-10-20">
...

The data-type can be defined for an XML schema by adding a dt attribute to the root element. Note that when element A has attributes or child elements, then their AttributeType or ElementType must have been previously defined and then the particular attribute or element must be a child element of element A's ElementType. EG:

<class xmlns="x-schema:classSchema.xml">
   <student studentID="13429">
      <name>James Smith</name>
      <GPA>3.8</GPA>
   </student>
</class>
<Schema xmlns="urn:schemas-microsoft-com:xml-data" 
  xmlns:dt="urn:schemas-microsoft-com:datatypes">
  <AttributeType name='studentID' dt:type='string' required='yes'/>
  <ElementType name='name' content='textOnly'/>
  <ElementType name='GPA' content='textOnly' dt:type='float'/>
  <ElementType name='student' content='mixed'>
    <attribute type='studentID'/>
    <element type='name'/>
    <element type='GPA'/>
  </ElementType>
  <ElementType name='class' content='eltOnly'>
    <element type='student'/>
  </ElementType>
</Schema>

Page Modified: (Hand noted: ) (Auto noted: )