XML Schema Tutorial, Part 1
Defining Elements and Attributes
Overview
First let's look at what an XML schema is. A schema formally describes what a given XML document contains, in the same way a database schema describes the data that can be contained in a database (table structure, data types, etc.). An XML schema describes the coarse shape of the XML document such as what fields an element can contain, which sub elements it can contain etc. It can also describe the values that can be placed into any element or attribute.
A Note About standards
- DTD was the first formalized standard, but is rarely used anymore.
- XDR was an early attempt but Microsoft to provide a more comprehensive standard than DTD. This standard has pretty much been abandoned now in favour of XSD.
- XSD is currently the de facto standard for describing XML documents. There are 2 versions in use 1.0 and 1.1, which are on the whole the same (you have to dig quite deep before you notice the difference). An XSD schema is itself an XML document, and there is even an XSD schema to describe the XSD standard.
- There are also a number of other standards but their take up has been patchy at best.
The XSD standard has evolved over a number of years, and is controlled by the W3C. It is extremely comprehensive, and as a result has become rather complex. For this reason it is a good idea to make use of design tools when working with XSDs (See XML Studio, a graphical XSD development tool). Also, when working with XML documents programmatically, XML Data Binding is a much easier way to manipulate your documents (an object oriented approach see Liquid XML Data Binding).
The remainder of this tutorial guides you through the basics of the XSD standard, things you should really know even if you are using a design tool like Liquid XML Studio.
Elements
Elements are the main building block of any XML document. They contain the data and determine the structure of the document. An element can be defined within an XML Schema (XSD) as follows:
<xs:element name="x" type="y"/>
An element definition within the XSD must have a name property, which is the name that will appear in the XML document. The type property provides the description of what can be contained within the element when it appears in the XML document. There are a number of predefined types, such as xs:string, xs:integer, xs:boolean and xs:date (see XSD standard for a complete list). You can also create user defined types using the <xs:simpleType> and <xs:complexType> tags, but more on these later.
If we have set the type property for an element in the XSD, then the corresponding value in the XML document must be in the correct format for its given type (failure to do this will cause a validation error). Examples of simple elements and their XML are shown below:
| Sample XSD | Sample XML |
<xs:element name="Customer_dob" type="xs:date"/> | <Customer_dob> 2000-01-12T12:13:14Z </Customer_dob> |
<xs:element name="Customer_address" type="xs:string"/> | <Customer_address> 99 London Road </Customer_address> |
<xs:element name="OrderID" type="xs:int"/> | <OrderID> 5756 </OrderID> |
<xs:element name="Body" type="xs:string"/> | <Body> (a type can be defined as a string but not have any
content, this is not true of all data types however).</Body> |
The previous XSD definitions are shown graphically in Liquid XML Studio as follows:

The value the element takes in the XML document can further be affected using the fixed and default properties.
Default means that if no value is specified in the XML document then the application reading the document, typically an XML parser or XML Data Binding Library, should use the default specified in the XSD.
Fixed means the value in the XML document can only have the value specified in the XSD.
For this reason it does not make sense to use both default and fixed in the same element definition (in fact it is invalid to do so).
<xs:element name="Customer_name" type="xs:string" default="unknown"/>
<xs:element name="Customer_location" type="xs:string" fixed=" UK"/>
Cardinality
Specifying how many times an element can appear is referred to as cardinality. Cardinality is specified using the attributes minOccurs and maxOccurs, and allows an element to be specified as mandatory, optional, or that it can appear many times. minOccurs can be assigned any non-negative integer value (e.g. 0, 1, 2, 3... etc.), and maxOccurs can be assigned any non-negative integer value or the string constant "unbounded" meaning no maximum.
The default values for minOccurs and maxOccurs is 1. Therefore, if both the minOccurs and maxOccurs attributes are absent, as in all the previous examples, the element must appear once and once only.
| Sample XSD | Description |
<xs:element name="Customer_dob" type="xs:date"/> | If we don't specify minOccurs or maxOccurs, then the default values of 1 are used, so in this case there has to be one and only one occurrence of Customer_dob |
<xs:element name="Customer_order" type="xs:integer" minOccurs ="0" maxOccurs="unbounded"/> | Here, a customer can have any number of Customer_orders (even 0) |
<xs:element name="Customer_hobbies" type="xs:string" minOccurs="2" maxOccurs="10"/> | In this example, the element Customer_hobbies must appear at least twice, but no more than 10 times |
The previous XSD definitions are shown graphically in Liquid XML Studio as follows:
Simple Types
So far we have only touched on a few of the built in data types xs:string, xs:integer, xs:date. However, you can also define your own types by modifying existing ones.
Examples of this are:
- Defining an ID, this may be an integer with a maximum value limit.
- A Postcode or Zip code could be restricted to ensure it is the correct length and complies with a regular expression.
- Defining a field to have a maximum length.
Creating you own types is coved more thoroughly in the next section.
Complex Types
A complex type is a container for other element definitions, this allows you to specify which child elements an element can contain. This allows you to provide some structure within your XML documents.
Here are some simple element definitions:
<xs:element name="Customer" type="xs:string"/>
<xs:element name="Customer_dob" type="xs:date"/>
<xs:element name="Customer_address" type="xs:string"/>
<xs:element name="Supplier" type="xs:string"/>
<xs:element name="Supplier_phone" type="xs:integer"/>
<xs:element name="Supplier_address" type="xs:string"/>
We can see that some of these elements should really be represented as child elements, "Customer_dob" and "Customer_address" belong to a parent element – "Customer". While "Supplier_phone" and "Supplier_address" belong to a parent element "Supplier". We can therefore re-write this in a more structured way:
<xs:element name="Customer">
<xs:complexType>
<xs:sequence>
<xs:element name="Dob" type="xs:date" />
<xs:element name="Address" type="xs:string" />
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="Supplier">
<xs:complexType>
<xs:sequence>
<xs:element name="Phone" type="xs:integer"/>
<xs:element name="Address" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
The previous XSD definitions are shown graphically in Liquid XML Studio as follows:
What's changed?
- We created a definition for an element called "Customer".
- Inside the <xs:element> definition we added a <xs:complexType>. This is a container for other <xs:element> definitions, allowing us to build a simple hierarchy of elements in the resulting XML document.
- Note the contained elements for "Customer" and "Supplier" do not have a type specified as they do not extend or restrict an existing type, they are a new definition built from scratch.
- The <xs:complexType> element contains another new element <xs:sequence>, but more on these in a minute.
- The <xs:sequence> in turn contains the definitions for the two child elements "Dob" and "Address". Note the customer/supplier prefix has been removed as it is implied from its position within the parent element "Customer" or "Supplier".
So in plain English this is saying we can have an XML document that contains an element <Customer> which must have two child elements <Dob> and <Address>.
Example XML
<Customer>
<Dob> 2000-01-12T12:13:14Z </Dob>
<Address> 34 thingy street, someplace, sometown, w1w8uu </Address>
</Customer>
<Supplier>
<Phone>0123987654</Phone>
<Address>22 whatever place, someplace, sometown, ss1 6gy </Address>
</Supplier>
Compositors
There are three types of compositors <xs:sequence>, <xs:choice> and <xs:all>. These compositors allow us to determine how the child elements contained within them will appear within the XML document.
| Compositor | Description |
| Sequence | The child elements in the XML document MUST appear in the order they are declared in the XSD schema. |
| Choice | Only one of the child elements described in the XSD schema can appear in the XML document. |
| All | The child elements described in the XSD schema can appear in the XML document in any order. |
Notes
The compositors <xs:sequence> and <xs:choice> can be nested inside other compositors, and be given there own minOccurs and maxOccurs properties. This allows for quite complex combinations to be formed.
Example
The definitions of "Customer->Address" and "Supplier->Address" are currently not very usable as they are grouped into a single field. In the real world it would be better break this out into a few fields. Let's fix this by breaking it out using the same technique shown above:
<xs:element name="Customer">
<xs:complexType>
<xs:sequence>
<xs:element name="Dob" type="xs:date" />
<xs:element name="Address">
<xs:complexType>
<xs:sequence>
<xs:element name="Line1" type="xs:string" />
<xs:element name="Line2" type="xs:string" />
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="Supplier">
<xs:complexType>
<xs:sequence>
<xs:element name="Phone" type="xs:integer" />
<xs:element name="Address">
<xs:complexType>
<xs:sequence>
<xs:element name="Line1" type="xs:string" />
<xs:element name="Line2" type="xs:string" />
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
The previous XSD definitions are shown graphically in Liquid XML Studio as follows:

This is much better, but we now have two definitions for address, which are the identical.
Re-use
It would make much more sense to have a single definition for "Address", which could then be used by both customer and supplier.
We can do this by defining a complexType independently of an element, and giving it a unique name:
<xs:complexType name="AddressType">
<xs:sequence>
<xs:element name="Line1" type="xs:string"/>
<xs:element name="Line2" type="xs:string"/>
</xs:sequence>
</xs:complexType>
The previous XSD definitions are shown graphically in Liquid XML Studio as follows:
We have now defined a <xs:complexType> that describes our representation of an address, so let's use it.
Earlier, when we started looking at elements, we said you could define your own types instead of using one of the standard ones (xs:string, xs:integer), and that is exactly what were now doing.
<xs:element name="Customer">
<xs:complexType>
<xs:sequence>
<xs:element name="Dob" type="xs:date"/>
<xs:element name="Address" type="AddressType"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="supplier">
<xs:complexType>
<xs:sequence>
<xs:element name="address" type="AddressType"/>
<xs:element name="phone" type="xs:integer"/>
</xs:sequence>
</xs:complexType>
</xs:element>
The previous XSD definitions are shown graphically in Liquid XML Studio as follows:
Hopefully, the advantages are obvious. Instead of having to define Address twice (once for Customer and once for Supplier) we now have a single definition. This makes maintenance simpler, i.e. if you decide to add "Line3" or "Postcode" elements to your address you only have to add them in one place.
Example XML
<Customer>
<Dob> 2000-01-12T12:13:14Z </Dob>
<Address>
<Line1>34 thingy street, someplace</Line1>
<Line2>sometown, w1w8uu </Line2>
</Address>
</Customer>
<Supplier>
<Phone>0123987654</Phone>
<Address>
<Line1>22 whatever place, someplace</Line1>
<Line2>sometown, ss1 6gy </Line2>
</Address>
</Supplier>
Note: Only complex types defined globally (as children of the <xs:schema> element can have their own name and be re-used throughout the schema). If they are defined inline within an <xs:element> they can not have a name (anonymous) and can not be reused elsewhere.
Attributes
An attribute provides extra information within an element. Attributes have name and type properties and are defined within an XSD as follows:
<xs:attribute name="x" type="y"/>
An Attribute can appear 0 or 1 times within a given element in the XML document. Attributes are either optional or mandatory (by default they are optional). The "use" property in the XSD definition is used to specify if the attribute is optional or mandatory.
So the following are equivalent:
<xs:attribute name="ID" type="xs:string"/>
<xs:attribute name="ID" type="xs:string" use="optional"/>
The previous XSD definitions are shown graphically in Liquid XML Studio as follows
To specify that an attribute must be present, use = "required" (Note: use may also be set to "prohibited", but we'll come to that later).
An attribute is typically specified within the XSD definition for an element, nesting the attribute in the element. Attributes can also be specified globally and then referenced (but more about this later).
| Sample XSD | Sample XML |
<xs:element name="Order"> <xs:complexType> <xs:attribute name="OrderID" type="xs:int"/> </xs:complexType> </xs:element> | |
<xs:element name="Order"> <xs:complexType> <xs:attribute name="OrderID" type="xs:int" use="optional"/> </xs:complexType> </xs:element> |
or |
<xs:element name="Order"> <xs:complexType> <xs:attribute name="OrderID" type="xs:int" use="required"/> </xs:complexType> </xs:element> | |
The default and fixed attributes can be specified within the XSD attribute specification (in the same way as they are for elements).
Mixed Element Content
So far we have seen how an element can contain data, other elements and attributes. Elements can also contain a combination of all of these. You can also mix elements and data. You can specify this in the XSD schema by setting the mixed property.
<xs:element name="MarkedUpDesc">
<xs:complexType mixed="true">
<xs:sequence>
<xs:element name="Bold" type="xs:string" />
<xs:element name="Italic" type="xs:string" />
</xs:sequence>
</xs:complexType>
</xs:element>
A sample XML document could look like this:
<MarkedUpDesc>
This is an <Bold>Example</Bold> of <Italic>Mixed</Italic> Content,
Note there are elements mixed in with the elements data.
</MarkedUpDesc>