Liquid Technologies - XML Glossary
In This Topic
    XML Overview
    In This Topic

    An XML Document is a hierarchical machine readable text file that can also be easily understood and edited by a person with a basic text editor.

    Most machine readable formats are binary in nature making them small, efficient, and easy for a parser to read and write. On the down side,  they are very difficult for a person to understand or modify.

    XML documents are text based, which by nature is more verbose, but do make it possible for a person to easily understand and edit the content. The down side being that they are slower to parse and much larger. Both of these issues have been somewhat negated by the increase in processing power and bandwidth.

    The biggest advantage XML has over other text based formats is it hierarchical, standardized and very well supported.

    What do I mean by hierarchical?

    Well basically its a tree. This means that you an store complex data, lets imagine we are trying to describe the data contained in the following invoice.

    Billing Address
    1630 Revello Drive,
    Sunnydale,
    CA
    USA
    Shipping Address
    742 Evergreen Terrace,
    Springfield
    USA
    Invoice No 1000254
    Date 2012/05/05
    Payment Terms Net 30
    Item Quantity Unit Cost
    Spring 50 £3.99
    Sprocket 2 £5.49
    Washer 100 £0.04
    Total £214.48

    The invoice contains an 0-n line items and the structure of the Billing and Shipping addresses are the same. We could therefore represent this data as follows.

    Invoice XML
    Copy Code
    <?xml version="1.0" encoding="utf-8"?>
    <!-- Created with Liquid XML Studio 2012 Designer Edition -->
    <Invoice>
        <BillingAddress>
            <House>1630</House>
            <Street>Revello Drive</Street>
            <Town>Sunnydale</Town>
            <State>CA</State>
            <Country>USA</Country>
        </BillingAddress>
        <ShippingAddress>
            <House>742</House>
            <Street>Evergreen Terrace</Street>
            <Town>Springfield</Town>
            <Country>USA</Country>
        </ShippingAddress>
        <InvoiceNo>1000254</InvoiceNo> 
        <Date>2012/05/05</Date>
        <PaymentTerms>Net 30</PaymentTerms>   
        <LineItem>
            <Description>Spring</Description>
            <Quantity>50</Quantity>
            <Cost>3.99</Cost>
        </LineItem>
        <LineItem>
            <Description>Sprocket</Description>
            <Quantity>2</Quantity>
            <Cost>5.49</Cost>
        </LineItem>
        <LineItem>
            <Description>Washer</Description>
            <Quantity>100</Quantity>
            <Cost>0.04</Cost>
        </LineItem>
    </Invoice>
    

    If we could not store this data as a hierarchy then we would have to create a number of flat files (rather like database tables). A table for the invoice, line items and addresses, these would then need ID's and ID References to tie them all together. Although this approach works well in a database senario, its awkward when storing or moving relatively small amounts of data .

    So you can see XML potentially saves a lot of effort when it comes to storing and distrributing structured data.

    XML is an Industry standard.

    Because XML has been embraced across almost all platforms and languages, you can be confident an XML Parser exists for even the most exotic system.

    Furthermore because its now standard people are familiar with it, meaning that if they need to read or edit XML data they can rely on existing skills and knowledge.

    XML is a mature technology and as such there are a wealth of tools an utilities out there that simplify editing, validating and working with XML and XML related documents.

    The syntax of an XML document

    The Building blocks of an XML Document are Elements, a document must have 1 and only 1 root element (this is often referred to as the document element).

    This root/document element can then have any number of child Elements and Attributes.

    Details how the various constructs within an XML document are structured can be explored in the following topics

    The structure of an XML document

    The last section talked about the syntax of an XML document, basically what a basic parser will consider to be a well structured (this is normally referred to as "well formed") xml document.

    However, this basic structure does not give any indication of what kind of data can be contained within the XML document.

    If I am writing an application that reads in an XML document, then my application has a very clear idea of what kind of data the document should contain. A significant amount of checking therefore needs to be added within the reader code to ensure that the document does actually contain what is expected. This checking code could be greatly reduced if it could be offload to the XML parser. Well the good news is you can do just that with the addition of an XML Schema.

    An XML Schema describes the structure of the XML data (in the same way as a Database schema describes the content of a database). There are a number of technologies for describing XML Schemas, but only 2 have been standardized.

    Document Type Definition (DTD)

    This is the original technique for describing the structure of an XML document, its actually defined within the W3C's XML standard, however it been largely superseded by the W3C's XSD Schema standard.
    The DTD specification is limited, but was very basic and as such easy to author.

    More information about the standard can be found at

    New development work should not be undertaken with DTD's, instead use XSD's.

    XML Schema Definition (XSD)

    The W3C's XSD specification has now become the defacto standard for describing XML Data. It is much more comprehensive than the DTD specification, but with that comes a lot of complexity.
    Without an XSD authoring tool to guide the creation of the document its easy to make mistakes, and its still common to find formal XSD specifications that contain a number of errors.
    It is however a powerful language capable of describing complex data structures and validation rules.

    See Also