Describing Complex Structures in XML

XML > Describing Complex Structures in XML

When the data we are trying to represent is a simple hierarchy, the layout fits naturally into XML.

 

As XML Data Copy Code
<Company>
  <Name>Bean Shoot Trading</Name>
  <Employee>
    <Name>Joe Bloggs</Name>
    <Age>43</Age>
    <Address>
      <HouseNo>50</HouseNo>
      <Street>Vanilla Crescent</Street>
      <ZipCode>123456</ZipCode>
    </Address>
  </Employee>
  <Employee>
    <Name>Jane Doe</Name>
    <Age>35</Age>
    <Address>
      <HouseNo>1245</HouseNo>
      <Street>Mornington Crescent</Street>
      <ZipCode>66475</ZipCode>
    </Address>
  </Employee>
</Company>

 However if the data being described contains links or shared data, then this is not always possible to model within a simple hierarchy.

 

In this example Company "Abacus Inc" and "Bean Shoot Trading" both link to the Employee "Jane Doe". We could duplicate her data when converting to XML, but then we would loose the fact that the 2 companies share the same employee. It would look more like their are 2 Jane Doe's.

Represented as XML (Attempt 1) Copy Code
<Companies>
  <Company>
    <Name>Bean Shoot Trading</Name>
    <Employee>
      <Name>Joe Bloggs</Name>
      <Age>43</Age>
      <Address>
        <HouseNo>50</HouseNo>
        <Street>Vanilla Crescent</Street>
        <ZipCode>123456</ZipCode>
      </Address>
    </Employee>
    <Employee>
      <Name>Jane Doe</Name>
      <Age>35</Age>
      <Address>
        <HouseNo>1245</HouseNo>
        <Street>Mornington Crescent</Street>
        <ZipCode>66475</ZipCode>
      </Address>
    </Employee>
  </Company>
  <Company>
    <Name>Abacus Inc</Name>
    <Employee>
      <Name>Jane Doe</Name>
      <Age>35</Age>
      <Address>
        <HouseNo>1245</HouseNo>
        <Street>Mornington Crescent</Street>
        <ZipCode>66475</ZipCode>
      </Address>
    </Employee>
    <Employee>
      <Name>Average Joe</Name>
      <Age>26</Age>
        <Address>
        <HouseNo>50</HouseNo>
        <Street>New Street</Street>
        <ZipCode>954674</ZipCode>
      </Address>
    </Employee>
  </Company>
<Companies>

In order to maintain the relationships described in the source data we must conceptually break up our entities and store them separately. We can then reference them when needed.

The next diagram shows the conceptual model that will be serialized.

Represented as XML (Attempt 2) Copy Code
<People>
  <Person id="103">
    <Name>Joe Bloggs</Name>
    <Age>43</Age>
    <Address>
      <HouseNo>50</HouseNo>
      <Street>Vanilla Crescent</Street>
      <ZipCode>123456</ZipCode>
    </Address>
  </Person>
  <Person id="102">
    <Name>Jane Doe</Name>
    <Age>35</Age>
    <Address>
      <HouseNo>1245</HouseNo>
      <Street>Mornington Crescent</Street>
      <ZipCode>66475</ZipCode>
    </Address>
  </Person>
  <Person id="101">
    <Name>Average Joe</Name>
    <Age>26</Age>
    <Address>
      <HouseNo>50</HouseNo>
      <Street>New Street</Street>
      <ZipCode>954674</ZipCode>
    </Address>
  </Person>
</People>
<Companies>
  <Company>
    <Name>Bean Shoot Trading</Name>
    <Employee personID="103"/>
    <Employee personID="102"/>
  </Company>
  <Company>
    <Name>Abacus Inc</Name>
    <Employee personID="101"/>
    <Employee personID="102"/>
  </Company>
<Companies>

By giving all the Person entities a unique ID, this solves the problem, allowing the data to be re-loaded and original hierarchy maintained.

The ID's defined in the Person element, are referenced within the Employee element via personID. This is effectively modeling the data more like a set of database tables using primary and foreign keys. It does make the XML a little more difficult to read, but makes it possible to describe complex relationships, that are impossible to model in a linear hierarchy.

There are no hard and fast rules regarding attributes, but its common practice to use them just to hold meta data, i.e. data the pertains to the interpretation of the element, not actual information which would end up in the final data model. The use of the attributes id an personID in this example are good illustrations of this. They hold information that relates to the structure of the data, but are just artifacts of the serialization processes; when the model is loaded from XML back into an object model (be that in a database or set of OO classes) these attributes are no longer needed.

XML Schemas have mechanisms that allow these ID and ID references to be checked to ensure that they exist.

See Also

Try Liquid XML Free and see how we can help you today Free Trial