- Escaping XML Data
- XML Encoding
- XML 1. 0 EBNF
- XML 1.1 EBNF
- Byte Order Marker
- XML Schemas
- XML Mixed Content
- Describing Complex Structures in XML
An XML Element can contain 0-n XML Attributes. Attributes are contained within the start tag and must be unique within it.
In practice this means an XML element can look like these examples
<MyElement myAttribute="attribute value"/>
<MyElement myAttribute="attribute value">...</MyElement>
<MyElement myAttribute='attribute value'>...</MyElement>
<MyElement myAttribute1="attribute value" myAttribute2="attribute value">...</MyElement>
Attributes can be associated with a namespace. Namespaces allow data to be broken up, it also allows a parser to ignore parts of a document it was not designed to deal with.
Namespaces are applied to an attribute using a namespace prefix. The prefix must be defined within the containing element (or any of its containing parent elements back to the document element).
<RootElm xmlns:nsA="MyNamespace"> <SubElm> <MyElement nsA:myAttribute1="attribute value" myAttribute2="attribute value">...</MyElement> </SubElm> </RootElm>
Typically attributes that have no namespace prefix, are deemed to belong to the containing elements namespace. But this is a complex area, have a look at XSD Namespace rules.
|Namespaces are a concept defined in a separate W3C specification 'Xml Names' the XML 1.0 & 1.1 specifications allows for namespaces, but do not make use of them explicitly. Other schema descriptions (notably XSD) make extensive use of namespaces.|
If you put control characters (",',&) into an attributes value, this would cause the parser to miss understand the resulting text, so in order for the parser to operate correctly these control characters need to be escaped, see Escaping XML Data.
You can not place a comment inside an attribute value.
When an XML Parser reads an XML Attribute it is supposed to do some processing to normalize the whitespace within the attribute value.
Normalization for all Attributes
When an XML Parser reads an XML Attribute the XML Spec says it should normalize the whitespace it contains (this means replacing all whitespace characters with spaces).
Before the value of an attribute is passed to the application or checked for validity, the XML processor MUST normalize the attribute value by applying the algorithm below, or by using some other method such that the value passed to the application is the same as that produced by the algorithm.
- All line breaks MUST have been normalized on input to #xA as described in 2.11 End-of-Line Handling, so the rest of this algorithm operates on text normalized in this way.
- Begin with a normalized value consisting of the empty string.
- For each character, entity reference, or character reference in the unnormalized attribute value, beginning with the first and continuing to the last, do the following:
- For a character reference, append the referenced character to the normalized value.
- For an entity reference, recursively apply step 3 of this algorithm to the replacement text of the entity.
- For a white space character (#x20, #xD, #xA, #x9), append a space character (#x20) to the normalized value.
- For another character, append the character to the normalized value.
Additional Normalization rules for DTDs
When the XML document is ready in conjunction with a DTD further rules come into play.
If the attribute type is not CDATA, then the XML processor MUST further process the normalized attribute value by discarding any leading and trailing space (#x20) characters, and by replacing sequences of space (#x20) characters by a single space (#x20) character.
Additional Normalization rules for XSD's
When the XML document is read in conjunction with an XSD schema, a number of additional rules apply.
If the Attributes data type is "xs:string" no further whitespace normalization is performed.
Attributes of any other data type are normalized using the xsd "Collapse" rule.
Note not all XML parsers correctly implement the whitespace normalization rules.
The syntax for an attribute is described by the W3C as using EBNF as follows.
 AttValue ::= '"' ([^<&"] | Reference)* '"'
| "'" ([^<&'] | Reference)* "'"  STag ::= '<' Name (S Attribute)* S? '>'  Attribute ::= Name Eq AttValue  EmptyElemTag ::= '<' Name (S Attribute)* S? '/>'