Escaping XML Data

XML > Escaping XML Data

Escaping XML Data

Adding control characters ('<', '>', ''', '"', '&') into xml data can cause the parser to miss understand the resulting data. The solution is to escape the control characters so that the parser can interpret them correctly as data, and not confuse them for markup.

The following is a list of all the built in replacements

Char Escape String
< &lt;
> &gt;
" &quot;
' &apos;
& &amp;

These can be used within XML attributes, elements, text and processing instructions.

It is good practice to always escape these characters when they appear in XML data, however this is not always required. 

Element and Attribute names can NOT contain characters <>"'& escaped or otherwise

Attribute Data

When attribute data is enclosed in double quotes " then any double quote " characters within the data must be escaped.
When attribute data is enclosed in single quotes ' then any single quote ' characters within the data must be escaped.
The ampersand & character must be escaped.
The greater than and less than characters do no have to be escaped but its good practice to do it.

Data In XML Notes
He said "OK" attributeName="He said &quot;OK&quot;" The double quotes in the data must be escaped.
He said "OK" attributeName='He said "OK"' The double quotes do not need escaping as they are contained within a single quoted attribute.
He said "OK" attributeName='He said &quot;OK&quot;' However there is no harm in always escaping them.
She said "You're right" attributeName="She said &quot;You're right&quot;" This is the minimum escaping required
She said "You're right" attributeName='She said "You&apos;re right"' This is the minimum escaping required
She said "You're right" attributeName="She said &quot;You&apos;re right&quot;" Typically all the data would be escaped though.
Smith&Sons attributeName="Smith&amp;Sons" The & must always be escaped within attribute data.
a>b attributeName="a>b" The > does not have to be escaped
a>b attributeName="a&gt;b" It is good practice to escape > characters.
a<b attributeName="a&lt;b" The < character MUST be escaped


Element Data

The '<' character must be escaped within element text data so it is not confused for the opening brace of the next element.
The '&' character must always be escaped.
The other replacements (even the closing brace '>') are optional, but its good practice to always escape them.

Data In XML Notes
if (age < 5) <MyElement>if (age &lt; 5)</MyElement> The < char must always be escaped
if (age > 5) <MyElement>if (age > 5)</MyElement> The > char does not have to be escaped
if (age > 5) <MyElement>if (age &gt; 5)</MyElement> However, it is good practice to escape > chars
if (age > 3 && age < 8) <MyElement>if (age &gt; 3 &amp;&amp; age &lt; 8)</MyElement>  
She said "You're right" <MyElement>She said "You're right"</MyElement> The ' and " chars don't need escaping within an element

CDATA

Data within a CDATA block can not be escaped. When the XML document is parsed (Character references are not expanded), so any chars within a CDATA block are just seen as character data.

As no escaping is possible within CDATA it is not possible to escape the terminating ]]> therefore not possible to nest CDATA blocks.

Data In XML Notes
if (age < 5) <![CDATA[if (age < 5)</MyElement>]]>  
if (age > 3 && age < 8) <![CDATA[if (age > 3 && age < 8))</MyElement>]]>  
]]> ERROR It is not possible to escape the end sequence of the CDATA block, so the string ]]> can not be stored within it.

Comments

Data within a comment block can not be escaped. When the XML document is parsed (Character references are not expanded), so any chars within a Comment block are just seen as character data.

As no escaping is possible within a Comment it is not possible to escape the terminating --> therefore not possible to nest Comment blocks.

The sequence -- may not appear within a comment, no provision is provided for escaping this sequence.

Data In XML Notes
Some Comment <!-- Some Comment -->  
The chars --> end a comment <!-- The chars --> end a comment --> This is Invalid. The --> in the comment can not be escaped, and contains the sequence -- which is illegal in a comment.
The chars -- are also illegal <!-- The chars -- are also illegal --> This is Invalid. The character sequence -- is not allowed in a comment.
if (age > 3 && age < 8) <!-- if (age > 3 && age < 8) --> Valid. The data requires no escaping
<CommentedOutElm>
   data
</CommentedOutElm>
<!-- <CommentedOutElm>
   data
</CommentedOutElm> -->
Valid. The data requires no escaping

Character References

Character references allow the character code to be specified within the data instead of the literal character. This can be useful if you can not type the character (i.e. ©) or if the XML document encoding does not support the character directly.

The character encodings can be used interchangeable with the escape chars listed above.

Char Escape String Character Encoding
< &lt; &#60;
> &gt; &#62;
" &quot; &#34;
' &apos; &#39;
& &amp; &#38;

See Also

Try Liquid XML Free and see how we can help you today Free Trial