An XML Entity allows tags to be defined that will be replaced by content when the XML Document is parsed.
There are several types of Entities
A character reference refers to a specific character in the ISO/IEC 10646 character set (to all intents and purposes ISO/IEC 10646 is Unicode).
The value for the character can either be specified in decimal using the notation R or specified in hex using notation R In this case they both represent the character 'R'.
Character references are typically used to output extended Unicode character that editors/keyboards/encodings are unable to enter directly, or to escape characters that would be mistaken for control characters.
Character Reference (Decimal) | Character Reference (Hex) | Character |
© | © | © |
™ | ™ | ™ |
Character references are expanded within the data held in elements, attributes, processing instructions & comments . They are not expanded within CDATA blocks.
The syntax for a character reference is described by the W3C as using EBNF as follows.
[66] CharRef ::= '&#' [0-9]+ ';' | '&#x' [0-9a-fA-F]+ ';'
Entity References are defined within a DTD as follows, for more information see DTD ENTITY.
<!ENTITY entityName "entityValue">
Internal entities are a mechanism to define replacement values, whenever the entityName is referenced in the XML document, the entityValue is substituted in its place. The XML Parser parsers the substituted entityValue as XML data.
Entity definition within a DTD |
Copy Code
|
---|---|
<!ENTITY CompanyName 'Liquid Technologies Ltd' > |
This defines an Entity called "CompanyName", that when used will be replaced by the value "Liquid Technologies Ltd".
Entities can be referenced using the notation &EntityName;
Sample usage within an XML element |
Copy Code
|
---|---|
<Vendor>&CompanyName;</Vendor> |
XML Parser to Expands |
Copy Code
|
---|---|
<Vendor>Liquid Technologies Ltd</Vendor> |
Sample usage within an XML attribute |
Copy Code
|
---|---|
<Vendor name="&CompanyName;" /> |
XML Parser to Expands |
Copy Code
|
---|---|
<Vendor name="Liquid Technologies Ltd" /> |
Full Parsed Internal General Entity Example |
Copy Code
|
---|---|
<?xml version="1.0" standalone="no"?> <!DOCTYPE product [ <!ELEMENT product (name, manufactuer)> <!ELEMENT name (#PCDATA)> <!ELEMENT manufactuer (#PCDATA)> <!ENTITY CompanyName "Liquid Technologies"> ]> <product> <name>Liquid XML Studio</name> <manufactuer>&CompanyName;</manufactuer> </product> |
The XML standard also defines a set of pre-defined entities.
Char | Escape String |
< | < |
> | > |
" | " |
' | ' |
& | & |
<!ENTITY entityName SYSTEM "entityUri">
<!ENTITY entityName PUBLIC "publicID" "entityUri">
A Parsed External General Entity Declaration is much the same as a XML Entity Refs except that the value for the replacement is read from an external file. As the XML Parser reads the data from the referenced entityUri (or retrieves it via the publicID), it parses it as XML data. This can be used to modularize large XML documents, i.e. a book could have every chapter stored in a separate file, and external entity references could be used pull them all together. They can also be used to hold common data, meaning only a single point of change should the shared data need amending.
Notes
Parsed External Entity Example |
Copy Code
|
---|---|
<?xml version="1.0" standalone="no"?> <!DOCTYPE experiment_a [ <!ELEMENT book (chapter)*> <!ELEMENT chapter ANY> <!ENTITY BookChapter1 SYSTEM "Chapter1.xml"> ]><book>&BookChapter1;</book> |
<!ENTITY entityName SYSTEM "entityUri" NDATA dataType>
If the ENTITY declaration contains the NDATA tag then the data read from the entityUri is not parsed and treated as data of type determined by the dataType argument.
The dataType argument is a reference to a NOTATION, which provides the consuming application within information about the type of data that will be contained at the entityUri.
This mechanism allows non-XML data to be brought into an XML Document.
Unparsed External Entity Example |
Copy Code
|
---|---|
<?xml version="1.0" standalone="no" ?> <!DOCTYPE img [ <!ELEMENT img EMPTY> <!ATTLIST img src ENTITY #REQUIRED> <!ENTITY companyLogo PUBLIC "-//W3C//GIF logo//EN" "http://www.w3.org/logo.gif" NDATA gif> <!NOTATION gif PUBLIC "gif viewer"> ]> <img src="companyLogo"/> |
Parameter entities are very similar to general entities, except they can only be used within the structure of the DTD itself (i.e. they can not appear within the XML document elements, attributes or processing instructions).
Parameter entities are defined in a similar way, but prefixed with a %
<!ENTITY % ParameterEntityName SYSTEM "uri">
Sample Parameter Entity Definition and Use |
Copy Code
|
---|---|
<!ENTITY % copyright '©'> <!ENTITY CompanyName 'Liquid Technologies Ltd %copyright;' > |
Invalid XML as parameter entities can not be used in XML data |
Copy Code
|
---|---|
<Vendor>Liquid Technologies %copyright;</Vendor> |
Parameter entities are very similar to external general entities, except they can only be used within the structure of the DTD itself (i.e. they can not appear within the XML document elements, attributes or processing instructions).
Parameter entities are defined in a similar way, but prefixed with a %
<!ENTITY % ParameterEntityName SYSTEM "uri">
<!ENTITY % ParameterEntityName PUBLIC "PublicID" "uri">
When the XML Parser reads the text referenced by the external entity it must treat it as DTD data and parse it accordingly.
This mechanism allows external DTD's to be included, which can make it possible to modularize DTD's for better manageability or re-use.
Sample Parameter Entity Definition and Use |
Copy Code
|
---|---|
<?xml version="1.0" standalone="no"?> <!DOCTYPE invoice [ <!ENTITY % addressTypes SYSTEM "http://www.liquid-technologies.com/AddressCommon.dtd"> %addressTypes; ... ]> |
Although entities have there uses they typically add a level of complexity that is rarely justified. Entities add a level of complexity to an XML document and require that the associated DTD schema is present in order to properly read the document.
Using Parsed External Parameter Entity Declaration to modularize DTD's is a good use for them, other than that it is uncommon to come across circumstances that justify the creation of new Entity definitions, the advice being use the pre-defined ones, but don't make up your own.
On the whole entities are just simple replacements, but they can be nested which potentially makes them very complex.
Complex example from the W3C XML Spec |
Copy Code
|
---|---|
1 <?xml version='1.1'?> 2 <!DOCTYPE test [ 3 <!ELEMENT test (#PCDATA) > 4 <!ENTITY % xx '%zz;'> 5 <!ENTITY % zz '<!ENTITY tricky "error-prone" >' > 6 %xx; 7 ]> 8 <test>This sample shows a &tricky; method.</test> |
This produces the following:
Step 1
In line 4, the reference to character 37 is expanded immediately, and the parameter entity "xx" is stored in the symbol table with the value "%zz;". Since the replacement text is not rescanned, the reference to parameter entity "zz" is not recognized. (And it would be an error if it were, since "zz" is not yet declared.)
1 <?xml version='1.1'?> 2 <!DOCTYPE test [ 3 <!ELEMENT test (#PCDATA) > 4 <!ENTITY % xx '%zz;'> 5 <!ENTITY % zz '<!ENTITY tricky "error-prone" >' > 6 %xx; 7 ]> 8 <test>This sample shows a &tricky; method.</test>
Step 2
In line 5, the character reference "<" is expanded immediately and the parameter entity "zz" is stored with the replacement text "<!ENTITY tricky "error-prone">", which is a well-formed entity declaration.
1 <?xml version='1.1'?> 2 <!DOCTYPE test [ 3 <!ELEMENT test (#PCDATA) > 4 <!ENTITY % xx '%zz;'> 5 <!ENTITY % zz '<!ENTITY tricky "error-prone" >' > 6 %xx; 7 ]> 8 <test>This sample shows a &tricky; method.</test>
Step 3
In line 6, the reference to "xx" is recognized, and the replacement text of "xx" (namely "%zz;") is parsed.
1 <?xml version='1.1'?> 2 <!DOCTYPE test [ 3 <!ELEMENT test (#PCDATA) > 4 <!ENTITY % xx '%zz;'> 5 <!ENTITY % zz '<!ENTITY tricky "error-prone" >' > 6 %zz; 7 ]> 8 <test>This sample shows a &tricky; method.</test>
Step 4
The reference to "zz" is recognized in its turn, and its replacement text ("<!ENTITY tricky "error-prone">") is parsed. The general entity "tricky" has now been declared, with the replacement text "error-prone".
1 <?xml version='1.1'?> 2 <!DOCTYPE test [ 3 <!ELEMENT test (#PCDATA) > 4 <!ENTITY % xx '%zz;'> 5 <!ENTITY % zz '<!ENTITY tricky "error-prone" >' > 6 <!ENTITY tricky "error-prone" > 7 ]> 8 <test>This sample shows a &tricky; method.</test>
Step 5
In line 8, the reference to the general entity "tricky" is recognized, and it is expanded, so the full content of the test element is the self-describing (and ungrammatical) string This sample shows a error-prone method.
1 <?xml version='1.1'?> 2 <!DOCTYPE test [ 3 <!ELEMENT test (#PCDATA) > 4 <!ENTITY % xx '%zz;'> 5 <!ENTITY % zz '<!ENTITY tricky "error-prone" >' > 6 <!ENTITY tricky "error-prone" > 7 ]> 8 <test>This sample shows a error-prone method.</test>
The syntax for an element is described by the W3C as using EBNF as follows.
[67] Reference ::= EntityRef | CharRef
[68] EntityRef ::= '&' Name ';'
[69] PEReference ::= '%' Name ';'