A character reference refers to a specific character in the ISO/IEC 10646 character set (to all intents and purposes ISO/IEC 10646 is Unicode).
The value for the character can either be specified in decimal using the notation R or specified in hex using notation R In this case they both represent the character 'R'.
Character references are typically used to output extended Unicode character that editors/keyboards/encodings are unable to enter directly, or to escape characters that would be mistaken for control characters.
Character Reference (Decimal) | Character Reference (Hex) | Character |
© | © | © |
™ | ™ | ™ |
Character references are expanded within the data held in elements, attributes, processing instructions & comments . They are not expanded within CDATA blocks.
The syntax for a character reference is described by the W3C as using EBNF as follows.
[66] CharRef ::= '&#' [0-9]+ ';' | '&#x' [0-9a-fA-F]+ ';'