XML, XSL, two of a family of extensible languages
PREVIOUSFIRSTLASTNEXT

Physical structures

Character references

A character reference refers to a specific character in the ISO/IEC 10646 character set, for example one not directly accessible from available input devices.

[66] CharRef ::= '&#' [0-9]+ ';' | '&#x' [0-9a-fA-F]+ ';'

Entity references

[67]   Reference ::= EntityRef | CharRef
[68]   EntityRef ::= '&' Name ';'
[69] PEReference ::= '%' Name ';'

Example of character and entity references follow:
The ampersand (&) and less-than (>) signs...

This document was written on &date; by &Authors;.

<!-- declaration of the parameter entity "HTMLsymbol"... -->
<!ENTITY % HTMLsymbol PUBLIC
        "-//W3C//ENTITIES Symbols//EN//HTML"
        "http://www.w3.org/TR/xhtml1/DTD/HTMLsymbolx.ent">
<!-- ... and here it is referenced (in the DTD only!)    -->
%HTMLsymbol;

Entity declarations

Entities are declared using the syntax below.

[70] EntityDecl ::= GEDecl | PEDecl
[71]     GEDecl ::= '<!ENTITY' S Name S EntityDef S? '>'
[72]     PEDecl ::= '<!ENTITY' S '%' S Name S PEDef S? '>'
[73]  EntityDef ::= EntityValue | (ExternalID NDataDecl?)
[74]      PEDef ::= EntityValue | ExternalID

If the entity definition is an EntityValue (production [9]) the defined entity is called an internal entity. There is no separate physical storage object, and the content of the entity is given in the declaration.

An internal entity is a parsed entity. An example follows:
<!ENTITY MathML "Mathematical Markup Language">
<!ENTITY XMLS   "&MathML; and other extensible markup languages">

If the entity is not internal, it is an external entity, declared using the following syntax.

[75] ExternalID ::= 'SYSTEM' S SystemLiteral |
                    'PUBLIC' S PubidLiteral S SystemLiteral
[76]  NDataDecl ::= S 'NDATA' S Name         

  Examples of external entity declarations:
 <!ENTITY myfile     SYSTEM "/user/goosssens/gut99.xml''>
 <!ENTITY xhtml      PUBLIC  
          "-//W3C//DTD XHTML 1.0 Transitional//EN"  
          "/user/goossens/xml/dtds/transitional.dtd"
 <!ENTITY myFigure   SYSTEM "../oxford99.eps" NDATA eps> 

Parsed entities

External parsed entities may each begin with a text declaration.

[77] TextDecl ::= '<?xml' VersionInfo? EncodingDecl S? '?>'

The text declaration must be provided literally, not by reference to a parsed entity. No text declaration may appear at any position other than the beginning of an external parsed entity.

[78] extParsedEnt ::= TextDecl? content
[79]        extPE ::= TextDecl? extSubsetDecl

[80] EncodingDecl ::= S 'encoding' Eq ('"' EncName '"' |  
                                       "'" EncName "'" )
[81]      EncName ::= [A-Za-z] ([A-Za-z0-9._] | '-')* 

<?xml encoding='UTF-8'?><!-- default -->
<?xml version="1.0" encoding="ISO-8859-1">
<?xml encoding='EUC-JP'?> 

XML Processor Treatment of Entities and References

Section 4.4 in the XML Specification explains in detail the contexts in which character references, entity references, and invocations of unparsed entities might appear and the required behaviour of an XML processor in each case.

Section 4.5 explains how the replacement text for internal entities is constructed.

Consider the following (from Section 4.5):
 <!ENTITY % pub    "&#xc9;ditions Gallimard" >
 <!ENTITY   rights "All rights reserved" >
 <!ENTITY   book   "La Peste: Albert Camus,
 &#xA9; 1947 %pub;. &rights;" >
this results in the following replacement text for the entity book:
 La Peste: Albert Camus,
 © 1947 Éditions Gallimard. &rights;

Predefined Entities

All XML processors must recognise the five entities amp, gt, lt, apos, quot, whether they are declared or not

Valid documents must nevertheless declare these entities in their DTD if they are referenced.
 <!ENTITY lt     "&#38;#60;">
 <!ENTITY gt     "&#62;">
 <!ENTITY amp    "&#38;#38;">
 <!ENTITY apos   "&#39;">
 <!ENTITY quot   "&#34;">

The < and & characters in the declarations of lt and amp are doubly escaped to meet the requirement that entity replacement be well-formed.

Notation declarations

[82] NotationDecl ::= '<!NOTATION' S Name S (ExternalID |  PublicID) S?
                      '>'
[83]     PublicID ::= 'PUBLIC' S PubidLiteral

The document entity

The document entity serves as the root of the entity tree and a starting-point for an XML processor. The document entity has no name and might well appear on a processor input stream without any identification at all.


Last updated: September 10th 1999