Understanding the XML Specification
The XML specification contains a grammar using the extended
Backus-Naur format for constructing XML documents.
Each rule of the grammar has the form:
-
symbol
- if defined by regular
expression: initial capital, lower case otherwise;
-
expression
- right-hand side of rule
which has the syntax shown below to match strings of one or more
characters.
-
#xN
-
where N is an hexadecimal integer (Unicode or ISO/IEC
10646 BMP code point).
-
[a-zA-Z], [#xN-#xN]
-
matches any character with a value in the range(s) indicated
(inclusive).
-
[^a-zA-Z], [^#xN-#xN]
-
matches any character with a value outside the range indicated.
-
[^abc], [^#xN#xN#xN]
-
matches any character with a value not among the characters given.
-
'texte' or "texte"
-
matches a literal string matching that given inside the single (double) quotes.
These symbols may be combined to match more complex patterns as follows,
where A and B represent simple expressions:
-
(expression)
- expression is treated as a unit and may be combined as
described in this list.
-
A?
- matches A or nothing; optional A
-
A B
- matches A followed by B.
-
A|B
- matches A or B but not both.
-
A-B
- matches any string that matches A but does not
match B.
-
A+
- matches one or more occurrences of A.
-
A*
- matches zero or more occurrences of A.
Other notations used in the productions are:
-
/* ... */
- comment.
-
[ wfc: ... ]
-
well-formedness constraint; this identifies by name
a constraint on well-formed documents associated with a production.
-
[ vc: ... ]
-
validity constraint; this identifies by name a
constraint on valid documents associated with a production.