| XML, XSL, two of a family of extensible languages | ||
|---|---|---|
![]() | ![]() ![]() | ![]() |
XPath is the result of an effort to provide a common syntax and semantics for functionality shared between XSL Transformations (XSLT) and XPointer.
The primary purpose of XPath is to address parts of an XML document.
XPath also provides basic facilities for manipulation of strings, numbers and booleans.
The primary syntactic construct in XPath is the expression. An expression matches the production Expr. An expression is evaluated to yield an object, which has one of the following four basic types:
node-set, an unordered collection of nodes without duplicates;
boolean, true or false;
number, a floating-point number;
string, a sequence of Unicode characters;
Expression evaluation occurs with respect to a context. XSLT and XPointer specify how the context is determined for XPath expressions used in XSLT and XPointer respectively. The context consists of:
the context node;
a pair of non-zero positive integers (the context position and the context size);
a set of variable bindings;
a function library;
the set of namespace declarations in scope for the expression.
Location paths, the most important construct in XPath, can be expressed using a straightforward but rather verbose syntax. There are also a number of syntactic abbreviations that allow common cases to be expressed concisely.
[1] LocationPath ::= RelativeLocationPath
| AbsoluteLocationPath
[2] AbsoluteLocationPath ::= '/' RelativeLocationPath?
| AbbreviatedAbsoluteLocationPath
[3] RelativeLocationPath ::= Step
| RelativeLocationPath '/' Step
| AbbreviatedRelativeLocationPath
|
A location step consists of three parts:
[4] Step ::= AxisSpecifier NodeTest Predicate*
| AbbreviatedStep
[5] AxisSpecifier ::= AxisName '::'
| AbbreviatedAxisSpecifier
|
The list of allowed axis types in defined in production [6] that follows.
[6] AxisName ::= 'ancestor' | 'ancestor-or-self'
| 'attribute'
| 'child'
| 'descendant' | 'descendant-or-self'
| 'following' | 'following-sibling'
| 'namespace'
| 'parent'
| 'preceding' | 'preceding-sibling'
| 'self'
|
Every axis has a principal node type. If an axis can contain elements, then the principal node type is element; otherwise, it is the type of the nodes that the axis can contain. Thus, for
In particular, a node test * is true for any node of the principal node type, for instance child::* will select all element children of the context node, and attribute::* will select all attributes of the context node.
The node test text() is true for any text node, comment() is true for any comment node, and processing-instruction() is true for any processing instruction. Finally, a node test node() is true for any node of any type.
[7] NodeTest ::= WildcardName | NodeType '(' ')'
| 'processing-instruction' '(' Literal ')'
|
A predicate filters a node-set with respect to an axis to produce a new node-set. For each node in the node-set to be filtered, the PredicateExpr is evaluated with that node as the context node, with the number of nodes in node-set as the context size, and with the proximity position of the node in the node-set with respect to the axis as the context position; if PredicateExpr evaluates to true for that node, the node is included in the new node-set; otherwise, it is not included.
[8] Predicate ::= '[' PredicateExpr ']' [9] PredicateExpr ::= Expr |
[10] AbbreviatedAbsoluteLocationPath ::= '//'
RelativeLocationPath
[11] AbbreviatedRelativeLocationPath ::= RelativeLocationPath
'//' Step
[12] AbbreviatedStep ::= '.' | '..'
[13] AbbreviatedAxisSpecifier ::= '@'?
|
Examples follow:
div/para is short for
child::div/child::para
../title is short for
parent::node()/child::title
//para is short for
/descendant-or-self::node()/child::para
.//para is short for
self::node()/descendant-or-self::node()/child::para
para[@type="warning"] is short for
child::para[attribute::type="warning"]
|
The following figure explains in a visual way most of the axes.
Parentheses may be used for grouping.
[14] Expr ::= OrExpr
[15] PrimaryExpr ::= VariableReference
| '(' Expr ')' | Literal
| Number | FunctionCall
|
An argument is converted to type string as if by calling the string function. An argument is converted to type number as if by calling the number function. An argument is converted to type boolean as if by calling the boolean function. An argument that is not of type node-set cannot be converted to a node-set. It is an error if the number or type of arguments is wrong.
[16] FunctionCall ::= FunctionName '(' ( Argument ( ','
Argument)*)? ')'
[17] Argument ::= Expr
|
No types of objects that can be converted to node-sets.
[18] UnionExpr ::= PathExpr | UnionExpr '|' PathExpr
[19] PathExpr ::= LocationPath
| FilterExpr
| FilterExpr '/' RelativeLocationPath
| FilterExpr '//' RelativeLocationPath
[20] FilterExpr ::= PrimaryExpr | FilterExpr Predicate
|
[21] OrExpr ::= AndExpr | OrExpr 'or' AndExpr
[22] AndExpr ::= EqualityExpr
| AndExpr 'and' EqualityExpr
[23] EqualityExpr ::= RelationalExpr
| EqualityExpr '=' RelationalExpr
| EqualityExpr '!=' RelationalExpr
[24] RelationalExpr ::= AdditiveExpr
| RelationalExpr '<' AdditiveExpr
| RelationalExpr '>' AdditiveExpr
| RelationalExpr '<=' AdditiveExpr
| RelationalExpr '>=' AdditiveExpr
|
From the above grammar it follows that the precedence order is (from least important to most important):
The operators are all left associative. For example, 3 > 2 > 1 is equivalent to (3 > 2) > 1, which evaluates to false.
Note that the < character must be escaped as <. However, it might be easier to invert the inequality and use > instead.
[25] AdditiveExpr ::= MultiplicativeExpr
| AdditiveExpr '+' MultiplicativeExpr
| AdditiveExpr '-' MultiplicativeExpr
[26] MultiplicativeExpr ::= UnaryExpr
| MultiplicativeExpr MultiplyOperator
UnaryExpr
| MultiplicativeExpr 'div' UnaryExpr
| MultiplicativeExpr 'mod' UnaryExpr
[27] UnaryExpr ::= UnionExpr
| '-' UnaryExpr
|
Since XML allows - in names, the - operator typically needs to be preceded by whitespace since otherwise the result evaluates to a node-set.
When tokenizing, the longest possible token is always returned.
[28] ExprToken ::= '(' | ')' | '[' | ']' | '.' | '..' |
'@' | ',' | '::'
| WildcardName
| NodeType
| Operator
| FunctionName
| AxisName
| Literal
| Number
| VariableReference
[29] Literal ::= '"' [^"]* '"'
| "'" [^']* "'"
[30] Number ::= Digits ('.' Digits?)?
| '.' Digits
[31] Digits ::= [0-9]+
[32] Operator ::= OperatorName
| MultiplyOperator
| '/' | '//' | '|' | '+' | '-' | '='
| '!=' | '<' | '<=' | '>' | '>='
[33] OperatorName ::= 'and' | 'or' | 'mod' | 'div'
[34] MultiplyOperator ::= '*'
[35] FunctionName ::= QName - NodeType
[36] VariableReference ::= '$' QName
[37] WildcardName ::= '*'
| NCName ':' '*'
| QName
[38] NodeType ::= 'comment'
| 'text'
| 'processing-instruction'
| 'node'
[39] ExprWhitespace ::= S
|
For each function its type, argument(s), and the section where it is defined in the (August 1999) XPath WD is given.



