XML, XSL, two of a family of extensible languages
PREVIOUSFIRSTLASTNEXT

The XPath language

The primary syntactic construct in XPath is the expression. An expression matches the production Expr. An expression is evaluated to yield an object, which has one of the following four basic types:

Expression evaluation occurs with respect to a context. XSLT and XPointer specify how the context is determined for XPath expressions used in XSLT and XPointer respectively. The context consists of:

Location paths

Location paths, the most important construct in XPath, can be expressed using a straightforward but rather verbose syntax. There are also a number of syntactic abbreviations that allow common cases to be expressed concisely.

[1]   LocationPath            ::=    RelativeLocationPath
                                   | AbsoluteLocationPath
[2]   AbsoluteLocationPath    ::=    '/' RelativeLocationPath?
                                       | AbbreviatedAbsoluteLocationPath
[3]   RelativeLocationPath    ::=    Step
                                   | RelativeLocationPath '/' Step
                                   | AbbreviatedRelativeLocationPath

A location step consists of three parts:

  1. an axis, which specifies the tree relationship between the nodes selected by the location step and the context node;
  2. a node test, which specifies the node type and expanded-name of the nodes selected by the location step;
  3. zero or more predicates, which use arbitrary expressions to further refine the set of nodes selected by the location step.

[4]   Step             ::=    AxisSpecifier NodeTest Predicate*
                            | AbbreviatedStep
[5]   AxisSpecifier    ::=    AxisName '::'
                            | AbbreviatedAxisSpecifier

The list of allowed axis types in defined in production [6] that follows.

 [6]   AxisName    ::=    'ancestor'    | 'ancestor-or-self'
                        | 'attribute'
                        | 'child'
                        | 'descendant'  | 'descendant-or-self'
                        | 'following'   | 'following-sibling'
                        | 'namespace'
                        | 'parent'
                        | 'preceding'   | 'preceding-sibling'
                        | 'self'

Every axis has a principal node type. If an axis can contain elements, then the principal node type is element; otherwise, it is the type of the nodes that the axis can contain. Thus, for

In particular, a node test * is true for any node of the principal node type, for instance child::* will select all element children of the context node, and attribute::* will select all attributes of the context node.

The node test text() is true for any text node, comment() is true for any comment node, and processing-instruction() is true for any processing instruction. Finally, a node test node() is true for any node of any type.

[7]   NodeTest    ::=    WildcardName | NodeType '(' ')'
                       | 'processing-instruction' '(' Literal ')'

A predicate filters a node-set with respect to an axis to produce a new node-set. For each node in the node-set to be filtered, the PredicateExpr is evaluated with that node as the context node, with the number of nodes in node-set as the context size, and with the proximity position of the node in the node-set with respect to the axis as the context position; if PredicateExpr evaluates to true for that node, the node is included in the new node-set; otherwise, it is not included.

[8]   Predicate        ::=    '[' PredicateExpr ']'
[9]   PredicateExpr    ::=    Expr

[10]   AbbreviatedAbsoluteLocationPath    ::=    '//'
                                                 RelativeLocationPath
[11]   AbbreviatedRelativeLocationPath    ::=    RelativeLocationPath
                                                 '//' Step
[12]   AbbreviatedStep                    ::=    '.' | '..'
[13]   AbbreviatedAxisSpecifier           ::=    '@'?

Examples follow:
div/para is short for
         child::div/child::para
../title is short for 
         parent::node()/child::title
//para   is short for 
         /descendant-or-self::node()/child::para
.//para  is short for 
         self::node()/descendant-or-self::node()/child::para
para[@type="warning"] is short for
         child::para[attribute::type="warning"]

A visual representation

The following figure explains in a visual way most of the axes.

Expressions

Parentheses may be used for grouping.
[14]   Expr           ::=    OrExpr
[15]   PrimaryExpr    ::=    VariableReference
                           | '(' Expr ')'     | Literal
                           | Number           | FunctionCall

An argument is converted to type string as if by calling the string function. An argument is converted to type number as if by calling the number function. An argument is converted to type boolean as if by calling the boolean function. An argument that is not of type node-set cannot be converted to a node-set. It is an error if the number or type of arguments is wrong.

[16]   FunctionCall    ::=    FunctionName '(' ( Argument ( ','
                              Argument)*)? ')'
[17]   Argument        ::=    Expr

No types of objects that can be converted to node-sets.
[18]   UnionExpr     ::=    PathExpr | UnionExpr '|' PathExpr
[19]   PathExpr      ::=    LocationPath
                          | FilterExpr
                          | FilterExpr '/' RelativeLocationPath
                          | FilterExpr '//' RelativeLocationPath
[20]   FilterExpr    ::=    PrimaryExpr | FilterExpr Predicate

[21]   OrExpr            ::=    AndExpr | OrExpr 'or' AndExpr
[22]   AndExpr           ::=    EqualityExpr
                              | AndExpr 'and' EqualityExpr
[23]   EqualityExpr      ::=    RelationalExpr
                              | EqualityExpr '=' RelationalExpr
                              | EqualityExpr '!=' RelationalExpr
[24]   RelationalExpr    ::=    AdditiveExpr
                              | RelationalExpr '<' AdditiveExpr
                              | RelationalExpr '>' AdditiveExpr
                              | RelationalExpr '<=' AdditiveExpr
                              | RelationalExpr '>=' AdditiveExpr

From the above grammar it follows that the precedence order is (from least important to most important):

  1. or
  2. and
  3. =, !=
  4. <=, <, =>=>

The operators are all left associative. For example, 3 > 2 > 1 is equivalent to (3 > 2) > 1, which evaluates to false.

Note that the < character must be escaped as &lt;. However, it might be easier to invert the inequality and use > instead.

[25]   AdditiveExpr          ::=    MultiplicativeExpr
                                  | AdditiveExpr '+' MultiplicativeExpr
                                  | AdditiveExpr '-' MultiplicativeExpr
[26]   MultiplicativeExpr    ::=    UnaryExpr
                                  | MultiplicativeExpr MultiplyOperator
                                    UnaryExpr
                                  | MultiplicativeExpr 'div' UnaryExpr
                                  | MultiplicativeExpr 'mod' UnaryExpr
[27]   UnaryExpr             ::=    UnionExpr
                                  | '-' UnaryExpr

Since XML allows - in names, the - operator typically needs to be preceded by whitespace since otherwise the result evaluates to a node-set.

When tokenizing, the longest possible token is always returned.

[28]   ExprToken            ::=    '(' | ')' | '[' | ']' | '.' | '..' |
                                   '@' | ',' | '::'
                                 | WildcardName
                                 | NodeType
                                 | Operator
                                 | FunctionName
                                 | AxisName
                                 | Literal
                                 | Number
                                 | VariableReference
[29]   Literal              ::=    '"' [^"]* '"'
                                 | "'" [^']* "'"
[30]   Number               ::=    Digits ('.' Digits?)?
                                 | '.' Digits
[31]   Digits               ::=    [0-9]+
[32]   Operator             ::=    OperatorName
                                 | MultiplyOperator
                                 | '/'  | '//' | '|'  | '+' | '-' | '=' 
                                 | '!=' | '<'  | '<=' | '>' | '>='
[33]   OperatorName         ::=    'and' | 'or' | 'mod' | 'div'
[34]   MultiplyOperator     ::=    '*'
[35]   FunctionName         ::=    QName - NodeType
[36]   VariableReference    ::=    '$' QName
[37]   WildcardName         ::=    '*'
                                 | NCName ':' '*'
                                 | QName
[38]   NodeType             ::=    'comment'
                                 | 'text'
                                 | 'processing-instruction'
                                 | 'node'
[39]   ExprWhitespace       ::=    S

XPath core function library (in alphabetical order)

For each function its type, argument(s), and the section where it is defined in the (August 1999) XPath WD is given.

boolean boolean(object)
boolean function (Section 4.3).
number ceiling(number)
number function (Section 4.4).
string concat(string,string,string*)
string function (Section 4.2).
string contains(string,string)
string function (Section 4.2).
number count(node-set)
node-set function (Section 4.1).
boolean false()
boolean function (Section 4.3).
number floor(number)
number function (Section 4.4).
node-set id(object)
node-set function (Section 4.1).
boolean lang(string)
boolean function (Section 4.3).
number last()
node-set function (Section 4.1).
string local-name(node-set?)
node-set function (Section 4.1).
string name(node-set?)
node-set function (Section 4.1).
string namespace-uri(node-set?)
node-set function (Section 4.1).
string normalize(string?)
string function (Section 4.2).
boolean not(boolean)
boolean function (Section 4.3).
number number(object?)
number function (Section 4.4).
number position()
node-set function (Section 4.1).
number round(number)
number function (Section 4.4).
string starts-with(string,string)
string function (Section 4.2).
string string(object?)
string function (Section 4.2).
number string-length(string?)
string function (Section 4.2).
string substring(string,number,number?)
string function (Section 4.2).
string substring-after(string,string)
string function (Section 4.2).
string substring-before(string,string)
string function (Section 4.2).
number sum(node-set)
number function (Section 4.4).
string translate(string,string,string)
string function (Section 4.2).
boolean true()
boolean function (Section 4.3).

Last updated: September 10th 1999