## Intro

XPath is a W3C standard (XML Path Language (XPath) Version 1.0 [§]) which queries XML documents for XSLT. XPath is an expression language for accessing different parts of an XML document.

An XML document can be expressed as a hierarchy or tree of nodes. Nodes include the following:

• Elements
• Attributes
• Processing Instructions
• Textual Content
• Namespace

XPath expressions (aka patterns) are inclusive or greedy --they match anything that applies and you have to restrict what it grabs with predicates.

XPath expressions usually define a node a location path which itself is consists of one or more location steps separated by / delimiters (EG: customers/customer[sex='F']/name[@title='Dr.']/*). This is analogous to how a file in directory system may be specified. The analogy even covers "relative v absolute" items. EG: "subdir" is relative  to the current location, but "/subdir" is relative to the root director or root node.

Location paths can be compound, i.e. unioned. EG: LocationPath1 | LocationPath2.

A location step has this syntax: Axis::NodeTest[Predicate].

• Axis. Optional. An Axis indicates which direction (up, down, or sideways) to start looking in the nodes specified by the NodeTest. Note that files on hard drives are usu. only specified going down. Here are the 14 axis options:
1. child::. The default axis. A given node may have 0 or more child nodes. The root node and element nodes have children but comments, attributes, processing instructions, and namespaces do not.
• EG: child::x and x are equivalent. Matches all x elements in the current context.
• EG: a/child::b and a/b are equivalent. Matches all b elements that have an a element as a parent or matches all b elements that are children of any a element.
• EG: /. Matches the root node.
• EG: *. Matches any node.
• EG: a/*/c. Matches any c elements that are grandchildren of any a elements.
2. parent::. There may be only 1 parent for a node.
• EG: parent:: and .. are equivalent.
3. descendant-or-self::. The current node and its descendants.
• EG: a/descendant-or-self::c and a//c are equivalent. Matches all c elements that are descendants of an a element or matches all c elements that have an a element as an ancestor.
• EG: //x matches any x elements in the document.
4. ancestor-or-self::. The current node and its ancestors.
5. descendant::. Descendants of the current node.
6. ancestor::. Ancestors of the current node.
• EG: Assume a tree has branches with fruit.
<xsl:template match="fruit">
Fruit <xsl:apply-templates select="name"/> belongs to branch
<xsl:apply-templates select=".."/> of tree
<xsl:apply-templates select="ancestor::tree/branch"/>.
</xsl:template>
7. following::. Any node which appears in the document after the parent of the context node, except ancestors, attribute nodes, or namespace nodes.
8. preceding::. Any node which appears in the document before the parent of the context node, except ancestors, attribute nodes, or namespace nodes.
9. following-sibling::. Any node which appears in the document after the context node, except ancestors, attribute nodes, or namespace nodes.
10. preceding-sibling::. Any node which appears in the document before the context node, except ancestors, attribute nodes, or namespace nodes.
11. self::. The current node.
• EG: self:: and . are equivalent.
• EG: .//x matches any x element that are descendants of the current context.
• EG: *[position()=1 and self::x] matches any x element that is the first child of its parent.
12. attribute::. Attribute nodes of the current node. Shorthand is @.
• EG: attribute::FName and @FName are equivalent. This matches FName attributes and does not match every element that has an FName attribute.
• EG: *[@name="test"]. Matches any element whose name attribute has a value of test.
13. namespace::. Namespace nodes of the current node.
• EG: namespace::myNamespace is the same as myNamespace:.
14. Here is a excellent graphic (from CraneSoftwrights.com) of the axes:
• NodeTest. Required. Specifies the initial set of nodes. There are several options:
• Node Name Test. The Name Type can be one of 3 types:
• *. Any node.
• EG: attribute::* and @* are equivalent. Matches all attributes.
• QName. A qualified name of a node, i.e. namespace and node name.
• EG: myNamespace:myElement.
• NCName. The regularly named XML node's name. It does not contain a colon (:). An NCName (Named Characters Name) begins with either a letter or an underscore (_) character, followed by any combination of letters, digits, accents, diacritical marks, periods (.), hyphens (-), and underscores (_).
• EG: myElement.
• id(id). An element with the specified id.
• EG: id("mainContent").
• Node Type Test. Available node types:
• comment().
• text().
• processing-instruction().
• processing-instruction("target").
• node(). Any node except for attribute nodes and the root node.
• Predicate. Optional. Specifies conditions that need to be met in order to be selected by the XPath expression. A Predicate is basically a filter. EGs:
• x[position()=1] and x[1] are equivalent. Matches the first x element.
• x[position()=last()]. Matches the last element.
• [@ProductID &lt;= "432"]. Roughly filters for where the element has a ProductID attribute <= 432.
• [a][b] is the same as [a and b]. All elements that have at least 1 <a> and 1 <b> element. The latter (the and predicate) is faster than the former. The and predicate will go faster if the more efficient clause is placed first.
• book[@author="mike"][1] and book[1][@author="mike"] may not produce the same results because multiple predicates are tested in left to right order.
• items/item[position()>1] matches any item element that has a items parent and that is not the first item child of its parent.
• item[position() mod 2 = 1] would be true for any item element that is an odd-numbered item child of its parent.
• div[@class="appendix"]//p matches any p element with a div ancestor element that has a class attribute with value appendix

The Microsoft MSXML Parser can access XML via the DOM API and via XPath. EG:

Set nodeList = rootNode.selectNodes("hamburger[@lowfat="yes"]/price")


## XPath Reserved Words

Operators and special characters.

 / Child operator; selects immediate children of the left-side collection. When this path operator appears at the start of the pattern, it indicates that children should be selected from the root node. // Recursive descent; searches for the specified element at any depth. When this path operator appears at the start of the pattern, it indicates recursive descent from the root node. . Indicates the current context. .. The parent of the current context node. * Wildcard; selects all elements regardless of the element name. @ Attribute; prefix for an attribute name. @* Attribute wildcard; selects all attributes regardless of name. : Namespace separator; separates the namespace prefix from the element or attribute name. ( ) Groups operations to explicitly establish precedence. [ ] Applies a filter pattern. [ ] Subscript operator; used for indexing within a collection. + Performs addition. - Performs subtraction. div Performs floating-point division according to IEEE 754. * Performs multiplication. mod Returns the remainder from a truncating division.

Boolean, comparison, and set symbols. Note that > and < operators must be escaped in XPath expressions.

 and Logical and or Logical or not() Negation = Equality != Not equal < Less than <= Less than or equal > Greater than <= Greater than or equal | Set operation; returns the union of two sets of nodes

Precedence.

 1 ( ) Grouping 2 [ ] Filters 3 / // Path operations 4 < <= > >= Comparisons 5 = != Comparisons 6 | Union 7 not() Boolean not 8 and Boolean and 9 or Boolean or

## XPath Functions

There are four categories of functions available in XPath: Node-Set, String, Boolean, and Numeric functions. Note the following is largely borrowed from the MS site.

Parameter Legend:

• ns = NodeString.
• ? = Optional.
• * = zero or more comma-delimited list.
Node-Set
last() Returns the number of the last node in the currently selected node-set
position() Returns the number of the current node in the selected node-set. Note: [position()=someNumber] as an expression's predicate can be abbreviated as [someNumber]. EG: [position()=3] can be shortened to simply [3].
count(ns) Returns the number of nodes in the node-set passed to the function.
id(obj) Returns the node with the ID-type attribute whose value equals that of obj. EG: id("mainContent").
local-name(ns) Returns the "local name" of the argument. The full name of an element is considered to be its expanded name—that is, including the prefix associated with its namespace, if there is one. The local name is this "full name" with the namespace prefix omitted.
namespace-uri(ns) Returns the URI associated with the argument's namespace, if there is one.
name(ns) Returns the "full name" of an element, including its namespace prefix (if any).
String
string(obj) Converts the argument to a string value, which is then returned from the function.
concat(str, str, str*) Concatenates the various strings passed to it into a single string, which is returned from the function.
starts-with(str, str) Returns "true" if the first argument starts with the second, otherwise "false".
contains(str, str) Returns "true" if the first argument contains the second, otherwise "false".
substring(str, num, num?) Extracts a portion of the first argument, starting with the position supplied by the second argument, for a length of however many characters are in the third argument (if there is one). If the third argument is omitted, the function simply returns all characters in the first argument, starting at the position supplied by the second.
substring-before(str, str) Returns the portion of the first argument that precedes the value of the second argument.
substring-after(str, str) Returns the portion of the first argument that follows the value of the second argument.
string-length(str) Returns the number of characters in the argument. If the argument is omitted, returns the number of characters in the current node.
normalize-space(str) Examines the argument and strips out leading and trailing white space in it; also removes extraneous white space within the argument by replacing two or more occurrences of white space with a single space. The value returned by the function is this "stripped" string.
translate(str, str, str) Returns the first argument, replacing each occurrence of a character that matches one of the characters in the second argument with the character in the corresponding position in the third argument. Can be used for things like changing case.
Boolean
boolean(obj) Used primarily to test whether or not something "exists." If obj is a node-set, the function returns true if and only if the node-set is not empty; if a string, if and only if the string's length is greater than 0; and if a number, if and only if it is non-zero and a valid number. In all other cases it returns false.
not(boolean) Returns true if the argument passed to it is false, or false if the argument passed to it is true.
true() Simply returns the value true.
false() Simply returns the value false.
lang(str) Returns true or false, depending on whether or not the context node has the xml:lang value specified in str.
Numeric
number(obj) Converts the argument to a number and returns the result. If the argument can't be converted, the function returns the string "NaN" (for "not a number"). If no argument is passed, the function operates on the context node.
sum(ns) Returns the sum of all nodes in the argument. If you want to simply add numeric values that aren't associated with a node-set, use the + sign.
floor(num) Returns the largest integer that is less than or equal to the argument.
ceiling(num) Returns the smallest integer that is greater than or equal to the argument.
round(num) Rounds the argument's value up or down to the nearest integer. 1.5 always rounds up to 2, and -1.5 always rounds up to -1. Accurate rounding is dependent on hardware and operating-system considerations; you should test a wide range of possible values to be sure that this function returns the results you want in your application.

MSXML provides additional XPath functions that are proprietary to MSXML. The root node of the XSL style sheet will have to include the portion marked with bold:

<xsl:stylesheet
version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:msxsl="urn:schemas-microsoft-com:xslt"
>

## EGs

Here is an example XML file and XPath that might be used on it. Both are largely borrowed from MS.

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="myfile.xsl" ?>
<bookstore specialty="novel">
<book style="autobiography">
<author>
<first-name>Joe</first-name>
<last-name>Bob</last-name>
<award>Trenton Literary Review Honorable Mention</award>
</author>
<price>12</price>
</book>
<book style="textbook">
<author>
<first-name>Mary</first-name>
<last-name>Bob</last-name>
<publication>Selected Short Stories of
<first-name>Mary</first-name>
<last-name>Bob</last-name>
</publication>
</author>
<editor>
<first-name>Britney</first-name>
<last-name>Bob</last-name>
</editor>
<price>55</price>
</book>
<magazine style="glossy" frequency="monthly">
<price>2.50</price>
<subscription price="24" per="year"/>
</magazine>
<book style="novel" id="myfave">
<author>
<first-name>Toni</first-name>
<last-name>Bob</last-name>
<degree from="Trenton U">B.A.</degree>
<degree from="Harvard">Ph.D.</degree>
<award>Pulitzer</award>
<publication>Still in Trenton</publication>
<publication>Trenton Forever</publication>
</author>
<excerpt>
<p>It was a dark and stormy night.</p>
<p>But then all nights in Trenton seem dark and
stormy to someone who has gone through what
<emph>I</emph> have.</p>
<definition-list>
<term>Trenton</term>
<definition>misery</definition>
</definition-list>
</excerpt>
</book>
<my:book xmlns:my="uri:mynamespace" style="leather" price="29.50">
<my:title>Who's Who in Trenton</my:title>
<my:author>Robert Bob</my:author>
</my:book>
</bookstore>
 ./author All elements within the current context. Note that this is equivalent to the expression in the next row. author All elements within the current context. first-name All elements within the current context. /bookstore The document element () of this document. //author All elements in the document. book[/bookstore/@specialty = @style] All elements whose style attribute value is equal to the specialty attribute value of the element at the root of the document. author/first-name All elements that are children of an element. bookstore//title All elements one or more levels deep in the <bookstore> element (arbitrary descendants). Note that this is different from the expression in the next row. bookstore/*/title All <title> elements that are grandchildren of <bookstore> elements. bookstore//book/excerpt//emph All <emph> elements anywhere inside <excerpt> children of <book> elements, anywhere inside the <bookstore> element. .//title All <title> elements one or more levels deep in the current context. Note that this situation is essentially the only one in which the period notation is required. author/* All elements that are the children of <author> elements. book/*/last-name All <last-name> elements that are grandchildren of <book> elements. */* All grandchildren elements of the current context. *[@specialty] All elements with the specialty attribute. @style The style attribute of the current context. price/@exchange The exchange attribute on <price> elements within the current context. price/@exchange/total Returns an empty node set, because attributes do not contain element children. This expression is allowed by the XML Path Language (XPath) grammar, but is not strictly valid. book[@style] All <book> elements with style attributes, of the current context. book/@style The style attribute for all <book> elements of the current context. @* All attributes of the current element context. ./first-name All <first-name> elements in the current context node. Note that this is equivalent to the expression in the next row. first-name All <first-name> elements in the current context node. author[1] The first <author> element in the current context node. author[first-name][3] The third <author> element that has a <first-name> child. my:book The <book> element from the my namespace. my:* All elements from the my namespace. @my:* All attributes from the my namespace (this does not include unqualified attributes on elements from the my namespace). book[last()] The last <book> element of the current context node. book/author[last()] The last <author> child of each <book> element of the current context node. (book/author)[last()] The last <author> element from the entire set of <author> children of <book> elements of the current context node. book[excerpt] All <book> elements that contain at least one <excerpt> element child. book[excerpt]/title All <title> elements that are children of <book> elements that also contain at least one <excerpt> element child. book[excerpt]/author[degree] All <author> elements that contain at least one <degree> element child, and that are children of <book> elements that also contain at least one <excerpt> element. book[author/degree] All <book> elements that contain <author> children that in turn contain at least one <degree> child. author[degree][award] All <author> elements that contain at least one <degree> element child and at least one <award> element child. author[degree and award] All <author> elements that contain at least one <degree> element child and at least one <award> element child. author[(degree or award) and publication] All <author> elements that contain at least one <degree> or <award> and at least one <publication> as the children author[degree and not(publication)] All <author> elements that contain at least one <degree> element child and that contain no <publication> element children. author[not(degree or award) and publication] All <author> elements that contain at least one <publication> element child and contain neither <degree> nor <award> element children. author[last-name = "Bob"] All <author> elements that contain at least one <last-name> element child with the value Bob. author[last-name[1] = "Bob"] All <author> elements where the first <last-name> child element has the value Bob. Note that this is equivalent to the expression in the next row. author[last-name [position()=1]= "Bob"] All <author> elements where the first <last-name> child element has the value Bob. degree[@from != "Harvard"] All <degree> elements where the from attribute is not equal to "Harvard". author[. = "Matthew Bob"] All <author> elements whose value is Matthew Bob. author[last-name = "Bob" and ../price > 50] All <author> elements that contain a <last-name> child element whose value is Bob, and a <price> sibling element whose value is greater than 50. book[position() <= 3] The first three books (1, 2, 3). author[not(last-name = "Bob")] All <author> elements that do no contain <last-name> child elements with the value Bob. author[first-name = "Bob"] All <author> elements that have at least one <first-name> child with the value Bob. author[* = "Bob"] All author elements containing any child element whose value is Bob. author[last-name = "Bob" and first-name = "Joe"] All <author> elements that has a <last-name> child element with the value Bob and a <first-name> child element with the value Joe. price[@intl = "Canada"] All <price> elements in the context node which have an intl attribute equal to "Canada". degree[position() < 3] The first two <degree> elements that are children of the context node. p/text()[2] The second text node in each <p> element in the context node. ancestor::book[1] The nearest <book> ancestor of the context node. ancestor::book[author][1] The nearest <book> ancestor of the context node and this <book> element has an <author> element as its child. ancestor::author[parent::book][1] The nearest <author> ancestor in the current context and this <author> element is a child of a <book> element.

Page Modified: (Hand noted: ) (Auto noted: )