Intro

This "W3C DOM" section covers the W3C DOM Core (and a smattering of browser proprietary DOMs) that pertains to generic XML documents.

The DOM (Document Object Model) is a tree-based API (Application Programming Interface) to access and modify the content, structure, and style of XML documents. Various programming languages can utilize the DOM. [The alternative to a DOM tree-based API is an event-based API such as SAX.] An application (such as a browser) that implements the DOM (by implementing the DOM interfaces) will allow programmers to use the DOM interfaces within a programming language such as JavaScript, VBScript, etc., to have near total control over all elements in documents. Objects in the DOM have properties, methods, events, sub-objects, and collections. In one sense, DOM is to a document as ADO is to a database.

Ideally the W3C DOM is platform, language, and application (browser) neutral. To facilitate this, the specs for the W3C DOM are defined with OMG IDL --a CORBA 2.2. spec that is supposed to be platform, language, and application (browser) neutral.

The W3C DOM has an architecture divided into modules which cover different domains. The current modules as quoted right from the W3C DOM Activity statement [2004-07-30] are as follows:

  • DOM Core defines a tree-like representation of the document, also referred as the DOM tree, enabling the user to traverse the hierarchy of elements accordingly. Refer to the DOM Range and Traversal modules to manipulate the tree elements and structure defined in the DOM Core.
  • DOM XML extends the Core platform for specific XML 1.0 needs, such as processing instructions, CDATA, and entities.
  • DOM HTML defines a convenient, easy-to-use set of ways to manipulate HTML documents. The initial HTML DOM only describes methods, such as how to access an identifier by name, or a particular link. The HTML DOM is sometimes referred to as DOM Level 0 but has been imported into DOM Level 1.
  • DOM Events defines XML-tree manipulation oriented events with tree mutation and user-oriented events such as mouse, keyboard, and HTML-specific events.
  • DOM Cascading Style Sheets (CSS) defines a set of convenient, easy to use ways to manipulate CSS style sheets or the formatting of documents.
  • DOM Load and Save. Loading an XML document into a DOM tree or saving a DOM tree into an XML document is a fundamental need for the DOM user. This module includes a variety of options controlling load and save operations.
  • DOM Validation defines a set of methods to modify the DOM tree and still make it valid.
  • DOM XPath defines a set of convenient, easy to use functions to query a DOM tree using an XPath 1.0 expression, such as evaluate.

Besides the 2 big DOM extensions (DOM XML and DOM HTML), there are other DOM extensions such as DOM for MathML 2.0, DOM for SMIL Animation, and DOM for SVG 1.0.

The W3C DOM Core represents the document as a hierarchy of 12 Node objects, each of which has more specialized interfaces. Nodes with only a few interfaces will not have a separate page.

  • Node. All the other nodes have the properties and methods of Node.
  • Fundamental Nodes
    1. Document. The entire document, the root node. nodeType = 9. Child nodes: Element (maximum of one), ProcessingInstruction, Comment, DocumentType (maximum of one).
    2. DocumentFragment. A document fragment is owned by the document but has not been assigned with a parent node via some method such as insertBefore(), replaceChild(), or appendChild(). A DocumentFragment has no properties or methods of its own. Child nodes: Element, ProcessingInstruction, Comment, Text, CDATASection, EntityReference
    3. Element. An element like <p>. Child nodes: Element, Text, Comment, ProcessingInstruction, CDATASection, EntityReference.
    4. Attr. An attribute like title="New Stories". Child nodes: Text, EntityReference.
    5. Comment. A comment, i.e. text between <!-- and -->. A Comment has no properties or methods of its own. Child nodes: no children.
    6. Text. The text between tags of an Element (EG: <p>hello</p>) or the value of an Attr (EG: title="old"). Child nodes: no children.
  • Extended Nodes. HTML-only documents don't need these nodes.
    1. DocumentType. Child nodes: no children.
    2. EntityReference. Child nodes: Element, ProcessingInstruction, Comment, Text, CDATASection, EntityReference.
    3. ProcessingInstruction. Child nodes: no children.
    4. CDATASection. Escapes blocks of text that have characters that would otherwise be considered markup. Everything between <![CDATA[ and ]]>. Child nodes: no children.
    5. Entity. Child nodes: Element, ProcessingInstruction, Comment, Text, CDATASection, EntityReference.
    6. Notation. Child nodes: no children.
  • Key Interfaces
    1. NodeList. A collection or array of nodes.
      • Created by methods such as:
        • .childNodes().
        • document.getElementsByTagName(name).
        • document.getElementsByTagNameNS(namespaceURI,localName).
      • Properties and Methods:
        • length. 'The number of nodes in the list. The range of valid child node indices is 0 to length-1 inclusive.'
        • item(n). Returns the nth node.
    2. NamedNodeMap. A collection or array of nodes where each node is named or enumerated.
    3. CharacterData. The CharacterData interfaces are inherited by Text, Comment, and CDATASection nodes. Properties and methods:
      • data.
      • appendData().
      • deleteData().
      • insertData().
      • replaceData().
      • substringData(). Extracts a range of data from the node.

DOM Level 1 methods are namespace ignorant while DOM Level 2 methods are not. A namespace corresponds to a distinct namespace prefix that is in scope for an element. A namespace is declared with an xmlns: attribute of an element. EG: For <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">, the namespace node refers to the "xmlns:xsl="http://www.w3.org/1999/XSLT/Transform" part of the element. Namespace nodes maybe present in some documents.

Basic Usage

90% of DOM work will involve Element nodes, Attr nodes, and Text nodes.

Navigating the DOM

The tree-like or forest-like structure of documents refers to its elements (and not its attributes). The elements of a tree are accessed by "tree-walking" methods. Much DOM navigation can be accomplished using the family-tree concept. It's an asexual species because there is always just one parent.

myE = document.getElementById("You")

myE.parentNode
myE.previousSibling
myE.nextSibling
// The next 3 lines are equivalent
myE.firstChild
myE.childNodes[0]
myE.childNodes.item(0)
myE.childNodes.item(1)
myE.childNodes.item(2)
myE.lastChild
// 1st grandchild
myE.firstChild.firstChild

Instead of tree-walking, there are more direct means of getting to particular elements. EGs:

  • node.ownerDocument allows direct navigation from any node in the DOM to the root node.
  • document.getElementById('myP'). Returns a particular paragraph with id="myP".
  • document.getElementsByTagName('p')[3]. Returns the 4th paragraph element.

Modifying Nodes

Change the Text of an Element:

var myP = document.getElementById('myP');
var newT = document.createTextNode('Text for myP');
newP.appendChild(newT);
var newT2 = document.createTextNode('Text2 for myP');
myP.replaceChild(newT2,newT);
// This code would only work for MSIE
myP.childNodes[0].nodeValue="myP now says this";

Fiddle with the Attributes of Elements:

// Add or change an attribute
myP.setAttribute("align", "center");
// Read an attribute
myP.getAttribute('align');
// Remove an attribute
myP.removeAttribute('align');

Make, insert, delete, etc. Elements:

// Make a new Element
var newP = document.createElement('p');
// Add the new Element as a child
myP.appendChild(newP);
// Remove the new Element
myP.removeChild(newP);


GeorgeHernandez.comSome rights reserved