XML files can serve a variety of purposes, including data storage. Before JSON became popular, XML was the preferred format for representing, storing, and transporting structured data.

Even though the popularity of XML has waned in recent years, you may encounter it occasionally, so it’s important to learn how to work with it. Find out how to use the DOM API to read and write XML files with Java.

Requirements for Processing XML in Java

The Java Standard Edition (SE) includes the Java API for XML Processing (JAXP), which is an umbrella term covering most aspects of XML processing. These include:

  • DOM: The Document Object Model includes classes for working with XML objects such as elements, nodes, and attributes. The DOM API loads the complete XML document into memory for processing, so it’s not well suited for large XML files.
  • SAX: The Simple API for XML is an event-driven API for reading XML. It fires events in response to the XML content that it finds as it parses a file. The memory footprint of this method is low, but working with the API is more difficult than working with the DOM.
  • StAX: The Streaming API for XML is a recent addition. It provides high-performance stream filtering, processing, and modification of XML. While it avoids loading the whole XML document into memory, it provides a pull-type architecture rather than an event-driven architecture, so it’s easier to code with than the SAX API.

To process XML in Java, you'll need to import these packages:

        import javax.xml.parsers.*;
import javax.xml.transform.*;
import org.w3c.dom.*;

Preparing a Sample XML File

sample XML file from Microsoft

To understand the sample code, and concepts behind it, use this sample XML file from Microsoft. Here’s an excerpt:

        <?xml version="1.0"?>
<catalog>
  <book id="bk101">
    <author>Gambardella, Matthew</author>
    <title>XML Developer's Guide</title>
    <genre>Computer</genre>
    <price>44.95</price>
    <publish_date>2000-10-01</publish_date>
    <description>An in-depth look at creating applications
      with XML.</description>
  </book>
  <book id="bk102">
    <author>Ralls, Kim</author>
...snipped...

Reading the XML File With DOM API

Let's look at the basic steps required for reading an XML file using the DOM API. Start by creating an instance of DocumentBuilder which you’ll use to parse the XML document:

        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();

You can now load the whole document into memory, starting from the XML root element. In our example, it is the catalog element.

        // XML file to read
File file = "<path_to_file>";
Document document = builder.parse(file);
Element catalog = document.getDocumentElement();

And that's it; you now have access to the whole XML document starting from its root element, catalog.

Extracting Information Using the DOM API

Now that you have the XML root element, you can use the DOM API to extract interesting nuggets of information. For instance, get all the book children of the root element and loop over them. Note that getChildNodes() returns all children, including text, comments, etc. For your purpose, you need just the child elements, so you can skip over the others:

        NodeList books = catalog.getChildNodes();

for (int i = 0, ii = 0, n = books.getLength() ; i < n ; i++) {
  Node child = books.item(i);

  if ( child.getNodeType() != Node.ELEMENT_NODE )
    continue;

  Element book = (Element)child;
  // work with the book Element here
}

How do you find a specific child element, given the parent? Create a static method that returns the first matching element if found, or null. The procedure involves getting the list of child nodes and looping through them picking out element nodes with the specified name.

        static private Node findFirstNamedElement(Node parent,String tagName)
{
  NodeList children = parent.getChildNodes();

  for (int i = 0, in = children.getLength() ; i < in ; i++) {
    Node child = children.item(i);

    if (child.getNodeType() != Node.ELEMENT_NODE)
      continue;

    if (child.getNodeName().equals(tagName))
      return child;
  }

  return null;
}

Note that the DOM API treats text content within an element as a separate node of type TEXT_NODE. Text content can consist of multiple adjacent text nodes, so you'll need some special processing to fetch the text of an element:

        static private String getCharacterData(Node parent)
{
  StringBuilder text = new StringBuilder();

  if ( parent == null )
    return text.toString();

  NodeList children = parent.getChildNodes();

  for (int k = 0, kn = children.getLength() ; k < kn ; k++) {
    Node child = children.item(k);

    if (child.getNodeType() != Node.TEXT_NODE)
      break;

    text.append(child.getNodeValue());
  }

  return text.toString();
}

Armed with these convenience functions, take a look at this code to list out some information from the sample XML. It shows detailed information for each book available in a catalog:

        NodeList books = catalog.getChildNodes();

for (int i = 0, ii = 0, n = books.getLength() ; i < n ; i++) {
  Node child = books.item(i);

  if (child.getNodeType() != Node.ELEMENT_NODE)
    continue;

  Element book = (Element)child;
  ii++;

  String id = book.getAttribute("id");
  String author = getCharacterData(findFirstNamedElement(child, "author"));
  String title = getCharacterData(findFirstNamedElement(child, "title"));
  String genre = getCharacterData(findFirstNamedElement(child, "genre"));
  String price = getCharacterData(findFirstNamedElement(child, "price"));
  String pubdate = getCharacterData(findFirstNamedElement(child, "pubdate"));
  String descr = getCharacterData(findFirstNamedElement(child, "description"));

  System.out.printf("%3d. book id = %s\n" +
    " author: %s\n" +
    " title: %s\n" +
    " genre: %s\n" +
    " price: %s\n" +
    " pubdate: %s\n" +
    " descr: %s\n",
    ii, id, author, title, genre, price, pubdate, descr);
}

Here's a step-by-step explanation of the code:

  1. The code iterates through the child nodes of catalog, the root element.
  2. For each child node, representing a book, it checks if the node's type is an ELEMENT_NODE. If not, it continues to the next iteration.
  3. If the child node is an ELEMENT_NODE, (Element)child casts it to an Element object.
  4. The code then extracts various attributes and character data from the book element, including "id," "author," "title," "genre," "price," "pub date," and "description". It prints this data using the System.out.printf method.

Here's what the output looks like:

Parsing XML in Java source code and output

Writing XML Output Using Transform API

Java provides the XML Transform API to transform XML data. We use this API with the identity transform to generate output. As an example, let us add a new book element to the sample catalog presented above.

You might obtain the details of a book (author, title, etc.) from an external source, like a properties file or a database. You can use the following properties file as an example:

        id=bk113
author=Jane Austen
title=Pride and Prejudice
genre=Romance
price=6.99
publish_date=2010-04-01
description="It is a truth universally acknowledged, that a single man in possession of a good fortune must be in want of a wife." So begins Pride and Prejudice, Jane Austen's witty comedy of manners-one of the most popular novels of all time-that features splendidly civilized sparring between the proud Mr. Darcy and the prejudiced Elizabeth Bennet as they play out their spirited courtship in a series of eighteenth-century drawing-room intrigues.

The first step is to parse the existing XML file using the method presented above:

        File file = ...; // XML file to read
Document document = builder.parse(file);
Element catalog = document.getDocumentElement();

Now you load the data from the properties file using the Properties class provided in Java. The code is quite simple:

        String propsFile = "<path_to_file>";
Properties props = new Properties();

try (FileReader in = new FileReader(propsFile)) {
  props.load(in);
}

Once you’ve loaded the properties, you can retrieve the values you want to add from the properties file:

        String id = props.getProperty("id");
String author = props.getProperty("author");
String title = props.getProperty("title");
String genre = props.getProperty("genre");
String price = props.getProperty("price");
String publish_date = props.getProperty("publish_date");
String descr = props.getProperty("description");

Now, create an empty book element.

        Element book = document.createElement("book");
book.setAttribute("id", id);

Adding the child elements to the book is trivial. For convenience, you can collect the required element names in a List and add the values in a loop.

        List<String> elnames =Arrays.asList("author", "title", "genre", "price",
  "publish_date", "description");

for (String elname : elnames) {
  Element el = document.createElement(elname);
  Text text = document.createTextNode(props.getProperty(elname));
  el.appendChild(text);
  book.appendChild(el);
}

catalog.appendChild(book);

The catalog element now has the new book element added. All that remains now is to write out the updated XML.

To write the XML, you need an instance of Transformer which you can create like this:

        TransformerFactory tfact = TransformerFactory.newInstance();
Transformer tform = tfact.newTransformer();
tform.setOutputProperty(OutputKeys.INDENT, "yes");
tform.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "3");

You can use the setOutputProperty() to request the indentation of the output.

The final step is to apply the transformation. The result appears in the output stream, System.out.

        tform.transform(new DOMSource(document), new StreamResult(System.out));

To write the output directly to a file, use the following:

        tform.transform(new DOMSource(document), new StreamResult(new File("output.xml")));

That's all the steps you need to read and write XML files in Java.

Now You Know How to Read and Write XML Files With Java

Parsing and manipulating XML with Java is a valuable skill that you’ll often use in real-world programs. The DOM and Transform APIs are particularly useful.

Understanding the DOM, in particular, is vital if you plan to write client-side code for web applications or sites. The DOM’s interface is universal, so you can work with it using similar code in languages as diverse as Java and JavaScript.