How to write XML documents using Streaming API for XML (StAX)

Nowadays XML became a de-facto standard for storing and exchanging documents over Internet. And because it was designed to be extensible, it can be easily adapted to almost every need. In this post I would like to describe how to create and write your own XML documents using Streaming API for XML (StAX).

Need for XML writer

Of course it is possible to write XML documents using a combination of print, println or printf methods but the amount of work and the number of details to take care of would be quite large. For example when writing documents manually it is very easy to forget about escaping special characters like < or " or prepending tag with namespace prefix. Therefore, it is better to leave these cumbersome tasks to the XML writer and concentrate on the general document layout.

XMLStreamWriter

Streaming API for XML introduced in Java 6 provides quite handy interface XMLStreamWriter which can be used for writing XML files. The good thing about this API is that it does not require building any specific object structure like in DOM and does not require doing any intermediate tasks.

Additionally, XMLStreamWriter supports namespaces by default which is very useful in more advanced situations.

General XMLStreamWriter workflow

To start writing document we have to create instance of XMLStreamWriter using XMLOutputFactory:

XMLOutputFactory outputFactory = XMLOutputFactory.newFactory();
XMLStreamWriter writer = outputFactory.createXMLStreamWriter(outputStream);

Once we have the instance of XMLStreamWriter we must write header of the XML document:

writer.writeStartDocument("utf-8", "1.0");

Then we can proceed to writing elements:

writer.writeStartElement("books");

Once we have element started we can add attributes to it, write some character data or CDATA section:

writer.writeAttribute("id", "10");
writer.writeCharacters("text data");
writer.writeCData("more text data");

We can also start some other nested elements and so on.

To close opened elements we should call:

writer.writeEndElement();

It is also possible to write empty elements:

writer.writeEmptyElement("used & new");

or write comments:

writer.writeComment("Some comment");

When we are done with writing, we should finish the document and close the writer:

writer.writeEndDocument();
writer.close();

which will automatically close all opened elements and will release resources used by the writer.

Closing the writer will not close the underlying file so it has to be done manually.

Issues with XMLStreamWriter

XMLStreamWriter is not perfect so it is still possible to create not well-formed XML documents which for example contain more than one root element or miss namespace definition.

Additionally, XMLStreamWriter does not indent its output so it may be a bit hard to read using plain text editor. Therefore, for reading I suggest to open it in a web browser most of which have user-friendly interface to view structure of XML documents.

Writing XML document without namespaces

Below I would like to show a simple example how to write your own data structure into XML document. In our case it will be a list of books and a single book will be represented like this:

package com.example.staxwrite;

import java.util.ArrayList;
import java.util.Collections;
import java.util.List;

public class Book {
    private List<String> authors;
    private String title;
    private Category category;
    private String language;
    private int year;

    public Book(List<String> authors, String title, Category category, String language, int year) {
        this.authors = new ArrayList<>(authors);
        this.title = title;
        this.category = category;
        this.language = language;
        this.year = year;
    }

    public Book(String author, String title, Category category, String language, int year) {
        this (Collections.singletonList(author), title, category, language, year);
    }

    public List<String> getAuthors() {
        return Collections.unmodifiableList(authors);
    }

    public void addAuthor(String author) {
        authors.add(author);
    }

    public String getTitle() {
        return title;
    }

    public void setTitle(String title) {
        this.title = title;
    }

    public Category getCategory() {
        return category;
    }

    public void setCategory(Category category) {
        this.category = category;
    }

    public String getLanguage() {
        return language;
    }

    public void setLanguage(String language) {
        this.language = language;
    }

    public int getYear() {
        return year;
    }

    public void setYear(int year) {
        this.year = year;
    }
    
}

The code which handles writing XML document is following:

package com.example.staxwrite;

import java.io.IOException;
import java.io.OutputStream;
import java.nio.file.Files;
import java.nio.file.Path;
import java.util.List;
import javax.xml.stream.XMLOutputFactory;
import javax.xml.stream.XMLStreamException;
import javax.xml.stream.XMLStreamWriter;

public class NoNSWriter {

    public void writeToXml(Path path, List<Book> books) throws IOException, XMLStreamException {
        try (OutputStream os = Files.newOutputStream(path)) {
            XMLOutputFactory outputFactory = XMLOutputFactory.newFactory();
            XMLStreamWriter writer = null;
            try {
                writer = outputFactory.createXMLStreamWriter(os, "utf-8");
                writeBooksElem(writer, books);
            } finally {
                if (writer != null)
                    writer.close();
            }
        }
    }

    private void writeBooksElem(XMLStreamWriter writer, List<Book> books) throws XMLStreamException {
        writer.writeStartDocument("utf-8", "1.0");
        writer.writeComment("Describes list of books");
        
        writer.writeStartElement("books");
        for (Book book : books)
            writeBookElem(writer, book);
        writer.writeEndElement();

        writer.writeEndDocument();
    }

    private void writeBookElem(XMLStreamWriter writer, Book book) throws XMLStreamException {
        writer.writeStartElement("book");
        writer.writeAttribute("language", book.getLanguage());

        writeAuthorsElem(writer, book.getAuthors());

        writer.writeStartElement("title");
        writer.writeCData(book.getTitle());
        writer.writeEndElement();

        writer.writeStartElement("category");
        writer.writeCharacters(book.getCategory().name());
        writer.writeEndElement();

        writer.writeStartElement("year");
        writer.writeCharacters(Integer.toString(book.getYear()));
        writer.writeEndElement();

        writer.writeEndElement();
    }

    private void writeAuthorsElem(XMLStreamWriter writer, List<String> authors) throws XMLStreamException {
        writer.writeStartElement("authors");
        for (String author : authors) {
            writer.writeStartElement("author");
            writer.writeCharacters(author);
            writer.writeEndElement();
        }
        writer.writeEndElement();
    }
}

In writeToXml method we create the output stream and the writer and ensure that they will be closed properly. Then we call writeBooksElem which will start the document, write comment, emit the list of books and will finish the document.

Most properties of a book are written as separate subelements except language which is written as an attribute. The resultant XML document should look like this:

<?xml version="1.0" encoding="UTF-8"?>
<!--Describes list of books-->
<books>
  <book language="English">
    <authors>
      <author>Mark Twain</author>
    </authors>
    <title><![CDATA[The Adventures of Tom Sawyer]]></title>
    <category>FICTION</category>
    <year>1876</year>
  </book>
  <book language="English">
    <authors>
      <author>Niklaus Wirth</author>
    </authors>
    <title><![CDATA[The Programming Language Pascal]]></title>
    <category>PASCAL</category>
    <year>1971</year>
  </book>
  <book language="English">
    <authors>
      <author>O.-J. Dahl</author>
      <author>E. W. Dijkstra</author>
      <author>C. A. R. Hoare</author>
    </authors>
    <title><![CDATA[The Programming Language Pascal]]></title>
    <category>PROGRAMMING</category>
    <year>1972</year>
  </book>
</books>

Writing XML document with namespaces

The XML document created using code above does not contain any references to namespaces. While it may be convenient in simple cases, more advanced XML documents will refer to one or more XML namespaces.

To create document with namespace support we have to first define the prefix of the namespace we want to use:

writer.setPrefix("b", "http://example.com/books");

and emit the element with URI of this namespace:

writer.writeStartElement("http://example.com/books", "books");
writer.writeNamespace("b", "http://example.com/books");

Once we do so, we can write elements and attributes from this namespace by adding the namepace URI as the first argument of writeStartElement and writeAttribute methods. Here is the full code:

package com.example.staxwrite;

import java.io.IOException;
import java.io.OutputStream;
import java.nio.file.Files;
import java.nio.file.Path;
import java.util.List;
import javax.xml.stream.XMLOutputFactory;
import javax.xml.stream.XMLStreamException;
import javax.xml.stream.XMLStreamWriter;

public class NSWriter {

    private static final String NS = "http://example.com/books";
    
    public void writeToXml(Path path, List<Book> books) throws IOException, XMLStreamException {
        try (OutputStream os = Files.newOutputStream(path)) {
            XMLOutputFactory outputFactory = XMLOutputFactory.newFactory();
            XMLStreamWriter writer = null;
            try {
                writer = outputFactory.createXMLStreamWriter(os, "utf-8");
                writeBooksElem(writer, books);
            } finally {
                if (writer != null)
                    writer.close();
            }
        }
    }

    private void writeBooksElem(XMLStreamWriter writer, List<Book> books) throws XMLStreamException {
        writer.writeStartDocument("utf-8", "1.0");
        writer.writeComment("Describes list of books");
        
        writer.setPrefix("b", NS);
        writer.writeStartElement(NS, "books");
        writer.writeNamespace("b", NS);
        for (Book book : books)
            writeBookElem(writer, book);
        writer.writeEndElement();

        writer.writeEndDocument();
    }

    private void writeBookElem(XMLStreamWriter writer, Book book) throws XMLStreamException {
        writer.writeStartElement(NS, "book");
        writer.writeAttribute(NS, "language", book.getLanguage());

        writeAuthorsElem(writer, book.getAuthors());

        writer.writeStartElement(NS, "title");
        writer.writeCData(book.getTitle());
        writer.writeEndElement();

        writer.writeStartElement(NS, "category");
        writer.writeCharacters(book.getCategory().name());
        writer.writeEndElement();

        writer.writeStartElement(NS, "year");
        writer.writeCharacters(Integer.toString(book.getYear()));
        writer.writeEndElement();

        writer.writeEndElement();
    }

    private void writeAuthorsElem(XMLStreamWriter writer, List<String> authors) throws XMLStreamException {
        writer.writeStartElement(NS, "authors");
        for (String author : authors) {
            writer.writeStartElement(NS, "author");
            writer.writeCharacters(author);
            writer.writeEndElement();
        }
        writer.writeEndElement();
    }
}

and the created XML document:

<?xml version="1.0" encoding="UTF-8"?>
<!--Describes list of books-->
<b:books xmlns:b="http://example.com/books">
  <b:book b:language="English">
    <b:authors>
      <b:author>Mark Twain</b:author>
    </b:authors>
    <b:title><![CDATA[The Adventures of Tom Sawyer]]></b:title>
    <b:category>FICTION</b:category>
    <b:year>1876</b:year>
  </b:book>
  <b:book b:language="English">
    <b:authors>
      <b:author>Niklaus Wirth</b:author>
    </b:authors>
    <b:title><![CDATA[The Programming Language Pascal]]></b:title>
    <b:category>PASCAL</b:category>
    <b:year>1971</b:year>
  </b:book>
  <b:book b:language="English">
    <b:authors>
      <b:author>O.-J. Dahl</b:author>
      <b:author>E. W. Dijkstra</b:author>
      <b:author>C. A. R. Hoare</b:author>
    </b:authors>
    <b:title><![CDATA[The Programming Language Pascal]]></b:title>
    <b:category>PROGRAMMING</b:category>
    <b:year>1972</b:year>
  </b:book>
</b:books>

Write XML document with default namespace

XML also supports the idea of default namespace which increases readability and limits the typing. Using it is very similar to using namespace with explicit prefix but we have to use setDefaultNamespace and writeDefaultNamespace instead of setPrefix and writeNamespace:

package com.example.staxwrite;

import java.io.IOException;
import java.io.OutputStream;
import java.nio.file.Files;
import java.nio.file.Path;
import java.util.List;
import javax.xml.stream.XMLOutputFactory;
import javax.xml.stream.XMLStreamException;
import javax.xml.stream.XMLStreamWriter;

public class DefaultNSWriter {

    private static final String NS = "http://example.com/books";
    
    public void writeToXml(Path path, List<Book> books) throws IOException, XMLStreamException {
        try (OutputStream os = Files.newOutputStream(path)) {
            XMLOutputFactory outputFactory = XMLOutputFactory.newFactory();
            XMLStreamWriter writer = null;
            try {
                writer = outputFactory.createXMLStreamWriter(os, "utf-8");
                writeBooksElem(writer, books);
            } finally {
                if (writer != null)
                    writer.close();
            }
        }
    }

    private void writeBooksElem(XMLStreamWriter writer, List<Book> books) throws XMLStreamException {
        writer.writeStartDocument("utf-8", "1.0");
        writer.writeComment("Describes list of books");
        
        writer.setDefaultNamespace(NS);
        writer.writeStartElement(NS, "books");
        writer.writeDefaultNamespace(NS);
        for (Book book : books)
            writeBookElem(writer, book);
        writer.writeEndElement();

        writer.writeEndDocument();
    }

    private void writeBookElem(XMLStreamWriter writer, Book book) throws XMLStreamException {
        writer.writeStartElement(NS, "book");
        writer.writeAttribute(NS, "language", book.getLanguage());

        writeAuthorsElem(writer, book.getAuthors());

        writer.writeStartElement(NS, "title");
        writer.writeCData(book.getTitle());
        writer.writeEndElement();

        writer.writeStartElement(NS, "category");
        writer.writeCharacters(book.getCategory().name());
        writer.writeEndElement();

        writer.writeStartElement(NS, "year");
        writer.writeCharacters(Integer.toString(book.getYear()));
        writer.writeEndElement();

        writer.writeEndElement();
    }

    private void writeAuthorsElem(XMLStreamWriter writer, List<String> authors) throws XMLStreamException {
        writer.writeStartElement(NS, "authors");
        for (String author : authors) {
            writer.writeStartElement(NS, "author");
            writer.writeCharacters(author);
            writer.writeEndElement();
        }
        writer.writeEndElement();
    }
}

The created XML document refers to the namespace but there is no prefix used:

<?xml version="1.0" encoding="UTF-8"?>
<!--Describes list of books-->
<books xmlns="http://example.com/books">
  <book language="English">
    <authors>
      <author>Mark Twain</author>
    </authors>
    <title><![CDATA[The Adventures of Tom Sawyer]]></title>
    <category>FICTION</category>
    <year>1876</year>
  </book>
  <book language="English">
    <authors>
      <author>Niklaus Wirth</author>
    </authors>
    <title><![CDATA[The Programming Language Pascal]]></title>
    <category>PASCAL</category>
    <year>1971</year>
  </book>
  <book language="English">
    <authors>
      <author>O.-J. Dahl</author>
      <author>E. W. Dijkstra</author>
      <author>C. A. R. Hoare</author>
    </authors>
    <title><![CDATA[The Programming Language Pascal]]></title>
    <category>PROGRAMMING</category>
    <year>1972</year>
  </book>
</books>

Conclusion

Streaming API for XML provides very convenient, fast and memory efficient way to write XML documents without worrying about details and escaping of special characters. It is a great alternative to DOM especially when you don’t need to keep and manage DOM tree in memory for any reason.

The source code for example is available at GitHub.

About Robert P.

Husband, software developer, Linux and open-source fan, blogger.
This entry was posted in Java, XML and tagged , , . Bookmark the permalink.

One Response to How to write XML documents using Streaming API for XML (StAX)

  1. Pingback: Parse XML document using Streaming API for XML (StAX) | softwarecave

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s