Thursday, December 11, 2008

Retrieving XML declaration Information Using SAX

So the other day I was notified that for a project that uses SAX I would need to maintain Encoding information it the XML Declaration. This seemed pretty straight forward, however, it really wasn't and in fact I am not sure why one or two things are done. So to try to save others the hassle here is how I did it...

Just so you know I am doing this with the org.xml.sax.helpers.XMLReaderFactory and XmlReader so if you are using a different parser this may be a little different.

First off if you are like me you were calling the XmlReader.parse method and passing it an input location. This will not work because it overrides the encoding with whatever the InputLocation's encoding. This means we have to go ahead and use the other override which takes a string in the form of a URI.

If you are like me you think EASY! and pass it the file path, this will not work you need to require a Java.net uri in the following way

String uri = new File(*filepath*).toURI().toString()

Now we have a proper uri and we are ready to parse just like normal. After it starts though we need to find a way to get the information about the declaration we do this by capturing the Locater. A Locater, according to Sun, is an "Interface for associating a SAX event with a document location." So the document's properties are essentially stored there. Here is how you get it, this is in the handler class...

Locator locator
...
public void setDocumentLocator(Locator locator){
this.locator = locator
}

And of course just to add a little wrinkle to things retrieving the encoding and XML version were Only added in SAX v2.0 and we can't access the information till we are inside the document. So we need to add the following to the handler class...

Integer elementDepth = 0
...
public void startElement(String uri, String name,
String qName, Attributes atts) {
if (elementDepth++ == 0) {
if (locator != null) {
if (locator instanceof Locator2) {
Locator2 loc = (Locator2) locator;
println loc.getXMLVersion();
println loc.getEncoding();
}
}
}
...
}

And as you can see now you are able to access the encoding and version information!!!!!

Feel free to comment if this helped you or if you know of a better way to do this!

No comments: