Awed and Disgusted

The one pass XSLT, it is the least.
A two pass XSLT, now that is a beast.
But I would give a silk pajama
To never do this three pass drama.


We have data that comes in from a SOAP server that is in pretty bad shape.  I have a feeling the team that makes this is using a direct Oracle utility to pump it out of the database so as not to have to do any cleaning up or development.  Most of the non-alpha/numeric characters are replaced by codes, like this:

<BATCH xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance” batchid=”EnterpriseTransaction” version=”2.0″ xsi:noNamespaceSchemaLocation=”config/xsd/DataReplication/2.0/DataReplication.xsd”>
<TABLE_METADATA>
<TABLE name=”TR_TRN”>
<COLUMNS>
<COLUMN name=” …

YUCK!

One of the teams wrote a two xsl sheets to tranform this nasty xml data — and is used during our web client’s consuming of this data.  But sometimes we want to transform it outside of the program to use it for integration tests or just look at the data for errors.  Most of the developers use the transform tool built  into Eclipse, and I have been using a java gui called jSimpleX.    But this being a TWO PASS xslt process, its a pain in the ass.  And, the last output is unformatted into a single line, all the carriage returns stripped out.  So I have to go to an online tool like FreeFormatter to finish the task — making this a THREE PASS transformation.

I decided to automate this a bit.  I looked for some command line tools to batch something out.

First I ran across the old tried and true Windows msxsl.exe.  It worked ok on the first pass — but unbelievably, crazily, choked on UTF-8 data.  A serious WTF moment.

The good old xalan. I stopped this very quickly — trying to integrate it into a “simple” java or groovy script.  Operative word being simple.  And not having a lot of time from the pm’s to do this, and running into xalan’s poor documentation and need for tons of dependencies I dumped it.

Wow is this really that hard?

So I tried Groovy — I have a lot of experience with the slurper objects.  But . . .. . of course my data exceeded the 65536 string max length.

Then there was Ant and it worked OK.  Just OK —

Finally, I tried a tool called xmlstarlet.   Bingo.  Would do transforms AND formatting.

Why are the tools to handle XML so lacking these days — when lots of Big Data tools like MarkLogic and BaseX use XML; and SOAP isn’t dead because of it’s capability to do ACID transactions? 

My batch file calls with xmlstarlet look like this:

xml tr phase1.xsl %TEMPFILE%.xml > %TEMPFILE%_phase1.xml
xml tr phase2.xsl %TEMPFILE%_phase1.xml > %TEMPFILE%_phase2.xml
xml fo %TEMPFILE%_phase2.xml > %TEMPFILE%_formatted.xml

So here I have the two transforms, with a format at the end. Super slick.

I had to write a little command interface for file input after that.  Needed a refresher — so went looking.  And ran into a Stack Overflow page that pretty much sums up my feelings on doing this part of the task.  Awed and disgusted.

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>