{"id":1292,"date":"2014-10-10T08:22:18","date_gmt":"2014-10-10T15:22:18","guid":{"rendered":"http:\/\/10kdev.net\/?p=1292"},"modified":"2014-10-16T07:00:41","modified_gmt":"2014-10-16T14:00:41","slug":"awed-and-disgusted","status":"publish","type":"post","link":"http:\/\/10kdev.net\/?p=1292","title":{"rendered":"Awed and Disgusted"},"content":{"rendered":"<blockquote>\n<p style=\"text-align: left; padding-left: 30px;\"><strong><em>The one pass XSLT, it is the least.<\/em><\/strong><br \/>\n<strong><em>A two pass XSLT, now that is a beast.<\/em><\/strong><br \/>\n<strong><em>But I would give a silk pajama<\/em><\/strong><br \/>\n<strong><em>To never do this\u00a0three pass drama.<\/em><\/strong><\/p>\n<\/blockquote>\n<hr \/>\n<p>We have data that comes in from a SOAP server that is in pretty bad shape. \u00a0I have a feeling the team that makes this is using a direct Oracle utility to pump it out of the database so as not to have to do any cleaning up or development. \u00a0Most of the non-alpha\/numeric\u00a0characters are replaced by codes, like this:<\/p>\n<p style=\"padding-left: 30px;\">&amp;lt;BATCH xmlns:xsi=&#8221;http:\/\/www.w3.org\/2001\/XMLSchema-instance&#8221; batchid=&#8221;EnterpriseTransaction&#8221; version=&#8221;2.0&#8243; xsi:noNamespaceSchemaLocation=&#8221;config\/xsd\/DataReplication\/2.0\/DataReplication.xsd&#8221;&amp;gt;<br \/>\n&amp;lt;TABLE_METADATA&amp;gt;<br \/>\n&amp;lt;TABLE name=&#8221;TR_TRN&#8221;&amp;gt;<br \/>\n&amp;lt;COLUMNS&amp;gt;<br \/>\n&amp;lt;COLUMN name=&#8221; &#8230;<\/p>\n<p>YUCK!<\/p>\n<p>One of the teams wrote a two xsl sheets to tranform this\u00a0nasty xml data\u00a0&#8212; and\u00a0is\u00a0used during our web client&#8217;s consuming of this data. \u00a0But sometimes we want to transform it outside of the program to use it for integration\u00a0tests or just look at the data for errors. \u00a0Most of the developers use the transform tool built \u00a0into Eclipse, and I have been using a java gui called <a href=\"http:\/\/jsimplex.sourceforge.net\">jSimpleX<\/a>. \u00a0 \u00a0But this being a TWO PASS xslt process, its a pain in the ass. \u00a0And, the last output is unformatted into a single line, all the carriage returns stripped out. \u00a0So I have to go to an online tool like <a href=\"http:\/\/www.freeformatter.com\/xml-formatter.html\">FreeFormatter\u00a0<\/a>to finish the task &#8212; making this a <em>THREE<\/em> PASS transformation.<\/p>\n<p>I decided to automate this a bit. \u00a0I looked for some command line tools to batch something out.<\/p>\n<p>First I ran across the old tried and true Windows msxsl.exe. \u00a0It worked ok on the first pass &#8212; but unbelievably, crazily, choked on UTF-8 data. \u00a0A serious WTF moment.<\/p>\n<p>The good old xalan. I stopped this very quickly &#8212; trying to integrate it into a &#8220;simple&#8221; java or groovy script. \u00a0Operative\u00a0word being <em>simple<\/em>. \u00a0And not having a lot of time from the pm&#8217;s to do this, and running into xalan&#8217;s poor documentation and need for tons of dependencies I dumped it.<\/p>\n<p>Wow is this really that hard?<\/p>\n<p>So I tried Groovy &#8212; I have a lot of experience with the slurper objects. \u00a0But . . .. . of course my data exceeded the 65536 string max length.<\/p>\n<p>Then there was Ant and it worked OK. \u00a0Just OK &#8212;<\/p>\n<p>Finally, I tried a tool called <a href=\"http:\/\/xmlstar.sourceforge.net\">xmlstarlet<\/a>. \u00a0 Bingo. \u00a0Would do transforms AND formatting.<\/p>\n<blockquote>\n<p style=\"padding-left: 30px;\"><strong><em>Why are the tools to handle XML so lacking these days &#8212; when lots of Big Data tools like MarkLogic and BaseX use XML; and SOAP isn&#8217;t dead because of it&#8217;s capability to do ACID transactions?\u00a0<\/em><\/strong><\/p>\n<\/blockquote>\n<p>My batch file calls with xmlstarlet look like this:<\/p>\n<pre style=\"padding-left: 30px;\"><code>xml tr phase1.xsl %TEMPFILE%.xml &gt; %TEMPFILE%_phase1.xml\r\nxml tr phase2.xsl %TEMPFILE%_phase1.xml &gt; %TEMPFILE%_phase2.xml\r\nxml fo %TEMPFILE%_phase2.xml &gt; %TEMPFILE%_formatted.xml\r\n<\/code><\/pre>\n<p>So here I have the two transforms, with a format at the end. Super slick.<\/p>\n<p>I had to write a little command interface for file input after that. \u00a0Needed a refresher &#8212; so went looking. \u00a0And ran into a <a href=\"http:\/\/stackoverflow.com\/questions\/6595977\/dos-command-line-string-parsing-folder-and-filename-in-string\">Stack Overflow<\/a> page that pretty much sums up my feelings on doing this part of the\u00a0task. \u00a0Awed <em>and<\/em> disgusted.<\/p>\n<p><a href=\"http:\/\/10kdev.net\/wp-content\/uploads\/2014\/10\/3774fda874c5ca7f783f186e3295a4f7.png\"><img decoding=\"async\" loading=\"lazy\" class=\"aligncenter\" src=\"http:\/\/10kdev.net\/wp-content\/uploads\/2014\/10\/3774fda874c5ca7f783f186e3295a4f7.png\" alt=\"\" width=\"654\" height=\"713\" \/><\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>The one pass XSLT, it is the least. A two pass XSLT, now that is a beast. But I would give a silk pajama To never do this\u00a0three pass drama. We have data that comes in from a SOAP server that is in pretty bad shape. \u00a0I have a feeling the team that makes this [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[1],"tags":[],"_links":{"self":[{"href":"http:\/\/10kdev.net\/index.php?rest_route=\/wp\/v2\/posts\/1292"}],"collection":[{"href":"http:\/\/10kdev.net\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/10kdev.net\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/10kdev.net\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"http:\/\/10kdev.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1292"}],"version-history":[{"count":11,"href":"http:\/\/10kdev.net\/index.php?rest_route=\/wp\/v2\/posts\/1292\/revisions"}],"predecessor-version":[{"id":1309,"href":"http:\/\/10kdev.net\/index.php?rest_route=\/wp\/v2\/posts\/1292\/revisions\/1309"}],"wp:attachment":[{"href":"http:\/\/10kdev.net\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1292"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/10kdev.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1292"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/10kdev.net\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1292"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}