CFML XMLTransform() and Character Encoding

Quick tip on using CFML’s XMLTransform() — if you see fun weird characters in the output of the transformation like  and you’ve checked to make sure the response headers from the web server are correctly returning UTF-8, you probably just need to specify the charset of the CFFILE operations when you read the XML and XSLT files from disk.

In my case I was seeing non-breaking spaces being rendered as   which outputs a capital ‘A’ with a circumflex before the non-breaking space. At first I thought maybe the response from the web server was ISO-8559 for some reason instead of UTF-8 but after verifying that was correct, adding charset=”utf-8″ to the CFFILE tags that read the XML and XSLT files from disk, all was right with the world.

XMLSearch(), Namespaces, and Lack of Namespaces

I ran into a similar issue using xquery in SQL Server, but today I was back on the ColdFusion side of the world and had to fix it slightly differently.

In the application I’m working on I’m processing tons of XML files and depending on how they’re generated, they may or may not have a namespace declared. Luckily other than that the XML structure is consistent, but XmlSearch() in CF behaves differently depending on whether or not a namespace is present.

Luckily I came across this post by Jeremy at his aftergeek blog that solved the issue for me (and to reiterate what I said in another post, I’m a bit of an xpath n00b). Using the local-name() function gets at the XML data correctly both when a namespace is present and when it isn’t. So where I was previously doing this:

<cfset foo = XmlSearch(myxml, "/*/mynode").get(0).XmlText />

I’m now doing this:

<cfset foo = XmlSearch(myxml, "//*[local-name()='mynode']").get(0).XmlText />

Blogging this mostly so I don’t forget myself, but I also thought others might be interested. Time for me to get an xpath book or something and learn this stuff for real. 😉

Comments

Very useful info Matt thanks for the post.

Posted by jim collins @ 3/14/08 6:59 AM

I thank you.

My forehead thanks you.

My wall thanks you.

Posted by DaveK @ 6/18/08 6:51 PM

I see that this thread is old – but it is clearly not dead. I just wanted to add my two cents:

If you are working with XML from an Excel 2003 Document (which is riddled with namespaces) this is the XML Search code I used to get at a worksheets rows:

subsiteXML = xmlSearch(mapXML, “//ss:Worksheet/ss:Table/ss:Row”)

until CF 9 comes out anyways

Posted by Ron West @ 6/21/09 9:06 PM

Thanks Ron–handy tip!

Posted by Matt Woodward @ 6/21/09 9:51 PM

Detecting Duplicate XML Data in SQL Server

I’ve been working quite a bit with XML in SQL Server lately (I’ll try to do a post on some xquery stuff at some point), and I had a need to check XML data that I’m pulling off disk against a table in SQL Server to see if the data I pulled off disk is a duplicate with data already in the database.

The problem I ran into is that SQL Server “collapses” empty XML nodes when you insert data as XML (e.g. <myXmlNode><myXmlNode/> is turned into <myXmlNode/>, by SQL Server) so if the XML you’re checking against hasn’t gone through this collapsing process, you won’t find duplicates accurately.

The solution turned out to be pretty simple and was suggested to me by a co-worker. First, you can’t compare XML to XML directly in a query because, like any binary datatype in SQL Server, the = operator can’t be used. Given the issue outlined above, you also can’t just convert the XML in the database and the XML from disk into nvarchar(max) because of the collapsed node issue.

The trick is to use SQL Server’s CONVERT() function and convert the XML from disk to SQL Server XML within a query, and then compare the result of that with the data already in the database:

DECLARE @xmlToCheck xmlSELECT @xmlToCheck = CONVERT(xml, '#theXmlFromDisk#')SELECT COUNT(id) AS dupeCount FROM xmlTable WHERE CONVERT(nvarchar(max), xmlColumn) = CONVERT(nvarchar(max), @xmlToCheck)

If dupeCount comes back greater than 0, then you have a dupe on your hands. Hope that helps others since I spent more time than I had hoped wrangling with this issue.

Getting XML Text From XML Nodes

I’ve been working with XML very heavily the past couple of weeks, and for whatever reason this is just one of those things I haven’t had to do a whole lot of until now. Don’t get me wrong, XML comes up pretty regularly for most people, but I haven’t had to live and breathe it this much before.

As with most things in ColdFusion it’s incredibly easy to jump in and get going, and for what I’m doing anyway, XmlSearch makes it absolutely trivial to tear through the XML and get the data I need.

I did run into one annoyance and thought I’d see if anyone had a better solution. Again, since I seem to have dodged working with XML this heavily until now I could just be missing something. Specifically I’m referring to the seeming necessity of two lines to get at most things when you use XmlSearch. If you’ve worked with XML much in CF you probably already know what I mean. Let’s assume there’s a single node (meaning really a single element array with an XML node in it) returned by this:

<cfset myXmlNode = XmlSearch(myDoc, “/*/myNode”) />

To get at the XmlText, I then have to do this:

<cfset myXmlText = myXmlNode[1].XmlText />

What I’d like to be able to do is this:

<cfset myXmlText = XmlSearch(myDoc, “/*/myNode”)[1].XmlText />

But unfortunately that syntax isn’t valid. To solve the problem I wrote a little UDF called getXmlTextFromXmlSearch (couldn’t think of a longer name 😉 that takes in the array and the element number from which you want the XmlText. It works, but I’m a tad irked that this seems to be necessary. What CF could use IMO is an ArrayGetAt *function* so you could wrap arrays in the function instead of having to jump through these other hoops:

<cfset myXmlText = ArrayGetAt(XmlSearch(myDoc, “/*/myNode”), 1).XmlText />

That’s essentially what my UDF does but again, seems a bit klunky. Maybe I’m just nitpicking because ColdFusion is so great and I want it to be perfect. 😉

Comments

I’m in the same boat – I barely use XML but here’s what I came up with for an XPath expression to get the xml text:

hi

I’m not sure if it’s the perfect solution but I do know XPath is pretty powerful stuff (and I’ve yet to find a *really* good resource for it).

Posted by todd sharp @ 2/7/08 5:57 AM

crud…looks like you stripped my code…let me try this:

<cfxml variable=”test”>

<nodes>

<node>hi</node>

</nodes>

</cfxml>

<cfset text = xmlSearch(test, “string(//nodes/node[text()])”) />

<cfdump var=”#text#”>

Posted by todd sharp @ 2/7/08 5:59 AM

Thanks Todd–I’m a bit of an XPath moron so I’ll need to look into that more.

Posted by Matt Woodward @ 2/7/08 6:09 AM

You could also do this:

Posted by radek @ 2/7/08 6:19 AM

<cfset myXmlText = XmlSearch(myDoc, “/*/myNode”).get(0).XmlText />

Posted by radek @ 2/7/08 6:19 AM

Ah–didn’t think of tapping into the Java that way. That’s very slick! Thanks radek!

Posted by Matt Woodward @ 2/7/08 6:31 AM

This will work as well

Posted by Qasim Rasheed @ 2/7/08 9:23 AM

I wasn’t sure how to put code in a comment

or

#HTMLEditFormat( ”)#

Posted by Qasim Rasheed @ 2/7/08 9:25 AM

Sorry Qasim–I need to tidy up the code handling in the comments!

Posted by Matt Woodward @ 2/8/08 5:46 AM

i don’t think it;s the best solution

Posted by Juli @ 8/22/08 6:21 AM

not sure how i happened upon this, as i’m in virtually the same boat this morning, and your post is already indexed in google – well done!

anyhow, radek is right, arrayGetAt IS array.get(index), and fwiw, arrayFind would be array.indexOf(string)

@todd – I think that the xPath functions such (as string()) are only available in CF8, but I have no way to test on 7 atm.

this xmlSearch will expose whether a function is avail in CF xPath: xmlSearch(search, “function-available(‘testthisfunction’)”)

My situation was a *little* different, I wanted to get an array of just the text values in a particular node.

mytext

xmlSearch(search, ‘//nodes/node/text()’) returns an array that looks like this:

[{xmlName=’#text’,xmlType=’TEXT’,xmlValue=’mytext’},{xmlName=’#text’,xmlType=’TEXT’,xmlValue=’mytext’}]

and i coulnd’t for the life of me figure out how to get the xpath search down to just xmlValue…becasue it appears that CF is building that DOM node structure back in to the xmlSearch response. so, xPath finds the text, gives it back to CF, and CF goes ‘well, hell, this looks like xml, so let’s send it back as native CF XML DOM node structure. as it turns out, in CF8, at least, if i output arr[1], i get a string, and not the XML object shown in the dump of the array.

Posted by tony @ 2/7/09 10:28 AM