Services Products About Us Case Studies Clients
Modern Signal
Modern Signal Home Page
Lighthouse on beach About Us
Offering a full range of development services: strategy, design, application programming, database development
News & Recents Projects

XML Safe Text

I've been working on improving the RSS feeds generated by Lighthouse.  One persistent problem is that some RSS readers (IE, for instance) will choke on a lot of special characters, such as those pasted from Microsoft Word.  I have previously put in place code to replace many of those characters, but came across another one today that wasn't replaced.  I knew I needed a better solution.

On CFlib.org, I found a function called xmlFormat2 that smartly avoids maintaining a list of characters to replace, and just replaces all characters not in a list of "good" characters.  That makes sense.  And it works.

I was concerned, though, about the performance of the function, and I thought it could be done better.  Using the REMatch function (introduced in ColdFusion 8), I was able to make the function both simpler and much faster.  My tests so far have been limited, but it has so far handled everything I have thrown at it.  And here it is:

<cffunction name="XmlSafeText" hint="Replaces all characters that would break an xml file." returnType="string" output="false">       
    <cfargument name="txt" hint="String to format" type="string" required="true">
    <cfset var chars = "">
    <cfset var replaced = "">

    <!--- Use XmlFormat function first --->
    <cfset txt = XmlFormat(txt)>
    <!--- Get all other characters to replace. --->
    <cfset chars = REMatch("[^[:ascii:]]",txt)>
    <!--- Loop through characters and do replace. Maintain a list of characters already replaced to avoid duplicate work. --->
    <cfloop index="char" array="#chars#">
        <cfif ListFind(replaced,char) is 0>
            <cfset txt = Replace(txt,char,"&##" & asc(char) & ";","all")>
            <cfset replaced = ListAppend(replaced,char)>
        </cfif>
    </cfloop>

    <cfreturn txt>
</cffunction>

It should be possible to use it as a replacement for the built in XmlFormat function.  Let me know if you run into any problems with it.

Comments

Eric B's Globally Recognized Avatar This solution works great for the fix I needed. We had international characters introduced to an XML feed we use, and they broken when I ran them through XMLParse. The fixed the issue, so thanks!

Posted on April 13, 2011 1:34:50 PM EDT by Eric B

David Hammond's Globally Recognized Avatar Glad to hear it worked for you!

Posted on April 13, 2011 1:53:07 PM EDT by David Hammond

January 2012 --

Charm City Run updates its site to include new Baltimore location. This site-wide project included refreshing header images with photos of customers and events, expanding the site navigation to include a new resources section, and enhancing ways for customers to interact through Charm City Run's many social media channels.

Charm City Run website

October 2011 -- Society for Developmental Biology launches SDB Collaborative Resources (CoRe), an online reference database of peer-reviewed images, movies, and diagrams for learning and teaching developmental biology.
September 2011 -- Millmark launches site for ConceptLinks Inquiry, a subscription-based online curriculum targeted at earth, life, and physical science concepts for grades 2-8.
September 2011 -- The 2012 International Builders’ Show website launches, unveiling the 2012 design and new tools for highlighting community sponsorships, special show events, and featured exhibitors. The site also includes expanded interactive features for attendees and exhibitors, including polls, logistics management tools, and social media.
August 2011 -- Modern Signal awarded contract to rebrand, redesign and develop new phase of PSLawnet.org, a comprehensive directory of legal public sectors jobs postings.