Services Products About Us Case Studies Clients
Modern Signal
Modern Signal Home Page
Lighthouse on beach About Us
Offering a full range of development services: strategy, design, application programming, database development
News & Recents Projects
Modern Signal Development Blog

News, tips, tricks and discussions related to Modern Signal Lighthouse, ColdFusion, Asp.Net and other web development technologies.

Viewing posts for topic: "DaisyDiff". View all posts.

Simple DaisyDiff CFC Wrapper

Some background first...

I had the need to do diff of html content in a project I was working on, which brought me pretty quickly to DaisyDiff, a really nice Java-based utility.  DaisyDiff doesn't however, have a simple built-in function to do a diff of two strings.  There is a command-line option, which takes the paths of two files as arguments, and also a java api that take a number of java objects as arguments.  What I wanted was a function that took two strings and output the results, but DaisyDiff has no such simple function.

I don't really do java development -- that is I've done some in the past but it's been a while and it would probably take me some time to get my development environment up to snuff.  Besides, I didn't really feel like dealing with compiled code. 

A quick google search, of course, turns up CFX_CompareHTML and the JavaLoader version of the same thing.  So I used that, and it worked fine.  But it was using an old version of DaisyDiff, and it seemed to have some bugs with UTF characters and such.  What I really wanted to do was to use JavaLoader to load the current version of DaisyDiff.  After much stumbling around in the code, I found that the test suite in the DaisyDiff repository has exactly the function I wanted -- it compares two strings and returns the result.

So, long story short, I took the code from that function and pulled it into a CFC, using JavaLoader, and rewrote everything in CFML.  The result is the simple function I was after.

So anyway, here it is:

<cfcomponent hint="Wrapper for DaisyDiff" output="false">

    <cffunction name="Init" output="false" returntype="DaisyDiff">
        <cfargument name="daisydiffpath" hint="absolute path to daisydiff jar file" type="string" required="true">
        <cfargument name="javaloaderpath" hint="component path to JavaLoader.cfc" type="string" required="true">
        <cfset This.daisydiffpath = arguments.daisydiffpath>
        <cfset This.javaloaderpath = arguments.javaloaderpath>
        <cfreturn This>
    </cffunction>

    <cffunction name="Diff" output="false" returntype="string">
        <cfargument name="olderHtml" type="string" required="true">
        <cfargument name="newerHtml" type="string" required="true">

        <cfset var paths = [This.daisydiffpath]>
        <cfset var loader = createObject("component", This.javaloaderpath).init(paths)>
        <cfset var TransformerFactoryImpl =     loader.create("com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl")>
        <cfset var StringReader =                 loader.create("java.io.StringReader")>
        <cfset var StringWriter =                 loader.create("java.io.StringWriter")>
        <cfset var Locale =                     loader.create("java.util.Locale")>
        <cfset var StreamResult =                 loader.create("javax.xml.transform.stream.StreamResult")>
        <cfset var OutputKeys =                 loader.create("javax.xml.transform.OutputKeys")>
        <cfset var NekoHtmlParser =             loader.create("org.outerj.daisy.diff.helper.NekoHtmlParser")>
        <cfset var DomTreeBuilder =             loader.create("org.outerj.daisy.diff.html.dom.DomTreeBuilder")>
        <cfset var HTMLDiffer =                 loader.create("org.outerj.daisy.diff.html.HTMLDiffer")>
        <cfset var HtmlSaxDiffOutput =             loader.create("org.outerj.daisy.diff.html.HtmlSaxDiffOutput")>
        <cfset var TextNodeComparator =         loader.create("org.outerj.daisy.diff.html.TextNodeComparator")>
        <cfset var InputSource =                 loader.create("org.xml.sax.InputSource")>
               
        <cfset var finalResult = StringWriter.Init()>
        <cfset var result = TransformerFactoryImpl.Init().newTransformerHandler()>
        <cfset var sr = StreamResult.Init(finalResult)>
        <cfset var prefix = "diff">
        <cfset var cleaner = NekoHtmlParser.Init()>
        <cfset var oldSource = InputSource.Init(StringReader.Init(olderHtml))>
        <cfset var newSource = InputSource.Init(StringReader.Init(newerHtml))>
        <cfset var oldHandler = DomTreeBuilder.Init()>
        <cfset var newHandler = DomTreeBuilder.Init()>
        <cfset var leftComparator = "">
        <cfset var rightComparator = "">
        <cfset var output = "">
        <cfset var differ = "">
        <cfset var diff = "">

        <cfset result.setResult(sr)>
        <cfset result.getTransformer().setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes")>

        <cfset cleaner.parse(oldSource, oldHandler)>
        <cfset leftComparator = TextNodeComparator.Init(oldHandler, Locale.getDefault())>

        <cfset cleaner.parse(newSource, newHandler)>
        <cfset rightComparator = TextNodeComparator.Init(newHandler, Locale.getDefault())>

        <cfset output = HtmlSaxDiffOutput.Init(result,prefix)>
        <cfset differ = HTMLDiffer.Init(output)>
        <cfset differ.diff(leftComparator, rightComparator)>
        <cfset diff = finalResult.toString()>

        <cfreturn diff>
    </cffunction>

</cfcomponent>

Usage:

<cfset var daisy = CreateObject("component","cfc.DaisyDiff").Init(expandPath("../daisydiff-1.1/daisydiff.jar"),"Lighthouse.Utilities.javaloader.JavaLoader")>
<cfset var diff = daisy.diff(olderhtml,newerhtml)>

The result is html that has been marked up by DaisyDiff with special classes.  You can take that and style it in any way that you see fit.

I'm sure there are some refinements that could be done to this CFC.  The class name prefix, for instance, is hardcoded to "diff", and that could be changed if you need to use a different prefix.  Someone more familiar with the Java classes used here could find problems too, which I would welcome.

RSS Feed

August 2010 -- Modern Signal completes integration of new design for Teaching Strategies.com.
July 2010 -- Modern Signal launches new site for the NAHB International Builders'Show, the largest residential building industry tradeshow in the world.
July 2010 -- Modern Signal awarded contract to build a social media plan for The International Manufacturing Technology Show, one of the largest industrial trade shows in the world, with more than 1,100 exhibitors and over 92,000 visitors.
July 2010 -- Modern Signal completes launch of Teaching Strategies GOLD, a brand-new assessment tool with feature-rich tools for teachers, administrators, parents and trainers.
June 28, 2010 -- Modern Signal launches redeveloped website for National Health Policy Forum, a nonpartisan research and public policy organization at The George Washington University. The new site includes admin tools to manage email announcements, event invitations and RSVPs, surveys, and an extensive library of publications and meeting materials. The site also features a customized Google Search integration, and a new content management system was integrated within the existing design and information architecture of the site.   -View-
April 22, 2010 -- Modern Signal launches a redesigned website for The NALP Foundation for Law Career Research and Education, a nonprofit organization that works to ensure that the legal community and society at large have a reliable, objective, and affordable source of information.The site includes a content management system; bookstore; and news, events, leadership, and products tools to manage the featured homepage content. -View-
April 5, 2010 --
A redesigned website is launched for Independent Sector, a nonprofit coalition of approximately 600 charities, foundations, and corporate philanthropy programs, collectively representing tens of thousands of charitable groups in every state across the nation. The site includes a content management system, discussion forum integration (phpBB), blog integration (MangoBlog), collaborative authoring wiki, tool to create surveys, video template, among other features.  -View-