Some background first...
I had the need to do diff of html content in a project I was working on, which brought me pretty quickly to DaisyDiff, a really nice Java-based utility. DaisyDiff doesn't however, have a simple built-in function to do a diff of two strings. There is a command-line option, which takes the paths of two files as arguments, and also a java api that take a number of java objects as arguments. What I wanted was a function that took two strings and output the results, but DaisyDiff has no such simple function.
I don't really do java development -- that is I've done some in the past but it's been a while and it would probably take me some time to get my development environment up to snuff. Besides, I didn't really feel like dealing with compiled code.
A quick google search, of course, turns up CFX_CompareHTML and the JavaLoader version of the same thing. So I used that, and it worked fine. But it was using an old version of DaisyDiff, and it seemed to have some bugs with UTF characters and such. What I really wanted to do was to use JavaLoader to load the current version of DaisyDiff. After much stumbling around in the code, I found that the test suite in the DaisyDiff repository has exactly the function I wanted -- it compares two strings and returns the result.
So, long story short, I took the code from that function and pulled it into a CFC, using JavaLoader, and rewrote everything in CFML. The result is the simple function I was after.
So anyway, here it is:
<cfcomponent hint="Wrapper for DaisyDiff" output="false">
<cffunction name="Init" output="false" returntype="DaisyDiff">
<cfargument name="daisydiffpath" hint="absolute path to daisydiff jar file" type="string" required="true">
<cfargument name="javaloaderpath" hint="component path to JavaLoader.cfc" type="string" required="true">
<cfset This.daisydiffpath = arguments.daisydiffpath>
<cfset This.javaloaderpath = arguments.javaloaderpath>
<cffunction name="Diff" output="false" returntype="string">
<cfargument name="olderHtml" type="string" required="true">
<cfargument name="newerHtml" type="string" required="true">
<cfset var paths = [This.daisydiffpath]>
<cfset var loader = createObject("component", This.javaloaderpath).init(paths)>
<cfset var TransformerFactoryImpl = loader.create("com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl")>
<cfset var StringReader = loader.create("java.io.StringReader")>
<cfset var StringWriter = loader.create("java.io.StringWriter")>
<cfset var Locale = loader.create("java.util.Locale")>
<cfset var StreamResult = loader.create("javax.xml.transform.stream.StreamResult")>
<cfset var OutputKeys = loader.create("javax.xml.transform.OutputKeys")>
<cfset var NekoHtmlParser = loader.create("org.outerj.daisy.diff.helper.NekoHtmlParser")>
<cfset var DomTreeBuilder = loader.create("org.outerj.daisy.diff.html.dom.DomTreeBuilder")>
<cfset var HTMLDiffer = loader.create("org.outerj.daisy.diff.html.HTMLDiffer")>
<cfset var HtmlSaxDiffOutput = loader.create("org.outerj.daisy.diff.html.HtmlSaxDiffOutput")>
<cfset var TextNodeComparator = loader.create("org.outerj.daisy.diff.html.TextNodeComparator")>
<cfset var InputSource = loader.create("org.xml.sax.InputSource")>
<cfset var finalResult = StringWriter.Init()>
<cfset var result = TransformerFactoryImpl.Init().newTransformerHandler()>
<cfset var sr = StreamResult.Init(finalResult)>
<cfset var prefix = "diff">
<cfset var cleaner = NekoHtmlParser.Init()>
<cfset var oldSource = InputSource.Init(StringReader.Init(olderHtml))>
<cfset var newSource = InputSource.Init(StringReader.Init(newerHtml))>
<cfset var oldHandler = DomTreeBuilder.Init()>
<cfset var newHandler = DomTreeBuilder.Init()>
<cfset var leftComparator = "">
<cfset var rightComparator = "">
<cfset var output = "">
<cfset var differ = "">
<cfset var diff = "">
<cfset result.getTransformer().setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes")>
<cfset cleaner.parse(oldSource, oldHandler)>
<cfset leftComparator = TextNodeComparator.Init(oldHandler, Locale.getDefault())>
<cfset cleaner.parse(newSource, newHandler)>
<cfset rightComparator = TextNodeComparator.Init(newHandler, Locale.getDefault())>
<cfset output = HtmlSaxDiffOutput.Init(result,prefix)>
<cfset differ = HTMLDiffer.Init(output)>
<cfset differ.diff(leftComparator, rightComparator)>
<cfset diff = finalResult.toString()>
<cfset var daisy = CreateObject("component","cfc.DaisyDiff").Init(expandPath("../daisydiff-1.1/daisydiff.jar"),"Lighthouse.Utilities.javaloader.JavaLoader")>
<cfset var diff = daisy.diff(olderhtml,newerhtml)>
The result is html that has been marked up by DaisyDiff with special classes. You can take that and style it in any way that you see fit.
I'm sure there are some refinements that could be done to this CFC. The class name prefix, for instance, is hardcoded to "diff", and that could be changed if you need to use a different prefix. Someone more familiar with the Java classes used here could find problems too, which I would welcome.
Posted on March 29, 2010 4:37:47 PM EDT by David Hammond
Modern Signal has been a great partner for us for over the past 10 years. As our business grew and our needs changed, Modern Signal was able to work with us to adjust our website platform in the ever-changing online world. Their service and response level has been second to none, and we've been never been happier with our relationship with them.
I love working with Modern Signal! Their CMS is very easy to use and they are incredibly responsive to questions or challenges I bring them.
Modern Signal has a professional staff that was very responsive to our needs during all phases - scoping, developing, implementing and maintaining - of our project. We have been pleased with their ability to deliver quality work on time and on budget. If given the opportunity, I would work with them again.
Modern Signal significantly enhanced our site to be more efficient and user-friendly. They provide excellent customer service with timely and cost-effective solutions.
Modern Signal understands our business - from future needs to current limitations - so their solutions are always scalable, solid, and service-oriented.
Modern Signal worked with us to understand our needs and figure out what solution would work best for us. Our Lighthouse CMS is perfectly suited to our website goals. When we later needed to modify the CMS, they again took the time to understand exactly what was needed and then built that functionality rather than delivering a cookie cutter solution.
I felt as if my company was their only client. They responded to my needs quickly and efficiently despite short turn around time and intense demands.