Porting a CFPDF / CFPDFFORM-Dependent Application to Open BlueDragon

In my continuing war against all things PDF in ColdFusion, today I believe I have achieved victory.

If you’ve been following my posts (OK, rants) over the past few weeks you’ll know that I ran into some annoying bugs in CFPDFFORM that were wreaking all sorts of havoc with the one application I wrote that depends on CFPDFFORM for a key part of what it does. I sent my test case to Adobe Support and, well, there are probably a million ways to couch this nicely, but the bottom line is they told me they can’t/won’t fix the bug, and their suggested workaround was to use iText to populate my PDF forms instead of using CFPDFFORM. Given that I ran into another annoying bug with CFPDFFORM a couple of years ago and got another “can’t/won’t fix” answer, this was strike two for CFPDFFORM. And personally I don’t like being backed into a corner by a third strike before making a change.

Now that my CFPDFFORM code had been changed over to use iText instead (which gave some nice speed benefits as well), the only remaining thing keeping me tied to ColdFusion 8 was the use of CFPDF to merge multiple PDFs into a single file. I looked into doing this with iText and although it’s doable, even being the gearhead I am I have to admit that’s a bit of a hassle. So before implementing that solution I decided to hunt around a bit more.

Enter Apache PDFBox.

PDFBox is already bundled with Open BlueDragon since we use it as the underlying libraries for some CFDOCUMENT functionality, but it’s a version behind the absolute latest. Turns out that the newest version of PDFBox has a great, easy-to-use PDFMergeUtility built in. This made removing my dependence on CFPDF (which isn’t in Open BlueDragon yet) pretty simple.

My application populates a varying number of individual PDF pages and then merges these pages into a single PDF file at the end of processing. So I use an “assembly” directory to build up the individual pages of the final result, and the final step is to merge the files. Previously I was using CFPDF for this:


<cfpdf action="merge"
       source="#filesToMerge#"
       destination="#destDir##destFileName#.pdf"
       overwrite="true" />

And the PDFBox PDFMergeUtility solution looks like this:


<cfset pdfMerger = CreateObject("java", "org.apache.pdfbox.util.PDFMergerUtility").init() />

<cfloop list="#filesToMerge#" index="fileToMerge">
  <cfset pdfMerger.addSource(fileToMerge) />
</cfloop>

<cfset pdfMerger.setDestinationFilename("#destDir##destFileName#.pdf") />
<cfset pdfMerger.mergeDocuments() />

So it’s a few more lines of code, but honestly not bad at all compared to the CFPDF version, and if you’ve been looking for ways to do some of what CFPDF and CFPDFFORM do in CF 8, between iText and PDFBox it looks like you’re covered.

I was forced to make the switch from CFPDFFORM to iText in order to get around a bug in CFPDFFORM, and with that out of the way it was only one more step to make this application CFML engine agnostic. I need to do some additional testing but with the CF 8 dependencies removed the app is running great on Open BlueDragon. I’ll do some comparison metrics but in initial testing, particularly since this is a CFC-heavy application, the speed and CPU load are both dramatically improved. Speed is about 30-50% faster (which makes a BIG difference when this app runs processing for hours at a time), CPU utilization is about half (CF completely pegs the CPU while this app is in full processing mode), and it’s noticeably lighter on RAM as well. All told the minor hassle of reworking these portions of the app will have been well worth it.

Yet Another PDF/Acrobat Pro Rant

I’m in PDF hell yet again today and had to vent. Now that iText fixed my
searchability problems (CFPDFFORM fail), I’m noticing cases where the font
in particular fields in the generated PDF does not in any way match the
settings that are in the PDF form when you look at the settings in Acrobat.

For example, all the form fields in one of the PDFs I’m working with are
set to font face Times New Roman and “Auto” for the font size. Random
fields here and there show up as Arial instead of Times New Roman and come
out some massive font size, even though other fields with the same amount
(or less) text are a reasonable size and are the correct font face.

Since I only recently figured out how to do mass changes of the font face
on multiple fields (usability fail; and this doesn’t work consistently by
any means, but it’s faster than doing it one by one), I thought I missed
setting the font face correctly on a field or two. But lo and behold when I
open the PDF form in Acrobat Pro the font is CLEARLY set correctly, yet the
generated PDF still renders the font incorrectly.

All that’s bad enough, but the PDF size issue is really starting to kill
me. The particular PDF I’m modifying started out at about 500K in size. I’m
having to experiment with some things to figure out these annoying font
issues, so I changed all the fonts from Times New Roman to Arial, saved the
PDF, and the file size went up to 800K. I then changed the font back from
Arial to Times New Roman (which is what it was originally) and the file
size is now 1MB.

What. The. @$&*.

I’m sure there are stupid subtleties or fancy Acrobat Guru tips and tricks
of which I am woefully unaware but my file size shouldn’t grow by 200K
every time I save it, so I’ll declare this a “fail” and try to suffer
through. Once I get these god-forsaken things working if I never have to
touch Acrobat the rest of my natural life it’ll be too soon.

Update on PDF Form Issues in ColdFusion

I posted previously on an odd problem I’ve run into with PDF forms populated by CFPDFFORM on ColdFusion 8 and the resultant PDF not being searchable via Ctrl-F in a PDF reader. Initially I thought it was a font encoding problem since I found some promising evidence to support that notion, so after spending literally an entire day changing hundreds of font encoding settings in PDF form fields individually in Acrobat (because you can’t select them all and change them globally … nice feature to add to Acrobat 10 maybe? YA THINK?), I thought I had the problem licked. I was able to open the PDF form in Acrobat, fill in the form field, save the PDF, and have the text I typed into the form field come up in searches.

Wrong. Turns out the only reason it was searchable was that I manually typed into the form field. When ColdFusion 8 populates the form fields with CFPDFFORM the form field still isn’t searchable. Same problem with ColdFusion 9.

I submitted the issue to Adobe Support Center of Excellence, and a week and a half later they’re saying they can reproduce the issue and it’s getting escalated.

Since I have no clue when I’ll get a response from Adobe and I’m putting my money on a “can’t/won’t fix” response (based on previous experience), this afternoon I grabbed the latest version of iText which is an open source Java PDF library. Among the numerous open source libraries bundled into ColdFusion, iText is one that’s potentially used to power CFPDFFORM. (I’m not saying this is what’s used since I can’t know that for sure; I’m just saying it’s a likely candidate if they didn’t roll something themselves.)

Anyway, with iText in hand I wrote a simple little Java application to populate said PDF form. This way I could narrow down further if the problem is with iText (if that is in fact what ColdFusion uses for CFPDFFORM) or if it’s something ColdFusion is doing that’s causing the issue.

The result? iText by itself works perfectly. So at least now I have it narrowed down to ColdFusion being the culprit. I even replaced the iText instance in ColdFusion with the latest version to make sure that wasn’t the issue, and either the version doesn’t matter or iText is not in fact what ColdFusion is using to populate PDF forms.

You might be thinking, “But isn’t CFPDFFORM so much simpler than Java?” Short answer is absolutely not . Longer answer is absolutely not and holy crap is using iText directly fast as hell.

Here’s some simple CFML code that populates a PDF form:


<cfpdfform action="populate" source="mypdfform.pdf" destination="filledpdfform.pdf">
    <cfpdfformparam name="myPDFField" value="Foo" />
    <cfpdfformparam name="myOtherPDFField" value="Bar" />
    ... etc. ...
</cfpdfform>

<!--- now if you want the data in the pdf form to actually be saved to disk,
        you have to do this little dance to flatten it --->
<cfpdf action="read" source="filledpdfform.pdf" name="myPDFData" />
<cfpdf action="write" source="myPDFData" destination="filledpdfform.pdf" flatten="true" overwrite="true" />

And here’s that same thing in Java–all I’m leaving out is some import statements that would be at the top of the class in which this code resides:


PdfReader reader = new PdfReader("mypdfform.pdf");
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream("filledpdfform.pdf"));
AcroFields form = stamper.getAcroFields();

form.setField("myPDFField", "Foo");
form.setField("myOtherPDFField", "Bar");
... etc. ...

stamper.close();

As you can see, the Java code is about the same as the CFML code. You open the PDF form, you set the field values, and you write the file out, although with iText I don’t have to write/read/write again to get it flattened, which is one of my earlier “can’t/won’t fix” responses from Adobe.

What’s my next step in all of this? Well, I’m faced with a pretty easy decision really. This is for an application the sole purpose of which is to process mountains of XML and generate flattened PDF forms containing the data extracted from XML. iText lets people actually search the PDFs as they should be able to (which is important since some of these PDFs are hundreds of pages long), iText is incredibly fast at cranking out PDFs in my limited testing thus far (which will be important when I’m re-generating three years of PDFs since none of the existing ones are searchable), and the Java code to achieve a better end result isn’t any more complex than the CFML code.

It’s an easy decision to use iText for this portion of the application. But as you can glean from the tone of this post I’m annoyed. CF 8 was new when this app was first built and one of the reasons we bought another CF 8 Enterprise license (since this app runs on its own server) was for this very feature because I figured it would be easier than using iText. It might have saved me some time initially, but boy am I paying dearly for it now.

So I’ll start by reworking the PDF generation portion of the app to use iText, and then I’m going to start implementing CFPDFFORM in Open BlueDragon. Frankly given that there’s literally no user interface for this application CF wasn’t a great choice to begin with, but at the point when I was making the decision CFPDFFORM is what sold me on using CF instead of writing the app in Java. But three years later, here I am. My other option I’m pondering is rewriting the entire application in Java or Groovy since that’s probably better suited to the no-UI aspects of the app anyway.

Nothing like having an itch to scratch to get motivated I guess, so at least the problems I’m having may lead to a nice new (and fully working!) feature in Open BlueDragon at some point, and maybe I’ll dig into Groovy more to rewrite my existing app. Interesting times ahead.