CFDOCUMENT Tips with Open BlueDragon

I’m porting a ColdFusion 8 application to Open BlueDragon and the app in question generates documents using both iText and PDFBox (which I posted about before), and also generates PDF files from HTML content using CFDOCUMENT. When compared with CF 8 I ran into some differences with CFDOCUMENT so I figured I’d post them here. In general everything just works, so this is more formatting issues than anything else.

1. Use Full URLs for Images and CSS

This was covered by Nitai in a thread on the OpenBD mailing list a while ago, so consider this a reminder that you need to use full URLs for images and external stylesheets.

2. Tweak Your CSS as Needed

Because the underlying rendering engine differs between OpenBD and CF (not sure what CF is using, but OpenBD uses the amazing Flying Saucer project), you may see differences in the handling of CSS. None of the ones I ran into were biggies, and in many cases when I looked at the CSS being used, CF 8 wasn’t doing what it was supposed to be doing so while the rendered output was what I wanted, it wasn’t adhering properly to the CSS. One particular case I’ll mention as an example–an h1 tag had a style of float:left in the CSS which wasn’t being respected by CF 8, so when the document was generated in OpenBD there wasn’t a break where I was expecting one. A quick change to float:none and all was well.

3. Empty Paragraphs Don’t Count

I had some instances of <p> tags with CSS applied that were being used as spacers, to generate horizontal rules using a border style on the paragraph, etc. but these paragraph tags had nothing between them (e.g. <p class=”spacer”><p>). If you don’t have *something* in between the open and close paragraph tag the CSS doesn’t seem to apply. Throwing a non-breaking space in (<p class=”spacer”> </p>) worked great for me.

4. Font Differences

Remember that depending on OS platform and a bunch of other variables you may find differences in the fonts being output. In my case the CSS (which I got from someone else originally) was using Georgia as the main font and I don’t have Georgia on my Ubuntu laptop, so the rendered output wasn’t the same. Just make sure you have the fonts you want to use available. You can check the Fonts page in the OpenBD administrator to see how OpenBD hunts for fonts and to add your own font paths if necessary.

That’s all I ran into with CFDOCUMENT on OpenBD–a few tweaks here and there and it’s working fantastically well!

Porting a CFPDF / CFPDFFORM-Dependent Application to Open BlueDragon

In my continuing war against all things PDF in ColdFusion, today I believe I have achieved victory.

If you’ve been following my posts (OK, rants) over the past few weeks you’ll know that I ran into some annoying bugs in CFPDFFORM that were wreaking all sorts of havoc with the one application I wrote that depends on CFPDFFORM for a key part of what it does. I sent my test case to Adobe Support and, well, there are probably a million ways to couch this nicely, but the bottom line is they told me they can’t/won’t fix the bug, and their suggested workaround was to use iText to populate my PDF forms instead of using CFPDFFORM. Given that I ran into another annoying bug with CFPDFFORM a couple of years ago and got another “can’t/won’t fix” answer, this was strike two for CFPDFFORM. And personally I don’t like being backed into a corner by a third strike before making a change.

Now that my CFPDFFORM code had been changed over to use iText instead (which gave some nice speed benefits as well), the only remaining thing keeping me tied to ColdFusion 8 was the use of CFPDF to merge multiple PDFs into a single file. I looked into doing this with iText and although it’s doable, even being the gearhead I am I have to admit that’s a bit of a hassle. So before implementing that solution I decided to hunt around a bit more.

Enter Apache PDFBox.

PDFBox is already bundled with Open BlueDragon since we use it as the underlying libraries for some CFDOCUMENT functionality, but it’s a version behind the absolute latest. Turns out that the newest version of PDFBox has a great, easy-to-use PDFMergeUtility built in. This made removing my dependence on CFPDF (which isn’t in Open BlueDragon yet) pretty simple.

My application populates a varying number of individual PDF pages and then merges these pages into a single PDF file at the end of processing. So I use an “assembly” directory to build up the individual pages of the final result, and the final step is to merge the files. Previously I was using CFPDF for this:


<cfpdf action="merge"
       source="#filesToMerge#"
       destination="#destDir##destFileName#.pdf"
       overwrite="true" />

And the PDFBox PDFMergeUtility solution looks like this:


<cfset pdfMerger = CreateObject("java", "org.apache.pdfbox.util.PDFMergerUtility").init() />

<cfloop list="#filesToMerge#" index="fileToMerge">
  <cfset pdfMerger.addSource(fileToMerge) />
</cfloop>

<cfset pdfMerger.setDestinationFilename("#destDir##destFileName#.pdf") />
<cfset pdfMerger.mergeDocuments() />

So it’s a few more lines of code, but honestly not bad at all compared to the CFPDF version, and if you’ve been looking for ways to do some of what CFPDF and CFPDFFORM do in CF 8, between iText and PDFBox it looks like you’re covered.

I was forced to make the switch from CFPDFFORM to iText in order to get around a bug in CFPDFFORM, and with that out of the way it was only one more step to make this application CFML engine agnostic. I need to do some additional testing but with the CF 8 dependencies removed the app is running great on Open BlueDragon. I’ll do some comparison metrics but in initial testing, particularly since this is a CFC-heavy application, the speed and CPU load are both dramatically improved. Speed is about 30-50% faster (which makes a BIG difference when this app runs processing for hours at a time), CPU utilization is about half (CF completely pegs the CPU while this app is in full processing mode), and it’s noticeably lighter on RAM as well. All told the minor hassle of reworking these portions of the app will have been well worth it.

Yet Another PDF/Acrobat Pro Rant

I’m in PDF hell yet again today and had to vent. Now that iText fixed my
searchability problems (CFPDFFORM fail), I’m noticing cases where the font
in particular fields in the generated PDF does not in any way match the
settings that are in the PDF form when you look at the settings in Acrobat.

For example, all the form fields in one of the PDFs I’m working with are
set to font face Times New Roman and “Auto” for the font size. Random
fields here and there show up as Arial instead of Times New Roman and come
out some massive font size, even though other fields with the same amount
(or less) text are a reasonable size and are the correct font face.

Since I only recently figured out how to do mass changes of the font face
on multiple fields (usability fail; and this doesn’t work consistently by
any means, but it’s faster than doing it one by one), I thought I missed
setting the font face correctly on a field or two. But lo and behold when I
open the PDF form in Acrobat Pro the font is CLEARLY set correctly, yet the
generated PDF still renders the font incorrectly.

All that’s bad enough, but the PDF size issue is really starting to kill
me. The particular PDF I’m modifying started out at about 500K in size. I’m
having to experiment with some things to figure out these annoying font
issues, so I changed all the fonts from Times New Roman to Arial, saved the
PDF, and the file size went up to 800K. I then changed the font back from
Arial to Times New Roman (which is what it was originally) and the file
size is now 1MB.

What. The. @$&*.

I’m sure there are stupid subtleties or fancy Acrobat Guru tips and tricks
of which I am woefully unaware but my file size shouldn’t grow by 200K
every time I save it, so I’ll declare this a “fail” and try to suffer
through. Once I get these god-forsaken things working if I never have to
touch Acrobat the rest of my natural life it’ll be too soon.

Update on PDF Form Issues in ColdFusion

I posted previously on an odd problem I’ve run into with PDF forms populated by CFPDFFORM on ColdFusion 8 and the resultant PDF not being searchable via Ctrl-F in a PDF reader. Initially I thought it was a font encoding problem since I found some promising evidence to support that notion, so after spending literally an entire day changing hundreds of font encoding settings in PDF form fields individually in Acrobat (because you can’t select them all and change them globally … nice feature to add to Acrobat 10 maybe? YA THINK?), I thought I had the problem licked. I was able to open the PDF form in Acrobat, fill in the form field, save the PDF, and have the text I typed into the form field come up in searches.

Wrong. Turns out the only reason it was searchable was that I manually typed into the form field. When ColdFusion 8 populates the form fields with CFPDFFORM the form field still isn’t searchable. Same problem with ColdFusion 9.

I submitted the issue to Adobe Support Center of Excellence, and a week and a half later they’re saying they can reproduce the issue and it’s getting escalated.

Since I have no clue when I’ll get a response from Adobe and I’m putting my money on a “can’t/won’t fix” response (based on previous experience), this afternoon I grabbed the latest version of iText which is an open source Java PDF library. Among the numerous open source libraries bundled into ColdFusion, iText is one that’s potentially used to power CFPDFFORM. (I’m not saying this is what’s used since I can’t know that for sure; I’m just saying it’s a likely candidate if they didn’t roll something themselves.)

Anyway, with iText in hand I wrote a simple little Java application to populate said PDF form. This way I could narrow down further if the problem is with iText (if that is in fact what ColdFusion uses for CFPDFFORM) or if it’s something ColdFusion is doing that’s causing the issue.

The result? iText by itself works perfectly. So at least now I have it narrowed down to ColdFusion being the culprit. I even replaced the iText instance in ColdFusion with the latest version to make sure that wasn’t the issue, and either the version doesn’t matter or iText is not in fact what ColdFusion is using to populate PDF forms.

You might be thinking, “But isn’t CFPDFFORM so much simpler than Java?” Short answer is absolutely not . Longer answer is absolutely not and holy crap is using iText directly fast as hell.

Here’s some simple CFML code that populates a PDF form:


<cfpdfform action="populate" source="mypdfform.pdf" destination="filledpdfform.pdf">
    <cfpdfformparam name="myPDFField" value="Foo" />
    <cfpdfformparam name="myOtherPDFField" value="Bar" />
    ... etc. ...
</cfpdfform>

<!--- now if you want the data in the pdf form to actually be saved to disk,
        you have to do this little dance to flatten it --->
<cfpdf action="read" source="filledpdfform.pdf" name="myPDFData" />
<cfpdf action="write" source="myPDFData" destination="filledpdfform.pdf" flatten="true" overwrite="true" />

And here’s that same thing in Java–all I’m leaving out is some import statements that would be at the top of the class in which this code resides:


PdfReader reader = new PdfReader("mypdfform.pdf");
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream("filledpdfform.pdf"));
AcroFields form = stamper.getAcroFields();

form.setField("myPDFField", "Foo");
form.setField("myOtherPDFField", "Bar");
... etc. ...

stamper.close();

As you can see, the Java code is about the same as the CFML code. You open the PDF form, you set the field values, and you write the file out, although with iText I don’t have to write/read/write again to get it flattened, which is one of my earlier “can’t/won’t fix” responses from Adobe.

What’s my next step in all of this? Well, I’m faced with a pretty easy decision really. This is for an application the sole purpose of which is to process mountains of XML and generate flattened PDF forms containing the data extracted from XML. iText lets people actually search the PDFs as they should be able to (which is important since some of these PDFs are hundreds of pages long), iText is incredibly fast at cranking out PDFs in my limited testing thus far (which will be important when I’m re-generating three years of PDFs since none of the existing ones are searchable), and the Java code to achieve a better end result isn’t any more complex than the CFML code.

It’s an easy decision to use iText for this portion of the application. But as you can glean from the tone of this post I’m annoyed. CF 8 was new when this app was first built and one of the reasons we bought another CF 8 Enterprise license (since this app runs on its own server) was for this very feature because I figured it would be easier than using iText. It might have saved me some time initially, but boy am I paying dearly for it now.

So I’ll start by reworking the PDF generation portion of the app to use iText, and then I’m going to start implementing CFPDFFORM in Open BlueDragon. Frankly given that there’s literally no user interface for this application CF wasn’t a great choice to begin with, but at the point when I was making the decision CFPDFFORM is what sold me on using CF instead of writing the app in Java. But three years later, here I am. My other option I’m pondering is rewriting the entire application in Java or Groovy since that’s probably better suited to the no-UI aspects of the app anyway.

Nothing like having an itch to scratch to get motivated I guess, so at least the problems I’m having may lead to a nice new (and fully working!) feature in Open BlueDragon at some point, and maybe I’ll dig into Groovy more to rewrite my existing app. Interesting times ahead.

Font Encoding and Searchable PDFs

I ran into a weird issue today I thought I’d share in case anyone else runs
into this.

In one of my applications I’m populating PDF forms via CFPDFFORM in
ColdFusion. It works great but the PDFs generated aren’t searchable, by
which I mean if you’re in Acrobat Reader (or any PDF reader application
from what I tested), you can search the PDF but any data that was
programmatically inserted into the PDF form fields isn’t searched. So for
example I can be looking at the name “Smith” in the PDF, but if I do a
search for “Smith” it will yield 0 results.

It turns out that the reason for this is due to the encoding of the font
being used on the form fields. I chose Arial for the font (in Acrobat Pro
on the Mac if I remember correctly) when I was creating the empty form but
didn’t realize that the version of Arial I chose used Identity-H encoding.
Identity-H is a double-byte encoding so I find it a bit odd that it’s not
searchable, but the solution (at least that I’ve found so far) is to use a
font with ANSI encoding instead.

Since I’ve been generating PDFs with this app for 2+ years now (funny no
one noticed until now!), I guess I’ll be regenerating a lot of PDFs if I
want them to be searchable. Luckily there’s a function in the app for just
that purpose, but my server’s going to hate me for having to do all that
work over again.

Hope that saves someone else’s head and nearest wall from unnecessary
abuse.