I'm in the middle of moving some things from physical servers to a VM infrastructure, and one application makes heavy use of URL rewriting and proxying. This is on Windows Server 2003 and when I first set this app up a few years ago, I used ISAPI Rewrite 2 to handle the rewriting and proxying chores. It's been working fine so when I set up the new VM for this app I got a license for ISAPI Rewrite 3 and started configuring things.
I'll spare you all the gory details but yesterday afternoon–a mere few hours before I was set to do the cutover–ISAPI Rewrite started choking hard. I started getting "Bad Request (Request header too long)
" errors, but only some of the time even on the same URL, so I hacked the registry as recommended by Microsoft in an attempt to fix it. That was followed with "Bad Request (Invalid Header Name)
" errors, which led to another registry hack. This seemed to fix things for a while, but then suddenly IIS would stop responding and throw one of these two errors if I had any rewrite rules enabled. Things continued a downward spiral from there. I even tried installing the older version of ISAPI Rewrite but that would immediately throw a 500 error whether or not any rewrite rules were enabled.
Needless to say I had to cancel the migration, and after the problems with ISAPI Rewrite I had absolutely zero confidence in that solution. There was no way I could move forward knowing that at any moment and without reason the whole thing would come crashing down.
I don't like being backed into a corner, particularly by Windows, so I shut down IIS and installed Apache. This app has a ton of server configuration to it but once I don't trust something I simply can't use it, so the configuration work on the Apache side would be beyond worth the effort since I'd wind up with a solution I can trust. (I would have chucked Windows altogether but not really my call in this case, and given that I'm under a bit of a time crunch that was one more variable I didn't need in the mix right this second.)
Here's the steps I went through, and it actually was easier than I thought it would be.
Download and Install Apache
Actually first, make sure to shut down IIS and set the startup to "Disabled" in your services panel. Now that I have everything set up I'm going to uninstall IIS entirely, but it was handy to have around for a bit so I could fire it up and go into the IIS admin console to check my settings as I moved things to Apache.
So grab the Windows version of Apache
(make sure and grab the version with SSL if you need it), run the installer (which takes all of about 10 seconds), and tell it to run as a service for all users. Next make sure when you hit localhost in your browser you get Apache's "It works!" message. Congratulations, you just freed yourself from IIS.
Connect ColdFusion to Apache
This server is running ColdFusion 8 Enterprise, and the OS on the new VM is Windows 2003 64 bit. The easiest way to hook CF into Apache is to open the Web Server Configuration Tool, which is under Start -> Programs -> Adobe -> ColdFusion 8. Since I had previously connected CF to IIS, when I launched the Web Server Configuration Tool it indicated that "localhost:cfusion" was hooked into IIS. I clicked that entry to select it, then clicked "Remove."
Next I clicked "Add" and waited about 60 seconds, and you'll see the "Add Web Server Configuration" screen. Choose the JRun Server you want to hook to Apache from the drop-down (if you have more than one), and choose "Apache" from the Web Server drop-down. Click the "…" box next to the "Configuration Directory" box and browse to your Apache conf directory, check the box "Configure web server for ColdFusion 8 applciations," and MAKE SURE to check the "Configure 32 bit webserver" box. I don't know this for a fact, but I'm pretty sure Apache for Windows is 32-bit. So even though I'm on a 64-bit box, when I didn't check that box Apache wouldn't start. This could be because I need a different version of the JRun shared object … who knows. Apache's running great so at least at this point I don't have much motivation to look into it.
Also, click on Advanced, click on the "…" box next to "Directory and file name of server binary," and point to your httpd.exe. This way CF can restart Apache after it modifies your Apache conf file.
That's it–pretty simple stuff. Delete the IIS entry, add one for Apache, and you're done.
Basic Apache Terminology
Before moving forward with the specifics of the configuration, if you're used to IIS terminology like "web site" and "virtual directory," you'll be happy to know all that stuff exists in Apache, but it's called something different and of course you'll be editing a config file instead of clicking through configuration wizards. I prefer the directness of the config file approach anyway, and I bet many others will too once you get the hang of it.
Here's the basic terminology mapping between IIS and Apache:
- a "web site" in IIS is a VirtualHost in Apache
- a "virtual directory" in IIS is an Alias in Apache
- a "home directory" in IIS is a DocumentRoot (or docroot) in Apache
- a "host header" in IIS is a ServerName or ServerAlias in Apache
- a "default document" in IIS is a DirectoryIndex in Apache
That should cover about 99% of what you need to know if you're moving from IIS to Apache. Apache is tremendously powerful and highly configurable so of course you can get as deep into things as you need to, but that should get most people going.
Before digging into Apache, at a high level all I did to convert things over was to open up IIS Manager and make note of all my "web sites" and their home directories. These will become virtual hosts and docroots in Apache. Next, in each IIS site take a look to see if you have any virtual directories defined. If so, make note of these–they'll become Aliases in Apache.
With that basic information in hand you're ready to configure Apache.
Apache Configuration Files
One of the things I absolutely love about Apache is that you do all your configuration in configuration files. Once you get the hang of this approach, there's just nothing simpler than being able to open a file and make the changes you need instead of clicking through a mess of popup windows to find the one setting you need to change.
The main two configuration files that most people will need are httpd.conf and extra/httpd-vhosts.conf. These are both under the Apache conf directory. httpd.conf is the main configuration file where you set server-wide configuration details. You can actually shove everything in this one file, and that's how things were done in older versions of Apache, but it's much cleaner to keep things in different files and simply enable these additional files within the main configuration file.
I won't give you the full tour of httpd.conf since the docs do a very nice job of that
, but I will go over what you'll likely need to edit in order to get things working the way most people want.
Going top to bottom in httpd.conf, the first thing you'll likely want to do is enable some modules. Specifically in this case since I know I'm going to be doing rewriting and proxying, I need to enable those modules since they're turned off by default. In the long list of LoadModule statements in httpd.conf, you'll want to uncomment (i.e. remove the #) these lines:
LoadModule proxy_module modules/mod_proxy.so
LoadModule proxy_http_module modules/mod_proxy_http.so
LoadModule rewrite_module modules/mod_rewrite.so
Next, if you're doing CFML stuff you'll want to add index.cfm as a DirectoryIndex, so find this section and update accordingly:
DirectoryIndex index.cfm index.html
You can have as many directory indexes as you want, just separate with a space and realize they will get hit in the order in which they're declared.
Finally, you'll want to enable name-based virtual hosting so you can have multiple virtual hosts sharing the same IP address. Towards the bottom of httpd.conf, find this section and uncomment the Include directive that will load the virtual hosts configuration file. When you're done it should look like this:
# Virtual hosts
Save httpd.conf, and now let's take a look at how to configure your virtual hosts.
Virtual Host Configuration
Open up conf/extra/httpd-vhosts.conf so we can configure some virtual hosts. You'll be spending a lot of time in this file as you use Apache. First, make sure this line right after the big comment block at the top is uncommented:
This enables name-based virtual hosts for all IP addresses on port 80. Next you'll see a couple of examples of virtual hosts. You can either delete those or comment them out by putting a # on each line. I tend to leave them in there but comment them out for reference.
For your first virtual host, let's set one up for localhost because (at least in my experience) once you enable name-based virtual hosting, you have to have a virtual host even for localhost. Add the following section, adjusting the DocumentRoot as needed based on where you installed Apache:
DocumentRoot "C:/Program Files (x86)/Apache Software Foundation/Apache2.2/htdocs"
Save the file, and then restart Apache just to make sure all the changes we've made are working. If Apache doesn't restart don't panic, that just means you have a syntax error somewhere. Double-check everything and try again. If you can hit localhost in your browser and see "It works!", well, that message says it all I guess.
Note that if you have multiple IP addresses on your machine and want to tell a virtual host to use a specific IP, or if you want to run a site on a port other than 80, you can replace the * with an IP, and the 80 with whatever port you need.
Next let's configure a more real-world virtual host. I'll be using foo.com
as my example, and we'll want people to be able to hit the site using foo.com
. I'm also going to tell Apache to use a log file specific to this site to make diagnosing problems and doing reporting easier. There are a few other things in here that I'll explain in a moment.
Alias /CFIDE C:/path/to/CFIDE
Allow from all
CustomLog "logs/foo-access.log" common
The ServerName and ServerAlias information is pretty self-explanatory–foo.com
is the primary name for this virtual host, but with the alias of www.foo.com
, either foo.com
will hit this virtual host.
DocumentRoot tells Apache where to find the files that it will be serving when someone hits this virtual host.
I threw an Alias in the mix simply to show how "virtual directories" (in IIS speak) work. Let's say in this case I want foo.com
The <Directory> directive requires a bit of explanation. For security reasons, by default all directory access (other than the default localhost site) is denied by Apache. This is done in the main httpd.conf file, so you can either make the change there, or I prefer to do this on a case-by-case basis inside each virtual host. In the case of a public site you won't know where people are coming from so you have to tell Apache to allow access to that directory from anywhere, which is done with the "Allow from all" line. I left this out, but note that you will likely have to add a <Directory> entry for the C:/path/to/CFIDE directory as well.
Finally, I tell Apache to create an access log specific to this site instead of using the global Apache logs.
For a lot of virtual hosts that's literally all there is to it. But since what started this whole process was rewriting issues, let's take a look at some of the cool things you can accomplish (and shoot yourself with) by using mod_rewrite.
URL Rewriting and Proxying
For the app in question we do a lot of URL rewriting and proxying so we can give the users a single site that actually is comprised of multiple sites, potentially on different physical servers. This is also a great way to handle long-term migrations where you have a legacy server that you don't really want to touch but still need content from, and you want to add a newer server in the mix.
As with everything else related to Apache this is powerful stuff, but the basics are relatively simple. I do love this quote from the mod_rewrite docs
"The great thing about mod_rewrite is it gives you all the configurability and flexibility of Sendmail. The downside to mod_rewrite is that it gives you all the configurability and flexibility of Sendmail.''
Let's start with a basic rewrite rule, and then we'll look at what I have to do a lot of which is proxying. Let's say for whatever reason in the foo.com
virtual host you want requests to foo.html to actually hit bar.html. First we need to enable the rewrite engine in our virtual host, so inside your <VirtualHost> block, add this line:
Next we add a simple RewriteRule
to tell requests for foo.html to be rewritten to bar.html:
RewriteRule /foo.html /bar.html [NC]
The [NC] bit at the end stands for "no case," so that way both foo.html and FOO.HTML will be rewritten to bar.html. There are a ton of flags to do various things outlined in the docs, and if you want some nice rewrite example examples they have those too
So far so good? Next let's tackle proxying. Instead of a simple rewrite from foo.html to bar.html, let's say you want everything under a particular directory to be proxied to another server. To make the example more concrete, let's say your company has an intranet on one server and an employee directory that runs on another server, but you want people to be able to access the employee directory directly from your intranet. If you wanted to do a simple redirect from http://intranet/empdirectory
, that's simple enough:
RewriteRule ^/empdirectory(.*) http://empdirectory$1 [NC,R]
The (.*) after /empdirectory will include anything that comes after /empdirectory, and this is tacked onto the end of the remote URL via the $1. The "R" flag tells Apache to do a redirect for this RewriteRule, and you can even set the status code for the redirect. This does change the URL in the user's browser, however, so what if you didn't want that to happen? This is where proxying comes in.
First, we change the "R" flag to a "P":
RewriteRule ^/empdirectory(.*) http://empdirectory$1 [NC,P]
Now we're proxying instead of doing a redirect (and note that mod_proxy needs to be enabled to use the P flag, which is why we did that earlier), but if this is all you do you'll notice that the URL in the browser still changes. This is because there's nothing in place to handle proxying the response back to the requestor. So we need to add a ProxyPassReverse directive, which will allow us to hit http://intranet/empdirectory
and keep that URL while the content is actually served from http://empdirectory
RewriteRule ^/empdirectory(.*) http://empdirectory$1 [NC,P]
ProxyPassReverse /empdirectory http://empdirectory
With all this in place you can serve content from another server without your users knowing they're hitting another server.
There are about a million and one other things you can do with mod_rewrite, but my only intent with this post was to share what I had to do in my specific move from IIS to Apache in the hopes it might help others who want to make this move.
Even though it was under duress, I'm honestly glad ISAPI Rewrite totally failed since that led me to setting up Apache on this box. After seeing ISAPI Rewrite have its various meltdowns I simply would not have felt comfortable using it. I'm sure I could have contacted support and gotten things figured out eventually, but it took me far longer to write this blog post than it did to switch to Apache, particularly since the rewrite syntax of ISAPI Rewrite is largely compatible with Apache's. I'm going to sleep much better at night knowing Apache is powering this app instead of being constantly worried that ISAPI Rewrite will have another meltdown.
I should have made this disclaimer at the beginning but I am in no way an Apache expert, so if there are different or better ways to do any of this, if anything is explained poorly or incorrectly, or if I omitted any important details, please comment.