Python 2.7, Django, Apache, and Gunicorn on CentOS 6.5

I’m working on a little Django project for KPTZ Radio in Port Townsend and since this project has to talk to a serial relay board from a specific server that has other things running on it, I’ve been going through the process of installing Python 2.7 on CentOS 6.5, along with configuring Django, Apache, and Gunicorn.

Since I’m a lot more used to dealing with Nginx and Gunicorn on Ubuntu, getting this all up and running correctly took a lot more trial and error than I thought it would, but I finally got it figured out so figured I’d share since I found a lot of either incomplete or misleading information about this as I searched for solutions.

Installing Python 2.7

Your first question is probably why I’m not installing Python 3. In the case of this particular project, pyserial was not (when I first started the project) Python 3 compatible, so rather than fight that battle I decided to use Python 2.7.

The problem with Python 2.7 is on CentOS 6.5, Python 2.6.6 is the default, and since there’s other Python-related stuff running on the server already I didn’t want to run the risk of screwing anything else up, so I had to install Python 2.7 as an alternate Python installation. Luckily there were a couple of resources from people who had already been through this so it wasn’t an issue. Here’s the steps I took on a fresh CentOS 6.5 VM I was using to do some trial runs before doing everything on the production server (do all these as the root user).

  1. yum -y update
  2. yum -y groupinstall “development tools” –skip-broken
  3. yum -y install wget gcc gcc-c++ make httpd-devel git vim zlib-devel bzip2-devel openssl-devel ncurses-devel sqlite-devel readline-devel tk-devel gdbm-devel db4-devel libpcap-devel xz-devel
  4. wget https://www.python.org/ftp/python/2.7.10/Python-2.7.10.tgz
  5. tar xvf Python-2.7.10.tgz
  6. cd Python-2.7.10
  7. ./configure –prefix=/usr/local –enable-shared LDFLAGS=”-Wl,-rpath /usr/local/lib”
  8. make && make altinstall
  9. python2.7 -V (to confirm it’s working)
  10. cd ..
  11. wget –no-check-certificate https://pypi.python.org/packages/source/s/setuptools/setuptools-18.2.tar.gz
  12. tar xvf setuptools-18.2.tar.gz
  13. cd setuptools-18.2
  14. python2.7 setup.py install
  15. cd ..
  16. curl https://raw.githubusercontent.com/pypa/pip/master/contrib/get-pip.py | python2.7 –
  17. pip2.7 install virtualenv

Create a User to Own the Project

Depending on how you want to do things this could be considered optional, but I created a user to own the project files (again as root):
  1. useradd -m -s /bin/bash/someuser

Create a Python virtualenv and Install the Django Project

Next we’ll create a Python 2.7 virtualenv, grab the Django code, install the Django project’s requirements, and do a couple of other configuration things for the Django app.
  1. sudo su – someuser
  2. mkdir ~/.virtualenvs
  3. cd ~/.virtualenvs
  4. virtualenv foo –python=python2.7
  5. cd foo
  6. source bin/activate
  7. cd ~/
  8. git clone foo
  9. cd foo
  10. pip install -r requirements.txt
  11. python manage.py runserver (just to make sure things are working at this point)
  12. python manage.py collectstatic
  13. python manage.py migrate

Configure Upstart to Start the Gunicorn Process When the Server Boots

I suppose on the next version of CentOS this will be done with systemd but thankfully on CentOS 6.5 we can still use Upstart. Note that if you’re familiar with Upstart on Ubuntu the syntax is quite different on CentOS — thanks to my good friend and former coworker Brandon Culpepper for pointing that out before I lost my mind.
First, we’ll do a quick test to make sure everything’s working at this point:
  1. sudo su – 
  2. cd /home/someuser/foo
  3. /home/someuser/.virtualenvs/foo/bin/gunicorn –workers 4 –timeout 60 –bind 0.0.0.0:8000 foo.wsgi:application
  4. Hit Ctrl-C to kill the process if you don’t see any errors.
Next we’ll create the upstart file:
  1. vim /etc/init/foo.conf
  2. Put the following in the foo.conf file and save it:
    description “Gunicorn process for foo app”

    start on started sshd
    stop on shutdown

    script
      cd /home/someuser/foo
      /home/someuser/.virtualenvs/foo/bin/gunicorn –workers 4 –timeout 60 –log-level debug –bind 0.0.0.0:8000 foo.wsgi:application
    endscript

  3. start foo (to make sure the upstart process works)
  4. ps -wef | grep python (you should see some python processes running under your virtualenv)

Create Apache Virtual Host for the App

There’s a bunch of ways to set up Django apps with Apache. In my early days with Django I would have used mod_wsgi but since I’m way more used to Nginx and Gunicorn these days, I figured I’d set up Apache in similar fashion and have it proxy to Gunicorn.
  1. vim /etc/httpd/conf/httpd.conf
  2. Uncomment the NameVirtualHost *.80 line if it isn’t already uncommented
  3. Add a new VirtualHost section at the bottom of the Apache config file:
    <VirtualHost *:80>
      ServerName whatever
      DocumentRoot /home/someuser/foo

      # serve static files from Apache
      RewriteEngine on
      RewriteRule ^/static/.* – [L]

      # proxy everything else to the gunicorn process
      ProxyPreserveHost on

      RewriteRule ^(.*)$ http://127.0.0.1:8000$1 [P]
      ProxyPassReverse / http://127.0.0.1:8000/
    </VirtualHost>

  4. apachectl restart
At that point you should be all set! Hope that helps people who are in this same or a similar boat save some time.

Apache Error “Document Root Doesn’t Exist” on Red Hat Enterprise LinuxWhen the Document Root DOES Exist

I upgraded one of our Red Hat Enterprise Linux VMs to Tomcat 7.0.4 tonight, did a bit of Apache reconfiguration, and when I restarted Apache I got a “document root doesn’t exist” error even though in fact the document root does exist. (Trust me Apache, it’s there.)

I double-checked the owners and permissions of all the directories in question and everything was identical to how things are set on another RHEL VM in this cluster on which I haven’t upgraded Tomcat and done the reconfiguration yet. I was at a bit of a loss, so I googled around a bit and the prevailing sentiment seemed to be this was related to either A) a config file copied from a Windows box and the line breaks were throwing things off (wasn’t applicable in my case), or B) the fact that SELinux was enabled.

If you’ve been around the Red Hat flavors of Linux long enough you’ll remember that SELinux used to be absolutely horrible. For years the very first thing you had to do on Red Hat and Fedora to get anything working at all was turn SELinux off, and for a long time I believe it was even Red Hat’s recommendation under their breath to just shut it off. It’s gotten better over the years, and honestly stays out of the way to the point where I’d almost forgotten about it.

Since I didn’t have anything else to try, however, I went into /etc/sysconfig/selinux and changed SELINUX=enforcing to SELINUX=disabled, restarted the server, and voila the complaining from Apache went away.

What I still don’t get is A) why this isn’t occurring on my other RHEL box with the same setup, and B) why it just started happening now. The only potential weirdness here is that my document root is a symlink, but again, it’s been that way since I setup up these boxes originally and it hasn’t been an issue.

So if you run into this same problem the fix (at least until I have more information about why it’s happening) is to disable SELinux, but if anyone has more ideas about why this might be happening I’d love to hear them.

Clustering and Load Balancing With tc Server and ERS httpd #s2gx

Mark Thomas – SpringSource

  • Tomcat committer
  • tc Server developer
  • responsible for keeping tc Server and Tomcat in sync
    • memory leak detection in tomcat manager app
    • recent logging improvements
    • simplifying jmx access
    • all of the above started in tc Server, but have been contributed back and implemented these features in tomcat
    • don't want to get into having a significant fork of tomcat

Typical Architectures

  • load balancer (round robin) -> httpd (sticky sessions) -> tc Server (clustered)
    • don't go anywhere near tc Server clustering unless you absolutely have to–adds complexity and overhead
    • only thing tc Server clustering gives you is the ability for users not to lose sessions if an instance of tomcat goes down
    • ask yourself how big of a deal it is if your users lose their sessions when an outage occurs–if it's a big deal then you may need clustering

Starting Point

  • ubuntu 8.04.4 64-bit VM
  • vmware tools installed
  • 64-bit sun jdk 1.6.0_21
  • will be installing tc Server, Hyperic, etc. on this clean image

tc Server Installation

  • don't run tc Server as root
  • create a tcserver user
    • owns the tc Server files
    • runs the tc Server processes
  • install to /usr/local/tcserver

Instance Naming and Port Numbering

  • think about this in advance–may wind up with 100s of instances
  • tc01, tc02, etc. as the instance name, then follow this for ports
  • example scheme for ports
    • 1NN80 – http
    • 1NN43 – https
    • 1NN09 – ajp
    • 1NN05 – shutdown (if used)
    • 1NN69 – jmx
  • server and jvmRoute naming–consider linking server name to IP address, e.g. srvXXX-tcYY where XXX is the end of the IP address, YY is the tomcat instance number
    • 1NN20 – cluster communication

DEMO: Installing tc Server

  • tc Server version names are e.g. apache-tomcat-6.0.29.A.RELEASE where the first part is the version of Tomcat, the "A" means it's the first release of tc Server based on that tomcat release
  • if shutdown port is disabled, doing a kill -15 does a graceful shutdown. kill -9 works too and tomcat won't care, though your application might, so only do -9 if you have to
  • created two instances of tc Server using the tc Server create instance script
  • tc Server comes with templates for startup scripts–copy these over to /etc/init.d and edit as needed
  • paramterize cluster addresses and ports in a catalina properties file
  • can use ${…} notation in server.xml to hit the properties in catalina.properties

Creating a Cluster

  • switching to static node membership
    • cumbersome for large clusters
    • remove the <Membership …/> element
    • need to add a bunch of config stuff after the <Interceptor …/> elements
  • easier to use dynamic node discovery
  • backup strategies — tomcat gives you DeltaManager and BackupManager
    • delta manager is simplest–replicates every session to every node in the cluster
    • if your sessions use a lot of memory, delta manager doesn't give you much scalability
    • if your limitation is CPU, delta manager gives you some scalability
    • amount of network traffic on delta manager increases with the square of the number of nodes–not terribly scalable
  • backup manager
    • replicates session data to one other node in the cluster
    • send options: synchronous vs. asynchronous
      • in synchronous, writes session changes to other nodes, waits for acknowledgement, and then sends response to the user. can mean a lag for the user.
      • asynchronous — changes to sessions are put on a queue and the user gets the response immediately. means there's a chance that the cluster will be in an inconsistent state. use of sticky sessions means the consistency of the cluster doesn't really matter.
      • because java thread running isn't deterministic, in asynchronous mode the session updates may not be processed in the same order in which they were placed on the queue, so if your application depends on these being processed in the same order this is a risk
    • no need for the WAR farm deployer — hyperic does this better
      • WAR farm deployer has been removed from tc Server
    • backup manager DOES know where the primary and backup nodes ARE for every session
      • i.e. it doesn't actually store all the sessions from all nodes, but it knows where to get the session it lost
    • backup manager scales much better than delta manager in both memory and network traffic
      • network traffic scales linearly with number of nodes
  • for availability on a small cluster, use the delta manager
  • if you're worried about scalability, go with the backup manager

Hyperic HQ Installation

  • create an hqs user
  • hqs user owns the hyperic hq agent files
  • the agent itself runs as the tcserver user
  • os security considerations
    • agent doesn't need root privileges to access OS mechanics, start/stop processes, etc.
    • tc Server needs to be able to read WAR files uploaded via the agent
    • don't want tc Server runtime running as root
  • hyperic security considerations
    • don't want agent connecting as hqadmin super user
    • create a dedicated agent user
    • requires create, modify, and delete privileges for platform and platform services only

ERS httpd

  • ERS = Enterprise Ready Server
  • SpringSource's distribution of Apache httpd
  • install ERS as root
    • httpd processes run as nobody:nobody so this is fine
  • remove the test instance
  • create a new instance
  • module configuration
    • enable mod_proxy_balancer
    • enable mod_proxy_ajp
    • mod_proxy_ajp isn't quite as stable vs. mod_jk and mod_proxy_http
    • mentioned something about mod_http now having remote IP addresses available–need to ask about this
  • configure balancer in ers


<Proxy balancer://tc>
  BalancerMember http://ip.address.here:port route=tc01-uniqueID
  BalancerMember http://ip.address.here:port route=tc02-uniqueID
</Proxy>

ProxyPass /cluster-test balancer://tc/cluster-test stickysession=JSESSIONID:jessionid
ProxyPassReverse /cluster-test balancer://tc/cluster-test

Debugging Clusters

  • need something in your apps that tells you which cluster node you're on
  • also need something to spit out the session ID so you can test that the sticky sessions are working
  • if your context path differs from your host name in tc Server, this may cause your cookies not to work since the hosts are different
    • can use cookiepath in proxypassreverse directive
    • easier: just have your context path match your host name
  • anything you want replicated in sessions has to be serializable
    • if your application can't support having everything in the session be serializable, terracotta will support non-serializable data in session replication

Apache 2 Error “apr_sockaddr_info_get() failed”

I’m setting up a couple of Red Hat Enterprise Linux boxes (that will likely replace some Windows 2008 servers, which of course makes me exceptionally happy), and I ran into the following error when I started Apache:

Starting httpd: httpd: apr_sockaddr_info_get() failed for host.name.here
httpd: Could not reliably determine the server's fully qualified domain name, using 127.0.0.1 for ServerName



The second part of that error is pretty common (and nothing to worry about most of the time), but I hadn’t seen that first error message before.

A bit of scroogling and experimenting led to the solution quickly enough so this is another one of those “so I don’t forget” blog posts more than anything, but hopefully it’ll help someone else who runs into this.
In my case the server’s host name didn’t have a corresponding DNS entry yet, so to resolve the error I simply added the server’s host name to /etc/hosts and pointed it to 127.0.0.1.

Of course depending on your situation you may want a DNS entry too, but I tend to have all the host names for the server itself in the hosts file on the server so it doesn’t have to do a DNS lookup only to find itself. Particularly when you’re behind a load balancer things get weird if you don’t leverage your hosts file.

Essential Configuration Settings for Apache on Windows

This came up a couple of times on mailing lists recently, and since it’s something I sometimes forget to do when I set up new Windows servers with Apache, I figured I’d document it here.

If you’re running into stability problems with Apache on Windows and can’t switch to Linux, doing the following seems to help quite a bit.

First, find these lines in your httpd.conf file and uncomment them:
EnableMMAP off
EnableSendfile off



Then right below those lines, add this line:
Win32DisableAcceptEx

Particularly if you’re seeing errors along the lines of “The specified network name is no longer available. : winnt_accept: Asynchronous AcceptEx failed” or “The semaphore timeout period has expired. : winnt_accept: Asynchronous AcceptEx failed” that last line should eliminate those errors.

If you’re interested in learning more about what’s behind these errors, there’s a nice post about it on the “My Digital Life” blog.

Accessing a Network Drive from Apache and Tomcat on Windows Server

A few quick tips if you find yourself having to access a network drive from Apache and Tomcat on a Windows Server. This is all pretty much old hat but since I still get questions about this fairly regularly and was just setting up some things on some servers this weekend, I figured I’d write this up.

In my situation we’re running an application on three VMs behind a load balancer, and users can upload files from the application. Since I didn’t want to set up a process behind the scenes that copied uploaded files between all three servers (though it would have given me a chance to take another run at using Unison), we have an uploads directory living on a SAN. This way no matter which VM the user is on, the uploads all go to and are served from a single file system.

On Windows Server, by default services run under the Local System Account, which doesn’t have access to network resources. So if you install Apache and Tomcat as services and expect to be able to point them to a UNC path for some of your content, that won’t work out of the box. You need to be running Apache and Tomcat under an account that has network access. In most environments in which I’ve worked this type of account is typically called a “service account,” because you’ll end up getting a domain account just like your typical Active Directory user account, but of course it won’t be associated with a human being.

Once you have that account in place, you go into your services panel, right click on the service, click on “Properties,” and then click the “Log On” tab. You’ll see by default the Local System Account radio button will be checked. Click the radio button next to “This Account,” enter your service account information, click “OK,” and then restart the service. At this point your service will be running under the service account and will have access to network resources. Note that you’ll have to do this for each service that needs access to the network drive, which in my case meant doing this for both Apache and Tomcat.

That takes care of the web server and things at the Tomcat level in terms of basic access, but you’ll likely be configuring an alias of some sort to point to the network drive. In my case I wanted /uploads to point to serversharenameuploads, which meant setting up an alias in Apache, a Context in Tomcat, and a mapping in OpenBD. This is where a lot of people get confused, so I’ll go through each of these settings one by one.

The necessity for a web server alias is probably pretty obvious. If you’re serving an image directly from your web server, e.g. http://server/uploads/image.gif, if /uploads doesn’t exist under your virtual host’s docroot, then Apache will throw a 404.

Allowing Apache to access the network drive involves (depending on how you have Apache configured) using a Directory directive to give Apache permission to access the directory, and then an Alias directive so Apache knows where to go when someone requests something under /uploads. So the following goes in your virtual host configuration file:


<Directory "serversharenameuploads">
  Order allow,deny
  Allow from all
</Directory>

Alias /uploads "serversharenameuploads"

You may have other stuff in your Directory directive as well but that’s the basics of what will allow Apache to see /uploads as the appropriate location on your SAN.

The next layer down will be your CFML engine. Remember that if in your CFML code you want to read or write files to /uploads, even though Apache knows what that is now, your CFML engine will not. I’m emphasizing that point because it’s such a common source of confusion for people. If things are happening in your CFML code, it won’t be interacting with Apache at all, so it won’t know about the Alias you set up in Apache. Simple enough to solve with a mapping; just go into your CFML engine’s admin console and create a mapping that points /uploads to serversharenameuploads and that handles things at the CFML level.

Lastly comes Tomcat. Depending on how you’re serving files, you may be proxying from Apache to Tomcat, so if Tomcat needs to know where /uploads lives, since it’s not in the webapp’s base directory Tomcat will throw a 404 unless you tell it where /uploads is located.

Tomcat doesn’t have Aliases in the same way Apache does, but what you can do in Tomcat is configure multiple Contexts under a single host. So in Tomcat’s server.xml (or in a separate host config file if you prefer), you simply add a Context that points /uploads to the network location:


<Host name="myapp">
  <Context path="" docBase="C:/location/of/my/app" />
  <Context path="/uploads" docBase="serversharenameuploads" />
</Host>

So now you have things set up in such a way that Apache, your CFML engine, and Tomcat all know where /uploads really lives.

Another point of confusion for people on Windows is the concept of “mapped drive” letters. A lot of people think that if you map serversharename to a drive letter of let’s say D:, than in your code you can then access the uploads directory we’ve been using as our example via D:uploads.

The simplest way to explain why this doesn’t work is to point out that mapped drive letters are associated with a Windows user account. They don’t exist at the operating system level. While you may remote into the server using your credentials, map to a network location, and assign that to drive letter D:, another user logging in won’t see that mapping, and services running on the server under various user accounts definitely won’t know anything about mapped drives.

This is why in all the examples above you see the full UNC path to the network resource being used. You have to use the UNC path in order to get this all to work correctly because that’s the only way services running under a service account will be able to address the network resource.

Hope this helps eliminate some of the persistent confusion I see around this issue.

Quick Apache/Tomcat Proxying Tip

I’m setting up some new servers with some semi-complex URL rewriting and proxying going on of both the HTTP and AJP varieties. For this particular application the home page is actually a completely separate application that’s running as its own webapp on Tomcat, and everything else runs from within the webapp itself. (I’ll spare you the details; just assume I’m not insane and this was the best way to handle this situation.)

So for all this to work properly I have the following rewrite rule:

RewriteRule ^/$ http://otherapp/ [NC,P]
ProxyPassReverse / http://otherapp/

Then below that I have a bunch of AJP rewrites for the webapp itself. What was throwing me fits was that everything was working great except for this one rewrite rule, which would attempt to load indefinitely until I killed the request.

I stared at my virtual host config file for quite a while and finally paused on this line:

ProxyPreserveHost On

Given that the ^/$ rewrite rule is proxying out to a different host, it seems kind of obvious now that of course that would confuse Apache. Removing that line and adding some aliases for the other host’s CSS and images directory did the trick.

Maybe no one else will be in this exact situation, but if you’re ever faced with a blank white screen while waiting forever for an HTTP proxy to go through, this tip might come in handy.

Eliminating index.cfm From Mura URLs With Apache and mod_rewrite

This may seem like it’s a topic that’s been beaten to death since there are several examples of how to do this, but none of the examples I found quite fit my exact situation. And since URL rewrite rules are all about getting the exact right characters in the exact right order (pesky computers), I thought I’d share my situation in the hopes of saving someone else some time.

First, in case some of you aren’t using Mura but are reading out of general interest in URL rewriting (get a life! ;-), let’s take a look at Mura URLs. Mura URLs are in the format http://server/siteid, where ‘siteid’ is an actual physical directory on disk. Since Mura is built on CFML it uses a directory index file of index.cfm, so when you hit http://server/siteid it’s the equivalent of hitting http://server/siteid/index.cfm. The presence of a .cfm file is what triggers Apache to hand off the processing of that file to the CFML engine. This is rudimentary stuff I realize, but that index.cfm bit is important in this case since that’s what we’re going to be eliminating from the URL.

Pages other than the site index page in a Mura site have URLs along the lines of http://server/siteid/index.cfm/page-name, and the ‘page-name’ bit is not a directory or file on disk since all the Mura content is contained in the database. So if my site ID is ‘foo’ and my page name is ‘bar’, the URL for the ‘bar’ page would be http://server/foo/index.cfm/bar What I’d like my URL to be for that page, however, is http://server/foo/bar, so I want to eliminate index.cfm from the URL.

The solution for this is to use URL rewriting in your web server, which in my case is Apache and mod_rewrite. Using URL rewriting I can have the web server match certain URL patterns and translate these patterns into something different behind the scenes so things still work properly. So in this case I need a rewrite rule that will take the URL http://server/foo/bar and treat that as if it were http://server/foo/index.cfm/bar. Otherwise, Apache would be looking for a directory ‘bar’ under the ‘foo’ directory, which of course doesn’t exist, so it would throw a 404.

Many of the URL rewrite examples I came across were designed for Mura sites that are also eliminating the site ID from the URL. This is all well and good, but since I’m working on an intranet site we want to keep the site IDs in the URLs (e.g. http://server/department), so that meant most of the examples I saw weren’t quite right for my needs.

The tricky part (at least for me as a relative mod_rewrite amateur) is that instead of being able to say “take my server name, add ‘index.cfm’ to that, and then tack on the rest of the URL being requested,” I had to take the first part of the URL being requested (the site ID, which again is a physical directory), then insert index.cfm, then tack on the rest of the URL being requested (i.e. everything after the site ID in the URL).

This is actually pretty easy if you want to create a rewrite rule for each site ID on the server, but that would mean additional Apache configuration and an Apache restart every time a new site is added to Mura. That seemed like a maintenance nightmare to me so I decided to try to come up with a solution that would handle eliminating index.cfm regardless of the site ID.

What I ended up doing was cobbling together several of the other great resources linked above for many of the other considerations with rewriting for Mura, and then came up with a rewrite rule that will leave the site ID in place, then add index.cfm, and then tack on the rest of the URL being requested.
Quite a lot of preamble for a one-line rewrite rule, but here it is in all its gory glory:

RewriteRule ^/([a-zA-Z0-9-]{1,})/(.*)$ ajp://%{HTTP_HOST}:8009/$1/index.cfm/$2 [NE,P]

Translation:

  • Create a backreference match for alphanumeric characters and hyphens in the URL up to the first / (these are the allowed characters in Mura site IDs)
  • Create a second backreference for everything else in the URL
  • Proxy this to port 8009 on the server (since I’m using AJP proxying with Tomcat), but make the URL backreference 1 + /index.cfm/ + backreference 2

The backreference stuff wound up being my savior here. Basically whatever you put in parens in the regex of your rewrite rule becomes a backreference, so whatever matches that pattern gets assigned to a placeholder. That way when you want to do the rewriting you can refer to pieces of your URL using $ and the position. So in this case $1 refers to the first backreference (the Mura site ID), and $2 refers to the second backreference (everything in the URL following the site ID).

There are a few other rewrite rules involved to get this all working when you take into account things like image files, etc., and as I said above I’m standing on the shoulders of giants (thanks guys!), so here’s the full set of rewrite rules:

  ProxyRequests Off

  <Proxy *>
    Order deny,allow
    Allow from 127.0.0.1
  </Proxy>

  ProxyPreserveHost On
  ProxyPassReverse / ajp://your-host-here:8009/

  RewriteEngine On

  # if it's a cfml request, proxy to tomcat
  RewriteRule ^(.+.cf[cm])(/.*)?$ ajp://%{HTTP_HOST}:8009$1$2 [P]
 
  # if it's a trailing slash and a real directory, append index.cfm and proxy
  RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} -d
  RewriteRule ^(.+/)$ ajp://%{HTTP_HOST}:8009%{REQUEST_URI}index.cfm [P]

  # if it's a real file and haven't proxied, just serve it
  RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} -f
  RewriteRule . - [L]

  # require trailing slash at this point if otherwise valid CMS url
  RewriteRule ^([a-zA-Z0-9/-]+[^/])$ $1/ [R=301,L]

  # valid cms url path is proxied to tomcat
  # MUST COME AFTER ANY OTHER FIXED/EXPECTED REWRITES!
  RewriteRule ^/([a-zA-Z0-9-]{1,})/(.*)$ ajp://%{HTTP_HOST}:8009/$1/index.cfm/$2 [NE,P]

Note that you can put the rewrite rules in a .htaccess file or in the virtual host configuration file. I prefer having the rewrite rules be defined as part of the virtual host, so in my case I have all this in the <VirtualHost> block in my virtual hosts config file. (I have a bunch of other stuff in the virtual host configuration as well but I left everything other than the parts relevant to the URL rewriting out to avoid any confusion.)

That’s it! I hope that helps others looking into doing this with Mura, or that you at least learned a few things about mod_rewrite along the way.

Moving From IIS To Apache: It’s Easier Than You Think

I'm in the middle of moving some things from physical servers to a VM infrastructure, and one application makes heavy use of URL rewriting and proxying. This is on Windows Server 2003 and when I first set this app up a few years ago, I used ISAPI Rewrite 2 to handle the rewriting and proxying chores. It's been working fine so when I set up the new VM for this app I got a license for ISAPI Rewrite 3 and started configuring things.

I'll spare you all the gory details but yesterday afternoon–a mere few hours before I was set to do the cutover–ISAPI Rewrite started choking hard. I started getting "Bad Request (Request header too long)" errors, but only some of the time even on the same URL, so I hacked the registry as recommended by Microsoft in an attempt to fix it. That was followed with "Bad Request (Invalid Header Name)" errors, which led to another registry hack. This seemed to fix things for a while, but then suddenly IIS would stop responding and throw one of these two errors if I had any rewrite rules enabled. Things continued a downward spiral from there. I even tried installing the older version of ISAPI Rewrite but that would immediately throw a 500 error whether or not any rewrite rules were enabled.

Needless to say I had to cancel the migration, and after the problems with ISAPI Rewrite I had absolutely zero confidence in that solution. There was no way I could move forward knowing that at any moment and without reason the whole thing would come crashing down.

I don't like being backed into a corner, particularly by Windows, so I shut down IIS and installed Apache. This app has a ton of server configuration to it but once I don't trust something I simply can't use it, so the configuration work on the Apache side would be beyond worth the effort since I'd wind up with a solution I can trust. (I would have chucked Windows altogether but not really my call in this case, and given that I'm under a bit of a time crunch that was one more variable I didn't need in the mix right this second.)

Here's the steps I went through, and it actually was easier than I thought it would be.

Download and Install Apache

Actually first, make sure to shut down IIS and set the startup to "Disabled" in your services panel. Now that I have everything set up I'm going to uninstall IIS entirely, but it was handy to have around for a bit so I could fire it up and go into the IIS admin console to check my settings as I moved things to Apache.

So grab the Windows version of Apache (make sure and grab the version with SSL if you need it), run the installer (which takes all of about 10 seconds), and tell it to run as a service for all users. Next make sure when you hit localhost in your browser you get Apache's "It works!" message. Congratulations, you just freed yourself from IIS.

Connect ColdFusion to Apache

This server is running ColdFusion 8 Enterprise, and the OS on the new VM is Windows 2003 64 bit. The easiest way to hook CF into Apache is to open the Web Server Configuration Tool, which is under Start -> Programs -> Adobe -> ColdFusion 8. Since I had previously connected CF to IIS, when I launched the Web Server Configuration Tool it indicated that "localhost:cfusion" was hooked into IIS. I clicked that entry to select it, then clicked "Remove."

Next I clicked "Add" and waited about 60 seconds, and you'll see the "Add Web Server Configuration" screen. Choose the JRun Server you want to hook to Apache from the drop-down (if you have more than one), and choose "Apache" from the Web Server drop-down. Click the "…" box next to the "Configuration Directory" box and browse to your Apache conf directory, check the box "Configure web server for ColdFusion 8 applciations," and MAKE SURE to check the "Configure 32 bit webserver" box. I don't know this for a fact, but I'm pretty sure Apache for Windows is 32-bit. So even though I'm on a 64-bit box, when I didn't check that box Apache wouldn't start. This could be because I need a different version of the JRun shared object … who knows. Apache's running great so at least at this point I don't have much motivation to look into it.

Also, click on Advanced, click on the "…" box next to "Directory and file name of server binary," and point to your httpd.exe. This way CF can restart Apache after it modifies your Apache conf file.

That's it–pretty simple stuff. Delete the IIS entry, add one for Apache, and you're done.

Basic Apache Terminology

Before moving forward with the specifics of the configuration, if you're used to IIS terminology like "web site" and "virtual directory," you'll be happy to know all that stuff exists in Apache, but it's called something different and of course you'll be editing a config file instead of clicking through configuration wizards. I prefer the directness of the config file approach anyway, and I bet many others will too once you get the hang of it.

Here's the basic terminology mapping between IIS and Apache:

  • a "web site" in IIS is a VirtualHost in Apache
  • a "virtual directory" in IIS is an Alias in Apache
  • a "home directory" in IIS is a DocumentRoot (or docroot) in Apache
  • a "host header" in IIS is a ServerName or ServerAlias in Apache
  • a "default document" in IIS is a DirectoryIndex in Apache

That should cover about 99% of what you need to know if you're moving from IIS to Apache. Apache is tremendously powerful and highly configurable so of course you can get as deep into things as you need to, but that should get most people going.

Before digging into Apache, at a high level all I did to convert things over was to open up IIS Manager and make note of all my "web sites" and their home directories. These will become virtual hosts and docroots in Apache. Next, in each IIS site take a look to see if you have any virtual directories defined. If so, make note of these–they'll become Aliases in Apache.

With that basic information in hand you're ready to configure Apache.

Apache Configuration Files

One of the things I absolutely love about Apache is that you do all your configuration in configuration files. Once you get the hang of this approach, there's just nothing simpler than being able to open a file and make the changes you need instead of clicking through a mess of popup windows to find the one setting you need to change.

The main two configuration files that most people will need are httpd.conf and extra/httpd-vhosts.conf. These are both under the Apache conf directory. httpd.conf is the main configuration file where you set server-wide configuration details. You can actually shove everything in this one file, and that's how things were done in older versions of Apache, but it's much cleaner to keep things in different files and simply enable these additional files within the main configuration file.

I won't give you the full tour of httpd.conf since the docs do a very nice job of that, but I will go over what you'll likely need to edit in order to get things working the way most people want.

Going top to bottom in httpd.conf, the first thing you'll likely want to do is enable some modules. Specifically in this case since I know I'm going to be doing rewriting and proxying, I need to enable those modules since they're turned off by default. In the long list of LoadModule statements in httpd.conf, you'll want to uncomment (i.e. remove the #) these lines:

LoadModule proxy_module modules/mod_proxy.so
LoadModule proxy_http_module modules/mod_proxy_http.so
LoadModule rewrite_module modules/mod_rewrite.so

Next, if you're doing CFML stuff you'll want to add index.cfm as a DirectoryIndex, so find this section and update accordingly:

<IfModule dir_module>
  DirectoryIndex index.cfm index.html
</IfModule>

You can have as many directory indexes as you want, just separate with a space and realize they will get hit in the order in which they're declared.

Finally, you'll want to enable name-based virtual hosting so you can have multiple virtual hosts sharing the same IP address. Towards the bottom of httpd.conf, find this section and uncomment the Include directive that will load the virtual hosts configuration file. When you're done it should look like this:

# Virtual hosts
Include conf/extra/httpd-vhosts.conf

Save httpd.conf, and now let's take a look at how to configure your virtual hosts.

Virtual Host Configuration

Open up conf/extra/httpd-vhosts.conf so we can configure some virtual hosts. You'll be spending a lot of time in this file as you use Apache. First, make sure this line right after the big comment block at the top is uncommented:

NameVirtualHost *:80

This enables name-based virtual hosts for all IP addresses on port 80. Next you'll see a couple of examples of virtual hosts. You can either delete those or comment them out by putting a # on each line. I tend to leave them in there but comment them out for reference.

For your first virtual host, let's set one up for localhost because (at least in my experience) once you enable name-based virtual hosting, you have to have a virtual host even for localhost. Add the following section, adjusting the DocumentRoot as needed based on where you installed Apache:

<VirtualHost *:80>
  ServerName localhost
  DocumentRoot "C:/Program Files (x86)/Apache Software Foundation/Apache2.2/htdocs"
</VirtualHost>

Save the file, and then restart Apache just to make sure all the changes we've made are working. If Apache doesn't restart don't panic, that just means you have a syntax error somewhere. Double-check everything and try again. If you can hit localhost in your browser and see "It works!", well, that message says it all I guess.

Note that if you have multiple IP addresses on your machine and want to tell a virtual host to use a specific IP, or if you want to run a site on a port other than 80, you can replace the * with an IP, and the 80 with whatever port you need.

Next let's configure a more real-world virtual host. I'll be using foo.com as my example, and we'll want people to be able to hit the site using foo.com or www.foo.com. I'm also going to tell Apache to use a log file specific to this site to make diagnosing problems and doing reporting easier. There are a few other things in here that I'll explain in a moment.

<VirtualHost *:80>
  ServerName foo.com
  ServerAlias www.foo.com

  DocumentRoot "C:/path/to/foo"

  Alias /CFIDE C:/path/to/CFIDE
 
  <Directory "D:/path/to/foo">
    Order allow,deny
    Allow from all
  </Directory>

  CustomLog "logs/foo-access.log" common
</VirtualHost>

The ServerName and ServerAlias information is pretty self-explanatory–foo.com is the primary name for this virtual host, but with the alias of www.foo.com, either foo.com or www.foo.com will hit this virtual host.

DocumentRoot tells Apache where to find the files that it will be serving when someone hits this virtual host.

I threw an Alias in the mix simply to show how "virtual directories" (in IIS speak) work. Let's say in this case I want foo.com to have access to my CF administrator or maybe the javascript files that are stored in the CFIDE directory, but that CFIDE directory is not inside this host's docroot. The Alias directive tells Apache that when someone is requesting /CFIDE on this virtual host, those files will actually be served from somewhere outside the virtual host's docroot.

The <Directory> directive requires a bit of explanation. For security reasons, by default all directory access (other than the default localhost site) is denied by Apache. This is done in the main httpd.conf file, so you can either make the change there, or I prefer to do this on a case-by-case basis inside each virtual host. In the case of a public site you won't know where people are coming from so you have to tell Apache to allow access to that directory from anywhere, which is done with the "Allow from all" line. I left this out, but note that you will likely have to add a <Directory> entry for the C:/path/to/CFIDE directory as well.

Finally, I tell Apache to create an access log specific to this site instead of using the global Apache logs.

For a lot of virtual hosts that's literally all there is to it. But since what started this whole process was rewriting issues, let's take a look at some of the cool things you can accomplish (and shoot yourself with) by using mod_rewrite.

URL Rewriting and Proxying

For the app in question we do a lot of URL rewriting and proxying so we can give the users a single site that actually is comprised of multiple sites, potentially on different physical servers. This is also a great way to handle long-term migrations where you have a legacy server that you don't really want to touch but still need content from, and you want to add a newer server in the mix.

As with everything else related to Apache this is powerful stuff, but the basics are relatively simple. I do love this quote from the mod_rewrite docs, however:

"The great thing about mod_rewrite is it gives you all the configurability and flexibility of Sendmail. The downside to mod_rewrite is that it gives you all the configurability and flexibility of Sendmail.''

Let's start with a basic rewrite rule, and then we'll look at what I have to do a lot of which is proxying. Let's say for whatever reason in the foo.com virtual host you want requests to foo.html to actually hit bar.html. First we need to enable the rewrite engine in our virtual host, so inside your <VirtualHost> block, add this line:

RewriteEngine on

Next we add a simple RewriteRule to tell requests for foo.html to be rewritten to bar.html:

RewriteRule /foo.html /bar.html [NC]

The [NC] bit at the end stands for "no case," so that way both foo.html and FOO.HTML will be rewritten to bar.html. There are a ton of flags to do various things outlined in the docs, and if you want some nice rewrite example examples they have those too.

So far so good? Next let's tackle proxying. Instead of a simple rewrite from foo.html to bar.html, let's say you want everything under a particular directory to be proxied to another server. To make the example more concrete, let's say your company has an intranet on one server and an employee directory that runs on another server, but you want people to be able to access the employee directory directly from your intranet. If you wanted to do a simple redirect from http://intranet/empdirectory to http://empdirectory, that's simple enough:

RewriteRule ^/empdirectory(.*) http://empdirectory$1 [NC,R]

The (.*) after /empdirectory will include anything that comes after /empdirectory, and this is tacked onto the end of the remote URL via the $1. The "R" flag tells Apache to do a redirect for this RewriteRule, and you can even set the status code for the redirect. This does change the URL in the user's browser, however, so what if you didn't want that to happen? This is where proxying comes in.

First, we change the "R" flag to a "P":

RewriteRule ^/empdirectory(.*) http://empdirectory$1 [NC,P]

Now we're proxying instead of doing a redirect (and note that mod_proxy needs to be enabled to use the P flag, which is why we did that earlier), but if this is all you do you'll notice that the URL in the browser still changes. This is because there's nothing in place to handle proxying the response back to the requestor. So we need to add a ProxyPassReverse directive, which will allow us to hit http://intranet/empdirectory and keep that URL while the content is actually served from http://empdirectory.

RewriteRule ^/empdirectory(.*) http://empdirectory$1 [NC,P]
ProxyPassReverse /empdirectory http://empdirectory

With all this in place you can serve content from another server without your users knowing they're hitting another server.

There are about a million and one other things you can do with mod_rewrite, but my only intent with this post was to share what I had to do in my specific move from IIS to Apache in the hopes it might help others who want to make this move.

Conclusion

Even though it was under duress, I'm honestly glad ISAPI Rewrite totally failed since that led me to setting up Apache on this box. After seeing ISAPI Rewrite have its various meltdowns I simply would not have felt comfortable using it. I'm sure I could have contacted support and gotten things figured out eventually, but it took me far longer to write this blog post than it did to switch to Apache, particularly since the rewrite syntax of ISAPI Rewrite is largely compatible with Apache's. I'm going to sleep much better at night knowing Apache is powering this app instead of being constantly worried that ISAPI Rewrite will have another meltdown.

I should have made this disclaimer at the beginning but I am in no way an Apache expert, so if there are different or better ways to do any of this, if anything is explained poorly or incorrectly, or if I omitted any important details, please comment.