The pain of ETags, mod_deflate, Apache 2.4 and Tomcat 7

While working on a project recently, I found that when a request was made, the web server was always responding with a 200 (OK). When it was the first load, this is of course the expected result, but for subsequent loads, it was expected that the file would be unchanged, and a response of the 304 (Not Modified) would be returned, thus retrieving the data from the cache.

After some investigation it was found that mod_deflate.c, in response to a bug found in Apache 2.2 (in 2008), would append -gzip to the generated ETag, and continue on it’s merry little way. This was found to break the caching as the ETags that were being sent in both the request and the response, although matching, would not match the ETag comparison that is done by the DefaultServlet (checkIfNoneMatch) in Tomcat.

The resulting behavior was always a response of 200 to any GET call that had been touched by mod_deflate, thus making the use of mod_deflate redundant for any load other than the first.

A suggested solution for the problem would be to have the appended “-gzip” removed from the ETag, this could be done in the runtime configuration using the following:

RequestHeader  edit "If-None-Match" "^(.*)-gzip$" "$1"
Header  edit "ETag" "^(.*[^g][^z][^i][^p])$" "$1-gzip"

It appears that this is similar to the solution in Apache 2.5 mod_deflate.c with the addition of the DeflateAlterETAG directive, allowing you to set the value to NoChange, which will prevent mod_deflate from appending/changing the ETag in the first place.

With the pain of ETags becoming more and more apparent, with this issues around mod_deflate’s alteration of the ETag, and other issues around when the same file is served from multiple Apache servers, it was clear that perhaps a different solution was needed.

The solution is of course, to remove ETags all together and rely on other methods for cache busting. It is said in this article that:

It is important to specify one of Expires or Cache-Control max-age, and oneof Last-Modified or ETag, for all cacheable resources. It is redundant to specify both Expires and Cache-Control: max-age, or to specify both Last-Modified and ETag.

As a result we decided to go forward using a combination of Last-Modified, and Cache-Control.

To remove ETags for all situations, make the following changes to the httpd.conf:

<IfModule mod_headers.c>
    Header unset ETag
</IfModule>

FileETag None

The second half of the solution is to set the Cache-Control. There are some issues around setting it to private in firefox, so as a result the best solution is to set it as the follows (also in the httpd.conf):

Header set Cache-Control public,max-age=0,no-cache

Note: It is important that there is no space between each of properties.

Both no-cache and max-age=0 are used to force the browser to revalidate with the server before using that cached value, with max-age=0 enforcing the check on the last-modified field.

The setting of public allows both a client cache, and a proxy cache to keep a copy of the files, where as private only allows the client cache. There are some known issues with certain versions of firefox, if you decide to set the cache-control value to private, and are using HTTPS, so it is important to keep that in mind when making the choice between public and private.
The result is the ability to return a 304, in a clustered environment, and while using mod_deflate.c with a combination of Last-Modified and Cache-Control.

Leave a comment