<?xml version="1.0"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0">
  <channel>
    <title>CKAN: Ticket #540: Implement caching in a systematic manner</title>
    <link>http://localhost/ticket/540</link>
    <description></description>
    <language>en-us</language>
    <image>
      <title>CKAN</title>
      <url>http://assets.okfn.org/p/ckan/img/ckan_logo_shortname.png</url>
      <link>http://localhost/ticket/540</link>
    </image>
    <generator>Trac 0.12.3</generator>
    <item>
      
        <dc:creator>wwaites</dc:creator>

      <pubDate>Wed, 01 Sep 2010 07:09:07 GMT</pubDate>
      <title></title>
      <link>http://localhost/ticket/540#comment:1</link>
      <guid isPermaLink="false">http://localhost/ticket/540#comment:1</guid>
      <description>
        &lt;p&gt;
Cut-and-paste from ckan-discuss:
&lt;/p&gt;
&lt;p&gt;
I had a look at Varnish and I agree that the configuration
language is complicated. In fact by default Varnish
disregards cache control headers and in general behaves
in a very standards non-compliant way. I have no doubt
that it is very fast -- if you are willing to spend the
efford to customise its configuration for the exact
layout of pages and headers and such that each web
site it is going to be used with will use. In other words,
there is a large administrative burden.
&lt;/p&gt;
&lt;p&gt;
So I decided to change tack and see where the Squid
proxy has gotten to in the decade or so since I last met
it. Squid is a general purpose caching proxy that can
be configured as an http accelerator. The configuration
is simple. You tell it where your web servers are for
which sites. The web servers make sure to set the
cache control headers appropriately.
&lt;/p&gt;
&lt;p&gt;
Here are some results from my testing, against
&lt;a class="ext-link" href="http://de.ckan.net/package/list?page=B"&gt;&lt;span class="icon"&gt;​&lt;/span&gt;http://de.ckan.net/package/list?page=B&lt;/a&gt; which is an
example of a slow page. Except for the first, which
only did 100 requests, the tests were set to 8
simultaneous connections and a total of 1000
requests.
&lt;/p&gt;
&lt;pre class="wiki"&gt;No caching of any kind:
    Requests per second:    0.44 [#/sec] (mean)
Beaker Cache (filesystem):
    Requests per second:    43.16 [#/sec] (mean)
SQUID setting cache control headers correctly:
    Requests per second:    421.33 [#/sec] (mean)
&lt;/pre&gt;&lt;p&gt;
The results are clear. Using the application cache is
about 100 times faster than doing nothing. Using
squid is about 1000 times faster. (Doing both wouldn't
necessarily help very much).
&lt;/p&gt;
&lt;p&gt;
I'm sure we could squeeze a bit more performance out
of it if we used Varnish, but probably not an order of
magnitude and I don't think it is worth the
administrative burden.
&lt;/p&gt;
&lt;p&gt;
If we set up a production Squid instance (or farm),
with a bare minimum of work it can cache for any
number of sites, not just CKAN.
&lt;/p&gt;
&lt;p&gt;
For the python coders, here's what you have to do
to set the headers properly so that squid will cache
the page:
&lt;/p&gt;
&lt;pre class="wiki"&gt;       del response.headers["Pragma"]
       del response.headers["Cache-Control"]
       from time import gmtime, strftime
       response.headers["Last-Modified"] = strftime("%a, %d %b %Y
%H:%M:%S GMT", gmtime())
       response.cache_expires(seconds=3600)
&lt;/pre&gt;&lt;p&gt;
A further advantage is that the *browsers* will also
understand these cache-control headers and do their
own caching - just setting them properly without
even using Squid should result in some subjective
performance improvements.
&lt;/p&gt;
&lt;p&gt;
That's all for now, I suggest we dedicate a machine
to just running squid, the more RAM the better and
big discs are good, and put it between the world and
the ckans. Oh, and comb through the controllers
setting the headers correctly where appropriate...
&lt;/p&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>dread</dc:creator>

      <pubDate>Thu, 13 Jan 2011 11:07:54 GMT</pubDate>
      <title>priority, component set</title>
      <link>http://localhost/ticket/540#comment:2</link>
      <guid isPermaLink="false">http://localhost/ticket/540#comment:2</guid>
      <description>
          &lt;ul&gt;
            &lt;li&gt;&lt;strong&gt;priority&lt;/strong&gt;
                set to &lt;em&gt;awaiting triage&lt;/em&gt;
            &lt;/li&gt;
            &lt;li&gt;&lt;strong&gt;component&lt;/strong&gt;
                set to &lt;em&gt;ckan&lt;/em&gt;
            &lt;/li&gt;
          &lt;/ul&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>dread</dc:creator>

      <pubDate>Wed, 13 Apr 2011 11:40:45 GMT</pubDate>
      <title>status changed; repo, theme, resolution set</title>
      <link>http://localhost/ticket/540#comment:3</link>
      <guid isPermaLink="false">http://localhost/ticket/540#comment:3</guid>
      <description>
          &lt;ul&gt;
            &lt;li&gt;&lt;strong&gt;repo&lt;/strong&gt;
                set to &lt;em&gt;ckan&lt;/em&gt;
            &lt;/li&gt;
            &lt;li&gt;&lt;strong&gt;status&lt;/strong&gt;
                changed from &lt;em&gt;new&lt;/em&gt; to &lt;em&gt;closed&lt;/em&gt;
            &lt;/li&gt;
            &lt;li&gt;&lt;strong&gt;theme&lt;/strong&gt;
                set to &lt;em&gt;none&lt;/em&gt;
            &lt;/li&gt;
            &lt;li&gt;&lt;strong&gt;resolution&lt;/strong&gt;
                set to &lt;em&gt;fixed&lt;/em&gt;
            &lt;/li&gt;
          &lt;/ul&gt;
        &lt;p&gt;
Closing - all the suggestions have been implemented: squid instance and cache headers set for high traffic pages.
&lt;/p&gt;
      </description>
      <category>Ticket</category>
    </item>
 </channel>
</rss>