Ticket #537 (closed task: duplicate)

Opened 4 years ago

Last modified 3 years ago

Caching and Performance improvement

Reported by: wwaites Owned by: wwaites
Priority: awaiting triage Milestone:
Component: ckan Keywords:
Cc: Repository: ckan
Theme: none

Description (last modified by johnbywater) (diff)

There are several places where performance is unacceptably slow. Even in places where it is not, the system could still be more responsive for read requests.

Introducing caching has to be done carefully and should be done in a standards compliant manner.

General strategy

  • Where possible, cache output within the pylons app (beaker).
  • Facilitate external caching in an end-user's web browser or a caching proxy
  • Slightly stale data is not necessarily much of a problem so allow the output to be cached for a relatively short period (e.g. 5-15 minutes).
  • When cache expiry has been reached, a request will be made to the server. The server should check if its internally cached data is still valid, and serve that, otherwise regenerate the data.

Tasks

These tasks should be broken into sub-tickets:

  • caching of parts of templates that are expensive to render (package list, tag list, group list)
  • caching of entire output using beaker particularly for API read operations.
  • need to perform a check to see if the cache should be invalidated by checking if anything in the output would have changed -- i.e. checking timestamps on package modifications. this is a natural place to introduce the ETag which will help browsers and web caches.
  • cache infrastructure front end - varnish, squid, etc. To do this right, the controllers need to set the cache control headers appropriately (max-age, must-revalidate). This is a good resource: http://www.mnot.net/cache_docs/#CACHE-CONTROL
    • Deploy varnish on a host dedicated to this purpose for research. This will be useful for other sites as well
    • Do not configure varnish to ignore cache control headers or otherwise behave in a non HTTP/1.1 compliant manner

Future Work

  • Investigate ckanclient library maintaining a local cache as a web browser would
  • Investigate using a CDN like Google Storage or Amazon for serving cached data.

Change History

comment:1 Changed 4 years ago by wwaites

  • Type changed from requirement to task

comment:2 Changed 4 years ago by johnbywater

  • Owner set to johnbywater
  • Milestone ckan v1.2 deleted

comment:3 Changed 4 years ago by johnbywater

  • Description modified (diff)

comment:4 Changed 4 years ago by johnbywater

  • Description modified (diff)

comment:5 Changed 4 years ago by johnbywater

comment:2 Changed 4 years ago by johnbywater

comment:3 Changed 4 years ago by johnbywater

  • Type changed from task to defect

comment:4 Changed 4 years ago by johnbywater

  • Type changed from defect to task

comment:5 Changed 4 years ago by johnbywater

  • Owner changed from johnbywater to wwaites

comment:6 Changed 4 years ago by wwaites

See #540 for a story about Varnish. Strongly favouring squid at this juncture.

comment:7 Changed 4 years ago by dread

Can this ticket be updated? Were any tasks listed here done? Anything remaining still planned?

comment:8 Changed 3 years ago by thejimmyg

  • Priority set to awaiting triage
  • Repository set to ckan
  • Theme set to none
  • Status changed from new to closed
  • Resolution set to duplicate

Consolidation of caching has been moved in ticket #995.

Note: See TracTickets for help on using tickets.