Changes between Version 7 and Version 9 of Ticket #891


Ignore:
Timestamp:
11/01/11 12:17:21 (3 years ago)
Author:
johnglover
Comment:

Added cache_url and cache_last_updated to resources after archiving.

Not checking for hash value in headers. This process will generally only run when a new resource is added or someone updates a URL, so we don't expect to be regularly downloading the same resource.

We will need something along these lines if this is running as a regular cron job, but in that case the logic will be added to the cron job itself (probably a paster command).

Legend:

Unmodified
Added
Removed
Modified
  • Ticket #891

    • Property Status changed from assigned to closed
    • Property Milestone changed from ckan-sprint-2011-10-24 to current-ckan-sprint-2011-11-07
    • Property Resolution changed from to fixed
  • Ticket #891 – Description

    v7 v9  
    2727      * IF: NOT data stuff (e.g. text/html) then EXIT. (store outcomes and info on resource) 
    2828      * ELSE: archive it (compute md5 hash etc) 
    29     * IF: get hash from headers and hash unchanged GOTO step 4 
    3029    * IF: get content-length and content-length unchanged GOTO step 4 
    31     * IF: max-age / expires / other cache headers show this has not changed since last check GOTO step 4 
    3230 3. Archive it: connect to storage system and store it. Bucket: from config, Key: /archive/{timestamp}/{resourceid}/filename.ext 
    3331  * Add cache url to resource and updated date