| 5 | | * Resource change notifications in core - #1383 |
| 6 | | * Resource download worker daemon - #891 |
| 7 | | * Make archived data available in WUI - #892 |
| 8 | | * Introduce timed actions into ckanext-queue - #890 |
| | 5 | == Preliminaries == |
| | 6 | |
| | 7 | * Add task_status table to store qa/archiever/webstore information that does not need to be versioned. - #1363 (and #1371 - related logic functions) |
| | 8 | |
| | 9 | == Tasks == |
| | 10 | |
| | 11 | 1. Resource change notifications in core - Make an IResourceChange and IResourceUrlChange. [1d] [0.75d] - #1383 |
| | 12 | 2. ckanext-archiver implements IResourceUrlChange and sends tasks to celery. [0.25d][0.25d] - ??? |
| | 13 | 3. Archiver daemon #891 |
| | 14 | 1. implement link-check function and task (point 2 from Archiver.update above) [1d] [0.5d] |
| | 15 | 2. Rewrite archiver to use external storage. (decide how!)[3d][~2d] |
| | 16 | 5. Write to resource and task status table.[1d][0.75d] |
| | 17 | 6. Make archived data available in WUI - #892 |
| | 18 | |
| | 19 | == Archiver process == |
| | 20 | |
| | 21 | Archiver: |
| | 22 | |
| | 23 | 0. A resource is added to CKAN |
| | 24 | 1. IResourceCreate event generated |
| | 25 | 2. IF: resource url points to ckan storage or falls within some other set of exclusion conditions then END else continue |
| | 26 | 3. Generate a Archiver.Update task with resource.id |
| | 27 | |
| | 28 | Archiver.update |
| | 29 | |
| | 30 | 1. [Archiver.Update checks queue (automated as part of celery)] |
| | 31 | 2. Open url |
| | 32 | 1. If FAILURE status: update task_status table (could retry if not more than 3 failures so far). Report task failure in celery |
| | 33 | 2. Check headers for content-length and content-type ... |
| | 34 | * IF: content-length > max_content_length: EXIT (store outcomes on task_status, and update resource with size and content-type and any other info we get?) |
| | 35 | * ELSE: check content-type. |
| | 36 | * IF: NOT data stuff (e.g. text/html) then EXIT. (store outcomes and info on resource) |
| | 37 | * ELSE: archive it (compute md5 hash etc) |
| | 38 | 3. Archive it: connect to storage system and store it. Bucket: from config, Key: /{timestamp}/{resourceid}/filename.ext |
| | 39 | * Add cache url to resource and updated date |
| | 40 | * Update task_status |
| | 41 | * Add other relevant info to resource such as md5, content-type etc |
| | 42 | |
| | 43 | Link checker: same as Archiver.update up to 2.1 |