28 | | Archiver.update |
29 | | |
30 | | 1. [Archiver.Update checks queue (automated as part of celery)] |
31 | | 2. Open url |
32 | | 1. If FAILURE status: update task_status table (could retry if not more than 3 failures so far). Report task failure in celery |
33 | | 2. Check headers for content-length and content-type ... |
34 | | * IF: content-length > max_content_length: EXIT (store outcomes on task_status, and update resource with size and content-type and any other info we get?) |
35 | | * ELSE: check content-type. |
36 | | * IF: NOT data stuff (e.g. text/html) then EXIT. (store outcomes and info on resource) |
37 | | * ELSE: archive it (compute md5 hash etc) |
38 | | 3. Archive it: connect to storage system and store it. Bucket: from config, Key: /{timestamp}/{resourceid}/filename.ext |
39 | | * Add cache url to resource and updated date |
40 | | * Update task_status |
41 | | * Add other relevant info to resource such as md5, content-type etc |
| 28 | Archiver.update: see #891 |