Custom Query (2152 matches)
Results (865 - 867 of 2152)
Ticket
|
Resolution
|
Summary
|
Owner
|
Reporter
|
#891 |
fixed
|
Resource download worker daemon
|
johnglover
|
pudo
|
Reported by pudo,
3 years ago.
|
Description |
Superticket: #1397
Write a worker daemon to download all resources from a CKAN instance to a local repository.
Questions
- Do we only want to download openly licensed information? ANS: no, we do everything (though do need to think about this re. IP issues)
- Should we have clever ways to dump APIs? ANS: no.
- Do we respect robots.txt even for openly licensed information? ANS: No (we're not crawling we're archiving)
- Use HTTP/1.1 Caching headers? ANS: if not changed since we last updated don't bother to recache.
- Complete support for ETags
- Expires, Max-Age etc.
- Check
Functionality
- Download files via HTTP, HTTPS (will not do FTP)
Process:
- [Archiver.Update checks queue (automated as part of celery)]
- Open url and get any info from resource on cache / content-length etc
- If FAILURE status: update task_status table (could retry if not more than 3 failures so far). Report task failure in celery
- Check headers for content-length and content-type ...
- IF: content-length > max_content_length: EXIT (store outcomes on task_status, and update resource with size and content-type and any other info we get?)
- ELSE: check content-type.
- IF: NOT data stuff (e.g. text/html) then EXIT. (store outcomes and info on resource)
- ELSE: archive it (compute md5 hash etc)
- IF: get content-length and content-length unchanged GOTO step 4
- Archive it: connect to storage system and store it. Bucket: from config, Key: /archive/{timestamp}/{resourceid}/filename.ext
- Add cache url to resource and updated date
- Add other relevant info to resource such as md5, content-type etc
- Update task_status
Optional functionality
- If result object is HTML, search for references to "proper data" (CSV download pages etc.)
- Download from POST forms (accepting licenses or weird proprietary systems)
- Support running on Google Apps Engine to save traffic costs.
Existing work
|
#892 |
fixed
|
Make stored data available in WUI - 0.5d
|
johnglover
|
pudo
|
Reported by pudo,
3 years ago.
|
Description |
Once we have storage, make the data available in the following ways:
- Now have a cached_url field can show in the frontend ...
- Add a [<a href="${cached_url}">cached</a>] link to right of real url on resource listing on dataset page.
- On resource page: will not add it yet.
- At the moment no clear place to pu this given nice big download button (could put in list of items on left but that does not seem right and note that it will turn up in big list of info at bottom)
- Add test (?)
- Deploy
|
#893 |
wontfix
|
ExtrasField should not overwrite more specific extras
|
|
pudo
|
Reported by pudo,
3 years ago.
|
Description |
At the moment, ExtrasField? cannot be used in conjunction with more specific extra fields, such as TextExtraField? or SuggestTextExtraField?.
|
Note: See
TracQuery
for help on using queries.