Custom Query (2152 matches)
Results (1357 - 1359 of 2152)
Ticket
|
Resolution
|
Summary
|
Owner
|
Reporter
|
#891 |
fixed
|
Resource download worker daemon
|
johnglover
|
pudo
|
Reported by pudo,
3 years ago.
|
Description |
Superticket: #1397
Write a worker daemon to download all resources from a CKAN instance to a local repository.
Questions
- Do we only want to download openly licensed information? ANS: no, we do everything (though do need to think about this re. IP issues)
- Should we have clever ways to dump APIs? ANS: no.
- Do we respect robots.txt even for openly licensed information? ANS: No (we're not crawling we're archiving)
- Use HTTP/1.1 Caching headers? ANS: if not changed since we last updated don't bother to recache.
- Complete support for ETags
- Expires, Max-Age etc.
- Check
Functionality
- Download files via HTTP, HTTPS (will not do FTP)
Process:
- [Archiver.Update checks queue (automated as part of celery)]
- Open url and get any info from resource on cache / content-length etc
- If FAILURE status: update task_status table (could retry if not more than 3 failures so far). Report task failure in celery
- Check headers for content-length and content-type ...
- IF: content-length > max_content_length: EXIT (store outcomes on task_status, and update resource with size and content-type and any other info we get?)
- ELSE: check content-type.
- IF: NOT data stuff (e.g. text/html) then EXIT. (store outcomes and info on resource)
- ELSE: archive it (compute md5 hash etc)
- IF: get content-length and content-length unchanged GOTO step 4
- Archive it: connect to storage system and store it. Bucket: from config, Key: /archive/{timestamp}/{resourceid}/filename.ext
- Add cache url to resource and updated date
- Add other relevant info to resource such as md5, content-type etc
- Update task_status
Optional functionality
- If result object is HTML, search for references to "proper data" (CSV download pages etc.)
- Download from POST forms (accepting licenses or weird proprietary systems)
- Support running on Google Apps Engine to save traffic costs.
Existing work
|
#229 |
fixed
|
Resource hashes
|
dread
|
dread
|
Reported by dread,
4 years ago.
|
Description |
New field for resources - hash of the resource file.
- CKAN itself will not calculate the hash value - user just pastes it in.
- Display text field in resource table.
- Requires migration script.
Questions for the field's value:
- Which hash to use? Restrict to python hashlib and other major hash libraries.
- Should we use merkle trees?
Thanks to Julien D'Assanges for the suggestion.
|
#1646 |
worksforme
|
Resource navigator options display spuriously
|
zephod
|
dread
|
Reported by dread,
2 years ago.
|
Description |
When viewing a dataset, the "Resources" navigation button contained the Resource titles on the Resource navigator button, instead of in a drop-down mouse-hover menu.
http://thedatahub.org/dataset/realtime-birth-data-in-bulgaria/resource/66fc5831-ce01-4954-9beb-e2889ef8a20f
Chrome/Linux?
|
Note: See
TracQuery
for help on using queries.