Custom Query – CKAN

Results (1315 - 1317 of 2152)

← 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 →

Ticket	Resolution	Summary	Owner	Reporter
#890	invalid	Introduce timed actions into ckanext-queue	kindly	pudo
Reported by pudo, 3 years ago.
Description	The ckan queuing system should provide the option to subscribe to timed re-submissions of specific resources. This could look as follows: routing_key: Package operation: daily payload: pkg.as_dict() Where operation is one of daily, weekly, monthly or any other interval.
#891	fixed	Resource download worker daemon	johnglover	pudo
Reported by pudo, 3 years ago.
Description	Superticket: #1397 Write a worker daemon to download all resources from a CKAN instance to a local repository. Questions Do we only want to download openly licensed information? ANS: no, we do everything (though do need to think about this re. IP issues) Should we have clever ways to dump APIs? ANS: no. Do we respect robots.txt even for openly licensed information? ANS: No (we're not crawling we're archiving) Use HTTP/1.1 Caching headers? ANS: if not changed since we last updated don't bother to recache. Complete support for ETags Expires, Max-Age etc. Check Functionality Download files via HTTP, HTTPS (will not do FTP) Process: [Archiver.Update checks queue (automated as part of celery)] Open url and get any info from resource on cache / content-length etc If FAILURE status: update task_status table (could retry if not more than 3 failures so far). Report task failure in celery Check headers for content-length and content-type ... IF: content-length > max_content_length: EXIT (store outcomes on task_status, and update resource with size and content-type and any other info we get?) ELSE: check content-type. IF: NOT data stuff (e.g. text/html) then EXIT. (store outcomes and info on resource) ELSE: archive it (compute md5 hash etc) IF: get content-length and content-length unchanged GOTO step 4 Archive it: connect to storage system and store it. Bucket: from config, Key: /archive/{timestamp}/{resourceid}/filename.ext Add cache url to resource and updated date Add other relevant info to resource such as md5, content-type etc Update task_status Optional functionality If result object is HTML, search for references to "proper data" (CSV download pages etc.) Download from POST forms (accepting licenses or weird proprietary systems) Support running on Google Apps Engine to save traffic costs. Existing work https://bitbucket.org/okfn/ckanext-qa/overview out of date: https://bitbucket.org/pudo/ckanextarchive - Old archiver extension, largely experimental. out of date: https://bitbucket.org/ollyc/ckan/changeset/1b16fbe9aa65 - Openness scores by ollyc
#892	fixed	Make stored data available in WUI - 0.5d	johnglover	pudo
Reported by pudo, 3 years ago.
Description	Once we have storage, make the data available in the following ways: Now have a cached_url field can show in the frontend ... Add a [<a href="${cached_url}">cached</a>] link to right of real url on resource listing on dataset page. On resource page: will not add it yet. At the moment no clear place to pu this given nice big download button (could put in list of items on left but that does not seem right and note that it will turn up in big list of info at bottom) Add test (?) Deploy

← 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 →

Context Navigation

Custom Query (2152 matches)

Results (1315 - 1317 of 2152)

Questions

Functionality

Optional functionality

Existing work

Download in other formats: