Custom Query (2152 matches)

Filters
 
Or
 
  
 
Columns

Show under each result:


Results (1315 - 1317 of 2152)

Ticket Resolution Summary Owner Reporter
#890 invalid Introduce timed actions into ckanext-queue kindly pudo

Reported by pudo, 3 years ago.

Description

The ckan queuing system should provide the option to subscribe to timed re-submissions of specific resources. This could look as follows:

  • routing_key: Package
  • operation: daily
  • payload: pkg.as_dict()

Where operation is one of daily, weekly, monthly or any other interval.

#891 fixed Resource download worker daemon johnglover pudo

Reported by pudo, 3 years ago.

Description

Superticket: #1397

Write a worker daemon to download all resources from a CKAN instance to a local repository.

Questions

  • Do we only want to download openly licensed information? ANS: no, we do everything (though do need to think about this re. IP issues)
  • Should we have clever ways to dump APIs? ANS: no.
  • Do we respect robots.txt even for openly licensed information? ANS: No (we're not crawling we're archiving)
  • Use HTTP/1.1 Caching headers? ANS: if not changed since we last updated don't bother to recache.
    • Complete support for ETags
    • Expires, Max-Age etc.
  • Check

Functionality

  • Download files via HTTP, HTTPS (will not do FTP)

Process:

  1. [Archiver.Update checks queue (automated as part of celery)]
  2. Open url and get any info from resource on cache / content-length etc
    1. If FAILURE status: update task_status table (could retry if not more than 3 failures so far). Report task failure in celery
    2. Check headers for content-length and content-type ...
      • IF: content-length > max_content_length: EXIT (store outcomes on task_status, and update resource with size and content-type and any other info we get?)
      • ELSE: check content-type.
        • IF: NOT data stuff (e.g. text/html) then EXIT. (store outcomes and info on resource)
        • ELSE: archive it (compute md5 hash etc)
      • IF: get content-length and content-length unchanged GOTO step 4
  3. Archive it: connect to storage system and store it. Bucket: from config, Key: /archive/{timestamp}/{resourceid}/filename.ext
    • Add cache url to resource and updated date
    • Add other relevant info to resource such as md5, content-type etc
  4. Update task_status

Optional functionality

  • If result object is HTML, search for references to "proper data" (CSV download pages etc.)
  • Download from POST forms (accepting licenses or weird proprietary systems)
  • Support running on Google Apps Engine to save traffic costs.

Existing work

#892 fixed Make stored data available in WUI - 0.5d johnglover pudo

Reported by pudo, 3 years ago.

Description

Once we have storage, make the data available in the following ways:

  • Now have a cached_url field can show in the frontend ...
  • Add a [<a href="${cached_url}">cached</a>] link to right of real url on resource listing on dataset page.
  • On resource page: will not add it yet.
    • At the moment no clear place to pu this given nice big download button (could put in list of items on left but that does not seem right and note that it will turn up in big list of info at bottom)
  • Add test (?)
  • Deploy
Note: See TracQuery for help on using queries.