Custom Query (2152 matches)

Filters
 
Or
 
  
 
Columns

Show under each result:


Results (796 - 798 of 2152)

Ticket Resolution Summary Owner Reporter
#891 fixed Resource download worker daemon johnglover pudo

Reported by pudo, 3 years ago.

Description

Superticket: #1397

Write a worker daemon to download all resources from a CKAN instance to a local repository.

Questions

  • Do we only want to download openly licensed information? ANS: no, we do everything (though do need to think about this re. IP issues)
  • Should we have clever ways to dump APIs? ANS: no.
  • Do we respect robots.txt even for openly licensed information? ANS: No (we're not crawling we're archiving)
  • Use HTTP/1.1 Caching headers? ANS: if not changed since we last updated don't bother to recache.
    • Complete support for ETags
    • Expires, Max-Age etc.
  • Check

Functionality

  • Download files via HTTP, HTTPS (will not do FTP)

Process:

  1. [Archiver.Update checks queue (automated as part of celery)]
  2. Open url and get any info from resource on cache / content-length etc
    1. If FAILURE status: update task_status table (could retry if not more than 3 failures so far). Report task failure in celery
    2. Check headers for content-length and content-type ...
      • IF: content-length > max_content_length: EXIT (store outcomes on task_status, and update resource with size and content-type and any other info we get?)
      • ELSE: check content-type.
        • IF: NOT data stuff (e.g. text/html) then EXIT. (store outcomes and info on resource)
        • ELSE: archive it (compute md5 hash etc)
      • IF: get content-length and content-length unchanged GOTO step 4
  3. Archive it: connect to storage system and store it. Bucket: from config, Key: /archive/{timestamp}/{resourceid}/filename.ext
    • Add cache url to resource and updated date
    • Add other relevant info to resource such as md5, content-type etc
  4. Update task_status

Optional functionality

  • If result object is HTML, search for references to "proper data" (CSV download pages etc.)
  • Download from POST forms (accepting licenses or weird proprietary systems)
  • Support running on Google Apps Engine to save traffic costs.

Existing work

#1445 fixed Resource View page in WUI zephod rgrp

Reported by rgrp, 2 years ago.

Description

Super ticket: #1032

  • Locate at: /dataset/{dataset}/resource/{id}

See: http://wiki.ckan.org/Dataset_View_Page

Implemented in branch feature-1445-resource-view.

Still to do:

  • Add inline data explorer to page
#336 fixed Resource Search API dread donovanhide

Reported by donovanhide, 4 years ago.

Description

As a

CKAN client such as ScraperWiki

I want to

search for Package Resources, either by URL or other field, or just get them all. I want to be able to get all the resource's fields, such as URL.

Proposed implementation

Add resource search API at:

/api/search/resource

AND resource added to model API at:

api/rest/resource

(see ticket:358)

Functional differences from the ScraperWiki suggested patch:

  • URL is not normalised

  • URLs are not grouped
  • All fields of the resource object are returned, not just the URL
  • Package is identified by its ID, not name or full URL. (This is for consistency in the API - you can simple prepend 'http://ckan.net/package/' to the package ID)

This is to make our API more general, simple and consistent. It means that the ScraperWiki client has to do a bit more processing to get exactly what it needs. Is this ok?

Example search

POST

{"url": "scraperwiki.com/", "all_fields": 1}

to: /api/2/search/resource

returns JSON:

 [{"id": "a3dd8f64-9078-4f04-845c-e3f047125028",
   "package_id": "b8a325c8-af2a-43f3-8245-9db7d73dfbfe",
   "URL": "http://scraperwiki.com/lincolnshire-councillors", 
   "format": "CSV", 
   "Description": "Scrape of www.lincs.gov/councillors.pdf by ScraperWiki.",
   "hash": "", 
   "position": 2
 }]

Note use of package_id instead of package_name is something we're moving towards in the API, since names can change. When we've done ticket:341 then ckan.net/package/lincs-councillors will be a synonym of ckan.net/package/b8a325c8-af2a-43f3-8245-9db7d73dfbfe

Search Parameters

Key:  q
Description: Search all resource fields for the value

Key: url / description / format / 
Description: Search particular field for the value

Key: all_fields
Value: 0 or 1 (0 is default)
Description: If 1 (true), the full record of the package resource
(and it's package reference) are returned, rather than just the
PackageResource ID.

May also choose to introduce 'offset' and 'limit' to page through a large number of results.

JSONP achieved through API-wide parameter - see ticket:342

Search is case insensitive.

Original request

Hi, have attached a patch for adding a resource list api call. Have also added a JSONP compatible callback section, along the lines of #388.

Could also add a search version. Not sure what the best url would be for that though.

Haven't written a test as the structure seems to follow a functional spec. Is that document around somewhere?

Donovan

Note: See TracQuery for help on using queries.