{22} Trac tickets (2647 matches)

Results (1701 - 1800 of 2647)

Id Type Owner Reporter Milestone Status Resolution Summary Description Posixtime Modifiedtime
#336 defect dread donovanhide v1.1 closed fixed Resource Search API

As a

CKAN client such as ScraperWiki

I want to

search for Package Resources, either by URL or other field, or just get them all. I want to be able to get all the resource's fields, such as URL.

Proposed implementation

Add resource search API at:

/api/search/resource

AND resource added to model API at:

api/rest/resource

(see ticket:358)

Functional differences from the ScraperWiki suggested patch:

  • URL is not normalised

  • URLs are not grouped
  • All fields of the resource object are returned, not just the URL
  • Package is identified by its ID, not name or full URL. (This is for consistency in the API - you can simple prepend 'http://ckan.net/package/' to the package ID)

This is to make our API more general, simple and consistent. It means that the ScraperWiki client has to do a bit more processing to get exactly what it needs. Is this ok?

Example search

POST

{"url": "scraperwiki.com/", "all_fields": 1}

to: /api/2/search/resource

returns JSON:

 [{"id": "a3dd8f64-9078-4f04-845c-e3f047125028",
   "package_id": "b8a325c8-af2a-43f3-8245-9db7d73dfbfe",
   "URL": "http://scraperwiki.com/lincolnshire-councillors", 
   "format": "CSV", 
   "Description": "Scrape of www.lincs.gov/councillors.pdf by ScraperWiki.",
   "hash": "", 
   "position": 2
 }]

Note use of package_id instead of package_name is something we're moving towards in the API, since names can change. When we've done ticket:341 then ckan.net/package/lincs-councillors will be a synonym of ckan.net/package/b8a325c8-af2a-43f3-8245-9db7d73dfbfe

Search Parameters

Key:  q
Description: Search all resource fields for the value

Key: url / description / format / 
Description: Search particular field for the value

Key: all_fields
Value: 0 or 1 (0 is default)
Description: If 1 (true), the full record of the package resource
(and it's package reference) are returned, rather than just the
PackageResource ID.

May also choose to introduce 'offset' and 'limit' to page through a large number of results.

JSONP achieved through API-wide parameter - see ticket:342

Search is case insensitive.

Original request

Hi, have attached a patch for adding a resource list api call. Have also added a JSONP compatible callback section, along the lines of #388.

Could also add a search version. Not sure what the best url would be for that though.

Haven't written a test as the structure seems to follow a functional spec. Is that document around somewhere?

Donovan

1275411765000000 1279373842000000
#1445 enhancement zephod rgrp ckan-sprint-2011-12-05 closed fixed Resource View page in WUI

Super ticket: #1032

  • Locate at: /dataset/{dataset}/resource/{id}

See: http://wiki.ckan.org/Dataset_View_Page

Implemented in branch feature-1445-resource-view.

Still to do:

  • Add inline data explorer to page
1320665049000000 1330019916000000
#2822 enhancement toby toby demo phase 4 new Resource additional info titles format/i18n

the title for additional info should be translated

capitalised etc

1344504620000000 1344543985000000
#891 task johnglover pudo ckan-sprint-2011-11-07 closed fixed Resource download worker daemon

Superticket: #1397

Write a worker daemon to download all resources from a CKAN instance to a local repository.

Questions

  • Do we only want to download openly licensed information? ANS: no, we do everything (though do need to think about this re. IP issues)
  • Should we have clever ways to dump APIs? ANS: no.
  • Do we respect robots.txt even for openly licensed information? ANS: No (we're not crawling we're archiving)
  • Use HTTP/1.1 Caching headers? ANS: if not changed since we last updated don't bother to recache.
    • Complete support for ETags
    • Expires, Max-Age etc.
  • Check

Functionality

  • Download files via HTTP, HTTPS (will not do FTP)

Process:

  1. [Archiver.Update checks queue (automated as part of celery)]
  2. Open url and get any info from resource on cache / content-length etc
    1. If FAILURE status: update task_status table (could retry if not more than 3 failures so far). Report task failure in celery
    2. Check headers for content-length and content-type ...
      • IF: content-length > max_content_length: EXIT (store outcomes on task_status, and update resource with size and content-type and any other info we get?)
      • ELSE: check content-type.
        • IF: NOT data stuff (e.g. text/html) then EXIT. (store outcomes and info on resource)
        • ELSE: archive it (compute md5 hash etc)
      • IF: get content-length and content-length unchanged GOTO step 4
  3. Archive it: connect to storage system and store it. Bucket: from config, Key: /archive/{timestamp}/{resourceid}/filename.ext
    • Add cache url to resource and updated date
    • Add other relevant info to resource such as md5, content-type etc
  4. Update task_status

Optional functionality

  • If result object is HTML, search for references to "proper data" (CSV download pages etc.)
  • Download from POST forms (accepting licenses or weird proprietary systems)
  • Support running on Google Apps Engine to save traffic costs.

Existing work

1294052979000000 1320149841000000
#235 enhancement tobes dread ckan-v1.9 assigned Resource format normalization and detection

Try to gather proper MIME information for all package resources in CKAN. This is a shared ticket with dcat-tools (https://bitbucket.org/pudo/dcat-tools), i.e. opendatasearch.org. This can then also be used by ckanrdf, the CKAN RDF conversion service.

Sub-tasks:

  • Create a Google Spreadsheet with two Worksheets: "MIME-Mappings", i.e. "CSV" -> "text/csv" and "Name mappings", i.e. "text/csv" -> "Comma-Separated Spreadsheet".
  • Collect and map surface forms from all CKANs
  • Access this via Swiss and apply, store as a PackageResource? extra field pending #826 (Resource extras).
  • Add heuristics for format auto-detections:
    • Map well-known file extensions
    • Recognize obvious magic (Zip, Tar)
    • Peek into Zipfile/Tarfiles?
  • Define a convention for generic data types (many CKAN packages have only "Spreadsheet" defined, either detect specific type or set MIME to */tabular-data or similar)
  • See also: #816 (Autocomplete for the resource format field)
1263827604000000 1340627624000000
#229 enhancement dread dread v0.11 closed fixed Resource hashes

New field for resources - hash of the resource file.

  • CKAN itself will not calculate the hash value - user just pastes it in.
  • Display text field in resource table.
  • Requires migration script.

Questions for the field's value:

  1. Which hash to use? Restrict to python hashlib and other major hash libraries.
  1. Should we use merkle trees?

Thanks to Julien D'Assanges for the suggestion.

1262686287000000 1265891612000000
#1646 defect zephod dread ckan-sprint-2012-01-23 closed worksforme Resource navigator options display spuriously

When viewing a dataset, the "Resources" navigation button contained the Resource titles on the Resource navigator button, instead of in a drop-down mouse-hover menu.

http://thedatahub.org/dataset/realtime-birth-data-in-bulgaria/resource/66fc5831-ce01-4954-9beb-e2889ef8a20f

Chrome/Linux?

1326452700000000 1327407044000000
#300 defect rgrp dread v1.0 closed fixed Resource ordering issue

Failing test: ckan.tests.models.test_resource.TestResourceLifecycle?.test_03_reorder_resources

Not clear how visible this is to the user.

Related to ticket:292

1272285994000000 1272384474000000
#2247 enhancement zephod ckan-backlog new Resource preview glitch in some browsers

From Ira: Preview for google spreadsheets are not displaying correctly for me in Firefox v.10.0.02, fine in Chrome. http://i.imgur.com/KJaqz.png

1332246614000000 1332246614000000
#1394 defect dread dread ckan-sprint-2012-01-09 closed fixed Resource validation error messages misleading

(Editing a dataset) If the second resource contains any validation error then it says "Resources: Package resource(s) incomplete" and "Resource 1:".

1318515262000000 1325604784000000
#1711 enhancement icmurray icmurray ckan-sprint-2012-03-19 closed fixed Resource validation page
  • On the resources tab, there's a "Check Resources" button which, when clicked makes an ajax request with the list of URLs entered by the user.

  • The server checka each link for errors and header information about the linked resource. (Using ckanext/archiver/tasks.py:link_checker()).

  • The server returns a list of dicts (json), containing information about the linked resource, and the client uses that to:
  • populate the format field of each resource

The (guessed) 'file_extension' populates the 'format' field. If it's 'htm' or 'html', then we assume it's a listing page, and so don't populate the format field with 'htm' or 'html'.

  • provide feedback if a URL appears to be invalid

If the URL doesn't appear to be a URL at all, or returns a HTTP error, or times-out, then URL field is highlighted in red to indicate it's a bad URL. A tooltip shows the error message to the user.

  • [Optional] provide feedback if a URL appears to point to "Additional Information" - ie the Resource should be entered under "Additional Resources", rather than "Timeseries" or "Individual" datasets.

Analysis and further description on etherpad: http://ckan.okfnpad.org/dgu-package-form? [Section I]

1327589759000000 1332151557000000
#365 enhancement dread dread closed fixed ResourceNotifications

If you change a resource then you not only get a PackageNotification?, but also a ResourceNofication?.

1279037411000000 1279300621000000
#276 defect dread rgrp v1.0 closed fixed Resources in Package form seen multiple times upon preview

Create a new package with a name 'blah' and resource format 'blah'. Hit preview. There is an error because of the lack of resource url, but in the resource input boxes, there are now four resources with format 'blah'.

1269255399000000 1272996237000000
#358 enhancement rgrp dread ckan-v1.5 closed duplicate Resources in REST API

(spun out of ticket:336)

Resource added to model API at:

api/rest/resource

Example model request

GET to: /api/2/rest/resource/a3dd8f64-9078-4f04-845c-e3f047125028

returns:

 [{"id": "a3dd8f64-9078-4f04-845c-e3f047125028",
   "package_id": "b8a325c8-af2a-43f3-8245-9db7d73dfbfe",
   "URL": "http://scraperwiki.com/lincolnshire-councillors", 
   "format": "CSV", 
   "Description": "Scrape of www.lincs.gov/councillors.pdf by ScraperWiki.",
   "hash": "", 
   "position": 2
 }]

Authorization

  1. Have it generic (ie. not per resource) and use an action/role on system
  2. Require all resources to attach to packages an inherit their permissions (i.e. read/write etc if and only read/write on associated packages)
  3. Introduce Resource in authorization system (requires migration)

Mixed model

Create / Edit:

if resource associated to package:
    check_permissions(package, update)
else:
    check_system_permissions(c.user, model.Action.Resouce Create/Update, model.System)
1277483282000000 1310128782000000
#497 story johnbywater johnbywater closed duplicate Respond to CSW "GetRecords" request 1282427334000000 1294407718000000
#179 defect dread rgrp v1.0 closed fixed Restore 404 and 500 messages in WUI

Cost: 0.5h (?)

Conjecture this went missing in cset:a35db862a841

1257412668000000 1265305549000000
#1812 enhancement johnglover johnglover ckan-backlog closed fixed Restrict editing rights/permissions based on publisher
  • see how this currently works with DGU first
1329747889000000 1335874864000000
#484 story johnbywater closed invalid Restricted partial read-write access to catalogue Web UI 1282422858000000 1294417248000000
#483 story johnbywater closed invalid Restricted total read-write access to catalogue Web UI 1282422801000000 1294417216000000
#912 defect pudo pudo ckan-v1.5 closed wontfix Rethink result row representation in dcat-tools/rdfsolr
  • Should license go in the bottom line?
  • Formats should be styled as in CKAN
1295266299000000 1306774876000000
#818 requirement cygri ckan-backlog new Rethinking the author and maintainer fields

The semantics of the Author and Maintainer fields are really unclear at the moment. This leads to very inconsistent usage. Also, perhaps Name and Email are not the only fields that are needed for a contact.

Here is a table that shows the current usage of these fields in CKAN: http://richard.cyganiak.de/2010/ckan/ckan-ppl.html

We note several problems:

  • Author and Maintainer are often the same
  • Author and Maintainer are often used interchangeably
  • People really want to specify URLs for the contacts and stick them into random places because there is no field for it
  • Multiple comma-separated names in a single field

I'm not sure what to do about this, but a redesign is necessary in my opinion.

Some ideas:

  • Remove the maintainer field?
  • Make really clear that Author doesn't refer to the metadata on CKAN, but to the original data
  • Add an “author URL” field?
1290003524000000 1339774621000000
#405 task rgrp rgrp datapkg-0.8 closed fixed Retrieval options for package resources

Download Command (was install command) should support the following modes:

  • Download only the first listed resource (current behaviour, slightly arbitrary)
  • Download resources by interactive selection
  • Download resources by MIME type (cf #235)
  • Download all resources.
1281346806000000 1297214833000000
#47 enhancement rgrp johnbywater v0.6 closed fixed Return OpenID signin pages that look and feel like normal pages 1201183920000000 1215543616000000
#46 enhancement rgrp johnbywater v0.9 closed fixed Return error documents that look and feel like normal CKAN pages 1201111018000000 1265891789000000
#655 story johnbywater ckan-v1.2 closed Return status code 404 when harvest source is not found 1285252399000000 1285254088000000
#656 story johnbywater ckan-v1.2 closed Return status code 404 when harvesting job is not found 1285252429000000 1285254084000000
#84 enhancement kindly rgrp ckan-future assigned Revert support on versioned objects

Basic revert in the classic wiki form is already support by purging a Revision. However may wish to support:

  1. Cases where multiple objects changed in a revision but only want to revert 1 (low priority)
  2. Want to revert but have reversion as a new revision of that object.

Seems low priority at present.

Cost: ?

1248339543000000 1340626385000000
#904 task rgrp Stiivi ckan-v1.4-sprint-3 closed fixed Review CKAN documentation

What's bad at the moment?

  • lack of documentation e.g. config (very poorly documented)
  • too many sources of documentation
  • no common theming

Sources:

Resulting meta-ticket with things to do: ticket:927

1294832610000000 1299840539000000
#785 task johnbywater johnbywater ckan-v1.3 closed fixed Review document from PP regarding CSW and WAF guidance

Scope of CSW guidance document has been broadened to include WAF.

1289210338000000 1289482499000000
#713 task johnbywater johnbywater ckan-v1.3 closed fixed Review four definitions of DGU package attributes 1287581158000000 1288260915000000
#2274 enhancement seanh seanh ckan-sprint-2012-04-30 closed fixed Review multilingual branch with kindly, merge into master 1333375804000000 1335644408000000
#2273 enhancement seanh seanh ckan-v1.8 closed duplicate Review publisher organisations code with Ross 1333375723000000 1340635788000000
#2862 enhancement toby markw demo phase 4 new Revised revised groups description

Revised text for 'What are groups?' box at demo.ckan.org/group (after discussion with IB re #2812):

What are groups?

Groups allow you to group users and data together so that they are easier to manage. Group owners can assign roles and authorisations, giving each project or department control of its own data publishing.

Users can browse or facet by groups, which could be an organisation (for example, the Department of Health) or topic (e.g. Transport, Health), making it easier to find the data they are looking for.

1345114322000000 1345115072000000
#290 defect johnbywater dread v1.0 closed fixed Revision API - docs

doc/api.rst needs to cover the new Revision REST interface.

1271268759000000 1271636910000000
#346 defect dread johnbywater ckan-v1.3 closed wontfix Revision search API (response data format and documentation issue)

Whilst going through the API docs for the revision search API, it was noticed that the "Gdu" SoS doc doesn't match up. It returns revision IDs (perhaps this is useful to note in the spec?) so the format is probably not 'limitedstring'. Also, they appear to be ordered youngest first, not oldest as stated.

And in the revision model, it refers to 'simplestring' which it doesn't define - I guess the names should be 'limitedstring'?

Could this be checked out?

1276523517000000 1296477510000000
#804 task johnbywater johnbywater ckan-v1.3 closed fixed Rework analysis for publisher/provider in UKLP

We need an incremental plan that connects with current state of DGU and reflects what is actually required by UKLP.

1289816054000000 1294233156000000
#797 enhancement rgrp rgrp ckan-v1.3 closed fixed Rework core html layout to mirror wordpress twentyten

WP twentyten has an excellent core html structure. Furthermore, using that structure makes us compatible with all the WP twentyten compatible themes.

  • Convert to wp twentyten html
  • Switch css to be based off twentyten css (and perhaps rework somewhat)
1289402873000000 1289402982000000
#2345 enhancement seanh seanh ckan-sprint-2012-05-29 closed fixed Rewrite action API docs using autodoc 1335876769000000 1337962454000000
#2384 enhancement dread dread ckan-sprint-2012-05-15 closed fixed Rights tool factored out

The command line tool 'rights' is quite handy but it is glued to the CLI. I'm going to factor out the bit which searches for objects etc so it can be used by CreateTestData? etc and will be used by DGU.

1337080794000000 1337100810000000
#2878 enhancement icmurray ross ckan 2.0 closed wontfix Roles and Permissions for Organisations

As part of merging Organisations into core, it is necessary that we clarify the capacity field with which the users/datasets are added as members to the group 'subclass'.

Rather than the capacity being an opaque string that implies auth but doesn't clearly specify it, we will use role names where roles are defined in the database - with a clearly defined set of standard roles. The Role table is expected to have simply a string name/representation and acts as a container for permissions.

Each permission is a string of the form object.action (such as package.add, group.delete) of which several are expected to be associated with a role. This means the permission table will contain a string and a reference to the role.

This work will require UI changes to the screens allowing users to be added to a group/organisation so that the list of available roles is available to add those users.

[x] Model for Role and Permission

[ ] Logic layer changes for managing roles/permissions etc.

[ ] Determine default roles, perhaps just admin/editor/viewer

[ ] Fix the auth layer to use the permissions/roles - may be better implemented as another ticket.

1345466266000000 1350561906000000
#607 task johnbywater ckan-v1.2 closed fixed Routing configuration for harvest job entity 1284218136000000 1284344113000000
#605 task johnbywater ckan-v1.2 closed fixed Routing configuration for harvest job register 1284217966000000 1284336788000000
#586 task johnbywater ckan-v1.2 closed fixed Routing configuration for harvest source create form API resource 1284211336000000 1284261447000000
#591 task johnbywater ckan-v1.2 closed fixed Routing configuration for harvest source entity API resource 1284212082000000 1284342925000000
#599 task johnbywater ckan-v1.2 closed fixed Routing configuration for listing remote metadata entities for a given publisher 1284213814000000 1284347290000000
#550 task johnbywater johnbywater ckan-v1.2 closed fixed Routing configuration for package create form API resource 1283339785000000 1283351840000000
#608 task johnbywater ckan-v1.2 closed fixed Routing configuration for register of harvest jobs with error status 1284218378000000 1285347281000000
#594 task johnbywater ckan-v1.2 closed fixed Routing configuration for remote metadata edit form API resource 1284212532000000 1285198859000000
#2928 enhancement seanh ckan 2.0 new Run CKAN tests with example_i*form extensions enabled

Before releasing CKAN 2.0 we need to run all the CKAN tests with a modified test-core.ini with the example_idatasetform, example_igroupform and example_iorganizationform plugins enabled. If any tests fail, fix the bugs. This needs to be done for each release so add it to the release process.

1347552334000000 1347552334000000
#772 story johnbywater ckan-v1.3 closed duplicate Run CLI harvester command without arguments

Should return help for the harvester (currently crashes).

1288190373000000 1294413313000000
#771 story johnbywater ckan-v1.3 closed worksforme Run CLI help command without arguments

Should return something sensible (currently reports an error).

1288190318000000 1294413256000000
#2225 enhancement aron.carroll rgrp demo phase 2 closed fixed Run jshint on our javascript and clean up as needed 1331407316000000 1343220502000000
#996 task kindly kindly ckan-v1.4 closed fixed Run some basic load testing.

This will involve running a sample of real requests synchronously against real data.

1298283994000000 1300364398000000
#1305 defect nils.toedtmann amercader ckan-backlog closed fixed SMTP config for thedatahub.org and IATI

The email sending functionality (e.g for password reset) does not work on thedatahub.org and IATI (and probably some other instances) when using an address which is not a okfn.org one.

Could not send reset link: SMTPRecipientsRefused({u'[email protected]…': (550, 'relay not permitted')},)

As I said, [email protected]… works fine. The SMTP server used mail.okfn.org

1314956657000000 1315317033000000
#455 task johnbywater closed invalid SOLR - suggest 1 pager about how system would work

Either CKAN writes to SOLR and Drupal reads from SOLR, or CKAN writes to SOLR and Drupal reads SOLR via CKAN API (so search resource locations are unaffected).

1282299913000000 1291637172000000
#1708 defect dread dread ckan-sprint-2012-02-06 closed fixed SOLR configuration lost

The SOLR url, user and password defined in the CKAN config file are ignored and the default SOLR url is used.

This causes:

  • "0 datasets" displayed on the home page
  • Dataset searches result in 0 results and a small message "There was an error while searching". (Nothing about it in the logs)

To reproduce

This bug is only visible if your SOLR instance is not at the default place. To quickly reproduce this problem, setup your machine as a SOLR multicore instance and run: "paster db clean && paster create-test-data && paster serve development.ini". It quits with the error: "solr.core.SolrException?: HTTP code=400, reason=Missing solr core name in path"

Code affected

1327493428000000 1327580995000000
#353 defect dread closed fixed SOLR search indexing

As a

SOLR instance

I want to

keep my search index of CKAN packages up-to-date

Implementation

  • Using asynchronous event notifications
  • Running in a separate process to CKAN
1277123480000000 1280756399000000
#2844 enhancement rgrp new SQL-only (no solr) version of CKAN
  • Search needs to run of local DB (again)
  • paster db clean attemps to connect to SOLR (still works as does db first but then excepts which is not nice UX)
1344859168000000 1345454527000000
#2535 enhancement rgrp assigned SSL certificate for DataHub + https by default

DataHub? is increasingly used and we should ensure it uses ssl as part of general security.

See also #1446 (Need to support https login for multiple instances as part of the CKAN package install)

1339758027000000 1346662082000000
#802 story johnbywater johnbywater closed duplicate Save last harvested time on source 1289484226000000 1294233294000000
#571 story johnbywater johnbywater ckan-v1.3 closed fixed Save metadata document and associate with harvest source entity 1284040495000000 1288038218000000
#1156 enhancement pudo pudo pdeu-1 closed fixed Scraping harvesters for Paris and Vienna Catalogues

Import metadata from both sources into PDEU via the Harvesting framework but by scraping their respective catalogue pages.

1306337428000000 1306855111000000
#1540 defect amercader amercader ckan-sprint-2012-01-09 closed fixed Search API returns an error if empty parameters are provided

Both in 1.5.1b:

http://thedatahub.org/api/search/dataset?groups=lodcloud&title=

and 1.5.2a (current master):

http://test.ckan.net/api/search/dataset?groups=lodcloud&title=

Although the error message in 1.5.2a is more verbose:

"Bad request - Bad search option: HTTP code=400, reason=org.apache.lucene.queryParser.ParseException?: Cannot parse 'groups:lodcloud title:': Encountered \"<EOF>\" at line 1, column 22. Was expecting one of: \"(\" ... \"*\" ... <QUOTED> ... <TERM> ... <PREFIXTERM> ... <WILDTERM> ... \"[\" ... \"{\" ... <NUMBER> ..."

Some parameter validation before sending it to Solr should do the trick

1323359388000000 1326060385000000
#1404 enhancement zephod zephod ckan-v1.7 closed wontfix Search Page UI improvements

[Refactored] :: Follows on from #1506 UX changes.

  • Declutter the sidebar. No yellow box.
  • Facets to go on the left, rather than the right. More logical flow.
  • Did you know you can search by author? Probably not. Find a nice way of presenting extended search options.
  • Make Datasets in the search page look more like Datasets on the groups pages (ie. like awesome sexy search results).
    • Update that look-and-feel to include the new resource icons created in #1506
1318847818000000 1338203639000000
#316 defect rgrp dread closed fixed Search URL escaping

If you search for unescaped characters such as '`' (backtick) in the URL in Chrome then you get a 500 error.

e.g. http://www.ckan.net/package/search?q=fjdkf2B%C2%B4gfhgfkgf{gpk fjdkf2B´gfhgfkgf{gpk

returns this exception:

URL: http://www.ckan.net/package/search?q=fjdkf%2B%C2%B4gfhgfkgf%7Bg%C2%B4pk&search=Search+Packages+%C2%BB
Module weberror.errormiddleware:162 in __call__
<<              __traceback_supplement__ = Supplement, self, environ
                   sr_checker = ResponseStartChecker(start_response)
                   app_iter = self.application(environ, sr_checker)
                   return self.make_catching_iter(app_iter, environ, sr_checker)
               except:
>>  app_iter = self.application(environ, sr_checker)
Module repoze.who.middleware:107 in __call__
<<          wrapper = StartResponseWrapper(start_response)
               app_iter = app(environ, wrapper.wrap_start_response)
       
               # The challenge decider almost(?) always needs information from the
>>  app_iter = app(environ, wrapper.wrap_start_response)
Module beaker.middleware:73 in __call__
<<                                                     self.cache_manager)
               environ[self.environ_key] = self.cache_manager
               return self.app(environ, start_response)
>>  return self.app(environ, start_response)
Module beaker.middleware:152 in __call__
<<                          headers.append(('Set-cookie', cookie))
                   return start_response(status, headers, exc_info)
               return self.wrap_app(environ, session_start_response)
           
           def _get_session(self):
>>  return self.wrap_app(environ, session_start_response)
Module routes.middleware:130 in __call__
<<                  environ['SCRIPT_NAME'] = environ['SCRIPT_NAME'][:-1]
               
               response = self.app(environ, start_response)
               
               # Wrapped in try as in rare cases the attribute will be gone already
>>  response = self.app(environ, start_response)
Module pylons.wsgiapp:125 in __call__
<<          
               controller = self.resolve(environ, start_response)
               response = self.dispatch(controller, environ, start_response)
               
               if 'paste.testing_variables' in environ and hasattr(response,
>>  response = self.dispatch(controller, environ, start_response)
Module pylons.wsgiapp:324 in dispatch
<<          if log_debug:
                   log.debug("Calling controller class with WSGI interface")
               return controller(environ, start_response)
           
           def load_test_env(self, environ):
>>  return controller(environ, start_response)
Module ckan.lib.base:50 in __call__
<<          # available in environ['pylons.routes_dict']
               try:
                   return WSGIController.__call__(self, environ, start_response)
               finally:
                   model.Session.remove()
>>  return WSGIController.__call__(self, environ, start_response)
Module pylons.controllers.core:221 in __call__
<<                  return response(environ, self.start_response)
               
               response = self._dispatch_call()
               if not start_response_called:
                   self.start_response = start_response
>>  response = self._dispatch_call()
Module pylons.controllers.core:172 in _dispatch_call
<<              req.environ['pylons.action_method'] = func
                   
                   response = self._inspect_call(func)
               else:
                   if log_debug:
>>  response = self._inspect_call(func)
Module pylons.controllers.core:107 in _inspect_call
<<                        func.__name__, args)
               try:
                   result = self._perform_call(func, args)
               except HTTPException, httpe:
                   if log_debug:
>>  result = self._perform_call(func, args)
Module pylons.controllers.core:60 in _perform_call
<<          """Hide the traceback for everything above this method"""
               __traceback_hide__ = 'before_and_this'
               return func(**args)
           
           def _inspect_call(self, func):
>>  return func(**args)
Module ckan.controllers.package:52 in search
<<                  collection=query,
                       page=request.params.get('page', 1),
                       items_per_page=50
                   )
                   # filter out ranks from the query result
>>  items_per_page=50
Module webhelpers.paginate:333 in __init__
<<              self.item_count = item_count
               else:
                   self.item_count = len(self.collection)
       
               # Compute the number of the first and last available page
>>  self.item_count = len(self.collection)
Module webhelpers.paginate:204 in __len__
<<      def __len__(self):
               return self.obj.count()
       
       # Since the items on a page are mainly a list we subclass the "list" type
>>  return self.obj.count()
Module sqlalchemy.orm.query:1094 in count
<<              q = q.params(params)
               q = q._legacy_select_kwargs(**kwargs)
               return q._count()
       
           def _count(self):
>>  return q._count()
Module sqlalchemy.orm.query:1103 in _count
<<          """
               return self._col_aggregate(sql.literal_column('1'), sql.func.count, nested_cols=list(self.mapper.primary_key))
       
           def _col_aggregate(self, col, func, nested_cols=None):
>>  return self._col_aggregate(sql.literal_column('1'), sql.func.count, nested_cols=list(self.mapper.primary_key))
Module sqlalchemy.orm.query:1125 in _col_aggregate
<<          if self._autoflush and not self._populate_existing:
                   self.session._autoflush()
               return self.session.scalar(s, params=self._params, mapper=self.mapper)
       
           def compile(self):
>>  return self.session.scalar(s, params=self._params, mapper=self.mapper)
Module sqlalchemy.orm.session:635 in scalar
<<          engine = self.get_bind(mapper, clause=clause, instance=instance)
       
               return self.__connection(engine, close_with_result=True).scalar(clause, params or {})
       
           def close(self):
>>  return self.__connection(engine, close_with_result=True).scalar(clause, params or {})
Module sqlalchemy.engine.base:834 in scalar
<<          """
       
               return self.execute(object, *multiparams, **params).scalar()
       
           def statement_compiler(self, statement, **kwargs):
>>  return self.execute(object, *multiparams, **params).scalar()
Module sqlalchemy.engine.base:844 in execute
<<          for c in type(object).__mro__:
                   if c in Connection.executors:
                       return Connection.executors[c](self, object, multiparams, params)
               else:
                   raise exceptions.InvalidRequestError("Unexecutable object type: " + str(type(object)))
>>  return Connection.executors[c](self, object, multiparams, params)
Module sqlalchemy.engine.base:895 in execute_clauseelement
<<          else:
                   keys = None
               return self._execute_compiled(elem.compile(dialect=self.dialect, column_keys=keys, inline=len(params) > 1), distilled_params=params)
       
           def _execute_compiled(self, compiled, multiparams=None, params=None, distilled_params=None):
>>  return self._execute_compiled(elem.compile(dialect=self.dialect, column_keys=keys, inline=len(params) > 1), distilled_params=params)
Module sqlalchemy.engine.base:907 in _execute_compiled
<<          context.pre_execution()
               self.__execute_raw(context)
               context.post_execution()
               self._autocommit(context)
>>  self.__execute_raw(context)
Module sqlalchemy.engine.base:916 in __execute_raw
<<              self._cursor_executemany(context.cursor, context.statement, context.parameters, context=context)
               else:
                   self._cursor_execute(context.cursor, context.statement, context.parameters[0], context=context)
       
           def _execute_ddl(self, ddl, params, multiparams):
>>  self._cursor_execute(context.cursor, context.statement, context.parameters[0], context=context)
Module sqlalchemy.engine.base:958 in _cursor_execute
<<              self.engine.logger.info(repr(parameters))
               try:
                   self.dialect.do_execute(cursor, statement, parameters, context=context)
               except Exception, e:
                   self._handle_dbapi_exception(e, statement, parameters, cursor)
>>  self.dialect.do_execute(cursor, statement, parameters, context=context)
Module sqlalchemy.engine.default:133 in do_execute
<<      def do_execute(self, cursor, statement, parameters, context=None):
               cursor.execute(statement, parameters)
       
           def is_disconnect(self, e):
>>  cursor.execute(statement, parameters)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xb4' in position 6: ordinal not in range(128)
1274265928000000 1291831177000000
#2962 enhancement dominik ckan-backlog new Search across multiple ckan instances

Could be done by:

  • using the solr distributed search
    • difficult set up
  • merging result sets from apis
    • make sure that results can be merged properly (score, facets, ...)
1349736622000000 1349736622000000
#141 task dread rgrp v0.11 closed fixed Search api docs

Write up search api documentation and put it in a template that shows up at api/search/.

Cost: 2h

1254903008000000 1255007583000000
#875 enhancement pudo rgrp ckan-v1.4-sprint-1 closed fixed Search backend supports solr interface and query API mimics solr

Consolidate search API interface (and backend) on solr (solrpy) type interface.

  • Support for standard query structure
  • Support for facet options

Do not need to change response formats. (Or do we?)

2 options here for advanced features like facets in non-solr:

  1. Disable (happens automatically)
  2. Implement - suggest using group by etc

Extras

  • Front-page tag cloud: change this to use facets
    • Accept this means that if facets not functional in backend we have no tag cloud
1292844957000000 1297085261000000
#923 defect rgrp dread closed worksforme Search box doesn't work in leaderboard page
  1. Go to: http://ckan.net/stats/leaderboard#content
  2. In the far top-right of the browser, select the search box in 'water'.
  3. Press enter to search. Nothing happens.

Tried in: chrome, firefox

1295867328000000 1340632144000000
#924 enhancement dread ckan-backlog new Search box has no search button

The search box at the top-right of CKAN's page doesn't have a 'go' button. I feel that a larger percentage of users expect a 'go' or 'search' button on the right-hand side of the box to press to start searching. Techies tend to know the keyboard shortcut of pressing 'carriage-return' but it might be better to follow standard practise on this.

Examples with 'search' button: Internet Explorer, Firefox, Google, Amazon, trac Examples without: ?

1295867533000000 1323170436000000
#356 enhancement rgrp v1.1 closed fixed Search box in at top of page (UI)

A small but useful ui improvement would be to have a search box at top right on every page.

As an example see the one here on trac or on github.com or bitbucket.org.

  • It would be particularly good to include a small advanced search link that took you to the full search page. Need to keep it small because screen real-estate here is limited (see how github.com does this for inspiration).
1277235411000000 1278931830000000
#2507 enhancement seanh seanh ckan-sprint-2012-06-25 closed fixed Search button on dataset search page wraps onto next line

If you change the language to one where the word for 'Search' is longer than the English word (e.g. fr) then the search button wraps onto the next line, below the text field, which looks bad.

1339409432000000 1339596399000000
#350 enhancement dread ckan-backlog reopened Search engine optimisation

Need to research what can easily be done to improve CKAN packages in the search rankings.

Comments from Glen Barnes:

We've been pretty successful at SEO without even really trying (see http://www.google.co.nz/search?client=safari&rls=en&q=auckland+google+transit+feed&ie=UTF-8&oe=UTF-8&redir_esc=&ei=dsYSTOzJLs2eceuZiI8I as an example). This to me is key. If we are to make data available it has to be findable which is the main reason for a catalogue. There are probably things we should be doing on CKAN like using slugged urls (http://www.ckan.net/package/ascoe -> http://www.ckan.net/package/ascoe/atmospheric-chemistry-studies-in-the-oceanic-environment), setting the H1 tag correctly ("Atmospheric Chemistry Studies in the Oceanic Environment" on the example above). Some basic SEO 101 on page optimisations.

1276594541000000 1339774690000000
#364 defect dread ckan-v1.3 closed fixed Search for 'statistic' returns nothing

On ckan.net there are plenty of packages (and indeed their tags) with the word 'statistic' in them, but no packages turn up when you search for it:

http://ckan.net/package/search?q=statistic&search=Search+Packages+%C2%BB

(Using Postgres full text search)

1278949620000000 1291637291000000
#1073 enhancement dread dread ckan-v1.4-sprint-5 closed fixed Search index checker

Tool that checks which packages have not been indexed.

Required for DGU: https://trac.dataco.coi.gov.uk/projects/datagov/ticket/940

1302185444000000 1302185825000000
#1715 enhancement kindly kindly ckan-sprint-2012-02-20 closed fixed Search index multilingual

Need to make solr schema work for many languages. Get stopwords and choose correct analysis for each.

1327598884000000 1329393450000000
#695 bug pudo dread closed fixed Search indexing broken on ckan.net

e.g. searching for 'buddhist' or 'sanskrit', you don't get this newly created package: http://ckan.net/package/digitalsanskritbuddhistcanon

1286991201000000 1287766973000000
#324 enhancement dread dread v1.1 closed fixed Search indexing using notifications

Currently search indexing is triggered directly using a Postgresql db callback. Now take advantage of the Notification system to register interest in all package changes and db changes to trigger this instead.

The indexing shall run in a separate shell/process, managed by supervisord.

1274723483000000 1278599927000000
#1366 defect dread ckan-future assigned Search inside extra fields

SOLR search doesn't support searching for part of an extra field, but it does for other fields.

i.e. title="One Two Three" matches q=one AND q=title:one and geographic_coverage="England Scotland" matches q=England BUT NOT q=geographic_coverage:England

This problem emerged when we went to SOLR in #1275 (CKAN 1.5a). Tests were skipped.

This is could be a problem for DGU and maybe elsewhere.

1317290992000000 1338206707000000
#498 story johnbywater closed invalid Search packages within location "bounding box" 1282427412000000 1294412520000000
#1131 defect dread dread ckan-v1.4-sprint-7 closed fixed Search param validation exception not caught

Example request:

http://nl.ckan.net/api/2/search/package?q=delft&order_by=&offset=&limit=&tags=

Gives 500 error:

<type 'exceptions.ValueError'>: invalid literal for int() with base 10: ''
1304942023000000 1305537897000000
#1603 enhancement zephod rgrp ckan-v1.7 closed duplicate Search query builder

Super ticket: #1745

Ability to build up search query using a nice javascript-y interface.

  • Add facets by selecting attribute and adding -> search facet options in dropdown -> added to search (with 'x' to remove -- as we currently do).
    • (a bit like the data.hri.fi)
  • Some improvements to css
  • Improvements to faceting
    • Ability to configure faceting and number of items to show (?)
  • Pure JS search implementation to make it easy to reuse across site
1325268364000000 1338202654000000
#305 defect johnbywater johnbywater v1.0 closed fixed Search result pagination is broken

Expect to page through results.

Only page 1 is shown, all other pages fail to display.

Reproduce by searching for something common and browsing to the second page.

1272468229000000 1272994804000000
#2723 defect aron.carroll shevski demo phase 2 closed fixed Search result summary badly displayed

Text goes over the order by text, with drop down floating on top meaning it's all impossible to read

http://s031.okserver.org:2375/dataset?q=unemployment%2C+data&sort=metadata_modified+desc

1342949143000000 1343136190000000
#864 enhancement memespring memespring ckan-v1.3-sprint-1 closed fixed Search results UI changes

as per http://ckan.org/wiki/UIRedesignSearch

1291736441000000 1291741028000000
#931 enhancement dread dread ckan-v1.4-sprint-1 closed fixed Search results generator hides paging functionality

ckanclient's search results list only packages up to the 'limit'. It would be good to return a generator instead of a list. When the limit is reached on the generator then another 'page' is loaded automatically.

1296147360000000 1298379187000000
#1246 defect pudo [email protected] ckan-sprint-2011-10-28 closed fixed Search results on ckan.net are mistakenly all 'open'

All package search results on ckan.net are labelled as 'open' even when their license is closed or unknown: http://ckan.net/package

1311863353000000 1311892816000000
#1250 enhancement pudo rgrp ckan-v1.5 closed fixed Search results should be sorted by score rather than alphabetical

At the moment we sort search results alphabetically. While this is useful for doing 'browse' case where no search bad for all other cases.

Adopt default sort order of 'score' though may wish to keep alphabetical for no search term (i.e. wildcard).

Options:

  • Default this in solr (no need to touch code) but fragile and affects everything ...
  • Do it in code and default to score
  • Do it in code and have alphabetical (on name or title?) when no criteria otherwise score

Aside: may also wish to support search in query api but that is for later!

1311964752000000 1312370201000000
#1455 defect johnglover dread ckan-sprint-2011-12-05 closed fixed Search results when 'all_fields' don't include 'extra' fields

When you do a search like this:

http://thedatahub.org/api/search/package?q=tauberer+census&all_fields=1

the "extra" fields (e.g. "triples", "shortname") get missed off the results. The docs say it should be a "full record" and I don't see any reason why this is missed off.

This is a problem because search all_fields is the only way for clients and front-ends to get packages in bulk. They end up (like lodcloud) doing thousands of requests to get packages individually.

The full record is:

http://thedatahub.org/api/rest/dataset/2000-us-census-rdf
{"count": 1, "results": [{"res_description": ["Download", "XML Sitemap", "SPARQL enpdoint", "Example (RDF/XML)"], "name": "2000-us-census-rdf", "license": "Non-OKD Compliant::Creative Commons Non-Commercial (Any)", "author": "Joshua Tauberer", "author_email": "http://razor.occams.info/", "ckan_url": "http://thedatahub.org/dataset/2000-us-census-rdf", "notes": "2000 U.S. Census converted into over a billion RDF triples.\n\nPopulation statistics at various geographic levels, from the U.S. as a whole, down through states, counties, sub-counties (roughly, cities and incorporated towns)\n\nNotes: also found in the of SPARQL Endpoints.\n\nFrom home page:\n\n> * For the detailed Census statistics, you'll have to download the raw Census data files from the Census Bureau, my Perl script and the patch file below and run it yourself because the files are too big for me to offer as a download!\n> \n> * The data and scripts can be reused under Creative Commons Attribution-NonCommercial-ShareAlike.\n", "entity_type": "package", "site_id": "www.ckan.net", "download_url": "http://www.rdfabout.com/demo/census/", "indexed_ts": "2011-11-01T12:52:36.034Z", "url": "http://www.rdfabout.com/demo/census/", "state": "active", "title": "2000 U.S. Census in RDF (rdfabout.com)", "groups": ["lod", "lodcloud"], "res_format": ["", "meta/sitemap", "api/sparql", "example/rdf+xml"], "license_id": "cc-nc", "revision_id": "fcbad0de-79ea-41bd-8e01-eb832a05b732", "res_url": ["http://www.rdfabout.com/demo/census/", "http://www.rdfabout.com/sitemap.xml", "http://www.rdfabout.com/sparql", "http://www.rdfabout.com/rdf/usgov/geo/us/ny"], "id": "551ec435-f198-4d52-9b56-ec0b0be6aec9", "tags": ["census", "data", "demographics", "deref-vocab", "format-dc", "format-geonames", "format-politico", "format-rdf", "geographic", "linkeddata", "lod", "lodcloud.nolinks", "no-license-metadata", "no-provenance-metadata", "no-vocab-mappings", "population", "published-by-third-party", "rdf", "statistics", "us"]}]}
1320858265000000 1324474466000000
#2331 defect kindly rgrp ckan-sprint-2012-05-29 reopened Search should AND terms not OR terms

Appears current default search in CKAN ORs terms rather than ANDing them (i.e. adding more terms increasing number of items found rather than reducing it).

Not sure when this crept in or if it has been there for a long time.

1335637485000000 1356474344000000
#775 task pudo dread closed fixed Search warning

We're getting this warning a great deal on live servers. Is these really a sign of the system not operating correctly or can we reduce the level to an INFO?

e.g. on hmg.ckan.net:

2010-10-29 17:12:08,262 WARNI [ckan.lib.search.common] NOOP Index: id,package_id,url,format,description,hash,position
2010-10-29 17:12:08,333 WARNI [ckan.lib.search.common] NOOP Index: id,package_id,url,format,description,hash,position
2010-10-29 17:12:08,375 WARNI [ckan.lib.search.common] NOOP Index: id,package_id,url,format,description,hash,position
2010-10-29 17:12:08,406 WARNI [ckan.lib.search.common] NOOP Index: id,package_id,url,format,description,hash,position
2010-10-29 17:12:08,480 WARNI [ckan.lib.search.common] NOOP Index: id,package_id,url,format,description,hash,position
2010-10-29 17:12:08,613 WARNI [ckan.lib.search.common] NOOP Index: id,package_id,url,format,description,hash,position
1288372692000000 1295260144000000
#1505 defect dread dread ckan-sprint-2011-12-05 closed fixed SearchError and SearchQueryError cause exception in Action API

This query caused ckan to except because ckan/controllers/api.py doesn't catch SearchError? and SearchQueryError?:

curl http://localhost:5000/api/action/package_search -d '{"sort": "metadata_modified"}'
1322758968000000 1324474577000000
#191 enhancement johnglover dread ckan-sprint-2011-12-19 closed fixed Searching by modification date

Cost - 2 days

Search interface has new options to filter and sort the results by the date the package has been last modified in ckan. Search options are included in both Web UI and Search API.

The filter specifies a range of dates. The results can be sorted by ascending or descending dates. The last modification date is surfaced in the package.

Example search parameters:

modification-range=5/4/09- Exclude packages last modified earlier than 5/4/09
modification-range=5/4/09-5/12/09 Exclude packages last modified outside of 5/4/09-5/12/09
order_by=mod Sort by metadata modification. Defaults to newest first.
order_by=mod-newest Sort by metadata modification, newest first.
order_by=mod-oldest Sort by metadata modification, oldest first.
1258387778000000 1330020983000000
#193 enhancement rgrp dread closed wontfix Searching by time-related field

Cost - 2 days

Search interface has new options to filter and sort the results by the time-related field of the package. Search options are included in both Web UI and Search API.

The filter specifies a range of dates. The results can be sorted by ascending or descending dates. The last modification date is surfaced in the package. Need to decide for a time-related field value that is date range, what date is used for the search.

Example search parameters:

reldate-range=5/4/09- Exclude packages related to earlier than 5/4/09
reldate-range=5/4/09-5/12/09 Exclude packages related to date outside of 5/4/09-5/12/09
order_by=reldate Sort by date package is related to. Defaults to newest first.
order_by=reldate-newest Sort by date package is related to, newest first.
order_by=reldate-oldest Sort by date package is related to, oldest first.

Related to ticket:192

1258388169000000 1340626463000000
#159 defect dread rgrp v0.11 closed fixed Searching for tags:... resulting in lots of tags being found

Search of form: tags: ... behaves differently depending on whether there is a leading space:

  • tags: postcode - tags found correctly but no packages found
  • tags:postcode - tags incorrectly found but correct packages found

Let's fix this.

Cost: 0.5h

1256030097000000 1256060264000000
#834 task Alexander ckan-v1.3 closed worksforme Searching in CKAN

Hello.

I've installed stable CKAN v1.1 from PyPI.

I can't find any docs about using CKAN API in order to query packages.

Query ./api/search/package?q=str works fine, but with extra parameters, such as limit, offset, fullinfo, order_by, search_notes, don't. Should I use new version for this? How can I perform this query via Ckanclient? Is it possible?

Also I'm interested how to find open-licensed files? I tied URL ./api/search/package?q=str&open_only=1&downloadable_only=1 and Ckanclient:

result = ckan.package_search('str', {'open_only': 1, 'downloadable_only': 1})

As result nothing found.

Thanks.

1290769564000000 1291633657000000
#129 enhancement rgrp dread ckan-backlog closed invalid Secure db access by channelling query generation through authz module

Controllers and templates should not access db objects directly - they should do all access via authz module giving username. They are handed by a query that has already been filtered by the packages they are authorized to read.

(Additional idea to be discussed: When they request a package object, they are handed an copy of the db object - disconnected from the database - so it the db object can't be changed.)

A couple of tests can be reenabled when this is done: ckan.tests.functional.test_authz.TestUsage?.test_admin_list_deleted ckan.tests.functional.test_authz.TestUsage?.test_search_deleted

1253886136000000 1267719162000000
#120 enhancement dread dread v0.10 closed fixed Security audit

Look for all places where model is accessed and check authorization is checked.

Document holes (and, as necessary, suggestions for fixes) as new tickets. Likely areas that need looking at:

  • search i/f
  • package WUI commit

Write holes are obviously much more significant to us than read holes.

1253529427000000 1254406544000000
#1585 enhancement dread closed fixed Security fix

(details embargoed until 31/1/2012)

1324473465000000 1340633128000000
#132 defect rgrp dread closed fixed Security hole - read package/group list (REST)

Using REST interface you can list packages and groups without authorization being checked.

Can be fixed using more advanced query to check authz.

1254389493000000 1273254514000000
Note: See TracReports for help on using and creating reports.