{22} Trac tickets (2647 matches)

Results (2201 - 2300 of 2647)

Id Type Owner Reporter Milestone Status Resolution Summary Description Posixtime Modifiedtime
#1021 enhancement pudo pudo ckan-v1.4-sprint-3 closed fixed Config option to disable OpenID

HRI don't like federation, want to login normal way only. Make this a config option and perhaps even mess with runtime repoze config

1299492920000000 1299518828000000
#1048 enhancement dread dread ckan-v1.4-sprint-4 closed fixed Complete making groups versioned
  • Deleting a group changes state to 'deleted' rather than purging it
  • Adding authz tests for deleted groups
1300387655000000 1300702752000000
#1557 enhancement David Rasnik jilly mathews ckan-future new Complete Webstore Preview Extension

Finish any work out standing on web store preview extension to be able to package and release.

Ref James and I going through existing features and trying to mention any polishing that needed doing to get exiting features ready for release with projects such as CKAN hosted.

1324291253000000 1324291253000000
#987 defect pudo pudo closed duplicate Common harvesting framework

We are now harvesting metadata from other sources in various places around CKAN. Such harvesting can include:

  • CSW/WFS for INSPIRE/UKLII (yields CKAN packages)
  • Catalogue scraping for LOD2 experiments (yields RDF graphs)
  • Atom/DCat for LOD2 production (yields RDF graphs)
  • OAI-PMH for http://datadryad.org/ and other dspace (yields CKAN packages)

We should aim to consolidate the harvesting clients into a common system that is easy to extend when needed and can be re-used in different scenarios.

In general, such a system would have the following stages:

  • Source selection: find what to download/scrape/harvest/parse
  • Index retrieval (i.e. package index)
  • Item retrieval (i.e. package entity)
  • (Optional: Serialization)
  • Normalisation
  • Loading/Merging? into CKAN

Exisiting harvesters are at:

1297684756000000 1311177705000000
#310 defect dread rgrp v1.1 closed fixed Commit message box looks wrong in edit page since edit style overhaul

Suggest move this below the label and make full width of screen and only 3/4 rows high (more like a wiki site).

  • Also change label to: Edit summary (Briefly describe the changes you have made)
  • Remove: you can markdown formatting here.
  • Move author: if you have not signed in smaller and closer (like markdown instructions are nwo).
  • Change commit -> save
  • Remove "please save" just have the bullet points
1273348714000000 1279300525000000
#296 enhancement johnbywater johnbywater closed duplicate Commit CKAN revisions to changeset system 1272279521000000 1294407032000000
#1415 enhancement thejimmyg nils.toedtmann ckan-sprint-2011-12-05 closed fixed Comments on current status of ckan deb packages

This is a scratch pad ticket with some comments on the current status of our ckan deb packages. I know that some of it is the deb packaging roadmap anyway, please forgive me if i mention them here again.

Rufus and me re-deployed some community ckan instances onto s022 (see http://trac.okfn.org/ticket/926). We followed the documentation http://docs.ckan.org/en/latest/install-from-package.html

  • Deb package version number: the version of the deb package is "python-ckan 1309471251~149be76faabc+lucid-1", and it's hard to guess from there that it contains a ckan 1.4.2a
  • When is 1.4.3/1.5.x expected as deb package?
  • There was a bug in the DB upgrade script /usr/share/pyshared/ckan/migration/versions/029_version_groups.py (line 150) which looks like it was fixed 1.4.1==>1.4.2 but was nevertheless present in this deb package.
  • The current script /usr/bin/ckan-std-install
    • does not set the Apache ServerName? according to the $INSTANCE variable
    • automatically configures a ckan extension named after $INSTANCE
    • depends on local postgres
    • could be replaced with "/usr/bin/ckan-deploy --name=ckan-std --domain=ckan-std.localhost (see next point)
  • (i think this is exactly James' plan): have more generic deployment script /usr/bin/ckan-deploy as part of python-ckan which takes arguments like
    • --domain=cc.ckan.net
    • --aliases=$list-of-domains
    • --name=cc (defaults to "domain")
    • --no-db (does not configure a DB)
    • --sql-alchemy=$DB_CONFIG_STRING (also runs "paster --plugin ckan db upgrade --config")
    • --extension $list-of-extesions
    • ...
1319457069000000 1323167941000000
#1182 defect timmcnamara ckan-backlog new Comments from deleted packages appear in "Recent Comments" feed

When a package has been deleted, say for spam moderation, comments still appear in the recent comments section.

This is a problem because non-admin users will be shown a warning that they're not authorised to view the package if they click on the link.

At CKAN.net currently, this affects the most recent comment.

1307658251000000 1339774319000000
#1559 enhancement rgrp jilly mathews ckan-sprint-2012-04-02 closed fixed Comments Extension / Disqus

Polish off comments extension dev and test. estimate 2 days.

1324291720000000 1332242129000000
#1218 enhancement dread minspamboks@… ckan-sprint-2011-10-28 closed fixed Colour the History tab icon

Change the color of the "History" tab icon to yellowish, like the rest of the icons in the other tabs ("View" and "Edit", "Authorization").

Reasoning

When you view a data package, for instance http://ckan.net/package/thesaurus-w, you will see "View", "Edit", "History" tabs on the top. "History" tab has a black-and-white icon which makes it look like an inactive/disabled tab (since the text is also grayed out when the tab is not selected). This is not a major issue, but it is a little bit confusing for the users. This icon exists in v1.3.2 and also in v1.4.1a (that runs on ckan.net).

The simple solution would be to change the color of the "History" tab icon and give it the same yellowish color like the rest of the icons in the other tabs (e.g. "View" and "Edit").

1310375768000000 1310389390000000
#354 defect johnbywater johnbywater closed invalid Collect together requirements and top-level design for user/package 'groups'

Collect together requirements and top-level design for user/package 'groups': existing tickets, Rufus spec, Sean spec, meeting notes (dread) email, based on existing user authz stuff.

http://knowledgeforge.net/ckan/trac/wiki/AccessControl

Do we add these into user-role table somehow or new table? To present this to team

1277131335000000 1282908983000000
#2890 enhancement seanh ckan-v1.8.1 new Collect data previews and data store docs in one chapter

Currently there is this page:

http://docs.ckan.org/en/latest/data-viewer.html

which covers Recline Data Explorer and other kinds of data preview in CKAN. It is under the Publishing Datasets section in the documentation. I had to to a search for 'recline' to find it.

Separately there is this page: http://docs.ckan.org/en/ckan-1.7.1/datastore.html which covers datastore, datastorer, and the data api.

I suggest collecting this together in one chapter called 'Data Previews'. If I understand it right the general gist would be:

CKAN has builtin previews of data resources on resource pages, enabled by default.

Images, Google Documents, and web page resources will be loaded into embedded iframes for preview.

Text-like files will be displayed raw.

CSV or Excel files uploaded to CKAN will be previewed using Recline Data Explorer.

Additionally, you can enable CKAN's DataStore?, requires you to install ElasticSearch? and nginx and put datastore.enabled=1 in your ini file. Lets you use the Data API to query data.

Does having DataStore? enabled mean you get preview of more types of resources? Any resource that's available via the Data API will be previewed using Recline,

You can install ckanext-datastorer, and then CSV and Excel files _linked to_ as CKAN resources will be previewed using Recline also. Requires celeryd.

1346149236000000 1346175867000000
#55 enhancement rgrp rgrp v0.7 closed fixed Code to migrate data from v0.6 to v0.7 using dump and load

Associated to ticket:51 (upgrade CKAN to new vdm) and ticket:54 (dump/load) need to convert v0.6 data for v0.7.

Obvious way to do this is via alteration to data load method.

1223908240000000 1223909891000000
#1164 enhancement amercader amercader pdeu-1 closed fixed Cloropleth Map of European Data Availability for PDEU

A nice map in the homepage showing the availability of data across Europe

1306408824000000 1308647224000000
#853 enhancement wwaites wwaites ckan-v1.3-sprint-1 closed fixed Client upload to storage without having primary storage keys

Reverse engineer boto and work out how to get headers to support upload to google storage without holding api keys.

This would lead to an extension to OFS.

This analysis should inform (and go hand-in-hand) with the implementation of ticket:879 (Storage Auth API in CKAN).

1291723063000000 1294594581000000
#2835 enhancement aron.carroll demo phase 5 new Client module needs a template loading method
Client#getTemplate(name, params, success, error);

Where params, success and error are optional arguments. test/index.html already has an implementation called loadFixture().

1344532233000000 1344532233000000
#322 enhancement dread dread v1.1 closed fixed Client interface for Notification Service

Use cases

  • Register for package changes
  • Register for all revisions
  • Notified of a package change
  • Notified of a revision
  • Deregistration
  • Configuration of port in pylons config

Design

  • Default port: 5672 (standard for AMQP)
  • Exchange name: 'ckan'
  • Exchange type: topic exchange (most flexible)
  • Routing keys: (see below)

Routing detail

Routing key format: "OBJ_TYPE" (NB tags should be identified by their name, not ID)

Example routing keys

  • 'package' - Package edited/created
  • 'resource' - Resource edited/created
  • 'revision' - Any change
  • 'db.clean'
  • 'db.rebuild'

Example queue bindings that clients may use:

  • * - no filtering - client receives all notifications
  • package - only changes to packages
  • revision - all revisions
  • db - all database operations

Versioning

Since message payloads will be tied into the REST Entities, it makes sense to join up with the REST versioning. This could be achieved by providing new exchanges called 'ckan-1.1' perhaps?

Documentation

  • How to use
  • simple example of an external client?
1274720042000000 1277722821000000
#1790 enhancement dread ckan-future new Click to delete tags, rather than have all existing tags in the tag text box

From Pablo:

Editing the tags field is clumsy when there are too many tags. Could show existing effectively as tags (like delicious), then allow clicks to delete. New tags added via text box.

1328888674000000 1328888674000000
#1362 defect johnglover johnglover ckan-sprint-2011-10-10 closed fixed Clearing the database should also clear the search index

When paster db clean is run, the search index should also be cleared.

1317121861000000 1318256546000000
#2229 enhancement kindly kindly ckan-sprint-2012-03-19 closed fixed Cleanup plugin system after some test failed to run.

The logic test did not have init. This caused lots of tests to fail because there were mock extensions that ran automatically in them. Fix plugin system so this can work.

1331721611000000 1332163408000000
#2417 enhancement toby aron.carroll closed fixed Clean up output for dataset search results

Currently due to the data structure the search result filters are output incorrectly. Repeating the facet with each tag.

See http://s031.okserver.org:2375/dataset?tags=fibre&tags=terrestrial&q=Africa

It outputs:

Tags: fibre Tags: terrestrial

It should be:

Tags: fibre terrestrial

1337793213000000 1337864155000000
#2874 enhancement rgrp rgrp assigned Clean up bin directory

Full of obsolete material

1345190508000000 1345190515000000
#1648 enhancement shevski ckan-backlog closed fixed Clarify that additional info = extra fields + add guidance

Super ticket: #1506

Need to decide which term to use and then have the same for editing as well as viewing a dataset.

In creating/editing a dataset, want more explanation about adding extra fields (probably as a tooltip or similar).. i.e. that this let's you add extra custom metadata such as 'location: uk' which is then searchable etc

1326674843000000 1330632344000000
#1461 defect pudo closed fixed CkanClient doesn't submit auth headers for GET requests

e.g. package_register_get.

1321354037000000 1321359503000000
#1377 defect zephod zephod ckan-sprint-2011-10-10 closed fixed Ckan admin repair

Integrating ckanext-admin into core has thrown up a number of problems:

  • Look & feel does not match the rest of the site
  • Tests are not passing
  • On the trash page, clicking 'undelete' triggers a purge
  • Using the purge functionality is dangerous; deleting and purging the latest revision will corrupt a dataset (& several corrupt datasets have been found on thedatahub.org)
  • Trash page can contain nested form tags in certain cases (breaking test harness & form redirection)
1318240018000000 1318245795000000
#1171 enhancement mark.wainwright dread ckan 2.0 assigned Citation instructions on dataset and resource view pages

Some sort of citation helper. Something small on the dataset and resource page that would show how to cite.

wwaites: Some related thoughts on this from opb: http://homepages.inf.ed.ac.uk/opb/papers/ssdbm2006.pdf

timclicks: I'm looking at Dataverse for the first time[0]. It seems very popular in the social sciences. I noticed that there is a recommended citation for each dataset. For example, [1] is has this one: "Targeted Input Programme (TIP) 2000-01", http://hdl.handle.net/1902.1/SSC-MWI-TIP2000-01-M1 V1 [Version]"

Implementation

Add a small box at bottom of dataset / resource page (or in sidebar on dataset page) with title "Cite this" with contents like:

%title. %author. Retrieved %date. %site_title.

For resource: %title = %dataset_title. %resource_name.

Could also add export to ref managers (e.g. to bibtex) but that is for later.

1306920799000000 1347358705000000
#2943 enhancement dominik new Chrome does not resize preview

Chrome does not resize iframe after a full refresh/ on first load

1349089686000000 1349090759000000
#991 defect dread ckan-v1.3 closed fixed Checkbox defaults to True

Form for new package has CheckboxExtraField? checked, when the value is False. (as used in ckanext-dgu package v3 form)

1298035175000000 1298037717000000
#871 defect nils.toedtmann closed invalid Check whether localhost-only exim installtions need upgrading too

The infamous exim bug only needs one mail with prepared headers to travel through a exim system infect it. All local processes could do that, and some services (e.g. cron, webapps) send messages and might be convinced by malicious remote users to produce evil headers.

We should either rule out that this could happen on our systems, or upgrade all exims regardless of whether they are localhost-only or not.

BTW did we already run a rootkit checker like Rootkit hunter on eu1? If not we should maybe do it now - there was already an exploit out in the wild. ByteMark? has (a) already observed infections and (b) notified us because they remotely fingerprinted our mailer to be exim<4.70 (our EHLO banner contains the exim version), just as anyone could.

1292264117000000 1296340558000000
#2795 enhancement toby demo phase 5 new Check validation of HTML, CSS, JS

Ensure that we are being standards compliant

1343903128000000 1343903128000000
#1470 defect dread amercader ckan-sprint-2011-11-21 closed fixed Check user name in the profile form 1321446143000000 1324473955000000
#2476 defect seanh johnglover ckan-sprint-2012-06-25 closed wontfix Check that translating lists of strings is being tested in multilingual tests

Check that lists of strings are being correctly translated. See https://github.com/okfn/ckan/commit/f1d68c3d2d4d25a0c0f8a89a68940643fc19b156

1338378078000000 1339151396000000
#2651 enhancement icmurray ross ckan-v1.8 closed fixed Check support for TSV which doesn't appear to work well.

TSV support doesn't seem to work very well, may be the mimetype ( text/tab-separated-values )

See http://thedatahub.org/dataset/wikipedia-e3-timestamp-position-modification/resource/d883ab44-07f4-4992-800a-3e4bf5d53a96

1341923318000000 1343209784000000
#194 defect rgrp dread v0.11 closed fixed Check star ratings aren't influenced by search engine crawlers

rel=nofollow or robots.txt ?

1258471512000000 1265284389000000
#1814 enhancement amercader amercader ckan-sprint-2012-03-19 closed fixed Check publicadata.eu harvesters

Estimate 2d

Once ckanext-pdeu is running on CKAN 1.6, upgrade ckanext-harvest to be able to update the CKAN harvesters (default tags and extras).

Also check non-CKAN harvesters (specially scrappers) to see if they are still working.

Make a list of current harvesters with status and potential ones.

1329757408000000 1332152596000000
#612 task johnbywater johnbywater ckan-v1.3 closed duplicate Check given XML schema validates given metadata document 1284218750000000 1294408188000000
#1089 enhancement dread dread ckan-v1.4-sprint-6 closed fixed Check for "--ckan" when running nosetests

(because if you forget, you get difficult to understand errors, and more than one person has tripped up on this)

1302631189000000 1302631733000000
#617 task johnbywater johnbywater ckan-v1.3 closed duplicate Check UKLP schematron validates given metadata document 1284219298000000 1294408164000000
#659 enhancement nils.toedtmann dread closed fixed Check CKAN instance works automatically

Auto way to check web and API interface of a CKAN instance basically works. Several gotchas can be quickly determined, such as logging in, search not working. Needs to be configurable per site basis.

1285348333000000 1311183031000000
#660 requirement dread closed invalid Check CKAN instance works

As an admin I want to check a CKAN instance works having just upgraded it or configured it.

1285348463000000 1311183115000000
#415 task dread ckan-v1.2 closed fixed Chase Talis about loading RDF from CKAN. 1281431656000000 1288003954000000
#1791 defect dread dread ckan-sprint-2012-02-20 closed fixed Changing locale on /dataset/new causes exception

When you are on the /dataset/new page and you try and change locale then you get a 500 error.

This is because it adds the 'cache' parameter, to ensure any proxy cache in the chain does not just send the cached page.

e.g. http://127.0.0.1:5000/dataset/new?__cache=37713707

1329134556000000 1329138315000000
#1829 defect dread dread ckan-sprint-2012-03-05 closed fixed Changing back to English prints the flash message in the previous non-English language

On the homepage click "francais" and then "English". The flash message reads "Le langage a été fixé à: français" when it should say "The language is now: English".

1330000660000000 1330001990000000
#2959 defect icmurray icmurray ckan 2.0 new Changing a Group's name through the action api disassociates it from its datasets in the index

Repro:

  • Create a new Group, named "test-group".
  • Add a dataset to it.
  • Verify the dataset belongs to the group by visiting the Group's read-page
  • Update the Group through the action api (group_update), using the uid in the "id" field, and a new name in the "name" field.
  • Visit the group's read-page. The list of datasets will be empty.

This was an issue when editing a Group through the web interface, which was fixed in [1]. However it only fixes the issue in the group controller.

[1] https://github.com/okfn/ckan/commit/dbe25d8b8d7fabfc40c5d794a920b91cec349335

1349363935000000 1349363935000000
#1135 enhancement kindly rgrp assigned Changeset model for vdm

Move to Changeset model for vdm.

A changeset model is like an Audit-Log model in which we just record Changesets with Change-Objects rather than have Revision-Objects for each Object that is revisioned.

This change would also incorporate significant simplication of vdm.

1305209986000000 1340632267000000
#898 defect rgrp dread closed fixed Changes stored indefinitely

Every change to every object is being stored in memory, which could add up to quite a lot of memory.

This fixes it by making sure the objects are in a weakref. https://bitbucket.org/kindly/vdm/changeset/8d5f91db641f

1294659490000000 1294662408000000
#69 enhancement rgrp rgrp v0.9 closed fixed Change to text-only license field and use external license repo

Switch from license domain object to a simple license field and use license list from new centralised license repo:

<http://knowledgeforge.net/okfn/licenses/>

  • This will require a migration

Cost: 4h (plus migration ...)

1245687449000000 1246437494000000
#725 story pudo pudo iati-3 closed fixed Change to allow anyone (logged in) to create a publisher

With a pending state set ("unapproved")

1287584630000000 1289296038000000
#925 defect dread ckan-backlog closed fixed Change the search box icon to remove the down arrow

Is there a good reason why the search box has a 'down arrow' icon when there is no drop-down menu? Or can this be usefully removed?

1295867593000000 1323168588000000
#414 task johnbywater dread ckan-v1.2 closed fixed Change the Apache and Varnish ports

Ask Paul for a new machine for testing. Then one for varnish-live and one for varnish-test.

1281431639000000 1288003770000000
#62 enhancement dread rgrp v0.10 closed fixed Change tags to contain any character (other than space)

Requires us to url encode the tag names when displaying them ...

1240585095000000 1250181376000000
#2206 enhancement johnglover johnglover ckan-sprint-2012-03-19 closed fixed Change site header to match latest ODP template 1330958095000000 1331046486000000
#1534 enhancement rgrp ckan-backlog new Change revisions to record userid rather than username

The use of username is problematic because username's can change.

  • Change all revision creation code to use user id (simplest is to change c.author field in lib/base.py (?))
    • (?) Add a field ipaddr for ip address of anonymous users? (or just keep putting this in author field on Revision and then acception that those won't match when we do a look up against user table)
  • Change user view page to look up against user id rather than name
  • Perform migration on existing Revision objects
    • Match should probably be against both openid and username when searching Revisions' author field (especially true on CKAN where some people have already changed their username from being their openid)
1323278790000000 1338205050000000
#252 enhancement dread johnbywater closed invalid Change revision object so that it has parent(s) attribute 1266519767000000 1296477560000000
#2923 defect seanh ckan 2.0 new Change regularise -> regularize

The function is called regularise_html(), can't remember what file it's in.

1347530582000000 1347530582000000
#2246 enhancement johnglover johnglover ckan-sprint-2012-04-02 closed fixed Change published_by metadata field to reference group instead of a custom extra
  • probably needs a new converter, as needs to be usable via API as 'published_by'.
1332243036000000 1332864871000000
#126 enhancement dread dread v0.10 closed fixed Change package state in the WUI (delete and undelete)

As a Package Admin I want to change the state of the package. In particular I wish to delete and undelete it.

(NB: this is quite separate from "purging" objects which is the term we shall use for irrevocable removal of an object from the domain model).

  • Only Package Admins (and sysadmins) should be able to change state

Implementation Suggestions

  • 'delete' action should be renamed to 'change-state' (NB: this requires a db migration ...)
  • Have new package formalchemy form (created via inheritance?) to incorporate state attribute. Suggest this is rendered as a dropdown (and may be simple object rendering of state, i.e. do NOT need to change it to a single name such 'active').
  • This form should then be used when the user satisfies is_authorized(..., model.Action.CHANGE_STATE) instead of the usual fieldset
1253789571000000 1254740244000000
#752 task johnbywater johnbywater ckan-v1.3 closed duplicate Change package attribute names used by Gemini harvesting to DGU "v.4" 1288039205000000 1294408472000000
#198 enhancement rgrp dread closed fixed Change package and tag ids to uuids

See how we did it already for other things.

Note: on ckan.net older PackageRevision?.id might not be identical to Package.id but this may need sorting at this point.

1258980613000000 1266837606000000
#1149 enhancement kindly kindly ckan-v1.5-sprint-1 closed fixed Change domain object modification plugin to use Session extension.

This should make it more efficient as it currently does a lot of repeating work. i.e if you change a package and a resource in the same commit it sends out 2 notifications and should only really send out 1.

1305969863000000 1306090663000000
#2679 enhancement icmurray icmurray ckan-v1.9 new Change default behaviour of TemplateController.view to 404.

The current behaviour of TemplateController?.view() (which is the fallback controller should all others fail) is to attempt to render (as a genshi template) the requested file.

Although this may be a feature that some instances want. In general, it leads to:

  • 500s when attempting to access a normal template (eg - http://datahub.io/importer/preview)
  • A way of inadvertantly serving things you may not want to serve. (Small risk, as it needs to be renderable as a genshi template).

Solution:

  • Change the controller to 404
  • Ensure there's a way for existing ckan instances to override that behaviour should they need it.
1342436133000000 1342436133000000
#1653 enhancement toby ross ckan-sprint-2012-03-05 closed fixed Change URLs for multilingual site

To support multiple languages we should have an easy way to specify the language as part of the URL, so that URLs are both specific and we also reduce the dependency on the session.

  • Analysis [1d] - Find the best way of implementing this and how everyone else does their language URLs.
  • Write Middleware + update url_for to take account of the language. [2d]

  • Document the language setup, and how to replicate it. [1d]
1326710590000000 1329845387000000
#461 task dread johnbywater ckan-v1.2 closed fixed Change ONS data importing to work via API
  • Move script out to ckandgu repo
  • Change script to convert xml into package dicts
  • Test (against test.ckan.net, hmg.test.ckan.net)
  • Deploy
1282303411000000 1283250478000000
#2315 enhancement dread dread ckan-sprint-2012-04-30 closed fixed Change Cookie expiry

Change login cookie from a default expiry of 50 years to 2 years. You can also uncheck a 'remember me' checkbox on the login form for the cookie to just last the session.

Background conversation on ckan-dev:

DR: I wonder if anyone objects to the expiry of the login cookie to be changed from 50 years to 2 years? 50 years might be appropriate for thedatahub.org, but for government sites it seems (to me) to be too lax.

Toby: is this the repoze.who cookie? If so that seems sensible to me.

Rufus: Definitely agree. I would also like to see introduction of a standard "remember me" checkbox (set to true by default). At the moment a login lasts forever (until you logout) automatically.

1334919449000000 1334919522000000
#758 task johnbywater johnbywater ckan-v1.3 closed duplicate Change API documentation to indicate harvest source entity has filter attribute 1288040643000000 1294409053000000
#760 task johnbywater johnbywater ckan-v1.3 closed duplicate Change "CSW Get Records" request class to accept and used given CSW filter 1288040993000000 1294409111000000
#326 task dread dread v1.1 closed fixed Centralise importation of json library

Later versions of python use json which is better than simplejson, but it must be kept as an option for compatibility. So centralise the import of json to ckan.lib.helpers.

1274784223000000 1274789296000000
#1609 enhancement ross ross ckan-sprint-2012-01-23 closed fixed Celery task for ckanext-archiver to write to webstore.

From super Storage changes - #1574 - and http://ckan.okfnpad.org/newstorage we determined that ckanext-archiver should have a celery task for grabbing local file uploads and writing to webstore

Analysis

When I upload a file to CKAN:

  • End up with file in permanent storage
  • IF file is ot type ... csv,xls,xlsx,sqlite,.sql
    • End up with new db in webstore
      • Where? {username}/{resource-id}/...
        • If a single table: name it after the file name (appropriately slugified)
      • A resource *always* corresponds to a 'database' in webstore ...
      • In Data Explorer have "Sheets" tab ...
  • Resource url = /dataset/{x}/resource/{y}/link -> cached_url ...
1325582253000000 1327057030000000
#1809 enhancement johnglover johnglover ckan-sprint-2012-03-05 closed fixed Catch request exceptions in archiver link_checker task

Some request exceptions are currently not being caught (see the celery log on thedatahub for examples)

1329746267000000 1330528828000000
#1616 defect amercader amercader ckan-sprint-2012-04-02 closed fixed Catch exceptions when rebuilding the search index

Right now if an exception is found while reindexing, the whole process stops and the remaining datasets are left out of the index. The process should continue after logging the exception. If more than a certain number of exceptions occur in a row, the process should stop.

1325844669000000 1332327635000000
#486 requirement johnbywater ckan-v1.3 closed duplicate Catalogue service shall notify and query SOLR service 1282425790000000 1291639321000000
#488 requirement johnbywater closed wontfix Catalogue service shall notify RDF service 1282426021000000 1320930240000000
#480 requirement thejimmyg johnbywater ckan-v1.4 closed fixed Catalogue service shall conform to specification

Common requirements for running CKAN behind a (e.g Wordpress or Drupal) front-end:

  1. Unrestricted total read-only access to catalogue API for general public (e.g. voluntary organisation).
    • monitored by API key
    • not monitored by API key
  2. Restricted total read-write access to catalogue API for authorized clients (e.g. front-end system, bulk upload clients).
    • restricted by CKAN access controller
    • restricted by HTTP Auth
    • restricted by IP address
  3. Restricted total read-write access to catalogue Web UI for authorized users (e.g. site admins).
    • restricted by CKAN access controller
    • restricted by HTTP Auth
  4. Restricted partial read-write access to catalogue Web UI for authorized users (e.g. group admins).
    • restricted by CKAN access controller
    • restricted by HTTP Auth

CKAN as a catalogue service

1282422612000000 1300281551000000
#2725 enhancement toby shevski demo phase 5 new Case sensitivity on tags

My feeling is that 'country-US' and 'country-us' should be the same tag. However currently tags with caps are treated differently

see http://s031.okserver.org:2375/en/dataset/test-dataset

with TEST and test - there also get indexed twice in the search page

1342949667000000 1343030773000000
#1431 defect dread dread ckan-v1.5 closed fixed Captcha field - foreign chars cause exception

During registering a user, the user inputs foreign chars into the captcha field.

URL: http://thedatahub.org/user/register
...
Module ckan.lib.captcha:22 in check_recaptcha
<<                                     remoteip=client_ip_address,
                                          challenge=recaptcha_challenge_field,
                                          response=recaptcha_response_field))
           f = urllib2.urlopen(recaptcha_server_name, params)
           data = f.read()
>>  response=recaptcha_response_field))
Module urllib:1267 in urlencode
<<          for k, v in query:
                   k = quote_plus(str(k))
                   v = quote_plus(str(v))
                   l.append(k + '=' + v)
           else:
>>  v = quote_plus(str(v))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xea' in position 0: ordinal not in range(128)
1320078849000000 1320084104000000
#1659 defect dread dread ckan-sprint-2012-01-23 closed fixed Cannot logout if CKAN mounted at non-root url

If you set WSGIScriptAlias to mount CKAN at a URL other than / then you cannot logout without adjusting the OpenID logged_out_url to match in who.ini config. e.g.

[plugin:openid] ... logged_out_url = /sub/dir/user/logged_out

Note: all the other URLs in who.ini should not have the /sub/dir/ - it is just this one that doesn't take account of the mounting point.

The solution is to fix-up the repoze.who OpenID plugin to take account of the mounting point.

1326716302000000 1326747205000000
#3019 defect seanh ckan 2.0 new Cannot delete dataset extras

Deleting extras in the web interface is broken

1352918678000000 1352918678000000
#1577 defect rgrp dread ckan-backlog new Can't upload file with foreign chars in filename

Looks like uploading a file with foreign characters fails due to encoding reasons.

URL: http://thedatahub.org/api/storage/auth/form/2011-12-19T124447/Ministerstvo-financ%C3%AD-%C4%8Cesk%C3%A9-republiky-_-P%C5%99%C3%ADprava-rozpo%C4%8Dtu.pdf
Module weberror.errormiddleware:162 in __call__
<<              __traceback_supplement__ = Supplement, self, environ
                   sr_checker = ResponseStartChecker(start_response)
                   app_iter = self.application(environ, sr_checker)
                   return self.make_catching_iter(app_iter, environ, sr_checker)
               except:
>>  app_iter = self.application(environ, sr_checker)
Module beaker.middleware:73 in __call__
<<                                                     self.cache_manager)
               environ[self.environ_key] = self.cache_manager
               return self.app(environ, start_response)
>>  return self.app(environ, start_response)
Module beaker.middleware:152 in __call__
<<                          headers.append(('Set-cookie', cookie))
                   return start_response(status, headers, exc_info)
               return self.wrap_app(environ, session_start_response)
           
           def _get_session(self):
>>  return self.wrap_app(environ, session_start_response)
Module routes.middleware:130 in __call__
<<                  environ['SCRIPT_NAME'] = environ['SCRIPT_NAME'][:-1]
               
               response = self.app(environ, start_response)
               
               # Wrapped in try as in rare cases the attribute will be gone already
>>  response = self.app(environ, start_response)
Module pylons.wsgiapp:125 in __call__
<<          
               controller = self.resolve(environ, start_response)
               response = self.dispatch(controller, environ, start_response)
               
               if 'paste.testing_variables' in environ and hasattr(response,
>>  response = self.dispatch(controller, environ, start_response)
Module pylons.wsgiapp:324 in dispatch
<<          if log_debug:
                   log.debug("Calling controller class with WSGI interface")
               return controller(environ, start_response)
           
           def load_test_env(self, environ):
>>  return controller(environ, start_response)
Module ckan.lib.base:123 in __call__
<<          # available in environ['pylons.routes_dict']    
               try:
                   return WSGIController.__call__(self, environ, start_response)
               finally:
                   model.Session.remove()
>>  return WSGIController.__call__(self, environ, start_response)
Module pylons.controllers.core:221 in __call__
<<                  return response(environ, self.start_response)
               
               response = self._dispatch_call()
               if not start_response_called:
                   self.start_response = start_response
>>  response = self._dispatch_call()
Module pylons.controllers.core:172 in _dispatch_call
<<              req.environ['pylons.action_method'] = func
                   
                   response = self._inspect_call(func)
               else:
                   if log_debug:
>>  response = self._inspect_call(func)
Module pylons.controllers.core:107 in _inspect_call
<<                        func.__name__, args)
               try:
                   result = self._perform_call(func, args)
               except HTTPException, httpe:
                   if log_debug:
>>  result = self._perform_call(func, args)
Module pylons.controllers.core:60 in _perform_call
<<          """Hide the traceback for everything above this method"""
               __traceback_hide__ = 'before_and_this'
               return func(**args)
           
           def _inspect_call(self, func):
>>  return func(**args)
Module ckanext.storage.controller:2 in auth_form
Module ckan.lib.jsonp:26 in jsonpify
<<      Very much modelled after pylons.decorators.jsonify .
           """
           data = func(*args, **kwargs)
           return to_jsonp(data)
>>  data = func(*args, **kwargs)
Module ckanext.storage.controller:301 in auth_form
<<          method = 'POST'
               authorize(method, bucket, label, c.userobj, self.ofs)
               data = self._get_form_data(label)
               return data
>>  authorize(method, bucket, label, c.userobj, self.ofs)
Module ckanext.storage.controller:79 in authorize
<<      if method != 'GET':
               # do not allow overwriting
               if ofs.exists(bucket, key):
                   abort(409)
               # now check user stuff
>>  if ofs.exists(bucket, key):
Module ofs.remote.botostore:53 in exists
<<          if bucket is None: 
                   return False
               return (label is None) or (label in bucket)
           
           def claim_bucket(self, bucket):
>>  return (label is None) or (label in bucket)
Module boto.s3.bucket:87 in __contains__
<<      def __contains__(self, key_name):
              return not (self.get_key(key_name) is None)
       
           def startElement(self, name, attrs, connection):
>>  return not (self.get_key(key_name) is None)
Module boto.s3.bucket:144 in get_key
<<          response = self.connection.make_request('HEAD', self.name, key_name,
                                                       headers=headers,
                                                       query_args=query_args)
               # Allow any success status (2xx) - for example this lets us
               # support Range gets, which return status 206:
>>  query_args=query_args)
Module boto.s3.connection:388 in make_request
<<          if isinstance(key, Key):
                   key = key.name
               path = self.calling_format.build_path_base(bucket, key)
               boto.log.debug('path=%s' % path)
               auth_path = self.calling_format.build_auth_path(bucket, key)
>>  path = self.calling_format.build_path_base(bucket, key)
Module boto.s3.connection:88 in build_path_base
<<      def build_path_base(self, bucket, key=''):
               return '/%s' % urllib.quote(key)
       
       class SubdomainCallingFormat(_CallingFormat):
>>  return '/%s' % urllib.quote(key)
Module urllib:1222 in quote
<<              safe_map[c] = (c in safe) and c or ('%%%02X' % i)
               _safemaps[cachekey] = safe_map
           res = map(safe_map.__getitem__, s)
           return ''.join(res)
>>  res = map(safe_map.__getitem__, s)
KeyError: u'\xed'
CGI Variables
AUTH_TYPE	'cookie'
CONTENT_TYPE	'; charset=utf-8'
DOCUMENT_ROOT	'/htdocs'
GATEWAY_INTERFACE	'CGI/1.1'
HTTP_ACCEPT	'*/*'
HTTP_ACCEPT_CHARSET	'ISO-8859-1,utf-8;q=0.7,*;q=0.3'
HTTP_ACCEPT_ENCODING	'gzip,deflate,sdch'
HTTP_ACCEPT_LANGUAGE	'en-US,en;q=0.8'
HTTP_CACHE_CONTROL	'max-age=259200'
HTTP_CONNECTION	'keep-alive'
HTTP_COOKIE	'thedatahub_net=27a7f095fcca1ea6b36df996d595e3278b16f4538862bf7f88d49e2000b9246547c8fd0e; auth_tkt="f9c6ab2b0d9fcd71c4c2408bc12fab544eef1c45elenaibp!userid_type:unicode"; auth_tkt="f9c6ab2b0d9fcd71c4c2408bc12fab544eef1c45elenaibp!userid_type:unicode"; ckan_user=elenaibp; ckan_display_name="Elena Mondo"; ckan_apikey=decd48b1-49ee-4250-bff4-98ccca9c02a5; hide_welcome_message=1; __utma=119670349.1809834699.1323782464.1324293066.1324298316.4; __utmb=119670349.3.10.1324298316; __utmc=119670349; __utmz=119670349.1323782464.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)'
HTTP_HOST	'thedatahub.org'
HTTP_REFERER	'http://thedatahub.org/dataset/edit/budget-library-czeck-republic'
HTTP_USER_AGENT	'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/535.7 (KHTML, like Gecko) Chrome/16.0.912.63 Safari/535.7'
HTTP_VIA	'1.1 localhost (squid/3.0.STABLE19)'
HTTP_X_FORWARDED_FOR	'87.114.74.190'
HTTP_X_REQUESTED_WITH	'XMLHttpRequest'
PATH	'/usr/local/bin:/usr/bin:/bin'
PATH_INFO	'/api/storage/auth/form/2011-12-19T124447/Ministerstvo-financ\xc3\xad-\xc4\x8cesk\xc3\xa9-republiky-_-P\xc5\x99\xc3\xadprava-rozpo\xc4\x8dtu.pdf'
PATH_TRANSLATED	'/home/okfn/var/srvc/ckan.net/pyenv/bin/ckan.net.py/api/storage/auth/form/2011-12-19T124447/Ministerstvo-financ\xc3\xad-\xc4\x8cesk\xc3\xa9-republiky-_-P\xc5\x99\xc3\xadprava-rozpo\xc4\x8dtu.pdf'
REMOTE_ADDR	'193.34.146.142'
REMOTE_PORT	'55419'
REMOTE_USER	u'elenaibp'
REMOTE_USER_DATA	'userid_type:unicode'
REMOTE_USER_TOKENS	['']
REQUEST_METHOD	'GET'
REQUEST_URI	'/api/storage/auth/form/2011-12-19T124447/Ministerstvo-financ%C3%AD-%C4%8Cesk%C3%A9-republiky-_-P%C5%99%C3%ADprava-rozpo%C4%8Dtu.pdf'
SCRIPT_FILENAME	'/home/okfn/var/srvc/ckan.net/pyenv/bin/ckan.net.py'
SCRIPT_URI	'http://thedatahub.org/api/storage/auth/form/2011-12-19T124447/Ministerstvo-financ\xc3\xad-\xc4\x8cesk\xc3\xa9-republiky-_-P\xc5\x99\xc3\xadprava-rozpo\xc4\x8dtu.pdf'
SCRIPT_URL	'/api/storage/auth/form/2011-12-19T124447/Ministerstvo-financ\xc3\xad-\xc4\x8cesk\xc3\xa9-republiky-_-P\xc5\x99\xc3\xadprava-rozpo\xc4\x8dtu.pdf'
SERVER_ADDR	'193.34.146.146'
SERVER_ADMIN	'[no address given]'
SERVER_NAME	'thedatahub.org'
SERVER_PORT	'80'
SERVER_PROTOCOL	'HTTP/1.0'
SERVER_SIGNATURE	'<address>Apache/2.2.14 (Ubuntu) Server at thedatahub.org Port 80</address>\n'
SERVER_SOFTWARE	'Apache/2.2.14 (Ubuntu)'
WSGI Variables
application	<beaker.middleware.CacheMiddleware object at 0x7f22601c7dd0>
beaker.cache	<beaker.cache.CacheManager object at 0x7f22601c7b50>
beaker.get_session	<bound method SessionMiddleware._get_session of <beaker.middleware.SessionMiddleware object at 0x7f22601c7a90>>
beaker.session	{'_accessed_time': 1324298703.071357, '_creation_time': 1324293077.4139669}
mod_wsgi.application_group	'ckan.net|'
mod_wsgi.callable_object	'application'
mod_wsgi.listener_host	''
mod_wsgi.listener_port	'80'
mod_wsgi.process_group	'ckan.net'
mod_wsgi.reload_mechanism	'1'
mod_wsgi.script_reloading	'1'
mod_wsgi.version	(2, 8)
paste.cookies	(<SimpleCookie: __utma='119670349.1809834699.1323782464.1324293066.1324298316.4' __utmb='119670349.3.10.1324298316' __utmc='119670349' __utmz='119670349.1323782464.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)' auth_tkt='f9c6ab2b0d9fcd71c4c2408bc12fab544eef1c45elenaibp!userid_type:unicode' ckan_apikey='decd48b1-49ee-4250-bff4-98ccca9c02a5' ckan_display_name='Elena Mondo' ckan_user='elenaibp' hide_welcome_message='1' thedatahub_net='27a7f095fcca1ea6b36df996d595e3278b16f4538862bf7f88d49e2000b9246547c8fd0e'>, 'thedatahub_net=27a7f095fcca1ea6b36df996d595e3278b16f4538862bf7f88d49e2000b9246547c8fd0e; auth_tkt="f9c6ab2b0d9fcd71c4c2408bc12fab544eef1c45elenaibp!userid_type:unicode"; auth_tkt="f9c6ab2b0d9fcd71c4c2408bc12fab544eef1c45elenaibp!userid_type:unicode"; ckan_user=elenaibp; ckan_display_name="Elena Mondo"; ckan_apikey=decd48b1-49ee-4250-bff4-98ccca9c02a5; hide_welcome_message=1; _ _utma=119670349.1809834699.1323782464.1324293066.1324298316.4; __utmb=119670349.3.10...)|utmcmd=(none)')
paste.registry	<paste.registry.Registry object at 0x7f226194df50>
paste.throw_errors	True
pylons.action_method	<bound method StorageAPIController.auth_form of <ckanext.storage.controller.StorageAPIController object at 0x7f2261dad990>>
pylons.controller	<ckanext.storage.controller.StorageAPIController object at 0x7f2261dad990>
pylons.environ_config	{'session': 'beaker.session', 'cache': 'beaker.cache'}
pylons.pylons	<pylons.util.PylonsContext object at 0x7f2261daddd0>
pylons.routes_dict	{'action': u'auth_form', 'controller': u'ckanext.storage.controller:StorageAPIController', 'label': u'2011-12-19T124447/Ministerstvo-financ\xed-\u010cesk\xe9-republiky-_-P\u0159\xedprava-rozpo\u010dtu.pdf'}
repoze.who.identity	<repoze.who identity (hidden, dict-like) at 139785645747120>
repoze.who.logger	<logging.Logger instance at 0x7f225e23c098>
repoze.who.plugins	{'openid': <OpenIdIdentificationPlugin 139785625065680>, 'friendlyform': <FriendlyFormPlugin 139785618095248>, 'ckan.lib.authenticator:UsernamePasswordAuthenticator': <ckan.lib.authenticator.UsernamePasswordAuthenticator object at 0x7f2260874c10>, 'auth_tkt': <AuthTktCookiePlugin 139785625065808>, 'ckan.lib.authenticator:OpenIDAuthenticator': <ckan.lib.authenticator.OpenIDAuthenticator object at 0x7f2260874c90>}
routes.route	<routes.route.Route object at 0x7f22601a1090>
routes.url	<routes.util.URLGenerator object at 0x7f2261dadf50>
webob._parsed_query_vars	(GET([]), '')
webob.adhoc_attrs	{'language': 'en-us'}
wsgi process	'Multiprocess'
wsgi.file_wrapper	<built-in method file_wrapper of mod_wsgi.Adapter object at 0x7f2261da9af8>
wsgiorg.routing_args	(<routes.util.URLGenerator object at 0x7f2261dadf50>, {'action': u'auth_form', 'controller': u'ckanext.storage.controller:StorageAPIController', 'label': u'2011-12-19T124447/Ministerstvo-financ\xed-\u010cesk\xe9-republiky-_-P\u0159\xedprava-rozpo\u010dtu.pdf'})
1324317659000000 1325473564000000
#1374 defect dread dread ckan-sprint-2011-10-24 closed fixed Can't switch to English if default is non-English

e.g. cz.ckan.net defaults to Czech (config option lang=cs_CZ) but it fails when you try to switch to English.

1317893975000000 1319648746000000
#2918 enhancement johnmartin ross ckan 2.0 closed fixed Can't remove users from organizations

When you remove someone, without adding them, the text box at the bottom (which should probably autocomplete) is empty, and this causes problems on the server.

Ideally when you add a user (select from the autocomplete) it would add another row to the table, defaulting the user to editor and setting the names to user{{X}}name and user{{X}}capacity where X is $('tr').size()

1347455572000000 1347970735000000
#662 defect sebbacon johnbywater ckan-v1.4 closed fixed Can't put entity that is returned by posting to package register

It's because Package carries several out-of-band values, which are snagged on the way back out. Entity get response also can't be posted.

However, post response can be re-posted (because it isn't the same as the register-post/entity-get responses.

An issue for CKAN too.

Sub-ticket of #961 (form, validation, model sync meta-ticket) and depends on that work.

1285410546000000 1301076463000000
#1419 enhancement dread ckan-sprint-2011-11-07 closed invalid Can't log in via OpenID

I couldn't log into theDataHub with OpenID today. I tried both Google ID and MyOpenID. Both times the login on the remote auth server went fine, but when it returns you to theDataHub you get error "Login failed. Bad username or password."

1319543013000000 1319796164000000
#1479 defect dread dread ckan-sprint-2011-12-05 closed fixed Can't edit a user with a unicode email address
  1. Register User with an email address with a unicode char (e.g. u'\u044e')
  2. View the User in the UI (/user/) or with 'user_show' Action API

Exception:

Module ckan.controllers.user:98 in read
<<          try:
                   user_dict = get_action('user_show')(context,data_dict)
               except NotFound:
                   h.redirect_to(controller='user', action='login', id=None)
>>  user_dict = get_action('user_show')(context,data_dict)
Module ckan.logic.action.get:488 in user_show
<<      check_access('user_show',context, data_dict)
       
           user_dict = user_dictize(user_obj,context)
       
           if not (Authorizer().is_sysadmin(unicode(user)) or user == user_obj.name):
>>  user_dict = user_dictize(user_obj,context)
Module ckan.lib.dictization.model_dictize:189 in user_dictize
<<      
           result_dict['display_name'] = user.display_name
           result_dict['email_hash'] = user.email_hash
           result_dict['number_of_edits'] = user.number_of_edits()
           result_dict['number_administered_packages'] = user.number_administered_packages()
>>  result_dict['email_hash'] = user.email_hash
Module ckan.model.user:59 in email_hash
<<          if self.email:
                   e = self.email.strip().lower()
               return hashlib.md5(e).hexdigest()
               
           def get_reference_preferred_for_uri(self):
>>  return hashlib.md5(e).hexdigest()
UnicodeEncodeError: 'ascii' codec can't encode character u'\u044e' in position 17: ordinal not in range(128)
1321960486000000 1321961592000000
#2266 defect dread dread ckan-sprint-2012-04-02 closed fixed Can't delete all of a package's resources over REST API

Nothing happens if you set resources=[] or resources=null.

1332932504000000 1332932634000000
#951 defect adrian.pohl@… closed invalid Can't add a package to group

I can't add a package (e.g. http://ckan.net/package/ub-konstanz) to a group (e.g. http://ckan.net/group/bibliographic). It's neither possible when editing a package (the only group in drop down menu is "history") nor on the group page.

1296726886000000 1314031006000000
#1356 enhancement kindly amercader ckan-sprint-2011-10-10 closed fixed Can not recreate a deleted extra

If you delete an extra and later on change your mind, you can not recreate it with the same value (Different value works fine).

1317034180000000 1318279617000000
#1223 enhancement pudo pudo closed fixed Caching of static files

StaticURLParser can have caching - use it

1310573854000000 1310573893000000
#668 defect thejimmyg Colin Calnan closed invalid Caching issues on API v1

It seems like the API v1 on CKAN metastable (cset:ec21f8e1c87e) has some caching issues.

Steps to test:

  1. Modify a dataset on datadotgc.ca, redirects to CKAN
  1. On save, redirects to http://www.datadotgc.ca/update/geogratisnat_hydrography_v100 which in turn redirects to http://www.datadotgc.ca/dataset/geogratisnat_hydrography_v100
  1. You can see that the Dataset has not updated correctly. Run a check on the API v1 - http://ca.ckan.net/api/1/rest/package/geogratisnat_hydrography_v100 the updates are not present
  1. Check the v2 of the API - http://ca.ckan.net/api/rest/package/geogratisnat_hydrography_v100, the updates are present.
  1. Setting the headers to 'Cache-control: no-cache' or 'Pragma: no-cache' does not work either.
1285953542000000 1311176649000000
#841 enhancement kindly dread ckan-v1.4-sprint-4 closed duplicate Caching docs (as a whole)

Documentation article on caching / improving performance. (To complement configuration docs.)

  • Different sorts of cache - beaker style, etags, package_dict in search results(?)
  • How each one affects performance
  • How to turn them on/off and configure them
  • Is it possible to bypass each of them in the browser or with wget/curl?
1291308879000000 1300364333000000
#537 task wwaites wwaites closed duplicate Caching and Performance improvement

There are several places where performance is unacceptably slow. Even in places where it is not, the system could still be more responsive for read requests.

Introducing caching has to be done carefully and should be done in a standards compliant manner.

General strategy

  • Where possible, cache output within the pylons app (beaker).
  • Facilitate external caching in an end-user's web browser or a caching proxy
  • Slightly stale data is not necessarily much of a problem so allow the output to be cached for a relatively short period (e.g. 5-15 minutes).
  • When cache expiry has been reached, a request will be made to the server. The server should check if its internally cached data is still valid, and serve that, otherwise regenerate the data.

Tasks

These tasks should be broken into sub-tickets:

  • caching of parts of templates that are expensive to render (package list, tag list, group list)
  • caching of entire output using beaker particularly for API read operations.
  • need to perform a check to see if the cache should be invalidated by checking if anything in the output would have changed -- i.e. checking timestamps on package modifications. this is a natural place to introduce the ETag which will help browsers and web caches.
  • cache infrastructure front end - varnish, squid, etc. To do this right, the controllers need to set the cache control headers appropriately (max-age, must-revalidate). This is a good resource: http://www.mnot.net/cache_docs/#CACHE-CONTROL
    • Deploy varnish on a host dedicated to this purpose for research. This will be useful for other sites as well
    • Do not configure varnish to ignore cache control headers or otherwise behave in a non HTTP/1.1 compliant manner

Future Work

  • Investigate ckanclient library maintaining a local cache as a web browser would
  • Investigate using a CDN like Google Storage or Amazon for serving cached data.
1283184362000000 1311178929000000
#728 requirement amercader johnbywater ckan-backlog assigned CSW Harvesting shall be optimised in respect of reharvesting only records that have changed

Hi Will, this is important again because some CSW servers we use have over 300 documents in. Could you take a look at modifying the filter please?

1287675340000000 1310124784000000
#623 task johnbywater johnbywater ckan-v1.2 closed fixed CSW GetRecords request for all identifiers (with CSW authentication) 1284220777000000 1287507837000000
#710 task johnbywater johnbywater ckan-v1.2 closed fixed CSW GetRecordById request for given identifier 1287432675000000 1287507854000000
#1660 defect rgrp lucychambers ckan-sprint-2012-02-06 closed wontfix CSV preview broken - OpenSpending

This CSV resource used to preview but now the format appears to be unsupported: "We are unable to preview this type of resource: x-osdata-csv"

http://thedatahub.org/dataset/lbhf-spending-2010/resource/9661abbd-2816-4d58-8b20-3cb0eb770c69

This is used as an example by the OpenSpending? team all the time.

1326717846000000 1328013627000000
#1219 defect timmcnamara closed fixed CSS issues on IE7

As reported on ckan-dev:

items in the footer of CKAN ("Packages", "Groups & Tags", "About", "Language", etc.) are shown vertically instead of horizontally in IE7. This works fine in later browsers like IE8, IE9, FF4, and latest Opera and Chrome.

This seems to exist in all recent CKAN versions up to 1.4.1a.

1310423688000000 1310740534000000
#1134 CREP amercader ckan-backlog new CREP0003: Description and Configuration of Harvesters

Proposer: Adrià Mercader

Abstract

The new harvester interface allows to create harvesters for different sources, but right now harvesters don't have many ways to describe and configure themselves. We need a way of allowing them to:

  • Expose their type and other details so they can be used internally and on the UI.
  • Define configuration settings for particular harvester instances.

The Problem

Harvester description

The current UI for adding and editing harvest sources is the same used in ckanext-dgu, and thus the 3 harvester types used in DGU to harvest various GEMINI realted sources are hardcoded in the form. The form will be migrated to a DGU-independent one, so we need the harvesters to provide all the necessary data. There is a current get_type method that returns the harvester type, but for make it compatible with the DGU forms, it returns a machine-readable string (e.g. "CSW Server"), making it error prone.

Arbitrary configuration

In the current implementation, when the harvest process is started, ckanext-harvest looks for all the available plugins that implement the IHarvester interface and calls the appropiate methods for the current stage (gather_stage,fetch_stage,import_stage). At these stages, harvesters have no way of applying arbitrary configuration options, so all harvesters of the same type behave on the same way. For instance, the CKAN harvester needs a way to define the API version to use when harvesting remote instances (Right now, the version 2 is hardcoded on the code).

Specification

Harvester description

Harvesters will need to provide the following information so the UI form can be built:

  • name: machine-readable name (e.g. "waf"). This will be the value stored in the database, and the one used by ckanext-harvest to call the appropiate harvester.
  • title: human-readable name (e.g. "Web Accessible Folder (WAF)"). This will appear in the form's select box.
  • description: a description of what the harvester does (e.g. "A Web Accessible Folder (WAF) displaying a list of GEMINI 2.1 documents"). This will appear on the form as a guidance to the user.

The way to provide it will be an info method that all harvesters must implement, which will return a dictionary with the previous elements:

    {
        'name': 'csw',
        'title': 'CSW Server',
        'description': 'A server that implements OGC's Catalog Service 
                        for the Web (CSW) standard'
    }

Arbitrary configuration

As different harvesters will have very different needs, we need to provide a way to persist arbitrary configuration flags for each harvest source. The more flexible way given the current architecture in my opinion would be to store the configuration options as a JSON encoded object as a property of the harvest source (There already is an unused DB field called config in the database) (Maybe using JsonType??).

This will mean adding an extra field in the harvest source form to allow entering the configuration. This could be just a simple text field where users enter the JSON encoded object or a more clever mechanism (i.e an "Add a configuration flag" link that adds two new text fields for the key and value for each flag, and a mechanism to later build the JSON object). In any case, this should probably be hidden in an "Advance options" section.

Why do it this way

Harvester description

The info method would provide a single point to get all the information related to the harvester, and future properties could be added to the dictionary returned without having to modify the interface.

Arbitrary configuration

There is an already existing config field in the database, so we won't need to change the model. Harvesters could access the config object at any of the stages. Of course they could provide default values in their implementations so users don't need to enter them everytime.

Implementation plan

Deliverables

Risks and mitigations

The highest risk on the harvesters info method side is that harvester implementation don't offer one of the necessary properties (namely name and title). This could fire a warning when showing the UI form or using the CLI.

Participants

Adrià Mercader to do it.

Progress

None yet.

1305108868000000 1339774554000000
#1129 CREP kindly ckan-v1.5 closed fixed CREP0002: Moderated Edits

Proposer: David Raznick

Abstract.

We are trying to achieve these goals.

  • To get people involved with making edits to CKAN metadata.
  • To have an ownership model as to who can moderate and validate these changes
  • To not put too huge a burden on these owners.

In order to achieve this, a feature which lets anyone edit a package but only let the moderator/owner accept it. The moderator should be able to look at a list of changes and accept the ones that

This cep is not about 'if' we need such a feature, it is about 'how' we go about implementing it. Another cep may needed for the 'if' case.

The Problem

We need the following to be possible.

  • Storing revision of objects that are not the current active one.
  • A way of the user viewing past revisions.
  • Accessing not only the history of a particular object but also of related objects at that time. i.e If a resource related to a package changes we need a way to see this when looking at the package.
  • A robust way of doing this in the face of database schema changes.
  • Make sure database queries are quick.

Solutions.

  1. Store the whole dictization of the package and all its related objects every time you change anything in its dictized representation and only save to the database proper if accepted.

Pros

  • Easy to implement, we already have a preview which makes the dictized form of a package without actually saving it. This will just need to be persisted in some way.
  • Fast retrieval.
  • Potential to store a branching revision tree of changes.

Cons

  • No easy way to remake the dictized packages historically or if there is an there a change in the way we represent packages, i.e schema changes.
  • Will only work for the particular objects we decide to store these changes for.
  • Stores a lot of repeated information
  1. Write specialized queries for every read of the database looking only at the revision tables.

This method requires there to be a change in the way we use VDM, so that we manage statefulness ourselves. We will need to add other states such as 'waiting for approval'.

Pros

  • No specialized storage required
  • Only need to change queries when schema changes
  • Can be made to work easily for other objects

Cons

  • Slower query time on read, as even looking at the last active package will need to do a fairly complicated query.

Implementation details.

1.

A new table with columns id, user, package_id, timestamp, revision_id, parent_id, dictized_package. revision_id should be null unless it is actually persisted to the database. parent_id is the id that this package_dict was changed from.

We could store only the diffs of the dictized_package as long as we assure that everything inside the json is stably sorted, this will make getting the historical data out slower.

Getting out the history of the dictized packages is an intensive task, as it will require replaying the whole history of all the changes and creating the dict for each change. This re-caching will need to be redone for every change we make to dictized representation of a package.

2.

Every normal packages read needs to look at the revision table to see the last accepted change in the dictized representation of the package. We also need to way to get what the dictized representation of the package was like at any point of its revision history. This querying is non-trivial in sql.

Participants

David Raznick to do it.

Progress.

Decided to go with option 2. However we will change the revisioning system to be like the schema attached. This gets rid of difficult querying problems caused by querying the revision tables by adding an end date, meaning you can do range queries.

The better and more normalized version of a revisioning system is outlined https://docs.google.com/drawings/d/1Y7nMgVsrs081Pame2RdbZHlCAlV33ddTZ8VAsab1j-0/edit?hl=en_GB&authkey=CJfd8vsB. We will be a step closer to that, with this change, but we will keep the current vdm more or less, intact.

1304851498000000 1325268100000000
#1127 CREP sebbacon closed fixed CREP0001: Formalise new feature discussion and definition using CREPs

Proposer: Seb Bacon
Seconder: Rufus Pollock

Abstract

When adding major new features to CKAN, a longer, more formal discussion will improve software design quality and documentation, better engage the wider community, and ensure the core team are up to date with latest developments.

I propose a formal process (CREP -- CKAN Revision and Enhancement Proposal) for making this happen.

The Problem

The current workflow for introducing major new features into CKAN is very informal, typically based around one person's great idea, which they've discussed with one or two other people in the team. The originator of the idea is typically the only person with access to all the input they've had through such discussions. Often, the only location of this information is in that person's head.

However, there is a lot of experience embodied in the CKAN community which should be drawn on before making large design decisions. This will lead to better software. Additionally, building consensus in the community around a proposal before implementation ensures positive community engagement and buy-in to new features, making them more likely to be a success.

We aren't great at documenting new features. Documentation after coding is complete is an unrewarding experience for most programmers. Requiring skeleton documentation before code is written is a good discipline that can form the basis of better documentation in the future (e.g. by a writer rather than a programmer).

Specification

Minor features don't require a CREP, and can just be entered in the issue tracking system as a bug or feature. As a rule of thumb, a feature is major if it will take more than a day to implement, or is likely to involve matters of opinion in its design.

A developer may decide that a CREP is too formal and long-winded. The decision to write a CREP is at at their discretion; however, new features MUST always be proposed via email, even if this is just a couple of sentences.

If a feature requires a CREP, the proposer should find a seconder for their idea. This sanity check step happens before a CREP is written to ensure at least the possibility of consensus on the CREP.

Next the proposer should write a CREP, starting by copying and pasting the template on the wiki into a new Trac ticket. This will be with a status of "new" and Type of "CREP". The proposer should notify the ckan-dev mailing list, and possibly the ckan-discuss list for less technical CREPs.

The draft can be discussed via email, verbally, or via the trac ticket. In any case, it is the proposer's responsibility to keep the CREP updated to reflect the current consensus.

Once consensus has been reached, the ticket should be marked with the "accepted" status and assigned to a CKAN release milestone.

When an accepted CREP has been implemented, it should be resolved as "fixed".

If no consensus can be reached on a draft CREP, or for some reason an accepted CREP doesn't get completed, it should be marked as or "wontfix".

If a completed CREP becomes obsolete, it should be marked as "invalid", with a note pointing to the obsoleting ticket(s)

Why do it this way

Given the distributed nature of the core team plus other volunteers, some kind of written procedure is necessary to ensure a fully documented and discussed proposal.

The idea of "Enhancement Proposals" which can be semi-formally proposed and discussed prior to implementation is common in the Open Source world (PEPs, DEPs, PLIPs, to name three).

Existing historic proposals exist, called CEPs. The proposed system is called CREP (CKAN revision or enhancement proposal) to disambiguate it from the legacy proposals, and from the delicious fungus Boletus Edulis.

Giving a formal structure to the proposal is useful as it gives the community a means to identify a CREP that's not had sufficient thought or discussion. An informal email thread can easily be lost and important questions (such as backwards compatibility) overlooked. The use of the proposed template empowers any community member to ask the proposer to expand on rationale, deliverables, etc.

The structure chosen is somewhere between Debian's and Plone's. It aims to give a structure to the debate, a clear start at documentation, and also prompt some thinking about implementation and timescales.

All this policy about structure should not be construed as mandatory. In particular, the later fields in the CREP template regarding Implementation Plan may be omitted if the author doesn't find them helpful.

Some projects (e.g. Debian) keep their enhancement proposals in a versioning repository; others (e.g. Plone) keep them in an issue tracking system. Trac is proposed for CKAN because we already use it for small feature proposals and for team planning. It seems unlikely that change tracking on an individual CREP will be useful; a CREP that changes sufficiently from its original form should probably be marked "obselete" and a new CREP started. Using an issue tracking system also means we can easily track CREPs by state.

Backwards Compatibility

Some [https://bitbucket.org/okfn/ceps/src/76b274888bcf/cep/ legacy enhancement proposals], called CEPs, have previously been started.

They are currently all marked as "active". Any which require discussion should be altered by the proposer to match the new CREP specification and submitted to trac. The original CEP should be updated with a banner at the top pointing a reader to the new CREP.

Any that are now obselete should be clearly marked as such in a banner at the top, pointing a reader to the trac for new CREPs.

Implementation plan

Deliverables

  • This CREP, agreed
  • Support for proposed statuses in Trac
  • Canned reports for listing CREPs in Trac

Risks and mitigations

  • That this CREP is agreed, but rarely acted on. This risk can be mitigated by nominating a CREP champion in the community or core team, whose job it is to say "where's the CREP for that?" and generally own the quality of CREPS

Participants

Seb Bacon: as current Documentation Czar (May 2011), responsible for ensuring CREPs are up to date.

Progress

This document is the entire proposal.

1304601313000000 1305622850000000
#1271 enhancement rgrp rgrp ckan-sprint-2011-10-28 closed fixed CORS support

CORS - http://www.w3.org/TR/cors/ - support.

This is what you do in Apache. Should do this in lib/base.py or similar.

    Header always set Access-Control-Allow-Origin "*"
    Header always set Access-Control-Allow-Methods "POST, PUT, GET, OPTIONS"
    Header always set Access-Control-Allow-Headers "X-CKAN-API-KEY, Content-Type"

    # Respond to all OPTIONS requests with 200 OK
    # This could be done in the webapp
    # This is need for pre-flighted requests (POSTs/PUTs)
    RewriteEngine On
    RewriteCond %{REQUEST_METHOD} OPTIONS
    RewriteRule ^(.*)$ $1 [R=200,L]
1313241839000000 1313753663000000
#2253 enhancement toby toby closed wontfix CMAP [super]

Somewhere for CMAP stuff not in other tickets

need to create some general tickets

  • template changes
  • general demo server setup
1332341133000000 1340038490000000
Note: See TracReports for help on using and creating reports.