{22} Trac tickets (2647 matches)

Results (301 - 400 of 2647)

1 2 3 4 5 6 7 8 9 10 11 12 13 14
Id Type Owner Reporter Milestone Status Resolution Summary Description Posixtime Modifiedtime
#1665 task seanh seanh closed fixed Begin doing research into eurovoc

How big is it? How are we going to store it? etc.

1326795828000000 1329742600000000
#2319 enhancement rgrp ckan-v1.8 closed wontfix Better auto-complete for groups on dataset edit page

Use jquery chosen?

1335211353000000 1340624083000000
#2924 defect seanh ckan 2.0 new Better docs for trans js command, and add to release process

Add a better docstring to for trans js command explaining what it does and why, and how to use it.

Also add this command to the CKAN Release Process as it's needed there

1347530671000000 1347530671000000
#2922 defect seanh ckan 2.0 new Better docstring for CKANInternationalizationExtension

I'm unsure about what's going on here. As I understand it, when we run python setup.py extract_messages it's going to use the extract_ckan() function from ckan/lib/extract.py to process the HTML files. For the HTML files that are jinja2 templates, extract_ckan() will call jinja2_cleaner() which will call regularise_html() on the strings. So the strings are regularised when they are extracted from the source files.

But then we have the parse() method of CkanInternationalizationExtension? also calling regularise_html(). I don't get what's happening here. Why do the strings need to be regularised twice?

My guess is that CkanInternationalizationExtension? is used when the strings are extracted from the templates at runtime, and they need to be regularised at this time in order to match them against the regularised strings in the mo files to find the translations to output?

Maybe CkanInternationalizationExtension? needs a better docstring saying what it does?

1347530507000000 1347530507000000
#2931 enhancement seanh ckan 2.0 new Better docstring for app_globals.py

The application's Globals object is not very informative.

1347891194000000 1347891378000000
#1290 enhancement dread dread ckan-backlog closed fixed Better error when blank database

When installing CKAN, when doing "paster serve development.ini", lots of users encounter the error for every request:

ProgrammingError: (ProgrammingError) relation "user" does not exist

This is because the database tables have not been created - they have forgotten or missed the "paster db init" step.

Can we provide a better error to say that the database is not initialised yet?

Implementation options

  1. At the start of every request we reflect the database tables and check they are there. This is rather expensive!
  1. Uncached requests to the home page start with a cheap database query. If there is an exception then return this error about database setup. I really like Drupal's page for this that has in large letters that the site is currently off-line. Below a line, in small letters, there are developer suggestions on what is wrong and where to look to fix it.
1314264255000000 1314270894000000
#2480 enhancement markw ckan-v1.9 new Better message when dataset has no resources

If a dataset has no resources the resources list currently says '(none)'.

Here is a suggested improvement, provided that a maintainer is named: 'There are no data resources here yet. For information about this data, contact the dataset maintainer.'

1338453093000000 1339771086000000
#511 requirement dread ckan-v1.3 closed worksforme Better warnings and errors when using API 1282754677000000 1297075354000000
#363 defect kindly dread ckan-backlog closed wontfix Blank revisions

Occasionally we seem to get revisions that are not connected to packages. These shouldn't appear, since all revisioned objects are linked to a package aren't they?

They appear on the 'Recently changed' list on the home page with an empty 'Packages' column.

1278947772000000 1310125872000000
#1581 enhancement mark.wainwright@… johnglover ckan-future new Blog post about Google Analytics extension for CKAN

The CKAN Google Analytics extension has been updated to work with the latest version of CKAN, could make for a nice blog post.

Can ping John Glover in January for any details required.

Key link is: http://thedatahub.org/analytics/dataset/top though this should probably move to be under stats (e.g. http://thedatahub.org/stats/usage)

1324402800000000 1325474274000000
#2747 enhancement toby shevski demo phase 2 closed fixed Breadcrumb should use title not URL of dataset

For example here: http://s031.okserver.org:2375/en/dataset/new_resource/gold-prices The breadcrumb shows "gold-prices" which is not the title of the dataset

1343211742000000 1343215679000000
#1527 enhancement icmurray icmurray ckan-sprint-2012-01-23 closed fixed Break DGU package edit form into sections
  • use javascript to selectively hide/show parts of the form
  • there's no validation between steps at this stage. It's still a "big save button at the end".
1323174829000000 1327589456000000
#1258 enhancement kindly kindly ckan-sprint-2011-10-28 closed fixed Bring purge revision into ckan repo from vdm

In order to make purge revision work correctly with the moderated edits we need to modify purge revision in vdm. This is best modified in ckan so we will override the vdm one in the reposotory.

1312289539000000 1319812452000000
#986 defect wwitzel3 thejimmyg ckan-v1.4-sprint-2 closed fixed Broken link report from the ckanext-qa code

Should have the following features:

  • A list of all packages with broken links on the resource
  • Under each packages a description of any resources with broken links to include:
    • Link
    • Description
    • Format
    • First found to be broken date

[If you can't do the last one yet, don't worry]

1297683994000000 1297812401000000
#1719 defect rgrp dread ckan-sprint-2012-02-06 closed fixed Broken links for non-Gravatar use icons - 0.25d

Super ticket: #1506

e.g. http://thedatahub.org/user The users with Gravatar have their nice user icons, but the majority of users (without Gravatars) have 'broken link' symbols. Same problem wherever users are shown, such as the dataset history pages.

Broken on 1.5.2b - currently on thedatahub.org.

1327938788000000 1328541630000000
#5 enhancement johnbywater johnbywater v0.3 closed fixed Browse list of packages and select one to view

As a

Visitor

I want to

Browse a list of packages resulting from a search or browse request (see other use cases)

So that

I can select one of the packages to view in more detail (-> viewing an individual package ticket:6)

Notes

  • When browsing a list of packages you should be able to see summary information about the package such as title (though this may be shortened in order to conveniently fit the list
  • The list should be broken up into pages so that the number of packages per page should be kept to a reasonable number (<= 50). Response time should be kept reasonable
1152549884000000 1185473622000000
#183 enhancement rgrp rgrp closed worksforme Browse packages by rating

At moment order packages by title.

1257534606000000 1290604779000000
#520 requirement pudo pudo iati-1 closed fixed Browseable web interface onto the data
  • e.g. find/browse by country and by publishing entity and by donor
1282893270000000 1283538080000000
#1417 defect dread dread ckan-sprint-2011-10-24 closed fixed Browser language detection doesn't work

In Firefox:

  • Edit | Preferences | Content | Languages - put 'de' at top language
  • Clear cookies
  • Load thedatahub.org
  • it should be in German, but it's in the default language - English.
1319539010000000 1319651617000000
#1457 defect jilly.mathews ckan-future new Bug with DataNL instance

"when logging into http://register.data.overheid.nl/ with OpenID, the /user/me page gives a 404. CKAN version 1.3.4."

n the manual it says an API key kan be created via http://test.ckan.net/user/me /. However, when I try the corresponding http://register.data.overheid.nl/user/me, I get a 404 error (not found).

1320925041000000 1320925041000000
#2318 enhancement seanh rgrp ckan-sprint-2012-04-30 closed fixed Bug with Portugese translation and Javascript

If you switch to Portugese and try to add a resource on dataset edit it will fail. This is because there is a string translation in js_strings.html that has quotes in it. When this is inserted into the file this causes a js exception which then prevents any further js processing.

Specifically:

CKAN.Strings.youHaveUnsavedChanges = "Você tem alterações não salvas. Certifique-se de ter clicado "Salvar Alterações" abaixo antes de sair desta página.";
Uncaught SyntaxError: Unexpected identifier

To fix is simple i imagine: need to escape " correctly in js_strings.html (or translations used).

I have temporarily patched this live in order to allow the hackday here to continue.

1335033889000000 1335774178000000
#314 defect johnbywater johnbywater closed fixed Bugs getting revisions from the REST API

Bug report regarding getting revisions:

Getting revisons by ID (on the latest ID) GET "http://test-hmg.ckan.net/api/search/revision?since_revision=44aac9b6-ba24-43a8-87a1-f6923dc523ff"

Returns a whole load of stuff (it's also quite slow - about 10 seconds)

I'm expecting it to return just an empty array - am I doing something wrong here - if so could you clarify correct use of the API?

GET "http://test-hmg.ckan.net/api/search/revision?since_time=2010-04-30T23:45" Returns the empty string - I'd expect an empty array ie []

GET "http://test-hmg.ckan.net/api/search/revision?since_time=2010-04-31T23:45" Returns an internal server error 500 - I think it should probably be "bad Request" 400 (the date is invalid)

1273743755000000 1276523983000000
#1106 defect rgrp rgrp ckan-v1.4-sprint-6 closed fixed Bugs related to routes arising from API refactor + removal of default routes

Various bugs I've been encountering:

Latter issue was masked by existence of 'default' routes:

   map.connect('/{controller}', action='index')
   map.connect('/:controller/{action}')
   map.connect('/{controller}/{action}/{id}')

Having these is, I think, bad practice as it is better to be explicit and we should therefore remove asap.

In addition I think we should be cautious about 'default' routes in core such as:

    map.connect('/api/rest/:register', controller='api', action='list',
        conditions=dict(method=['GET'])
        )

As it makes it harder for extensions to introduce their own APIs (here one could perhaps add something at /api/rest/{my-object} but only by using before_map rather than after_map).

1303747360000000 1303834069000000
#2877 enhancement kindly rgrp assigned Bugs with datastore v2

In progress

  1. [major] q does not seem to work reliably. e.g. using the setup from this gist https://gist.github.com/1930806 and doing a ?q=DE yields no results (does not work with "q=de" either)
    • q=second does work ...
  2. [major] q does not work with 2 values (see below)
  3. Query on search with limit 0 results in total of 0 (should either be null or correct total). Queries with other limits yield correct total AFAICT
    • Also weird fact that limit is returned but as as as string - should it not be an integer
  4. Types on fields: could these not be canonical and as per recline (or is it important to allow exact sql types ...)

Multiple query values

Try a query such as: "second UK" and you will get 500 error:

http://localhost:5000/api/3/action/datastore_search?resource_id=4f1299ab-a100-4e5f-ba81-e6d234a2f3bd&q=second%20UK

ProgrammingError: (ProgrammingError) syntax error in tsquery: "second UK" 'select "_id", "id", "date", "x", "y", "z", "country", "title", "lat", "lon", count(*) over() as "_full_count"\n from "4f1299ab-a100-4e5f-ba81-e6d234a2f3bd" where _full_text @@ to_tsquery(%s) limit 100 offset 0' (u'second UK',)

Suggestions

Filter support: should think in more detail about this (may want to follow recline style)

Simple filters in query parameters would be nice too ...

1345250002000000 1346320395000000
#1161 requirement pudo amercader pdeu-1 closed duplicate Build a simplified theme for PDEU

We need to offer a strongly simplified version, read-only of CKAN under publicdata.eu, with a focus on its role as search engine instead of a data catalogue.

This ticket relates to work on the PDEU theme only!

1306407835000000 1306408026000000
#172 enhancement rgrp rgrp v0.11 closed fixed Build ckan documentation using sphinx and upload

Use python sphinx to build documenation in ./doc and then upload it somewhere publicly accessible.

NB: improving the documentation is another matter (as is integrating e.g. existing api docs).

Upload location (these are docs for CKAN codebase/concept not the ckan service at ckan.net so good not to associate it too closely with ckan.net):

1256489019000000 1257532331000000
#437 bug dread ckan-v1.2 closed fixed Buildbot test failures - ascii codec

On today's buildbot: http://buildbot.okfn.org/builders/buildbot-test/builds/201

2 failures about ascii (ignore other 2)

1282223640000000 1288004009000000
#1542 enhancement dread ckan-backlog new Buttons to purge spam datasets and groups

A sysadmin should be able to easily examine a suspect group or package, determine if it was created by a spammer (as opposed to being a legitimate object that has been graffitied by a spammer) and purge it.

The existing two-stage revision delete is currently unreliable and perhaps too laborious.

Olav and Richard have needs along this line.

1323364930000000 1339774000000000
#3016 enhancement johnmartin johnmartin ckan 2.0 new CKAN 2.0 template tweaks

Just a ticket to keep track of a few suggested template changes.

1352813417000000 1352813417000000
#698 task Stiivi thejimmyg ckan-v1.3-sprint-1 closed fixed CKAN Data API v1

This proposal is to discuss adding a new API for proxying certain spreadsheet data via JSON-P to make it possible to build simple browser apps directly off the API.

See the attached proposal for information.

1287073433000000 1293649815000000
#1003 enhancement rgrp rgrp ckan-v1.4-sprint-3 closed fixed CKAN Javascript library and demonstration web interface

A plain javascript library for interfacing with CKAN would be very useful (why? see below!). It would also be nice to have a pure html + javascript web interface to CKAN both for its own sake and to act as a demonstrator for the library.

Why?

  • Development of bespoke interfaces -- much easier to edit html + javascript than to change ckan core
    • E.g. for specific communities e.g. geodata, science
    • Specialized tasks - multi-package editing
  • Very easy deployment and integration (e.g. can drop in to getthedata or other sites)
1298490086000000 1300100411000000
#576 defect sebbacon wwaites ckan-v1.3 closed fixed CKAN Requires Old Version of SQLAlchemy

Requires 0.4. 0.5 is a maintenance branch, 0.6 is current.

VDM appears to work correctly (all the tests pass) with 0.5. CKAN does not -- requires more investigation to determine exactly why.

It would actually be nice to be able to run with 0.6 or 0.5, though 0.6 will require some changes to VDM as well

1284141573000000 1294753848000000
#1615 enhancement thejimmyg ckan-v1.6 closed worksforme CKAN Should work behind a proxy server

This would allow deployment via Nginx or Apache using proxy to Paster, uWSGI. At the moment CKAN isn't aware of the proxy's IP address so when you perform an action which does a redirect (such as adding a package), CKAN redirects you to the *internal IP* not the external *proxy IP*.

We would like this work to facilitate testing within VMs as part of our new build infrastructure.

It would also be nice if CKAN worked when mounted at a path other than /. That could be dealt with in another ticket because it isn't a problem at the moment.

1325687841000000 1328888870000000
#1518 defect rgrp markbrough ckan-sprint-2011-12-19 closed fixed CKAN Upload fails if filename has spaces in it

E.g. uploading a file with spaces in it: OECD Monthly Exchange Rates.zip

Gives a 404 Not Found response to the following file: http://test.ckan.org/api/storage/metadata/2011-12-05T193046/OECD%20Monthly%20Exchange%20Rates.zip

The upload wheel keeps spinning and the user is not informed that the upload has failed.

Analysis: turns out that google storage (possibly s3 but not checked) replaces with ' ' in keys with '+' on upload. This breaks things because we try and look up metadata about upload using the filename/key we put in but of course that does not exist because google has changed name.

Fix is trivial: replace ' ' in keys / filenames with '-'.

1323114236000000 1330020742000000
#1551 enhancement ross ross ckan-backlog closed fixed CKAN auth for webstore changes

Webstore should use auth api ( #1550 ) for authenticating users accessing webstore rather than talking directly to the CKAN database. We also need it to suppose /user/ urls and /userid/ for accessing databases.

1324049966000000 1346662048000000
#1467 defect thejimmyg thejimmyg ckan-sprint-2012-01-09 closed worksforme CKAN dumps dgu miss certain publisher information

Pawel knows about this so David Read, Pawel and I need to find time to discuss it.

1321376042000000 1326120319000000
#333 enhancement dread v1.1 closed wontfix CKAN front end requirements for package notifications

Use case: new package

  1. An external front-end system provides a web page with a list of packages. Each package has the option to edit it or and there is also a button to create a new package.
  1. User: clicks 'new package'.
  1. CKAN presents the package/new form to the user.
  1. (After a couple of previews) User: clicks 'commit'.
  1. Notification message goes from CKAN to the front-end detailing the new package.
  1. The user is redirected back to the front-end web page displaying the list of packages, which contains the new one.

The notification message (step 5) has to get through to the front-end that the new package is created before the redirect (step 6). This suggests that the message sending needs to be *synchronous*, i.e. acknowledged by the front-end, before CKAN redirects the user to the front-end package listing page (step 6).

In addition, this use case suggests the front-end listens for package notifications, to save another call to CKAN to get the package details, before the displaying the list of packages. If this isn't possible (see next use case) and it must listen for revision notifications instead, then perhaps it is worth including the full package details in the payload for the revision notification message. Would there be a problem with such a large message in the next use case, with 100 packages?

Use case: CKAN imports packages

  1. CKAN administrator runs a script that adds 100 new packages into CKAN.
  1. CKAN sends notification message to front-end to report the new packages/revisions.
  1. Knowing there are new revisions, the front-end queries the CKAN revision interface to get the list of new packages.
  1. The front-end queries CKAN for each new package one-by-one.
  1. A new user request to the front-end will include the info about the new packages.

The package addition could be achieved in 1 revision, 100 revisions or some compromise:

  • If it is 1 revision then potentially there are problems displaying the long list of packages in the 'recent changes'.
  • If it is 100 revisions, then the notification webhook would be called 100 times, which creates unnecessary load on the front-end. Suppose each Webhook call-back (step 2) triggers the front-end to make a call to CKAN to get the latest revisions (step3), in this case it would make 100 calls, most of them fruitless, causing unnecessary load on CKAN.

This use case suggests a bulk import of packages should go into one revision, and therefore generate one revision notification message and 100 package notification messages. The front-end client should listen to only revision messages.

1275324042000000 1275407987000000
#837 enhancement rgrp ckan-backlog new CKAN integration with freebase gridworks / google refine

Thread: http://lists.okfn.org/pipermail/ckan-discuss/2010-November/000718.html

Scenario 1

  1. User installs Refine and CKAN extension for refine
  2. On booting refine and asked to load data they can choose from any data package on CKAN.net (or any other CKAN instance)
  3. They edit the dataset on Refine
  4. On save (or perhaps as a separate option) they are prompted as to whether they wish to sync the dataset back to CKAN (either as a new package or as a new resource on the existing package)

NB: for the dataset sync back some form of "CKAN" storage would be required (we already have storage.ckan.net running but a closer integration would be required)

Scenario 2

  1. User visits a package on CKAN.net (or another CKAN instance)
  2. There is a button on the page "View and edit this dataset in Google Refine"
  3. Click button -- ask them if they have Google refine installed
    • Yes: instructions for loading dataset into refine
    • No: load dataset in hosted version of google refine (we could run this)
  4. User edits dataset and hits save. As in previous scenario they are prompted to sync the dataset.
1291140609000000 1339774605000000
#1804 defect toby dread ckan-sprint-2012-03-05 closed fixed CKAN mounted at URL - changing language problem

e.g. http://189.9.137.65/dados/ clicking on Deutsch link is http://189.9.137.65/dados/locale?locale=de&return_to=%2Fdados%2F&hash=1dc17c315c419df850da0dd3599eefa9da76fbeb and redirect goes to http://189.9.137.65/dados/dados/?__cache=97995106 so /dados/dados/ when it should be /dados.

Affects CKAN 1.6b only (not yet released).

1329484956000000 1330347185000000
#441 requirement dread dread ckan-v1.3 closed duplicate CKAN read-only state

When performing maintenance on CKAN it may be necessary to make CKAN obviously read-only, telling the users and restricting access to 'edit' pages.

Examples of use:

  • Administrator wants to upgrade CKAN or move it to another server. During this time the database is being administered and either edits are lost or can't be done.
  • A CKAN is used just for distributing metadata and so is always read-only. Updates may still arrive through direct db manipulation, e.g.:
    • another (but writable) CKAN instance is connected to the same db
    • restoring database dumps from another CKAN db
  • Should a security be breached, all editing could be stopped
1282227314000000 1292586309000000
#847 enhancement pudo rgrp ckan-v1.3-sprint-2 closed fixed CKAN search using SOLR backend

This is a meta-ticket to pull together all the work on SOLR as a backend for CKAN search. (Work on SOLR search has been going on since March of 2010).

3 key aspects of this:

  • Index package information in SOLR - ticket:353 - SOLR search indexing (index CKAN)
    • Synchronously (generify Search Index update method to support SOLR or postgres?) - #317: Make search pluggable
    • Asynchroously: via worker with queue
      • #324 Search indexing using notifications
      • #669: Refactor Solr (indexer) to become a generic worker
      • #874: Extract Solr search backend into an extension
  • In core support using SOLR backend as well as postgres for queries
    • #317: Make search pluggable (?)
  • Integrate into WUI (if changes needed)
    • #672 - Facets in CKAN search results

Extras:

  • Solr index testing tool #431
1291639273000000 1295259902000000
#952 task wwaites pudo ckan-backlog closed invalid CKAN should run under nginx/uswgi

second part of #908

1296730498000000 1310125204000000
#1384 task rgrp shevski ckan-backlog new CKAN wiki needs updating to refer to thedatahub.org instead of ckan.net and datasets instead of packages

Most articles still refer and link to ckan.net, wiki.ckan.net and to packages

1318414077000000 1318414077000000
#2682 defect seanh seanh ckan-v1.8 closed fixed CKAN's internal tracking counts each view twice, needs unit tests

CKAN's internal tracking seems to count each page view twice, the problem appears to be with the SQL in the update_tracking() method in ckan/lib/cli.py.

The internal tracking feature needs some tests, and some of the code could maybe do with some more explanatory comments, e.g. what is the intended difference between count and running_total?

1342446402000000 1343225636000000
#1067 enhancement dread dread ckan-v1.4-sprint-5 closed fixed CLI for loading/dumping complete databases

Use 'db dump' and 'db load' for 'pg_dump' and 'psql -f' of a database. Use pylons config to find out database options.

1301645463000000 1302186503000000
#2401 enhancement dread dread ckan-sprint-2012-05-29 closed fixed CLI for time/speed profiling

To enable you to easily track down what is taking all the time when you make a request.

1337269571000000 1337273042000000
#1114 enhancement dread dread ckan-v1.4-sprint-7 closed fixed CLI for viewing search index of a package

To see what lexemes are generated for debug purposes.

1304078353000000 1304085484000000
#2253 enhancement toby toby closed wontfix CMAP [super]

Somewhere for CMAP stuff not in other tickets

need to create some general tickets

  • template changes
  • general demo server setup
1332341133000000 1340038490000000
#1271 enhancement rgrp rgrp ckan-sprint-2011-10-28 closed fixed CORS support

CORS - http://www.w3.org/TR/cors/ - support.

This is what you do in Apache. Should do this in lib/base.py or similar.

    Header always set Access-Control-Allow-Origin "*"
    Header always set Access-Control-Allow-Methods "POST, PUT, GET, OPTIONS"
    Header always set Access-Control-Allow-Headers "X-CKAN-API-KEY, Content-Type"

    # Respond to all OPTIONS requests with 200 OK
    # This could be done in the webapp
    # This is need for pre-flighted requests (POSTs/PUTs)
    RewriteEngine On
    RewriteCond %{REQUEST_METHOD} OPTIONS
    RewriteRule ^(.*)$ $1 [R=200,L]
1313241839000000 1313753663000000
#1127 CREP sebbacon closed fixed CREP0001: Formalise new feature discussion and definition using CREPs

Proposer: Seb Bacon
Seconder: Rufus Pollock

Abstract

When adding major new features to CKAN, a longer, more formal discussion will improve software design quality and documentation, better engage the wider community, and ensure the core team are up to date with latest developments.

I propose a formal process (CREP -- CKAN Revision and Enhancement Proposal) for making this happen.

The Problem

The current workflow for introducing major new features into CKAN is very informal, typically based around one person's great idea, which they've discussed with one or two other people in the team. The originator of the idea is typically the only person with access to all the input they've had through such discussions. Often, the only location of this information is in that person's head.

However, there is a lot of experience embodied in the CKAN community which should be drawn on before making large design decisions. This will lead to better software. Additionally, building consensus in the community around a proposal before implementation ensures positive community engagement and buy-in to new features, making them more likely to be a success.

We aren't great at documenting new features. Documentation after coding is complete is an unrewarding experience for most programmers. Requiring skeleton documentation before code is written is a good discipline that can form the basis of better documentation in the future (e.g. by a writer rather than a programmer).

Specification

Minor features don't require a CREP, and can just be entered in the issue tracking system as a bug or feature. As a rule of thumb, a feature is major if it will take more than a day to implement, or is likely to involve matters of opinion in its design.

A developer may decide that a CREP is too formal and long-winded. The decision to write a CREP is at at their discretion; however, new features MUST always be proposed via email, even if this is just a couple of sentences.

If a feature requires a CREP, the proposer should find a seconder for their idea. This sanity check step happens before a CREP is written to ensure at least the possibility of consensus on the CREP.

Next the proposer should write a CREP, starting by copying and pasting the template on the wiki into a new Trac ticket. This will be with a status of "new" and Type of "CREP". The proposer should notify the ckan-dev mailing list, and possibly the ckan-discuss list for less technical CREPs.

The draft can be discussed via email, verbally, or via the trac ticket. In any case, it is the proposer's responsibility to keep the CREP updated to reflect the current consensus.

Once consensus has been reached, the ticket should be marked with the "accepted" status and assigned to a CKAN release milestone.

When an accepted CREP has been implemented, it should be resolved as "fixed".

If no consensus can be reached on a draft CREP, or for some reason an accepted CREP doesn't get completed, it should be marked as or "wontfix".

If a completed CREP becomes obsolete, it should be marked as "invalid", with a note pointing to the obsoleting ticket(s)

Why do it this way

Given the distributed nature of the core team plus other volunteers, some kind of written procedure is necessary to ensure a fully documented and discussed proposal.

The idea of "Enhancement Proposals" which can be semi-formally proposed and discussed prior to implementation is common in the Open Source world (PEPs, DEPs, PLIPs, to name three).

Existing historic proposals exist, called CEPs. The proposed system is called CREP (CKAN revision or enhancement proposal) to disambiguate it from the legacy proposals, and from the delicious fungus Boletus Edulis.

Giving a formal structure to the proposal is useful as it gives the community a means to identify a CREP that's not had sufficient thought or discussion. An informal email thread can easily be lost and important questions (such as backwards compatibility) overlooked. The use of the proposed template empowers any community member to ask the proposer to expand on rationale, deliverables, etc.

The structure chosen is somewhere between Debian's and Plone's. It aims to give a structure to the debate, a clear start at documentation, and also prompt some thinking about implementation and timescales.

All this policy about structure should not be construed as mandatory. In particular, the later fields in the CREP template regarding Implementation Plan may be omitted if the author doesn't find them helpful.

Some projects (e.g. Debian) keep their enhancement proposals in a versioning repository; others (e.g. Plone) keep them in an issue tracking system. Trac is proposed for CKAN because we already use it for small feature proposals and for team planning. It seems unlikely that change tracking on an individual CREP will be useful; a CREP that changes sufficiently from its original form should probably be marked "obselete" and a new CREP started. Using an issue tracking system also means we can easily track CREPs by state.

Backwards Compatibility

Some [https://bitbucket.org/okfn/ceps/src/76b274888bcf/cep/ legacy enhancement proposals], called CEPs, have previously been started.

They are currently all marked as "active". Any which require discussion should be altered by the proposer to match the new CREP specification and submitted to trac. The original CEP should be updated with a banner at the top pointing a reader to the new CREP.

Any that are now obselete should be clearly marked as such in a banner at the top, pointing a reader to the trac for new CREPs.

Implementation plan

Deliverables

  • This CREP, agreed
  • Support for proposed statuses in Trac
  • Canned reports for listing CREPs in Trac

Risks and mitigations

  • That this CREP is agreed, but rarely acted on. This risk can be mitigated by nominating a CREP champion in the community or core team, whose job it is to say "where's the CREP for that?" and generally own the quality of CREPS

Participants

Seb Bacon: as current Documentation Czar (May 2011), responsible for ensuring CREPs are up to date.

Progress

This document is the entire proposal.

1304601313000000 1305622850000000
#1129 CREP kindly ckan-v1.5 closed fixed CREP0002: Moderated Edits

Proposer: David Raznick

Abstract.

We are trying to achieve these goals.

  • To get people involved with making edits to CKAN metadata.
  • To have an ownership model as to who can moderate and validate these changes
  • To not put too huge a burden on these owners.

In order to achieve this, a feature which lets anyone edit a package but only let the moderator/owner accept it. The moderator should be able to look at a list of changes and accept the ones that

This cep is not about 'if' we need such a feature, it is about 'how' we go about implementing it. Another cep may needed for the 'if' case.

The Problem

We need the following to be possible.

  • Storing revision of objects that are not the current active one.
  • A way of the user viewing past revisions.
  • Accessing not only the history of a particular object but also of related objects at that time. i.e If a resource related to a package changes we need a way to see this when looking at the package.
  • A robust way of doing this in the face of database schema changes.
  • Make sure database queries are quick.

Solutions.

  1. Store the whole dictization of the package and all its related objects every time you change anything in its dictized representation and only save to the database proper if accepted.

Pros

  • Easy to implement, we already have a preview which makes the dictized form of a package without actually saving it. This will just need to be persisted in some way.
  • Fast retrieval.
  • Potential to store a branching revision tree of changes.

Cons

  • No easy way to remake the dictized packages historically or if there is an there a change in the way we represent packages, i.e schema changes.
  • Will only work for the particular objects we decide to store these changes for.
  • Stores a lot of repeated information
  1. Write specialized queries for every read of the database looking only at the revision tables.

This method requires there to be a change in the way we use VDM, so that we manage statefulness ourselves. We will need to add other states such as 'waiting for approval'.

Pros

  • No specialized storage required
  • Only need to change queries when schema changes
  • Can be made to work easily for other objects

Cons

  • Slower query time on read, as even looking at the last active package will need to do a fairly complicated query.

Implementation details.

1.

A new table with columns id, user, package_id, timestamp, revision_id, parent_id, dictized_package. revision_id should be null unless it is actually persisted to the database. parent_id is the id that this package_dict was changed from.

We could store only the diffs of the dictized_package as long as we assure that everything inside the json is stably sorted, this will make getting the historical data out slower.

Getting out the history of the dictized packages is an intensive task, as it will require replaying the whole history of all the changes and creating the dict for each change. This re-caching will need to be redone for every change we make to dictized representation of a package.

2.

Every normal packages read needs to look at the revision table to see the last accepted change in the dictized representation of the package. We also need to way to get what the dictized representation of the package was like at any point of its revision history. This querying is non-trivial in sql.

Participants

David Raznick to do it.

Progress.

Decided to go with option 2. However we will change the revisioning system to be like the schema attached. This gets rid of difficult querying problems caused by querying the revision tables by adding an end date, meaning you can do range queries.

The better and more normalized version of a revisioning system is outlined https://docs.google.com/drawings/d/1Y7nMgVsrs081Pame2RdbZHlCAlV33ddTZ8VAsab1j-0/edit?hl=en_GB&authkey=CJfd8vsB. We will be a step closer to that, with this change, but we will keep the current vdm more or less, intact.

1304851498000000 1325268100000000
#1134 CREP amercader ckan-backlog new CREP0003: Description and Configuration of Harvesters

Proposer: Adrià Mercader

Abstract

The new harvester interface allows to create harvesters for different sources, but right now harvesters don't have many ways to describe and configure themselves. We need a way of allowing them to:

  • Expose their type and other details so they can be used internally and on the UI.
  • Define configuration settings for particular harvester instances.

The Problem

Harvester description

The current UI for adding and editing harvest sources is the same used in ckanext-dgu, and thus the 3 harvester types used in DGU to harvest various GEMINI realted sources are hardcoded in the form. The form will be migrated to a DGU-independent one, so we need the harvesters to provide all the necessary data. There is a current get_type method that returns the harvester type, but for make it compatible with the DGU forms, it returns a machine-readable string (e.g. "CSW Server"), making it error prone.

Arbitrary configuration

In the current implementation, when the harvest process is started, ckanext-harvest looks for all the available plugins that implement the IHarvester interface and calls the appropiate methods for the current stage (gather_stage,fetch_stage,import_stage). At these stages, harvesters have no way of applying arbitrary configuration options, so all harvesters of the same type behave on the same way. For instance, the CKAN harvester needs a way to define the API version to use when harvesting remote instances (Right now, the version 2 is hardcoded on the code).

Specification

Harvester description

Harvesters will need to provide the following information so the UI form can be built:

  • name: machine-readable name (e.g. "waf"). This will be the value stored in the database, and the one used by ckanext-harvest to call the appropiate harvester.
  • title: human-readable name (e.g. "Web Accessible Folder (WAF)"). This will appear in the form's select box.
  • description: a description of what the harvester does (e.g. "A Web Accessible Folder (WAF) displaying a list of GEMINI 2.1 documents"). This will appear on the form as a guidance to the user.

The way to provide it will be an info method that all harvesters must implement, which will return a dictionary with the previous elements:

    {
        'name': 'csw',
        'title': 'CSW Server',
        'description': 'A server that implements OGC's Catalog Service 
                        for the Web (CSW) standard'
    }

Arbitrary configuration

As different harvesters will have very different needs, we need to provide a way to persist arbitrary configuration flags for each harvest source. The more flexible way given the current architecture in my opinion would be to store the configuration options as a JSON encoded object as a property of the harvest source (There already is an unused DB field called config in the database) (Maybe using JsonType??).

This will mean adding an extra field in the harvest source form to allow entering the configuration. This could be just a simple text field where users enter the JSON encoded object or a more clever mechanism (i.e an "Add a configuration flag" link that adds two new text fields for the key and value for each flag, and a mechanism to later build the JSON object). In any case, this should probably be hidden in an "Advance options" section.

Why do it this way

Harvester description

The info method would provide a single point to get all the information related to the harvester, and future properties could be added to the dictionary returned without having to modify the interface.

Arbitrary configuration

There is an already existing config field in the database, so we won't need to change the model. Harvesters could access the config object at any of the stages. Of course they could provide default values in their implementations so users don't need to enter them everytime.

Implementation plan

Deliverables

Risks and mitigations

The highest risk on the harvesters info method side is that harvester implementation don't offer one of the necessary properties (namely name and title). This could fire a warning when showing the UI form or using the CLI.

Participants

Adrià Mercader to do it.

Progress

None yet.

1305108868000000 1339774554000000
#1219 defect timmcnamara closed fixed CSS issues on IE7

As reported on ckan-dev:

items in the footer of CKAN ("Packages", "Groups & Tags", "About", "Language", etc.) are shown vertically instead of horizontally in IE7. This works fine in later browsers like IE8, IE9, FF4, and latest Opera and Chrome.

This seems to exist in all recent CKAN versions up to 1.4.1a.

1310423688000000 1310740534000000
#1660 defect rgrp lucychambers ckan-sprint-2012-02-06 closed wontfix CSV preview broken - OpenSpending

This CSV resource used to preview but now the format appears to be unsupported: "We are unable to preview this type of resource: x-osdata-csv"

http://thedatahub.org/dataset/lbhf-spending-2010/resource/9661abbd-2816-4d58-8b20-3cb0eb770c69

This is used as an example by the OpenSpending? team all the time.

1326717846000000 1328013627000000
#710 task johnbywater johnbywater ckan-v1.2 closed fixed CSW GetRecordById request for given identifier 1287432675000000 1287507854000000
#623 task johnbywater johnbywater ckan-v1.2 closed fixed CSW GetRecords request for all identifiers (with CSW authentication) 1284220777000000 1287507837000000
#728 requirement amercader johnbywater ckan-backlog assigned CSW Harvesting shall be optimised in respect of reharvesting only records that have changed

Hi Will, this is important again because some CSW servers we use have over 300 documents in. Could you take a look at modifying the filter please?

1287675340000000 1310124784000000
#537 task wwaites wwaites closed duplicate Caching and Performance improvement

There are several places where performance is unacceptably slow. Even in places where it is not, the system could still be more responsive for read requests.

Introducing caching has to be done carefully and should be done in a standards compliant manner.

General strategy

  • Where possible, cache output within the pylons app (beaker).
  • Facilitate external caching in an end-user's web browser or a caching proxy
  • Slightly stale data is not necessarily much of a problem so allow the output to be cached for a relatively short period (e.g. 5-15 minutes).
  • When cache expiry has been reached, a request will be made to the server. The server should check if its internally cached data is still valid, and serve that, otherwise regenerate the data.

Tasks

These tasks should be broken into sub-tickets:

  • caching of parts of templates that are expensive to render (package list, tag list, group list)
  • caching of entire output using beaker particularly for API read operations.
  • need to perform a check to see if the cache should be invalidated by checking if anything in the output would have changed -- i.e. checking timestamps on package modifications. this is a natural place to introduce the ETag which will help browsers and web caches.
  • cache infrastructure front end - varnish, squid, etc. To do this right, the controllers need to set the cache control headers appropriately (max-age, must-revalidate). This is a good resource: http://www.mnot.net/cache_docs/#CACHE-CONTROL
    • Deploy varnish on a host dedicated to this purpose for research. This will be useful for other sites as well
    • Do not configure varnish to ignore cache control headers or otherwise behave in a non HTTP/1.1 compliant manner

Future Work

  • Investigate ckanclient library maintaining a local cache as a web browser would
  • Investigate using a CDN like Google Storage or Amazon for serving cached data.
1283184362000000 1311178929000000
#841 enhancement kindly dread ckan-v1.4-sprint-4 closed duplicate Caching docs (as a whole)

Documentation article on caching / improving performance. (To complement configuration docs.)

  • Different sorts of cache - beaker style, etags, package_dict in search results(?)
  • How each one affects performance
  • How to turn them on/off and configure them
  • Is it possible to bypass each of them in the browser or with wget/curl?
1291308879000000 1300364333000000
#668 defect thejimmyg Colin Calnan closed invalid Caching issues on API v1

It seems like the API v1 on CKAN metastable (cset:ec21f8e1c87e) has some caching issues.

Steps to test:

  1. Modify a dataset on datadotgc.ca, redirects to CKAN
  1. On save, redirects to http://www.datadotgc.ca/update/geogratisnat_hydrography_v100 which in turn redirects to http://www.datadotgc.ca/dataset/geogratisnat_hydrography_v100
  1. You can see that the Dataset has not updated correctly. Run a check on the API v1 - http://ca.ckan.net/api/1/rest/package/geogratisnat_hydrography_v100 the updates are not present
  1. Check the v2 of the API - http://ca.ckan.net/api/rest/package/geogratisnat_hydrography_v100, the updates are present.
  1. Setting the headers to 'Cache-control: no-cache' or 'Pragma: no-cache' does not work either.
1285953542000000 1311176649000000
#1223 enhancement pudo pudo closed fixed Caching of static files

StaticURLParser can have caching - use it

1310573854000000 1310573893000000
#1356 enhancement kindly amercader ckan-sprint-2011-10-10 closed fixed Can not recreate a deleted extra

If you delete an extra and later on change your mind, you can not recreate it with the same value (Different value works fine).

1317034180000000 1318279617000000
#951 defect adrian.pohl@… closed invalid Can't add a package to group

I can't add a package (e.g. http://ckan.net/package/ub-konstanz) to a group (e.g. http://ckan.net/group/bibliographic). It's neither possible when editing a package (the only group in drop down menu is "history") nor on the group page.

1296726886000000 1314031006000000
#2266 defect dread dread ckan-sprint-2012-04-02 closed fixed Can't delete all of a package's resources over REST API

Nothing happens if you set resources=[] or resources=null.

1332932504000000 1332932634000000
#1479 defect dread dread ckan-sprint-2011-12-05 closed fixed Can't edit a user with a unicode email address
  1. Register User with an email address with a unicode char (e.g. u'\u044e')
  2. View the User in the UI (/user/) or with 'user_show' Action API

Exception:

Module ckan.controllers.user:98 in read
<<          try:
                   user_dict = get_action('user_show')(context,data_dict)
               except NotFound:
                   h.redirect_to(controller='user', action='login', id=None)
>>  user_dict = get_action('user_show')(context,data_dict)
Module ckan.logic.action.get:488 in user_show
<<      check_access('user_show',context, data_dict)
       
           user_dict = user_dictize(user_obj,context)
       
           if not (Authorizer().is_sysadmin(unicode(user)) or user == user_obj.name):
>>  user_dict = user_dictize(user_obj,context)
Module ckan.lib.dictization.model_dictize:189 in user_dictize
<<      
           result_dict['display_name'] = user.display_name
           result_dict['email_hash'] = user.email_hash
           result_dict['number_of_edits'] = user.number_of_edits()
           result_dict['number_administered_packages'] = user.number_administered_packages()
>>  result_dict['email_hash'] = user.email_hash
Module ckan.model.user:59 in email_hash
<<          if self.email:
                   e = self.email.strip().lower()
               return hashlib.md5(e).hexdigest()
               
           def get_reference_preferred_for_uri(self):
>>  return hashlib.md5(e).hexdigest()
UnicodeEncodeError: 'ascii' codec can't encode character u'\u044e' in position 17: ordinal not in range(128)
1321960486000000 1321961592000000
#1419 enhancement dread ckan-sprint-2011-11-07 closed invalid Can't log in via OpenID

I couldn't log into theDataHub with OpenID today. I tried both Google ID and MyOpenID. Both times the login on the remote auth server went fine, but when it returns you to theDataHub you get error "Login failed. Bad username or password."

1319543013000000 1319796164000000
#662 defect sebbacon johnbywater ckan-v1.4 closed fixed Can't put entity that is returned by posting to package register

It's because Package carries several out-of-band values, which are snagged on the way back out. Entity get response also can't be posted.

However, post response can be re-posted (because it isn't the same as the register-post/entity-get responses.

An issue for CKAN too.

Sub-ticket of #961 (form, validation, model sync meta-ticket) and depends on that work.

1285410546000000 1301076463000000
#2918 enhancement johnmartin ross ckan 2.0 closed fixed Can't remove users from organizations

When you remove someone, without adding them, the text box at the bottom (which should probably autocomplete) is empty, and this causes problems on the server.

Ideally when you add a user (select from the autocomplete) it would add another row to the table, defaulting the user to editor and setting the names to user{{X}}name and user{{X}}capacity where X is $('tr').size()

1347455572000000 1347970735000000
#1374 defect dread dread ckan-sprint-2011-10-24 closed fixed Can't switch to English if default is non-English

e.g. cz.ckan.net defaults to Czech (config option lang=cs_CZ) but it fails when you try to switch to English.

1317893975000000 1319648746000000
#1577 defect rgrp dread ckan-backlog new Can't upload file with foreign chars in filename

Looks like uploading a file with foreign characters fails due to encoding reasons.

URL: http://thedatahub.org/api/storage/auth/form/2011-12-19T124447/Ministerstvo-financ%C3%AD-%C4%8Cesk%C3%A9-republiky-_-P%C5%99%C3%ADprava-rozpo%C4%8Dtu.pdf
Module weberror.errormiddleware:162 in __call__
<<              __traceback_supplement__ = Supplement, self, environ
                   sr_checker = ResponseStartChecker(start_response)
                   app_iter = self.application(environ, sr_checker)
                   return self.make_catching_iter(app_iter, environ, sr_checker)
               except:
>>  app_iter = self.application(environ, sr_checker)
Module beaker.middleware:73 in __call__
<<                                                     self.cache_manager)
               environ[self.environ_key] = self.cache_manager
               return self.app(environ, start_response)
>>  return self.app(environ, start_response)
Module beaker.middleware:152 in __call__
<<                          headers.append(('Set-cookie', cookie))
                   return start_response(status, headers, exc_info)
               return self.wrap_app(environ, session_start_response)
           
           def _get_session(self):
>>  return self.wrap_app(environ, session_start_response)
Module routes.middleware:130 in __call__
<<                  environ['SCRIPT_NAME'] = environ['SCRIPT_NAME'][:-1]
               
               response = self.app(environ, start_response)
               
               # Wrapped in try as in rare cases the attribute will be gone already
>>  response = self.app(environ, start_response)
Module pylons.wsgiapp:125 in __call__
<<          
               controller = self.resolve(environ, start_response)
               response = self.dispatch(controller, environ, start_response)
               
               if 'paste.testing_variables' in environ and hasattr(response,
>>  response = self.dispatch(controller, environ, start_response)
Module pylons.wsgiapp:324 in dispatch
<<          if log_debug:
                   log.debug("Calling controller class with WSGI interface")
               return controller(environ, start_response)
           
           def load_test_env(self, environ):
>>  return controller(environ, start_response)
Module ckan.lib.base:123 in __call__
<<          # available in environ['pylons.routes_dict']    
               try:
                   return WSGIController.__call__(self, environ, start_response)
               finally:
                   model.Session.remove()
>>  return WSGIController.__call__(self, environ, start_response)
Module pylons.controllers.core:221 in __call__
<<                  return response(environ, self.start_response)
               
               response = self._dispatch_call()
               if not start_response_called:
                   self.start_response = start_response
>>  response = self._dispatch_call()
Module pylons.controllers.core:172 in _dispatch_call
<<              req.environ['pylons.action_method'] = func
                   
                   response = self._inspect_call(func)
               else:
                   if log_debug:
>>  response = self._inspect_call(func)
Module pylons.controllers.core:107 in _inspect_call
<<                        func.__name__, args)
               try:
                   result = self._perform_call(func, args)
               except HTTPException, httpe:
                   if log_debug:
>>  result = self._perform_call(func, args)
Module pylons.controllers.core:60 in _perform_call
<<          """Hide the traceback for everything above this method"""
               __traceback_hide__ = 'before_and_this'
               return func(**args)
           
           def _inspect_call(self, func):
>>  return func(**args)
Module ckanext.storage.controller:2 in auth_form
Module ckan.lib.jsonp:26 in jsonpify
<<      Very much modelled after pylons.decorators.jsonify .
           """
           data = func(*args, **kwargs)
           return to_jsonp(data)
>>  data = func(*args, **kwargs)
Module ckanext.storage.controller:301 in auth_form
<<          method = 'POST'
               authorize(method, bucket, label, c.userobj, self.ofs)
               data = self._get_form_data(label)
               return data
>>  authorize(method, bucket, label, c.userobj, self.ofs)
Module ckanext.storage.controller:79 in authorize
<<      if method != 'GET':
               # do not allow overwriting
               if ofs.exists(bucket, key):
                   abort(409)
               # now check user stuff
>>  if ofs.exists(bucket, key):
Module ofs.remote.botostore:53 in exists
<<          if bucket is None: 
                   return False
               return (label is None) or (label in bucket)
           
           def claim_bucket(self, bucket):
>>  return (label is None) or (label in bucket)
Module boto.s3.bucket:87 in __contains__
<<      def __contains__(self, key_name):
              return not (self.get_key(key_name) is None)
       
           def startElement(self, name, attrs, connection):
>>  return not (self.get_key(key_name) is None)
Module boto.s3.bucket:144 in get_key
<<          response = self.connection.make_request('HEAD', self.name, key_name,
                                                       headers=headers,
                                                       query_args=query_args)
               # Allow any success status (2xx) - for example this lets us
               # support Range gets, which return status 206:
>>  query_args=query_args)
Module boto.s3.connection:388 in make_request
<<          if isinstance(key, Key):
                   key = key.name
               path = self.calling_format.build_path_base(bucket, key)
               boto.log.debug('path=%s' % path)
               auth_path = self.calling_format.build_auth_path(bucket, key)
>>  path = self.calling_format.build_path_base(bucket, key)
Module boto.s3.connection:88 in build_path_base
<<      def build_path_base(self, bucket, key=''):
               return '/%s' % urllib.quote(key)
       
       class SubdomainCallingFormat(_CallingFormat):
>>  return '/%s' % urllib.quote(key)
Module urllib:1222 in quote
<<              safe_map[c] = (c in safe) and c or ('%%%02X' % i)
               _safemaps[cachekey] = safe_map
           res = map(safe_map.__getitem__, s)
           return ''.join(res)
>>  res = map(safe_map.__getitem__, s)
KeyError: u'\xed'
CGI Variables
AUTH_TYPE	'cookie'
CONTENT_TYPE	'; charset=utf-8'
DOCUMENT_ROOT	'/htdocs'
GATEWAY_INTERFACE	'CGI/1.1'
HTTP_ACCEPT	'*/*'
HTTP_ACCEPT_CHARSET	'ISO-8859-1,utf-8;q=0.7,*;q=0.3'
HTTP_ACCEPT_ENCODING	'gzip,deflate,sdch'
HTTP_ACCEPT_LANGUAGE	'en-US,en;q=0.8'
HTTP_CACHE_CONTROL	'max-age=259200'
HTTP_CONNECTION	'keep-alive'
HTTP_COOKIE	'thedatahub_net=27a7f095fcca1ea6b36df996d595e3278b16f4538862bf7f88d49e2000b9246547c8fd0e; auth_tkt="f9c6ab2b0d9fcd71c4c2408bc12fab544eef1c45elenaibp!userid_type:unicode"; auth_tkt="f9c6ab2b0d9fcd71c4c2408bc12fab544eef1c45elenaibp!userid_type:unicode"; ckan_user=elenaibp; ckan_display_name="Elena Mondo"; ckan_apikey=decd48b1-49ee-4250-bff4-98ccca9c02a5; hide_welcome_message=1; __utma=119670349.1809834699.1323782464.1324293066.1324298316.4; __utmb=119670349.3.10.1324298316; __utmc=119670349; __utmz=119670349.1323782464.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)'
HTTP_HOST	'thedatahub.org'
HTTP_REFERER	'http://thedatahub.org/dataset/edit/budget-library-czeck-republic'
HTTP_USER_AGENT	'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/535.7 (KHTML, like Gecko) Chrome/16.0.912.63 Safari/535.7'
HTTP_VIA	'1.1 localhost (squid/3.0.STABLE19)'
HTTP_X_FORWARDED_FOR	'87.114.74.190'
HTTP_X_REQUESTED_WITH	'XMLHttpRequest'
PATH	'/usr/local/bin:/usr/bin:/bin'
PATH_INFO	'/api/storage/auth/form/2011-12-19T124447/Ministerstvo-financ\xc3\xad-\xc4\x8cesk\xc3\xa9-republiky-_-P\xc5\x99\xc3\xadprava-rozpo\xc4\x8dtu.pdf'
PATH_TRANSLATED	'/home/okfn/var/srvc/ckan.net/pyenv/bin/ckan.net.py/api/storage/auth/form/2011-12-19T124447/Ministerstvo-financ\xc3\xad-\xc4\x8cesk\xc3\xa9-republiky-_-P\xc5\x99\xc3\xadprava-rozpo\xc4\x8dtu.pdf'
REMOTE_ADDR	'193.34.146.142'
REMOTE_PORT	'55419'
REMOTE_USER	u'elenaibp'
REMOTE_USER_DATA	'userid_type:unicode'
REMOTE_USER_TOKENS	['']
REQUEST_METHOD	'GET'
REQUEST_URI	'/api/storage/auth/form/2011-12-19T124447/Ministerstvo-financ%C3%AD-%C4%8Cesk%C3%A9-republiky-_-P%C5%99%C3%ADprava-rozpo%C4%8Dtu.pdf'
SCRIPT_FILENAME	'/home/okfn/var/srvc/ckan.net/pyenv/bin/ckan.net.py'
SCRIPT_URI	'http://thedatahub.org/api/storage/auth/form/2011-12-19T124447/Ministerstvo-financ\xc3\xad-\xc4\x8cesk\xc3\xa9-republiky-_-P\xc5\x99\xc3\xadprava-rozpo\xc4\x8dtu.pdf'
SCRIPT_URL	'/api/storage/auth/form/2011-12-19T124447/Ministerstvo-financ\xc3\xad-\xc4\x8cesk\xc3\xa9-republiky-_-P\xc5\x99\xc3\xadprava-rozpo\xc4\x8dtu.pdf'
SERVER_ADDR	'193.34.146.146'
SERVER_ADMIN	'[no address given]'
SERVER_NAME	'thedatahub.org'
SERVER_PORT	'80'
SERVER_PROTOCOL	'HTTP/1.0'
SERVER_SIGNATURE	'<address>Apache/2.2.14 (Ubuntu) Server at thedatahub.org Port 80</address>\n'
SERVER_SOFTWARE	'Apache/2.2.14 (Ubuntu)'
WSGI Variables
application	<beaker.middleware.CacheMiddleware object at 0x7f22601c7dd0>
beaker.cache	<beaker.cache.CacheManager object at 0x7f22601c7b50>
beaker.get_session	<bound method SessionMiddleware._get_session of <beaker.middleware.SessionMiddleware object at 0x7f22601c7a90>>
beaker.session	{'_accessed_time': 1324298703.071357, '_creation_time': 1324293077.4139669}
mod_wsgi.application_group	'ckan.net|'
mod_wsgi.callable_object	'application'
mod_wsgi.listener_host	''
mod_wsgi.listener_port	'80'
mod_wsgi.process_group	'ckan.net'
mod_wsgi.reload_mechanism	'1'
mod_wsgi.script_reloading	'1'
mod_wsgi.version	(2, 8)
paste.cookies	(<SimpleCookie: __utma='119670349.1809834699.1323782464.1324293066.1324298316.4' __utmb='119670349.3.10.1324298316' __utmc='119670349' __utmz='119670349.1323782464.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)' auth_tkt='f9c6ab2b0d9fcd71c4c2408bc12fab544eef1c45elenaibp!userid_type:unicode' ckan_apikey='decd48b1-49ee-4250-bff4-98ccca9c02a5' ckan_display_name='Elena Mondo' ckan_user='elenaibp' hide_welcome_message='1' thedatahub_net='27a7f095fcca1ea6b36df996d595e3278b16f4538862bf7f88d49e2000b9246547c8fd0e'>, 'thedatahub_net=27a7f095fcca1ea6b36df996d595e3278b16f4538862bf7f88d49e2000b9246547c8fd0e; auth_tkt="f9c6ab2b0d9fcd71c4c2408bc12fab544eef1c45elenaibp!userid_type:unicode"; auth_tkt="f9c6ab2b0d9fcd71c4c2408bc12fab544eef1c45elenaibp!userid_type:unicode"; ckan_user=elenaibp; ckan_display_name="Elena Mondo"; ckan_apikey=decd48b1-49ee-4250-bff4-98ccca9c02a5; hide_welcome_message=1; _ _utma=119670349.1809834699.1323782464.1324293066.1324298316.4; __utmb=119670349.3.10...)|utmcmd=(none)')
paste.registry	<paste.registry.Registry object at 0x7f226194df50>
paste.throw_errors	True
pylons.action_method	<bound method StorageAPIController.auth_form of <ckanext.storage.controller.StorageAPIController object at 0x7f2261dad990>>
pylons.controller	<ckanext.storage.controller.StorageAPIController object at 0x7f2261dad990>
pylons.environ_config	{'session': 'beaker.session', 'cache': 'beaker.cache'}
pylons.pylons	<pylons.util.PylonsContext object at 0x7f2261daddd0>
pylons.routes_dict	{'action': u'auth_form', 'controller': u'ckanext.storage.controller:StorageAPIController', 'label': u'2011-12-19T124447/Ministerstvo-financ\xed-\u010cesk\xe9-republiky-_-P\u0159\xedprava-rozpo\u010dtu.pdf'}
repoze.who.identity	<repoze.who identity (hidden, dict-like) at 139785645747120>
repoze.who.logger	<logging.Logger instance at 0x7f225e23c098>
repoze.who.plugins	{'openid': <OpenIdIdentificationPlugin 139785625065680>, 'friendlyform': <FriendlyFormPlugin 139785618095248>, 'ckan.lib.authenticator:UsernamePasswordAuthenticator': <ckan.lib.authenticator.UsernamePasswordAuthenticator object at 0x7f2260874c10>, 'auth_tkt': <AuthTktCookiePlugin 139785625065808>, 'ckan.lib.authenticator:OpenIDAuthenticator': <ckan.lib.authenticator.OpenIDAuthenticator object at 0x7f2260874c90>}
routes.route	<routes.route.Route object at 0x7f22601a1090>
routes.url	<routes.util.URLGenerator object at 0x7f2261dadf50>
webob._parsed_query_vars	(GET([]), '')
webob.adhoc_attrs	{'language': 'en-us'}
wsgi process	'Multiprocess'
wsgi.file_wrapper	<built-in method file_wrapper of mod_wsgi.Adapter object at 0x7f2261da9af8>
wsgiorg.routing_args	(<routes.util.URLGenerator object at 0x7f2261dadf50>, {'action': u'auth_form', 'controller': u'ckanext.storage.controller:StorageAPIController', 'label': u'2011-12-19T124447/Ministerstvo-financ\xed-\u010cesk\xe9-republiky-_-P\u0159\xedprava-rozpo\u010dtu.pdf'})
1324317659000000 1325473564000000
#3019 defect seanh ckan 2.0 new Cannot delete dataset extras

Deleting extras in the web interface is broken

1352918678000000 1352918678000000
#1659 defect dread dread ckan-sprint-2012-01-23 closed fixed Cannot logout if CKAN mounted at non-root url

If you set WSGIScriptAlias to mount CKAN at a URL other than / then you cannot logout without adjusting the OpenID logged_out_url to match in who.ini config. e.g.

[plugin:openid] ... logged_out_url = /sub/dir/user/logged_out

Note: all the other URLs in who.ini should not have the /sub/dir/ - it is just this one that doesn't take account of the mounting point.

The solution is to fix-up the repoze.who OpenID plugin to take account of the mounting point.

1326716302000000 1326747205000000
#1431 defect dread dread ckan-v1.5 closed fixed Captcha field - foreign chars cause exception

During registering a user, the user inputs foreign chars into the captcha field.

URL: http://thedatahub.org/user/register
...
Module ckan.lib.captcha:22 in check_recaptcha
<<                                     remoteip=client_ip_address,
                                          challenge=recaptcha_challenge_field,
                                          response=recaptcha_response_field))
           f = urllib2.urlopen(recaptcha_server_name, params)
           data = f.read()
>>  response=recaptcha_response_field))
Module urllib:1267 in urlencode
<<          for k, v in query:
                   k = quote_plus(str(k))
                   v = quote_plus(str(v))
                   l.append(k + '=' + v)
           else:
>>  v = quote_plus(str(v))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xea' in position 0: ordinal not in range(128)
1320078849000000 1320084104000000
#2725 enhancement toby shevski demo phase 5 new Case sensitivity on tags

My feeling is that 'country-US' and 'country-us' should be the same tag. However currently tags with caps are treated differently

see http://s031.okserver.org:2375/en/dataset/test-dataset

with TEST and test - there also get indexed twice in the search page

1342949667000000 1343030773000000
#480 requirement thejimmyg johnbywater ckan-v1.4 closed fixed Catalogue service shall conform to specification

Common requirements for running CKAN behind a (e.g Wordpress or Drupal) front-end:

  1. Unrestricted total read-only access to catalogue API for general public (e.g. voluntary organisation).
    • monitored by API key
    • not monitored by API key
  2. Restricted total read-write access to catalogue API for authorized clients (e.g. front-end system, bulk upload clients).
    • restricted by CKAN access controller
    • restricted by HTTP Auth
    • restricted by IP address
  3. Restricted total read-write access to catalogue Web UI for authorized users (e.g. site admins).
    • restricted by CKAN access controller
    • restricted by HTTP Auth
  4. Restricted partial read-write access to catalogue Web UI for authorized users (e.g. group admins).
    • restricted by CKAN access controller
    • restricted by HTTP Auth

CKAN as a catalogue service

1282422612000000 1300281551000000
#488 requirement johnbywater closed wontfix Catalogue service shall notify RDF service 1282426021000000 1320930240000000
#486 requirement johnbywater ckan-v1.3 closed duplicate Catalogue service shall notify and query SOLR service 1282425790000000 1291639321000000
#1616 defect amercader amercader ckan-sprint-2012-04-02 closed fixed Catch exceptions when rebuilding the search index

Right now if an exception is found while reindexing, the whole process stops and the remaining datasets are left out of the index. The process should continue after logging the exception. If more than a certain number of exceptions occur in a row, the process should stop.

1325844669000000 1332327635000000
#1809 enhancement johnglover johnglover ckan-sprint-2012-03-05 closed fixed Catch request exceptions in archiver link_checker task

Some request exceptions are currently not being caught (see the celery log on thedatahub for examples)

1329746267000000 1330528828000000
#1609 enhancement ross ross ckan-sprint-2012-01-23 closed fixed Celery task for ckanext-archiver to write to webstore.

From super Storage changes - #1574 - and http://ckan.okfnpad.org/newstorage we determined that ckanext-archiver should have a celery task for grabbing local file uploads and writing to webstore

Analysis

When I upload a file to CKAN:

  • End up with file in permanent storage
  • IF file is ot type ... csv,xls,xlsx,sqlite,.sql
    • End up with new db in webstore
      • Where? {username}/{resource-id}/...
        • If a single table: name it after the file name (appropriately slugified)
      • A resource *always* corresponds to a 'database' in webstore ...
      • In Data Explorer have "Sheets" tab ...
  • Resource url = /dataset/{x}/resource/{y}/link -> cached_url ...
1325582253000000 1327057030000000
#326 task dread dread v1.1 closed fixed Centralise importation of json library

Later versions of python use json which is better than simplejson, but it must be kept as an option for compatibility. So centralise the import of json to ckan.lib.helpers.

1274784223000000 1274789296000000
#760 task johnbywater johnbywater ckan-v1.3 closed duplicate Change "CSW Get Records" request class to accept and used given CSW filter 1288040993000000 1294409111000000
#758 task johnbywater johnbywater ckan-v1.3 closed duplicate Change API documentation to indicate harvest source entity has filter attribute 1288040643000000 1294409053000000
#2315 enhancement dread dread ckan-sprint-2012-04-30 closed fixed Change Cookie expiry

Change login cookie from a default expiry of 50 years to 2 years. You can also uncheck a 'remember me' checkbox on the login form for the cookie to just last the session.

Background conversation on ckan-dev:

DR: I wonder if anyone objects to the expiry of the login cookie to be changed from 50 years to 2 years? 50 years might be appropriate for thedatahub.org, but for government sites it seems (to me) to be too lax.

Toby: is this the repoze.who cookie? If so that seems sensible to me.

Rufus: Definitely agree. I would also like to see introduction of a standard "remember me" checkbox (set to true by default). At the moment a login lasts forever (until you logout) automatically.

1334919449000000 1334919522000000
#461 task dread johnbywater ckan-v1.2 closed fixed Change ONS data importing to work via API
  • Move script out to ckandgu repo
  • Change script to convert xml into package dicts
  • Test (against test.ckan.net, hmg.test.ckan.net)
  • Deploy
1282303411000000 1283250478000000
#1653 enhancement toby ross ckan-sprint-2012-03-05 closed fixed Change URLs for multilingual site

To support multiple languages we should have an easy way to specify the language as part of the URL, so that URLs are both specific and we also reduce the dependency on the session.

  • Analysis [1d] - Find the best way of implementing this and how everyone else does their language URLs.
  • Write Middleware + update url_for to take account of the language. [2d]

  • Document the language setup, and how to replicate it. [1d]
1326710590000000 1329845387000000
#2679 enhancement icmurray icmurray ckan-v1.9 new Change default behaviour of TemplateController.view to 404.

The current behaviour of TemplateController?.view() (which is the fallback controller should all others fail) is to attempt to render (as a genshi template) the requested file.

Although this may be a feature that some instances want. In general, it leads to:

  • 500s when attempting to access a normal template (eg - http://datahub.io/importer/preview)
  • A way of inadvertantly serving things you may not want to serve. (Small risk, as it needs to be renderable as a genshi template).

Solution:

  • Change the controller to 404
  • Ensure there's a way for existing ckan instances to override that behaviour should they need it.
1342436133000000 1342436133000000
#1149 enhancement kindly kindly ckan-v1.5-sprint-1 closed fixed Change domain object modification plugin to use Session extension.

This should make it more efficient as it currently does a lot of repeating work. i.e if you change a package and a resource in the same commit it sends out 2 notifications and should only really send out 1.

1305969863000000 1306090663000000
#198 enhancement rgrp dread closed fixed Change package and tag ids to uuids

See how we did it already for other things.

Note: on ckan.net older PackageRevision?.id might not be identical to Package.id but this may need sorting at this point.

1258980613000000 1266837606000000
#752 task johnbywater johnbywater ckan-v1.3 closed duplicate Change package attribute names used by Gemini harvesting to DGU "v.4" 1288039205000000 1294408472000000
#126 enhancement dread dread v0.10 closed fixed Change package state in the WUI (delete and undelete)

As a Package Admin I want to change the state of the package. In particular I wish to delete and undelete it.

(NB: this is quite separate from "purging" objects which is the term we shall use for irrevocable removal of an object from the domain model).

  • Only Package Admins (and sysadmins) should be able to change state

Implementation Suggestions

  • 'delete' action should be renamed to 'change-state' (NB: this requires a db migration ...)
  • Have new package formalchemy form (created via inheritance?) to incorporate state attribute. Suggest this is rendered as a dropdown (and may be simple object rendering of state, i.e. do NOT need to change it to a single name such 'active').
  • This form should then be used when the user satisfies is_authorized(..., model.Action.CHANGE_STATE) instead of the usual fieldset
1253789571000000 1254740244000000
#2246 enhancement johnglover johnglover ckan-sprint-2012-04-02 closed fixed Change published_by metadata field to reference group instead of a custom extra
  • probably needs a new converter, as needs to be usable via API as 'published_by'.
1332243036000000 1332864871000000
#2923 defect seanh ckan 2.0 new Change regularise -> regularize

The function is called regularise_html(), can't remember what file it's in.

1347530582000000 1347530582000000
#252 enhancement dread johnbywater closed invalid Change revision object so that it has parent(s) attribute 1266519767000000 1296477560000000
#1534 enhancement rgrp ckan-backlog new Change revisions to record userid rather than username

The use of username is problematic because username's can change.

  • Change all revision creation code to use user id (simplest is to change c.author field in lib/base.py (?))
    • (?) Add a field ipaddr for ip address of anonymous users? (or just keep putting this in author field on Revision and then acception that those won't match when we do a look up against user table)
  • Change user view page to look up against user id rather than name
  • Perform migration on existing Revision objects
    • Match should probably be against both openid and username when searching Revisions' author field (especially true on CKAN where some people have already changed their username from being their openid)
1323278790000000 1338205050000000
#2206 enhancement johnglover johnglover ckan-sprint-2012-03-19 closed fixed Change site header to match latest ODP template 1330958095000000 1331046486000000
#62 enhancement dread rgrp v0.10 closed fixed Change tags to contain any character (other than space)

Requires us to url encode the tag names when displaying them ...

1240585095000000 1250181376000000
#414 task johnbywater dread ckan-v1.2 closed fixed Change the Apache and Varnish ports

Ask Paul for a new machine for testing. Then one for varnish-live and one for varnish-test.

1281431639000000 1288003770000000
#925 defect dread ckan-backlog closed fixed Change the search box icon to remove the down arrow

Is there a good reason why the search box has a 'down arrow' icon when there is no drop-down menu? Or can this be usefully removed?

1295867593000000 1323168588000000
#725 story pudo pudo iati-3 closed fixed Change to allow anyone (logged in) to create a publisher

With a pending state set ("unapproved")

1287584630000000 1289296038000000
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Note: See TracReports for help on using and creating reports.