Proposer: David Raznick
Abstract.
We are trying to achieve these goals.
- To get people involved with making edits to CKAN metadata.
- To have an ownership model as to who can moderate and validate these changes
- To not put too huge a burden on these owners.
In order to achieve this, a feature which lets anyone edit a package but only let the moderator/owner accept it. The moderator should be able to look at a list of changes and accept the ones that
This cep is not about 'if' we need such a feature, it is about 'how' we go about implementing it. Another cep may needed for the 'if' case.
The Problem
We need the following to be possible.
- Storing revision of objects that are not the current active one.
- A way of the user viewing past revisions.
- Accessing not only the history of a particular object but also of related objects at that time. i.e If a resource related to a package changes we need a way to see this when looking at the package.
- A robust way of doing this in the face of database schema changes.
- Make sure database queries are quick.
Solutions.
- Store the whole dictization of the package and all its related objects every time you change anything in its dictized representation and only save to the database proper if accepted.
Pros
- Easy to implement, we already have a preview which makes the dictized form of a package without actually saving it. This will just need to be persisted in some way.
- Fast retrieval.
- Potential to store a branching revision tree of changes.
Cons
- No easy way to remake the dictized packages historically or if there is an there a change in the way we represent packages, i.e schema changes.
- Will only work for the particular objects we decide to store these changes for.
- Stores a lot of repeated information
- Write specialized queries for every read of the database looking only at the revision tables.
This method requires there to be a change in the way we use VDM, so that we manage statefulness ourselves. We will need to add other states such as 'waiting for approval'.
Pros
- No specialized storage required
- Only need to change queries when schema changes
- Can be made to work easily for other objects
Cons
- Slower query time on read, as even looking at the last active package will need to do a fairly complicated query.
Implementation details.
1.
A new table with columns id, user, package_id, timestamp, revision_id, parent_id, dictized_package.
revision_id should be null unless it is actually persisted to the database. parent_id is the id that this package_dict was changed from.
We could store only the diffs of the dictized_package as long as we assure that everything inside the json is stably sorted, this will make getting the historical data out slower.
Getting out the history of the dictized packages is an intensive task, as it will require replaying the whole history of all the changes and creating the dict for each change. This re-caching will need to be redone for every change we make to dictized representation of a package.
2.
Every normal packages read needs to look at the revision table to see the last accepted change in the dictized representation of the package.
We also need to way to get what the dictized representation of the package was like at any point of its revision history. This querying is non-trivial in sql.
Participants
David Raznick to do it.
Progress.
Decided to go with option 2. However we will change the revisioning system to be like the schema attached.
This gets rid of difficult querying problems caused by querying the revision tables by adding an end date, meaning you can do range queries.
The better and more normalized version of a revisioning system is outlined https://docs.google.com/drawings/d/1Y7nMgVsrs081Pame2RdbZHlCAlV33ddTZ8VAsab1j-0/edit?hl=en_GB&authkey=CJfd8vsB.
We will be a step closer to that, with this change, but we will keep the current vdm more or less, intact.
|