<?xml version="1.0"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0">
  <channel>
    <title>CKAN: Ticket #1129: CREP0002: Moderated  Edits</title>
    <link>http://localhost/ticket/1129</link>
    <description>&lt;p&gt;
&lt;strong&gt;Proposer:&lt;/strong&gt; David Raznick&lt;br /&gt;
&lt;/p&gt;
&lt;h2 id="Abstract."&gt;Abstract.&lt;/h2&gt;
&lt;p&gt;
We are trying to achieve these goals.
&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;To get people involved with making edits to CKAN metadata.
&lt;/li&gt;&lt;li&gt;To have an ownership model as to who can moderate and validate these changes
&lt;/li&gt;&lt;li&gt;To not put too huge a burden on these owners.
&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;
In order to achieve this, a feature which lets anyone edit a package but only let the moderator/owner accept it.  The moderator should be able to look at a list of changes and accept the ones that
&lt;/p&gt;
&lt;p&gt;
This cep is not about 'if' we need such a feature, it is about 'how' we go about implementing it. Another cep may needed for the 'if' case.
&lt;/p&gt;
&lt;h2 id="TheProblem"&gt;The Problem&lt;/h2&gt;
&lt;p&gt;
We need the following to be possible.
&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;Storing revision of objects that are not the current active one.
&lt;/li&gt;&lt;li&gt;A way of the user viewing past revisions.
&lt;/li&gt;&lt;li&gt;Accessing not only the history of a particular object but also of related objects at that time. i.e If a resource related to a package changes we need a way to see this when looking at the package.
&lt;/li&gt;&lt;li&gt;A robust way of doing this in the face of database schema changes.
&lt;/li&gt;&lt;li&gt;Make sure database queries are quick.
&lt;/li&gt;&lt;/ul&gt;&lt;h2 id="Solutions."&gt;Solutions.&lt;/h2&gt;
&lt;ol&gt;&lt;li&gt; Store the whole dictization of the package and all its related objects every time you change anything in its dictized representation and only save to the database proper if accepted.
&lt;/li&gt;&lt;/ol&gt;&lt;blockquote&gt;
&lt;p&gt;
Pros
&lt;/p&gt;
&lt;/blockquote&gt;
&lt;ul&gt;&lt;li&gt;Easy to implement, we already have a preview which makes the dictized form of a package without actually saving it.  This will just need to be persisted in some way.
&lt;/li&gt;&lt;li&gt;Fast retrieval.
&lt;/li&gt;&lt;li&gt;Potential to store a branching revision tree of changes.
&lt;/li&gt;&lt;/ul&gt;&lt;blockquote&gt;
&lt;p&gt;
Cons
&lt;/p&gt;
&lt;/blockquote&gt;
&lt;ul&gt;&lt;li&gt;No easy way to remake the dictized packages historically or if there is an there a change in the way we represent packages, i.e schema changes.
&lt;/li&gt;&lt;li&gt;Will only work for the particular objects we decide to store these changes for.
&lt;/li&gt;&lt;li&gt;Stores a lot of repeated information
&lt;/li&gt;&lt;/ul&gt;&lt;ol start="2"&gt;&lt;li&gt; Write specialized queries for every read of the database looking only at the revision tables.
&lt;/li&gt;&lt;/ol&gt;&lt;p&gt;
This method requires there to be a change in the way we use VDM, so that we manage statefulness ourselves. We will need to add other states such as 'waiting for approval'.
&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;
Pros
&lt;/p&gt;
&lt;/blockquote&gt;
&lt;ul&gt;&lt;li&gt;No specialized storage required
&lt;/li&gt;&lt;li&gt;Only need to change queries when schema changes
&lt;/li&gt;&lt;li&gt;Can be made to work easily for other objects
&lt;/li&gt;&lt;/ul&gt;&lt;blockquote&gt;
&lt;p&gt;
Cons
&lt;/p&gt;
&lt;/blockquote&gt;
&lt;ul&gt;&lt;li&gt;Slower query time on read, as even looking at the last active package will need to do a fairly complicated query.
&lt;/li&gt;&lt;/ul&gt;&lt;h2 id="Implementationdetails."&gt;Implementation details.&lt;/h2&gt;
&lt;p&gt;
1.
&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;
A new table with columns  id, user, package_id, timestamp, revision_id, parent_id, dictized_package.
revision_id should be null unless it is actually persisted to the database.  parent_id is the id that this package_dict was changed from.
&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;
We could store only the diffs of the dictized_package as long as we assure that everything inside the json is stably sorted, this will make getting the historical data out slower.
&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;
Getting out the history of the dictized packages is an intensive task, as it will require replaying the whole history of all the changes and creating the dict for each change.  This re-caching will need to be redone for every change we make to dictized representation of a package.
&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;
2.
&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;
Every normal packages read needs to look at the revision table to see the last accepted change in the dictized representation of the package.
We also need to way to get what the dictized representation of the package was like at any point of its revision history.  This querying is non-trivial in sql.
&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;
There are 3 ways found of doing such queries efficiently.
&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;
These ways have been profiled/tested against package_extras in dgu.  This was chosen as its likely that the package_extras_revision table is the largest table we currently have, with over 100,000 rows.
The test was to see how long it would take to get out the package_extras for each of the 6864 packages separately as they were 100 days before the test was run.
The test scripts are attached.
&lt;/p&gt;
&lt;/blockquote&gt;
&lt;ol&gt;&lt;li&gt;Using distinct on.
&lt;/li&gt;&lt;/ol&gt;&lt;blockquote&gt;
&lt;p&gt;
Pros:
&lt;/p&gt;
&lt;/blockquote&gt;
&lt;ul&gt;&lt;li&gt;The fastest method tested taking 19.622389s
&lt;/li&gt;&lt;li&gt;The most intuitive and maintainable script to write.
&lt;/li&gt;&lt;li&gt;The shortest script.
&lt;/li&gt;&lt;/ul&gt;&lt;blockquote&gt;
&lt;p&gt;
Cons:
&lt;/p&gt;
&lt;/blockquote&gt;
&lt;ul&gt;&lt;li&gt;No direct sqlalchemy support (coming in 0.7) so only hand written script.
&lt;/li&gt;&lt;li&gt;Very postgres specific and not part of the sql standard.
&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;
&lt;/p&gt;
&lt;ol start="2"&gt;&lt;li&gt;Using Window functions
&lt;/li&gt;&lt;/ol&gt;&lt;blockquote&gt;
&lt;p&gt;
Pros:
&lt;/p&gt;
&lt;/blockquote&gt;
&lt;ul&gt;&lt;li&gt;Very fast method tested taking 21.120075s
&lt;/li&gt;&lt;li&gt;Fairly straight forward to write
&lt;/li&gt;&lt;li&gt;Part of sql standard.
&lt;/li&gt;&lt;/ul&gt;&lt;blockquote&gt;
&lt;p&gt;
Cons:
&lt;/p&gt;
&lt;/blockquote&gt;
&lt;ul&gt;&lt;li&gt;No direct sqlalchemy support yet also but coming in 0.7.
&lt;/li&gt;&lt;li&gt;Need upto date version of posgres.
&lt;/li&gt;&lt;li&gt;Not supported by sqlite, mysql.
&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;
&lt;/p&gt;
&lt;ol start="3"&gt;&lt;li&gt;Self join
&lt;/li&gt;&lt;/ol&gt;&lt;p&gt;
&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;
Pros:
&lt;/p&gt;
&lt;/blockquote&gt;
&lt;ul&gt;&lt;li&gt;Database agnostic.
&lt;/li&gt;&lt;/ul&gt;&lt;blockquote&gt;
&lt;p&gt;
Cons:
&lt;/p&gt;
&lt;/blockquote&gt;
&lt;ul&gt;&lt;li&gt;Fairly slow 35.285168s
&lt;/li&gt;&lt;li&gt;Difficult to write in a way thats reusable
&lt;/li&gt;&lt;li&gt;Difficult to nicely in sqlalchemy sql expression
&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;
&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;
If we were to use 1 or 2 we would need to drop any sqlite support for the tests.
&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;
There is a forth option of just returning all the rows and finding out the last change before a date however this will throw back a lot of rows especially if there are lots of edits.
&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3 id="Participants"&gt;Participants&lt;/h3&gt;
&lt;p&gt;
David Raznick to do it.
&lt;/p&gt;
&lt;h2 id="Progress."&gt;Progress.&lt;/h2&gt;
&lt;p&gt;
Need to decide on the correct solution.
&lt;/p&gt;
</description>
    <language>en-us</language>
    <image>
      <title>CKAN</title>
      <url>http://assets.okfn.org/p/ckan/img/ckan_logo_shortname.png</url>
      <link>http://localhost/ticket/1129</link>
    </image>
    <generator>Trac 0.12.3</generator>
    <item>
      
        <dc:creator>kindly</dc:creator>

      <pubDate>Sun, 08 May 2011 10:49:02 GMT</pubDate>
      <title>attachment set</title>
      <link>http://localhost/ticket/1129</link>
      <guid isPermaLink="false">http://localhost/ticket/1129</guid>
      <description>
          &lt;ul&gt;
            &lt;li&gt;&lt;strong&gt;attachment&lt;/strong&gt;
                set to &lt;em&gt;sqlspeed.py&lt;/em&gt;
            &lt;/li&gt;
          &lt;/ul&gt;
        &lt;p&gt;
script run do test speed of various ways of doing a "last before this date" query
&lt;/p&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>thejimmyg</dc:creator>

      <pubDate>Wed, 11 May 2011 12:20:19 GMT</pubDate>
      <title>priority, state changed; cc, milestone set</title>
      <link>http://localhost/ticket/1129#comment:1</link>
      <guid isPermaLink="false">http://localhost/ticket/1129#comment:1</guid>
      <description>
          &lt;ul&gt;
            &lt;li&gt;&lt;strong&gt;cc&lt;/strong&gt;
              &lt;em&gt;rufus.pollock@…&lt;/em&gt; added
            &lt;/li&gt;
            &lt;li&gt;&lt;strong&gt;priority&lt;/strong&gt;
                changed from &lt;em&gt;awaiting triage&lt;/em&gt; to &lt;em&gt;major&lt;/em&gt;
            &lt;/li&gt;
            &lt;li&gt;&lt;strong&gt;state&lt;/strong&gt;
                changed from &lt;em&gt;draft&lt;/em&gt; to &lt;em&gt;accepted&lt;/em&gt;
            &lt;/li&gt;
            &lt;li&gt;&lt;strong&gt;milestone&lt;/strong&gt;
                set to &lt;em&gt;ckan-v1.5&lt;/em&gt;
            &lt;/li&gt;
          &lt;/ul&gt;
        &lt;p&gt;
Great proposal, here's my thinking:
&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;The current VDM model (where we have things like a single package table which logs changes to a package_revision table) is very well tested and works well and we should only consider a radical change with good reason
&lt;/li&gt;&lt;li&gt;Although we are making a move from *model objects* being the central components to the *logic layer functions* being the central components, there is no need yet to have the database structures we store (for revisioning or otherwise) start to look like the dictized representation. The logic layer itself is designed to do that mapping.
&lt;/li&gt;&lt;li&gt;Since we do have this logic layer though, there is less need for "magic" SQLAlchemy objects to support dicts and lists (eg the stateful objects used to make a list of tags on a package behave like a list), a few simple (and easily debuggable) queries in the logic layer would work better.
&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;
In the longer term we may want to look at serialising more dictized-looking objects and that may lend itself to a more dictized changeset model or even a more dictized storage system (eg no-SQL) but for the time being we are not at that point.
&lt;/p&gt;
&lt;p&gt;
I recommend the following:
&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;We continue to use the current default branch of VDM (not the changeset branch)
&lt;/li&gt;&lt;li&gt;We continue to treat the package table (and other non-revision tables) as the most recent revision (even though the *active* revision displayed in CKAN might well be in the revision tables because more recent changes haven't been moderated yet)
&lt;/li&gt;&lt;li&gt;We slowly stop using stateful lists and dicts in CKAN because we have move control with explicit queries in the logic layer
&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;
Sound good?
&lt;/p&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>rgrp</dc:creator>

      <pubDate>Wed, 11 May 2011 18:53:00 GMT</pubDate>
      <title></title>
      <link>http://localhost/ticket/1129#comment:2</link>
      <guid isPermaLink="false">http://localhost/ticket/1129#comment:2</guid>
      <description>
        &lt;p&gt;
It seems like this ticket is getting a bit confused with &lt;a class="closed ticket" href="http://localhost/ticket/1076" title="enhancement: Improve revision and package purge system (closed: fixed)"&gt;ticket:1076&lt;/a&gt; (changes re vdm) :-)
&lt;/p&gt;
&lt;p&gt;
Some general conclusions relevant to both:
&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;General feeling is not to change to changeset model
&lt;ul&gt;&lt;li&gt;I'm personally +0 on changeset model
&lt;/li&gt;&lt;li&gt;I would note main benefit is simplicity and orthogonality of changesets to core system. I also think cost of change is small.
&lt;/li&gt;&lt;li&gt;That said we can simplify current model as well.
&lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;&lt;li&gt;There seems some misunderstanding: change to have logic layer has almost nothing to do with being able to remove main stateful stuff in vdm. To be able to remove most of stateful stuff in vdm requires us to make some other changes (re foreign keys from revision objects to continuity)
&lt;/li&gt;&lt;li&gt;There are other simplifications we should make to vdm before embarking on this (e.g. move to &lt;a class="missing wiki"&gt;SessionExtension?&lt;/a&gt; from &lt;a class="missing wiki"&gt;MapperExtension?&lt;/a&gt;). This is easy as that work has been done in changeset branch and can be backported.
&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;
What I don't see in this CEP as yet is any discussion of the complexities around pending stuff (e.g. do we allow multiple pending, do new edits get shown current or latest pending). I'll try and comment more on this but a lot of this was in original vdm discussion last summer.
&lt;/p&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>rgrp</dc:creator>

      <pubDate>Thu, 12 May 2011 14:13:21 GMT</pubDate>
      <title></title>
      <link>http://localhost/ticket/1129#comment:3</link>
      <guid isPermaLink="false">http://localhost/ticket/1129#comment:3</guid>
      <description>
        &lt;p&gt;
Replying to &lt;a href="http://localhost/ticket/1129#comment:2" title="Comment 2 for Ticket #1129"&gt;rgrp&lt;/a&gt;:
&lt;/p&gt;
&lt;blockquote class="citation"&gt;
&lt;p&gt;
It seems like this ticket is getting a bit confused with &lt;a class="closed ticket" href="http://localhost/ticket/1076" title="enhancement: Improve revision and package purge system (closed: fixed)"&gt;ticket:1076&lt;/a&gt; (changes re vdm) :-)
&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;
I of course ;0 meant &lt;a class="new ticket" href="http://localhost/ticket/1077" title="enhancement: Move to simpler vdm system (new)"&gt;ticket:1077&lt;/a&gt;
&lt;/p&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>kindly</dc:creator>

      <pubDate>Thu, 12 May 2011 15:01:07 GMT</pubDate>
      <title></title>
      <link>http://localhost/ticket/1129#comment:4</link>
      <guid isPermaLink="false">http://localhost/ticket/1129#comment:4</guid>
      <description>
        &lt;blockquote class="citation"&gt;
&lt;ul&gt;&lt;li&gt;There seems some misunderstanding: change to have logic layer has almost nothing to do with being able to remove main stateful stuff in vdm. To be able to remove most of stateful stuff in vdm requires us to make some other changes (re foreign keys from revision objects to continuity)
&lt;/li&gt;&lt;/ul&gt;&lt;/blockquote&gt;
&lt;p&gt;
The logic layer does not automatically help out, however it makes our life easier if we want to handle state ourselves.  For example take package tags, if we remove the stateful_m2m properties and just use normal sqlalchemy relations. We will still want statefullness (i.e active, pending, deleted) on the package_tags table. We should update those on the table ourselves in the  logic layer.
&lt;/p&gt;
&lt;blockquote class="citation"&gt;
&lt;ul&gt;&lt;li&gt;There are other simplifications we should make to vdm before embarking on this (e.g. move to &lt;a class="missing wiki"&gt;SessionExtension?&lt;/a&gt; from &lt;a class="missing wiki"&gt;MapperExtension?&lt;/a&gt;). This is easy as that work has been done in changeset branch and can be backported.
&lt;/li&gt;&lt;/ul&gt;&lt;/blockquote&gt;
&lt;p&gt;
I agree but event thought the &lt;a class="missing wiki"&gt;MapperExtension?&lt;/a&gt; way is not great, it is very well field tested.
&lt;/p&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>kindly</dc:creator>

      <pubDate>Thu, 19 May 2011 19:07:48 GMT</pubDate>
      <title>description changed</title>
      <link>http://localhost/ticket/1129#comment:5</link>
      <guid isPermaLink="false">http://localhost/ticket/1129#comment:5</guid>
      <description>
          &lt;ul&gt;
            &lt;li&gt;&lt;strong&gt;description&lt;/strong&gt;
              modified (&lt;a href="/ticket/1129?action=diff&amp;amp;version=5"&gt;diff&lt;/a&gt;)
            &lt;/li&gt;
          &lt;/ul&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>kindly</dc:creator>

      <pubDate>Thu, 19 May 2011 19:18:38 GMT</pubDate>
      <title>attachment set</title>
      <link>http://localhost/ticket/1129</link>
      <guid isPermaLink="false">http://localhost/ticket/1129</guid>
      <description>
          &lt;ul&gt;
            &lt;li&gt;&lt;strong&gt;attachment&lt;/strong&gt;
                set to &lt;em&gt;sudgested_vdm.svg&lt;/em&gt;
            &lt;/li&gt;
          &lt;/ul&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>kindly</dc:creator>

      <pubDate>Thu, 19 May 2011 19:19:16 GMT</pubDate>
      <title>attachment set</title>
      <link>http://localhost/ticket/1129</link>
      <guid isPermaLink="false">http://localhost/ticket/1129</guid>
      <description>
          &lt;ul&gt;
            &lt;li&gt;&lt;strong&gt;attachment&lt;/strong&gt;
                set to &lt;em&gt;suggested_vdm.png&lt;/em&gt;
            &lt;/li&gt;
          &lt;/ul&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>thejimmyg</dc:creator>

      <pubDate>Fri, 08 Jul 2011 12:41:02 GMT</pubDate>
      <title>status changed; resolution set</title>
      <link>http://localhost/ticket/1129#comment:6</link>
      <guid isPermaLink="false">http://localhost/ticket/1129#comment:6</guid>
      <description>
          &lt;ul&gt;
            &lt;li&gt;&lt;strong&gt;status&lt;/strong&gt;
                changed from &lt;em&gt;new&lt;/em&gt; to &lt;em&gt;closed&lt;/em&gt;
            &lt;/li&gt;
            &lt;li&gt;&lt;strong&gt;resolution&lt;/strong&gt;
                set to &lt;em&gt;fixed&lt;/em&gt;
            &lt;/li&gt;
          &lt;/ul&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>rgrp</dc:creator>

      <pubDate>Fri, 30 Dec 2011 18:01:40 GMT</pubDate>
      <title></title>
      <link>http://localhost/ticket/1129#comment:7</link>
      <guid isPermaLink="false">http://localhost/ticket/1129#comment:7</guid>
      <description>
        &lt;p&gt;
More UI work remains in &lt;a class="closed ticket" href="http://localhost/ticket/1141" title="CREP: [super] Moderated Edits User Interface (closed: fixed)"&gt;#1141&lt;/a&gt;
&lt;/p&gt;
      </description>
      <category>Ticket</category>
    </item>
 </channel>
</rss>