Ticket #1077 (new enhancement) — at Version 4

Opened 3 years ago

Last modified 23 months ago

Switch to new vdm changeset model

Reported by: rgrp Owned by: kindly
Priority: awaiting triage Milestone: ckan-backlog
Component: ckan Keywords:
Cc: Repository: ckan
Theme: none

Description (last modified by kindly) (diff)

Have developed a new "changeset" based model for revisioning in vdm. This has several advantages:

  • Much simpler
  • Cleaner separation of continuity from changesets
    • Supports certain operations that are impossible now (e.g. deleting all changes to a particular object irrespective of whether other objects were changed in same revisions).
  • Easier support for pending state and similar behaviour
  • No need to introduce new tables (and hence migrations) when making something revisioned (or not).
  • Almost identical API

Possible Disadvantages

  • Difficult to query revision history. Currently we have a way of finding out the diffs of particular packages. These diffs *include* changes to objects associated with packages (i.e a resource attached to a package). With the new model the only way to get this information is by looking in the json stored in the change object which is very awkward.
    • RP: not sure this is true. You can query on object id very easily in the changeset model. Possible complication here is working out what objects are associated to say a package (e.g. have to look up ids of package_tags) but this does not seem more problematic than what you would do in other model to achieve the same ends.s
      • DR: In looking for related objects we do joins between revision tables and the main tables. For example we join the package_extras_revision table to the package table. We could not do this with the new model as we would need to look into change object table dict for the join, which is painful. Also the object_ids are tuples as the moment which is difficult to join on.
  • Does not give us anything extra if we simplify our use of vdm currently. (see alternative below)
    • RP: not quite true. E.g. pending support and API.
      • DR: pending support would be there if we did not use any stateful lists/dicts and use vdm as a copy on write only with revision_id only. I do not know what you mean by api.
  • A large change to database structure needs to happen.


  • The main challenge with this change is schema and data migration


Every revisioned object has a revision_id and revision attribute.

Approximate algorithm:

Revision -> Changeset

for revtype in [PackageRevision, ...]:
    for pkgrev in package_revision:
        changeset = lookupchangeset(package_revision)
        ChangeObject(cset, (table, id), dictize(pkgrev))


  • does pkg include tags attributes or not? or we have to dictize, pkgrev, pkg2tagrev, and tag. Probably the latter.


Instead of restructuring the whole of the database to fit the new changeset model just simplifying our use of the current vdm by removing stateful list/dicts and handling this state ourselves in the logic layer could be adequate. The vdm would then be just a simple copy on write mechanism at the table level. This seems to cover all advantages/disadvantages above.

Change History

comment:1 Changed 3 years ago by rgrp

  • Priority changed from awaiting triage to critical
  • Description modified (diff)

comment:2 Changed 3 years ago by kindly

  • Description modified (diff)

comment:3 Changed 3 years ago by rgrp

  • state set to draft
  • Description modified (diff)

comment:4 Changed 3 years ago by kindly

  • Description modified (diff)
Note: See TracTickets for help on using tickets.