Version 3 (modified by johnbywater, 4 years ago) (diff)

added consideration of longer text attribute merging

Abstracting and continuing considerations from:

Apparent wants and needs:

  • to distribute local CKAN changes to foreign CKAN instances
  • to preserve history (perhaps so that normally each instance's Recent Changes resembles the others')
  • to support selective distribution (either by selecting for some logical subset of CKAN records, or by manual approval of changes on a case-by-case basis, or a combination, or otherwise)

Inferred wants and needs:

  • to support use through firewall (client-server: where client pushes changes on client to server, and client polls and pulls changes on server to client)
  • to support use across internet (peer-peer: each accepts listeners, each can register to listen for changes, each notifies its listeners of any new changes, each listener pulls changes it doesn't have, etc)

Highlighted core models of Mercurial and of CKAN: (because it is thought that CKAN is like a DVCS)

of Mercurial:

  • Repository (create with: hg init; hg clone)
  • File (create with: editor)
  • Repository History (create with: hg log; hg glog)
  • Changeset (create with: hg commit; hg merge)
  • Working Directory (create with: hg update)
  • Changeset Patch (create with: hg export)
  • Branch (create with hg: pull)

of CKAN:

  • Repository (create with: paste setup-app)
  • Package
  • Tag
  • Group
  • Recent Changes
  • Revision

Concepts that don't easily carry over from Mercurial to CKAN:

  • Mercurial Branch (CKAN just has a single change history)
  • Mercurial Merge (CKAN doesn't have any branches to merge)
  • Mercurial Push/Pull? (CKAN can't send or receive foreign branches)
  • Working Directory (CKAN presents its repository directly)

Concepts that do easily carry over from Mercurial to CKAN:

  • Changeset Patch
  • Export/Import? (which BTW causes changes to be applied to the working directory before committing, that is different from Pull which doesn't affect the working directory - also worth noting that import normally aborts if there are outstanding changes in the working directory, which would carry over to CKAN should received changeset patches be queued and progressively applied unless there is a conflict, which [could] normally cause the queue to be held up, and notification to be sent to the site admins to intervene, after the conflicting patch is resolved the queue would continue until the next conflict -- other behaviours could include: automatic merging; automatic skipping; intervention each time)

Actions needed in CKAN to distribute changes (functional requirements):

  • Changeset patch, creation (on new revision: create serialised diff)
  • Changeset patch, publication (handle register-get and entity-get, searchable, publish-subscribe)
  • Changeset patch, retrieval (get and add new changeset patches to local queue entity-get, add to queue)
  • Changeset patch, conflict detection (possibly by asserting either that new values of changed attributes match current values of same attributes in model - so the patch would leave the local model in it's current state, or that old values of changed attributes match current values of same attributes in model - with refinements for merging diffs to "long text")
  • Changeset patch, resolution (human response to conflict notification, decide new state, continue the queue)
  • Changeset patch, application (model merge, record changeset patch has been applied)
  • Model Merge (to include add/remove aggregated children e.g. packages)
  • Model Package Merge (to include add/remove child associations e.g. taggings)
  • Model Tag Merge (if there are any editable attributes )
  • Model Group Merge (if there are any editable attributes)
  • Model Text Merge (so parts of a longer piece of text can be merged into an otherwise conflicting text attribute)
  • CKAN merge (merge queued changeset patches into the model: FIFO, for each: if changeset patch conflicts, request resolution and stop; otherwise apply changeset patch and continue with next in queue)
  • CKAN pull (retrieve new patches)
  • CKAN push (send new patches)

Sub-domain models needed in CKAN:

  • changeset patch model (need uuids for changeset patches, need each to record which patches have already been received, need to record what state of application they are in, need to arrange things so that new change numbers are created only in the case where the change does not arise from applying a patch)
  • changeset patch notification publish-subscribe model (for event-driven changeset patch distribution)


  • the more divergent two instances the more likely it is that a changeset will conflict, so there is a very good reason to make the changeset distribution loop as tight as possible (in order to minimise the need to conflicts to be resolved) -- hence the event-driven peer-peer considerations
  • the frequency of human intervention will also depend on the strictness of the changeset patch conflict detection and the forcefulness of the patch application mechanism