wiki:DistributingChanges

Version 6 (modified by johnbywater, 4 years ago) (diff)

--

Abstracting and continuing considerations from:

Wants and needs

Apparent wants and needs:

  • to distribute Changesets from one (local) CKAN Repository to a foreign CKAN Repository
    • to do this on a repeated basis
  • to be able to also pull Changesets from a foreign CKAN repository to a local one
    • to be able to do this with a foreign repository that has previously received Changesets from the local (in particular Changesets made on foreign instance can be pulled back without duplication of Changesets pushed from the local repository)
  • to preserve history (perhaps so that normally each instance's Recent Changes resembles the others')
  • to support selective distribution (either by selecting for some logical subset of CKAN records, or by manual approval of changes on a case-by-case basis, or a combination, or otherwise)

Inferred wants and needs:

  • to support use through firewall (client-server: where client pushes changes on client to server, and client polls and pulls changes on server to client)
  • to support use across internet (peer-peer: each accepts listeners, each can register to listen for changes, each notifies its listeners of any new changes, each listener pulls changes it doesn't have, etc)
  • for this process to be integrated into the AccessControl

Comparison between Mercurial and CKAN Situation

Highlighted core models of Mercurial and of CKAN: (because it is thought that CKAN is like a DVCS)

of Mercurial:

  • Repository (create with: hg init; hg clone)
  • File (create with: editor)
  • Repository History (create with: hg log; hg glog)
  • Changeset (create with: hg commit; hg merge)
  • Working Directory (create with: hg update)
  • Changeset Patch (create with: hg export)
  • Branch (create with hg: pull)

of CKAN:

  • Repository (create with: paste setup-app)
  • Package
  • Tag
  • Group
  • Recent Changes
  • Revision

Concepts that don't easily carry over from Mercurial to CKAN:

  • Mercurial Branch (CKAN just has a single change history)
  • Mercurial Merge (CKAN doesn't have any branches to merge)
  • Mercurial Push/Pull? (CKAN can't send or receive foreign branches)
  • Working Directory (CKAN presents its repository directly)

Concepts that do easily carry over from Mercurial to CKAN:

  • Changeset Patch
  • Export/Import? (which BTW causes changes to be applied to the working directory before committing, that is different from Pull which doesn't affect the working directory - also worth noting that import normally aborts if there are outstanding changes in the working directory, which would carry over to CKAN should received changeset patches be queued and progressively applied unless there is a conflict, which [could] normally cause the queue to be held up, and notification to be sent to the site admins to intervene, after the conflicting patch is resolved the queue would continue until the next conflict -- other behaviours could include: automatic merging; automatic skipping; intervention each time)

Actions needed in CKAN to distribute changes (functional requirements)

  • Changeset patch creation (on new revision: create serialised diff)
  • Changeset patch publication (handle register-get and entity-get, searchable, publish-subscribe)
  • Changeset patch retrieval (get and add new changeset patches to local queue entity-get, add to queue)
  • Changeset patch conflict detection (possibly by asserting either that new values of changed attributes match current values of same attributes in model - so the patch would leave the local model in it's current state, or that old values of changed attributes match current values of same attributes in model - with refinements for merging diffs to "long text")
  • Changeset patch resolution (human response to conflict notification, decide new state, continue the queue)
  • Changeset patch application (model merge, record changeset patch has been applied)
  • Model merge (to include add/remove aggregated children e.g. packages)
  • Package merge (to include add/remove child associations e.g. taggings)
  • Tag merge (if there are any editable attributes )
  • Group Merge (if there are any editable attributes)
  • Text merge (so parts of a longer piece of text can be merged into an otherwise conflicting text attribute)
  • CKAN merge (merge queued changeset patches into the model: FIFO, for each: if changeset patch conflicts, request resolution and stop; otherwise apply changeset patch and continue with next in queue)
  • CKAN pull (retrieve new patches)
  • CKAN push (send new patches)

Sub-domain models needed in CKAN:

  • changeset patch model (need uuids for changeset patches, need each to record which patches have already been received, need to record what state of application they are in, need to arrange things so that new change numbers are created only in the case where the change does not arise from applying a patch)
  • changeset patch notification publish-subscribe model (for event-driven changeset patch distribution)

Notes:

  • the more divergent two instances the more likely it is that a changeset will conflict, so there is a very good reason to make the changeset distribution loop as tight as possible (in order to minimise the need for conflicts to be resolved) -- hence the event-driven peer-peer considerations
  • the frequency of human intervention will also depend on the strictness of the changeset patch conflict detection and the forcefulness of the patch application mechanism

Glossary (Early Draft)

  • Repository: a given standalone instance containing domain objects.
  • Revision: metadata about a particular change such as unique-id, author, timestamp (and maybe more such as: parents, hash ...)
  • Patch: description of the changes to the domain model: e.g. a set of ids for changed versioned objects along with relevant diffs, necessary changes to non-versioned objects etc
  • Changeset: the combination of a Revision and its associated Patch
    • NB: a patch alone can be applied but it can only be applied "blind" (for example one has no idea whether the patch has already been applied earlier in the Repository history)
    • NB: similarly a Revision along can be applied (it is a Changeset with a null Patch)