Version 5 (modified by rgrp, 4 years ago) (diff) |
---|
Abstracting and continuing considerations from:
Terminology
To facilitate discussion here is some clarificatory terminology:
- Repository: a given standalone instance containing domain objects.
- Revision: metadata about a particular change such as unique-id, author, timestamp (and maybe more such as: parents, hash ...)
- Patch: description of the changes to the domain model: e.g. a set of ids for changed versioned objects along with relevant diffs, necessary changes to non-versioned objects etc
- Changeset: the combination of a Revision and its associated Patch
- NB: a patch alone can be applied but it can only be applied "blind" (for example one has no idea whether the patch has already been applied earlier in the Repository history)
- NB: similarly a Revision along can be applied (it is a Changeset with a null Patch)
Wants and needs
Apparent wants and needs:
- to distribute Changesets from one (local) CKAN Repository to a foreign CKAN Repository
- to do this on a repeated basis
- to be able to also pull Changesets from a foreign CKAN repository to a local one
- to be able to do this with a foreign repository that has previously received Changesets from the local (in particular Changesets made on foreign instance can be pulled back without duplication of Changesets pushed from the local repository)
- to preserve history (perhaps so that normally each instance's Recent Changes resembles the others')
- to support selective distribution (either by selecting for some logical subset of CKAN records, or by manual approval of changes on a case-by-case basis, or a combination, or otherwise)
Inferred wants and needs:
- to support use through firewall (client-server: where client pushes changes on client to server, and client polls and pulls changes on server to client)
- to support use across internet (peer-peer: each accepts listeners, each can register to listen for changes, each notifies its listeners of any new changes, each listener pulls changes it doesn't have, etc)
- for this process to be integrated into the AccessControl
Comparison between Mercurial and CKAN Situation
Highlighted core models of Mercurial and of CKAN: (because it is thought that CKAN is like a DVCS)
of Mercurial:
- Repository (create with: hg init; hg clone)
- File (create with: editor)
- Repository History (create with: hg log; hg glog)
- Changeset (create with: hg commit; hg merge)
- Working Directory (create with: hg update)
- Changeset Patch (create with: hg export)
- Branch (create with hg: pull)
of CKAN:
- Repository (create with: paste setup-app)
- Package
- Tag
- Group
- Recent Changes
- Revision
Concepts that don't easily carry over from Mercurial to CKAN:
- Mercurial Branch (CKAN just has a single change history)
- Mercurial Merge (CKAN doesn't have any branches to merge)
- Mercurial Push/Pull? (CKAN can't send or receive foreign branches)
- Working Directory (CKAN presents its repository directly)
Concepts that do easily carry over from Mercurial to CKAN:
- Changeset Patch
- Export/Import? (which BTW causes changes to be applied to the working directory before committing, that is different from Pull which doesn't affect the working directory - also worth noting that import normally aborts if there are outstanding changes in the working directory, which would carry over to CKAN should received changeset patches be queued and progressively applied unless there is a conflict, which [could] normally cause the queue to be held up, and notification to be sent to the site admins to intervene, after the conflicting patch is resolved the queue would continue until the next conflict -- other behaviours could include: automatic merging; automatic skipping; intervention each time)
Actions needed in CKAN to distribute changes (functional requirements)
- Changeset patch, creation (on new revision: create serialised diff)
- Changeset patch, publication (handle register-get and entity-get, searchable, publish-subscribe)
- Changeset patch, retrieval (get and add new changeset patches to local queue entity-get, add to queue)
- Changeset patch, conflict detection (possibly by asserting either that new values of changed attributes match current values of same attributes in model - so the patch would leave the local model in it's current state, or that old values of changed attributes match current values of same attributes in model - with refinements for merging diffs to "long text")
- Changeset patch, resolution (human response to conflict notification, decide new state, continue the queue)
- Changeset patch, application (model merge, record changeset patch has been applied)
- Model Merge (to include add/remove aggregated children e.g. packages)
- Model Package Merge (to include add/remove child associations e.g. taggings)
- Model Tag Merge (if there are any editable attributes )
- Model Group Merge (if there are any editable attributes)
- Model Text Merge (so parts of a longer piece of text can be merged into an otherwise conflicting text attribute)
- CKAN merge (merge queued changeset patches into the model: FIFO, for each: if changeset patch conflicts, request resolution and stop; otherwise apply changeset patch and continue with next in queue)
- CKAN pull (retrieve new patches)
- CKAN push (send new patches)
Sub-domain models needed in CKAN:
- changeset patch model (need uuids for changeset patches, need each to record which patches have already been received, need to record what state of application they are in, need to arrange things so that new change numbers are created only in the case where the change does not arise from applying a patch)
- changeset patch notification publish-subscribe model (for event-driven changeset patch distribution)
Notes:
- the more divergent two instances the more likely it is that a changeset will conflict, so there is a very good reason to make the changeset distribution loop as tight as possible (in order to minimise the need for conflicts to be resolved) -- hence the event-driven peer-peer considerations
- the frequency of human intervention will also depend on the strictness of the changeset patch conflict detection and the forcefulness of the patch application mechanism