wiki:SyncingInstances

Version 4 (modified by dread, 4 years ago) (diff)

--

Syncing

Scenarios

  • 1-way: On setup, Server A's packages are copied to Server B. On sync, changes to packages on Server A are transferred to Server B.
  • 2-way: On setup, packages from each server is copied to the other. On sync, changes on each are transferred to the other.

We will focus now on 1-way, leaving 2-way for the future.

Requirements

  • Merging of changes from both machines. If there is a conflict then it is logged and a result is chosen.
  • Use of Server A and Server B continues undisturbed during sync.

Issues

  • Clashes of package/tag/group names.
  • Sync between CKAN instances of slightly different versions of ckan & vdm.
  • Unversioned objects - make versioned? User, Group, Authz, Rating.
  • How to test system.
  • Copy authorization tables? Allow API access to objects not-authorized for reading by visitor?

Use cases

  • First sync - all packages and revisions are copied from Server A to Server B.
  • Subsequent sync after package changes on A and/or B.
  • Sync after package purged on A. (Package also purged on B.)
  • Sync after package purged on B. (Package not recreated on B.)
  • Server B syncs at different times from a third server.
  • Package/Tag/Group? name on Server A clashes with an existing one on Server B. Log all of them. Merge tag and group. Not sure about package.
  • Objects on Server A with restricted authz are by default editable on Server B.

Operation

First sync - 10am

Server B asks "Give me all your revisions and unrevisioned objects."

Server A replies "Rev1 and associated revisions Pkg1Rev1, Pkg2Rev1, !PkgTagRev1, !PkgResource1; Rev2 with Pkg1Rev2; Tag1; User1; !PackageGroup1; Group1; ratings"

Server B creates Rev1, Pkg1Rev1, Pkg2Rev1, Pkg1Rev2, Pkg1, Pkg2, !PkgTagRev1, !PkgResource1, Tag1, User1, Auth, !PackageGroup1, Group1. UUIDs are the same as Server A.

Server B updates search vectors for Pkg1 and Pkg2.

Server records the time of the sync - 10am.

Meanwhile - 10.20am

On Server A, user updates Pkg1 twice, creating Rev2/Pkg1Rev3 and Rev3/Pkg1Rev4PkgTagRev2. User1 updates his name.

Meanwhile - 10.40am

On Server B, user updates Pkg1 once, creating Pkg1Rev5.

Sesequent sync

Server B asks "Give me revisions and diffs since 10am."

Server A replies Rev2/Pkg1Rev3, Rev3/Pkg1Rev4 and gives diff of Pkg1Rev2 -> Pkg1Rev4

Server B looks at its own revisions since 10am and sees Pkg1 now has two heads. It calculates diff of Pkg1Rev2 -> Pkg1Rev5.

Server B takes Pkg1Rev2 and applies the two diffs in the order of priority, logging any conflicts, calling the result Pkg1Rev6.

Tickets

  • Sync set-up stored in config file (server URI). Last sync status stored in local db.
  • Repository method 'all_revs_since'. It returns all revisions since a time/revision (or since the beginning).
  • Object method 'diff'. It returns a Diff object which is the diff of two ObjectResources. Already exists for Package, but need for PackageTag, PackageExtra
  • Revision method 'serialize'.
  • Diff method 'serialize'.
  • API access to revisions: /api/search/revision?since=ab49f348-fd23-ae3c
  • API access to diffs: /api/diff/revision?diff=8f77c992-5eec-4909&oldid=ab49f348-fd23-ae3c
  • API access to unrevisioned objects?
  • API access to version of CKAN / vdm.
  • Repository method 'import_revisions'. It takes serialised revisions and diffs and creates revision objects exactly matching spec.
  • Object method 'merge_diffs'. It takes an original object and two diffs that apply to it and applies them both in a new revision.