Context Navigation

← Previous Ticket
Next Ticket →

Ticket #1037 (closed defect: fixed)

Opened 3 years ago

Last modified 3 years ago

More Robust Harvesting for DGU

Reported by:	thejimmyg	Owned by:	amercader
Priority:	major	Milestone:	ckan-v1.4-sprint-6
Component:	uklii	Keywords:
Cc:		Repository:	ckan
Theme:	none

Description

CKAN's harvesting facility is now live on DGU but there are some major improvements that could be made to make it more robust and better fit the generic CKAN harvesting framework proposed in #987.

Some of the key issues:

Error reports do not currently contain the ID or title of the document with the error.
We only have "added" and "error" logging on jobs when we really need a report of "added", "updated", "not changed" and "errors" with the items in each referencing a real metadata document for which harvesting was attempted
We need deletion and editing of sources, without deleting the harvested documents or packages
We need a more robust harvesting mechanism than a cron job or we need to deal with the case of multiple cron jobs running at once.
We need to know the last time a list of documents was scheduled for harvest and the last time each one was fetched.

Change History

comment:1 Changed 3 years ago by thejimmyg

Repository set to ckan
Theme set to none
Milestone set to ckan-v1.4-sprint-5

comment:2 Changed 3 years ago by thejimmyg

Owner changed from thejimmyg to amercader

comment:3 Changed 3 years ago by thejimmyg

Milestone changed from ckan-v1.4-sprint-5 to ckan-v1.4-sprint-6

We spent last week integrating the new harvesting architecture and testing the code but there are still some areas that need looking at

The source type and label should be part of the plugin, not named in DGU.
Need warnings if a document changes but its date doesn't -> do we have these?
I noticed there are some tests in DGU, should these perhaps be in ckanext-harvest?
If active is False, the job should not be put on the queue
Log if the wrong type of URL is entered as an error the user can see
Deny if the source is already registered
Overwrite all extras, not just merge new ones.
During the import stage use iswms.py to add an extra during import if it is a WMS so that we can add a link to the WMS later https://gist.github.com/900878
Can errors/warnings be logged in the import stage? Do all fetched documents get passed to import in one go?

comment:4 Changed 3 years ago by thejimmyg

Status changed from new to closed
state set to draft
Resolution set to fixed

Closing this now, any outstanding small issues will be logged in new tickets.

Note: See TracTickets for help on using tickets.

Download in other formats: