id summary reporter owner description type status priority milestone component resolution keywords cc repo theme 987 Common harvesting framework pudo pudo "We are now harvesting metadata from other sources in various places around CKAN. Such harvesting can include: * CSW/WFS for INSPIRE/UKLII (yields CKAN packages) * Catalogue scraping for LOD2 experiments (yields RDF graphs) * Atom/DCat for LOD2 production (yields RDF graphs) * OAI-PMH for http://datadryad.org/ and other dspace (yields CKAN packages) We should aim to consolidate the harvesting clients into a common system that is easy to extend when needed and can be re-used in different scenarios. In general, such a system would have the following stages: * Source selection: find what to download/scrape/harvest/parse * Index retrieval (i.e. package index) * Item retrieval (i.e. package entity) * (Optional: Serialization) * Normalisation * Loading/Merging into CKAN Exisiting harvesters are at: * CSW: https://bitbucket.org/okfn/ckanext-csw/src/ * Scraper+CKAN: https://bitbucket.org/pudo/dcat-tools/src/d5d96b06ec9a/dcat/crawl/" defect closed major lod2 duplicate ckan none