Ticket #235 (new enhancement) — at Version 1
Resource format normalization and detection
Reported by: | dread | Owned by: | rgrp |
---|---|---|---|
Priority: | awaiting triage | Milestone: | ckan-v1.9 |
Component: | ckan | Keywords: | |
Cc: | Repository: | ckan | |
Theme: | none |
Description (last modified by pudo) (diff)
Try to gather proper MIME information for all package resources in CKAN. This is a shared ticket with dcat-tools (https://bitbucket.org/pudo/dcat-tools), i.e. opendatasearch.org. This can then also be used by ckanrdf, the CKAN RDF conversion service.
Sub-tasks:
- Create a Google Spreadsheet with two Worksheets: "MIME-Mappings", i.e. "CSV" -> "text/csv" and "Name mappings", i.e. "text/csv" -> "Comma-Separated Spreadsheet".
- Collect and map surface forms from all CKANs
- Access this via Swiss and apply, store as a PackageResource? extra field pending #826 (Resource extras).
- Add heuristics for format auto-detections:
- Map well-known file extensions
- Recognize obvious magic (Zip, Tar)
- Peek into Zipfile/Tarfiles?
- Define a convention for generic data types (many CKAN packages have only "Spreadsheet" defined, either detect specific type or set MIME to */tabular-data or similar)
- See also: #816 (Autocomplete for the resource format field)
Change History
Note: See
TracTickets for help on using
tickets.