Ticket #235 (assigned enhancement)
Resource format normalization and detection
|Reported by:||dread||Owned by:||tobes|
Description (last modified by pudo) (diff)
Try to gather proper MIME information for all package resources in CKAN. This is a shared ticket with dcat-tools ( https://bitbucket.org/pudo/dcat-tools), i.e. opendatasearch.org. This can then also be used by ckanrdf, the CKAN RDF conversion service.
- Create a Google Spreadsheet with two Worksheets: "MIME-Mappings", i.e. "CSV" -> "text/csv" and "Name mappings", i.e. "text/csv" -> "Comma-Separated Spreadsheet".
- Collect and map surface forms from all CKANs
- Access this via Swiss and apply, store as a PackageResource? extra field pending #826 (Resource extras).
- Add heuristics for format auto-detections:
- Map well-known file extensions
- Recognize obvious magic (Zip, Tar)
- Peek into Zipfile/Tarfiles?
- Define a convention for generic data types (many CKAN packages have only "Spreadsheet" defined, either detect specific type or set MIME to */tabular-data or similar)
- See also: #816 (Autocomplete for the resource format field)