Changes between Initial Version and Version 1 of Ticket #235


Ignore:
Timestamp:
01/03/11 10:10:25 (3 years ago)
Author:
pudo
Comment:

Legend:

Unmodified
Added
Removed
Modified
  • Ticket #235

    • Property Summary changed from Resource format detect from filename extension to Resource format normalization and detection
  • Ticket #235 – Description

    initial v1  
     1Try to gather proper MIME  information for all package resources in CKAN. This is a shared ticket with dcat-tools (https://bitbucket.org/pudo/dcat-tools), i.e. opendatasearch.org. This can then also be used by ckanrdf, the CKAN RDF conversion service.  
     2 
     3Sub-tasks:  
     4 
     5 * Create a Google Spreadsheet with two Worksheets: "MIME-Mappings", i.e. "CSV" -> "text/csv" and "Name mappings", i.e. "text/csv" -> "Comma-Separated Spreadsheet".  
     6 * Collect and map surface forms from all CKANs 
     7 * Access this via Swiss and apply, store as a PackageResource extra field pending #826 (Resource extras).  
     8 * Add heuristics for format auto-detections:  
     9  * Map well-known file extensions  
     10  * Recognize obvious magic (Zip, Tar) 
     11  * Peek into Zipfile/Tarfiles 
     12 * Define a convention for generic data types (many CKAN packages have only "Spreadsheet" defined, either detect specific type or set MIME to */tabular-data or similar) 
     13 * See also: #816 (Autocomplete for the resource format field)