id	summary	reporter	owner	description	type	status	priority	milestone	component	resolution	keywords	cc	repo	theme
235	Resource format normalization and detection	dread	tobes	"Try to gather proper MIME  information for all package resources in CKAN. This is a shared ticket with dcat-tools (https://bitbucket.org/pudo/dcat-tools), i.e. opendatasearch.org. This can then also be used by ckanrdf, the CKAN RDF conversion service. 

Sub-tasks: 

 * Create a Google Spreadsheet with two Worksheets: ""MIME-Mappings"", i.e. ""CSV"" -> ""text/csv"" and ""Name mappings"", i.e. ""text/csv"" -> ""Comma-Separated Spreadsheet"". 
 * Collect and map surface forms from all CKANs
 * Access this via Swiss and apply, store as a PackageResource extra field pending #826 (Resource extras). 
 * Add heuristics for format auto-detections: 
  * Map well-known file extensions 
  * Recognize obvious magic (Zip, Tar)
  * Peek into Zipfile/Tarfiles
 * Define a convention for generic data types (many CKAN packages have only ""Spreadsheet"" defined, either detect specific type or set MIME to */tabular-data or similar)
 * See also: #816 (Autocomplete for the resource format field)"	enhancement	assigned	awaiting triage	ckan-v1.9	ckan				ckan	none
