<?xml version="1.0"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0">
  <channel>
    <title>CKAN: Ticket #235: Resource format normalization and detection</title>
    <link>http://localhost/ticket/235</link>
    <description>&lt;p&gt;
Try to gather proper MIME  information for all package resources in CKAN. This is a shared ticket with dcat-tools (&lt;a class="ext-link" href="https://bitbucket.org/pudo/dcat-tools"&gt;&lt;span class="icon"&gt;​&lt;/span&gt;https://bitbucket.org/pudo/dcat-tools&lt;/a&gt;), i.e. opendatasearch.org. This can then also be used by ckanrdf, the CKAN RDF conversion service.
&lt;/p&gt;
&lt;p&gt;
Sub-tasks:
&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;Create a Google Spreadsheet with two Worksheets: "MIME-Mappings", i.e. "CSV" -&amp;gt; "text/csv" and "Name mappings", i.e. "text/csv" -&amp;gt; "Comma-Separated Spreadsheet".
&lt;/li&gt;&lt;li&gt;Collect and map surface forms from all CKANs
&lt;/li&gt;&lt;li&gt;Access this via Swiss and apply, store as a &lt;a class="missing wiki"&gt;PackageResource?&lt;/a&gt; extra field pending &lt;a class="closed ticket" href="http://localhost/ticket/826" title="enhancement: Resource 'extra' fields (closed: fixed)"&gt;#826&lt;/a&gt; (Resource extras).
&lt;/li&gt;&lt;li&gt;Add heuristics for format auto-detections:
&lt;ul&gt;&lt;li&gt;Map well-known file extensions
&lt;/li&gt;&lt;li&gt;Recognize obvious magic (Zip, Tar)
&lt;/li&gt;&lt;li&gt;Peek into &lt;a class="missing wiki"&gt;Zipfile/Tarfiles?&lt;/a&gt;
&lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;&lt;li&gt;Define a convention for generic data types (many CKAN packages have only "Spreadsheet" defined, either detect specific type or set MIME to */tabular-data or similar)
&lt;/li&gt;&lt;li&gt;See also: &lt;a class="closed ticket" href="http://localhost/ticket/816" title="enhancement: Autocomplete for the resource format field (closed: fixed)"&gt;#816&lt;/a&gt; (Autocomplete for the resource format field)
&lt;/li&gt;&lt;/ul&gt;</description>
    <language>en-us</language>
    <image>
      <title>CKAN</title>
      <url>http://assets.okfn.org/p/ckan/img/ckan_logo_shortname.png</url>
      <link>http://localhost/ticket/235</link>
    </image>
    <generator>Trac 0.12.3</generator>
    <item>
      
        <dc:creator>pudo</dc:creator>

      <pubDate>Mon, 03 Jan 2011 10:10:25 GMT</pubDate>
      <title>description, summary changed</title>
      <link>http://localhost/ticket/235#comment:1</link>
      <guid isPermaLink="false">http://localhost/ticket/235#comment:1</guid>
      <description>
          &lt;ul&gt;
            &lt;li&gt;&lt;strong&gt;description&lt;/strong&gt;
              modified (&lt;a href="/ticket/235?action=diff&amp;amp;version=1"&gt;diff&lt;/a&gt;)
            &lt;/li&gt;
            &lt;li&gt;&lt;strong&gt;summary&lt;/strong&gt;
                changed from &lt;em&gt;Resource format detect from filename extension&lt;/em&gt; to &lt;em&gt;Resource format normalization and detection&lt;/em&gt;
            &lt;/li&gt;
          &lt;/ul&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>pudo</dc:creator>

      <pubDate>Mon, 11 Apr 2011 07:59:48 GMT</pubDate>
      <title>repo, theme set</title>
      <link>http://localhost/ticket/235#comment:2</link>
      <guid isPermaLink="false">http://localhost/ticket/235#comment:2</guid>
      <description>
          &lt;ul&gt;
            &lt;li&gt;&lt;strong&gt;repo&lt;/strong&gt;
                set to &lt;em&gt;ckan&lt;/em&gt;
            &lt;/li&gt;
            &lt;li&gt;&lt;strong&gt;theme&lt;/strong&gt;
                set to &lt;em&gt;none&lt;/em&gt;
            &lt;/li&gt;
          &lt;/ul&gt;
        &lt;p&gt;
The first three sub-items of this ticket are done in datautil and dcat-tools:
&lt;/p&gt;
&lt;p&gt;
Basic GDocs-based normalizer:
&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;&lt;a class="ext-link" href="https://bitbucket.org/okfn/datautil/src/8bba83b4fa45/datautil/normalization/table_based.py"&gt;&lt;span class="icon"&gt;​&lt;/span&gt;https://bitbucket.org/okfn/datautil/src/8bba83b4fa45/datautil/normalization/table_based.py&lt;/a&gt;
&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;
Example of use:
&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;&lt;a class="ext-link" href="https://bitbucket.org/okfn/dcat-tools/src/0ec5012bf12a/dcat/core/normalize.py#cl-32"&gt;&lt;span class="icon"&gt;​&lt;/span&gt;https://bitbucket.org/okfn/dcat-tools/src/0ec5012bf12a/dcat/core/normalize.py#cl-32&lt;/a&gt;
&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;
Spreadsheet (as referenced in datautil source, should be a kwarg):
&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;&lt;a class="ext-link" href="https://spreadsheets.google.com/ccc?key=0AplklDf0nYxWdE8tVlRrN1F3bG9PdDBFUDNZcENDNEE&amp;amp;hl=en#gid=0"&gt;&lt;span class="icon"&gt;​&lt;/span&gt;https://spreadsheets.google.com/ccc?key=0AplklDf0nYxWdE8tVlRrN1F3bG9PdDBFUDNZcENDNEE&amp;amp;hl=en#gid=0&lt;/a&gt;
&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;
Experience so far has been that Google rate limits the current implementation so we should perform all ops in one or two big calls rather than "piece by piece".
&lt;/p&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>icmurray</dc:creator>

      <pubDate>Mon, 25 Jun 2012 12:24:17 GMT</pubDate>
      <title>owner, status changed; milestone set</title>
      <link>http://localhost/ticket/235#comment:3</link>
      <guid isPermaLink="false">http://localhost/ticket/235#comment:3</guid>
      <description>
          &lt;ul&gt;
            &lt;li&gt;&lt;strong&gt;owner&lt;/strong&gt;
              changed from &lt;em&gt;rgrp&lt;/em&gt; to &lt;em&gt;tobes&lt;/em&gt;
            &lt;/li&gt;
            &lt;li&gt;&lt;strong&gt;status&lt;/strong&gt;
                changed from &lt;em&gt;new&lt;/em&gt; to &lt;em&gt;assigned&lt;/em&gt;
            &lt;/li&gt;
            &lt;li&gt;&lt;strong&gt;milestone&lt;/strong&gt;
                set to &lt;em&gt;ckan-v1.9&lt;/em&gt;
            &lt;/li&gt;
          &lt;/ul&gt;
        &lt;p&gt;
Moved to ckan-future and unassign so that this ticket will be picked up in triage.
&lt;/p&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>icmurray</dc:creator>

      <pubDate>Mon, 25 Jun 2012 12:33:44 GMT</pubDate>
      <title></title>
      <link>http://localhost/ticket/235#comment:4</link>
      <guid isPermaLink="false">http://localhost/ticket/235#comment:4</guid>
      <description>
        &lt;p&gt;
Replying to &lt;a href="http://localhost/ticket/235#comment:3" title="Comment 3 for Ticket #235"&gt;icmurray&lt;/a&gt;:
&lt;/p&gt;
&lt;blockquote class="citation"&gt;
&lt;p&gt;
Moved to ckan-future and unassign so that this ticket will be picked up in triage.
&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;
Sorry, this comment bears no resemblemnce to what I actually did!  Assigned to tobes for 1.9.
&lt;/p&gt;
      </description>
      <category>Ticket</category>
    </item>
 </channel>
</rss>