<?xml version="1.0"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0">
  <channel>
    <title>CKAN: Ticket #698: CKAN Data API v1</title>
    <link>http://localhost/ticket/698</link>
    <description>&lt;p&gt;
This proposal is to discuss adding a new API for proxying certain spreadsheet data via JSON-P to make it possible to build simple browser apps directly off the API.
&lt;/p&gt;
&lt;p&gt;
See the attached proposal for information.
&lt;/p&gt;
</description>
    <language>en-us</language>
    <image>
      <title>CKAN</title>
      <url>http://assets.okfn.org/p/ckan/img/ckan_logo_shortname.png</url>
      <link>http://localhost/ticket/698</link>
    </image>
    <generator>Trac 0.12.3</generator>
    <item>
      
        <dc:creator>rgrp</dc:creator>

      <pubDate>Thu, 14 Oct 2010 16:24:52 GMT</pubDate>
      <title>attachment set</title>
      <link>http://localhost/ticket/698</link>
      <guid isPermaLink="false">http://localhost/ticket/698</guid>
      <description>
          &lt;ul&gt;
            &lt;li&gt;&lt;strong&gt;attachment&lt;/strong&gt;
                set to &lt;em&gt;data-api-jsonp-proxy.txt&lt;/em&gt;
            &lt;/li&gt;
          &lt;/ul&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>Stiivi</dc:creator>

      <pubDate>Tue, 07 Dec 2010 17:25:52 GMT</pubDate>
      <title>priority, component set</title>
      <link>http://localhost/ticket/698#comment:1</link>
      <guid isPermaLink="false">http://localhost/ticket/698#comment:1</guid>
      <description>
          &lt;ul&gt;
            &lt;li&gt;&lt;strong&gt;priority&lt;/strong&gt;
                set to &lt;em&gt;awaiting triage&lt;/em&gt;
            &lt;/li&gt;
            &lt;li&gt;&lt;strong&gt;component&lt;/strong&gt;
                set to &lt;em&gt;ckan&lt;/em&gt;
            &lt;/li&gt;
          &lt;/ul&gt;
        &lt;p&gt;
I see two possible options:
&lt;/p&gt;
&lt;p&gt;
Option A: store only mirrors of source files, have file format based plugins for querying files
&lt;/p&gt;
&lt;p&gt;
Option B: store mirrors of source files, have plugin based loading scripts into "common structured format", have single query module.
&lt;/p&gt;
&lt;p&gt;
I would go with option B as it is:
&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;easier to implement - file format based transformations are simpler than file format based queries
&lt;/li&gt;&lt;li&gt;more transparent data management process
&lt;/li&gt;&lt;li&gt;only one simple query module
&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;
(see attached  ckan-srcmirror.png)
&lt;/p&gt;
&lt;p&gt;
The Option B will fit better to the broader data architecture context:
&lt;/p&gt;
&lt;p&gt;
&lt;a class="ext-link" href="http://democracyfarm.org/f/ckan/data_arch.png"&gt;&lt;span class="icon"&gt;​&lt;/span&gt;http://democracyfarm.org/f/ckan/data_arch.png&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;
Concerning API I would suggest to try to be compatible with google spreadsheet API:
&lt;/p&gt;
&lt;p&gt;
&lt;a class="ext-link" href="http://code.google.com/apis/spreadsheets/data/3.0/reference.html"&gt;&lt;span class="icon"&gt;​&lt;/span&gt;http://code.google.com/apis/spreadsheets/data/3.0/reference.html&lt;/a&gt;
&lt;/p&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>Stiivi</dc:creator>

      <pubDate>Tue, 07 Dec 2010 17:26:35 GMT</pubDate>
      <title>attachment set</title>
      <link>http://localhost/ticket/698</link>
      <guid isPermaLink="false">http://localhost/ticket/698</guid>
      <description>
          &lt;ul&gt;
            &lt;li&gt;&lt;strong&gt;attachment&lt;/strong&gt;
                set to &lt;em&gt;ckan-srcmirror.png&lt;/em&gt;
            &lt;/li&gt;
          &lt;/ul&gt;
        &lt;p&gt;
CKAN Source Mirror and transformations for Data API
&lt;/p&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>thejimmyg</dc:creator>

      <pubDate>Wed, 08 Dec 2010 18:15:33 GMT</pubDate>
      <title></title>
      <link>http://localhost/ticket/698#comment:2</link>
      <guid isPermaLink="false">http://localhost/ticket/698#comment:2</guid>
      <description>
        &lt;p&gt;
Actually we've implemented a first version which doesn't store the data.
&lt;/p&gt;
&lt;p&gt;
See this post: &lt;a class="ext-link" href="http://blog.ckan.org/2010/12/04/open-data-day-announcing-ckan-data-proxy/"&gt;&lt;span class="icon"&gt;​&lt;/span&gt;http://blog.ckan.org/2010/12/04/open-data-day-announcing-ckan-data-proxy/&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;
You can get data like this:
&lt;/p&gt;
&lt;p&gt;
&lt;a class="ext-link" href="http://1.latest.jsonpdataproxy.appspot.com/?sheet=1&amp;amp;indent=4&amp;amp;url=http://research.dwp.gov.uk/asd/asd4/r1_values.xls"&gt;&lt;span class="icon"&gt;​&lt;/span&gt;http://1.latest.jsonpdataproxy.appspot.com/?sheet=1&amp;amp;indent=4&amp;amp;url=http://research.dwp.gov.uk/asd/asd4/r1_values.xls&lt;/a&gt;
&lt;/p&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>Stiivi</dc:creator>

      <pubDate>Wed, 08 Dec 2010 23:34:35 GMT</pubDate>
      <title></title>
      <link>http://localhost/ticket/698#comment:3</link>
      <guid isPermaLink="false">http://localhost/ticket/698#comment:3</guid>
      <description>
        &lt;p&gt;
@thejimmyg: It is neat simple solution.
&lt;/p&gt;
&lt;p&gt;
You have suggested a proxy API:
&lt;/p&gt;
&lt;p&gt;
&lt;em&gt;There will be a new API at &lt;tt&gt;&lt;/tt&gt;/api/spreadsheet?callback=jsonpcallback&amp;amp;url=&lt;tt&gt;&lt;/tt&gt;
&lt;/em&gt;
&lt;/p&gt;
&lt;p&gt;
There are two options:
&lt;/p&gt;
&lt;ol&gt;&lt;li&gt;Have public ckan data proxy as stand-alone service: I get package resource URL from CKAN and pass it to proxy
&lt;/li&gt;&lt;/ol&gt;&lt;ol start="2"&gt;&lt;li&gt;Have ckan data API (as ticket title suggests): If I am talking to CKAN, I am getting data from CKAN, I should not care about proxy or anything behind nor I should care about original data source - I care about resource data in a format that I can process (CSV/JSON).
&lt;/li&gt;&lt;/ol&gt;&lt;p&gt;
For CKAN data API I would suggest something like:
&lt;/p&gt;
&lt;pre class="wiki"&gt;/api/resource_data/RESOURCE_ID?...
&lt;/pre&gt;&lt;p&gt;
or more human readable:
&lt;/p&gt;
&lt;pre class="wiki"&gt;/api/resource_data/PACKAGE_NAME/RESOURCE_NUMBER?...
&lt;/pre&gt;&lt;p&gt;
This will allow others to get only CKAN resources. Moreover, allowing to get only resource data (not any URL data) would allow us to pre-process resources in the future.
&lt;/p&gt;
&lt;p&gt;
First version/implementation: pass each requested resource URL to your proxy service (external, not CKAN related), which determines file by file extension in URL, fail on unknown file or unprocessable file.
&lt;/p&gt;
&lt;p&gt;
/api/resource_data/PACKAGE/RESOURCE?output=jsonp&amp;amp;sheet=1...
&lt;/p&gt;
&lt;p&gt;
would be redirected to (for example):
&lt;/p&gt;
&lt;p&gt;
&lt;a class="ext-link" href="http://1.latest.jsonpdataproxy.appspot.com/?url=RESOURCE[&amp;#34;URL&amp;#34;]&amp;amp;sheet=1"&gt;&lt;span class="icon"&gt;​&lt;/span&gt;http://1.latest.jsonpdataproxy.appspot.com/?url=RESOURCE["URL"]&amp;amp;sheet=1&lt;/a&gt;...
&lt;/p&gt;
&lt;p&gt;
Second version/implementation: Determine file type in advance and pass to appropriate conversion service when requested
&lt;/p&gt;
&lt;p&gt;
If you upload document on scribd or slideshare it gets processed in the background. This can be done in CKAN after any resource change. We do not need to download the file at the moment, however what can be done is:
&lt;/p&gt;
&lt;ol&gt;&lt;li&gt;try a converter by URL file extension
&lt;/li&gt;&lt;li&gt;try a converter by MIME type (content-type header)
&lt;/li&gt;&lt;li&gt;brute-force try all converters
&lt;/li&gt;&lt;/ol&gt;&lt;p&gt;
No need to store copies of files, just store determined file type somewhere in the resource record (as mime type).
&lt;/p&gt;
&lt;p&gt;
Also, it would be nice if any data conversion service would provide output in both - JSON/CSV. Therefore we would be able to have "Download CSV" link directly in CKAN web page for browsing users:
&lt;/p&gt;
&lt;p&gt;
/api/resource_data/PACKAGE/RESOURCE?output=csv...
&lt;/p&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>Stiivi</dc:creator>

      <pubDate>Thu, 09 Dec 2010 01:32:26 GMT</pubDate>
      <title></title>
      <link>http://localhost/ticket/698#comment:4</link>
      <guid isPermaLink="false">http://localhost/ticket/698#comment:4</guid>
      <description>
        &lt;p&gt;
I have created "proof of concept" implementation that will use external data proxy service when accessing:
&lt;/p&gt;
&lt;pre class="wiki"&gt;/api/data/PACKAGE_ID
&lt;/pre&gt;&lt;p&gt;
like:
&lt;/p&gt;
&lt;pre class="wiki"&gt;http://127.0.0.1:5000/api/data/069c80f8-8476-452e-bfd4-0a9077666c14
&lt;/pre&gt;&lt;p&gt;
It just works and requires refactoring to match ckan standards. I would need help from soneone who knows ckan internals better.
&lt;/p&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>Stiivi</dc:creator>

      <pubDate>Thu, 09 Dec 2010 01:48:18 GMT</pubDate>
      <title></title>
      <link>http://localhost/ticket/698#comment:5</link>
      <guid isPermaLink="false">http://localhost/ticket/698#comment:5</guid>
      <description>
        &lt;p&gt;
One more note: it would be good if packages had names/identifiers as well, as referencing internal IDs from outside world is not very good practice - they are quite volatile, mostly in regard to expected objects.
&lt;/p&gt;
&lt;p&gt;
PACKAGE/RESOURCE_REFERENCE
&lt;/p&gt;
&lt;p&gt;
Possible resource references:
&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;'default' - reserved keyword for 'the only one resource' if there is only one, or first resource if there are more or the one with flag 'default'
&lt;/li&gt;&lt;li&gt;'latest' - to be able to access 'latest' resource within package (or 'actual' or 'last'?)
&lt;/li&gt;&lt;li&gt;alphanumeric identifier (not starting with number)
&lt;/li&gt;&lt;li&gt;number - index of resource as human/visitor sees it on page (not the same as "position" attribute - as that one might contain gaps or be different (and it is in some cases)), index of resource should be something like:
&lt;/li&gt;&lt;/ul&gt;&lt;pre class="wiki"&gt;SELECT package_id, id, url, ROW_NUMBER() OVER (PARTITION BY package_id ORDER BY position) AS index FROM package_resource
&lt;/pre&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>Stiivi</dc:creator>

      <pubDate>Fri, 10 Dec 2010 17:21:49 GMT</pubDate>
      <title></title>
      <link>http://localhost/ticket/698#comment:6</link>
      <guid isPermaLink="false">http://localhost/ticket/698#comment:6</guid>
      <description>
        &lt;p&gt;
'draft": &lt;a class="ext-link" href="https://github.com/Stiivi/ckanext-dataapi"&gt;&lt;span class="icon"&gt;​&lt;/span&gt;https://github.com/Stiivi/ckanext-dataapi&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;
requires that the client handles HTTP 302 Redirect correctly.
&lt;/p&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>rgrp</dc:creator>

      <pubDate>Mon, 13 Dec 2010 11:22:52 GMT</pubDate>
      <title>owner, priority changed; milestone set</title>
      <link>http://localhost/ticket/698#comment:7</link>
      <guid isPermaLink="false">http://localhost/ticket/698#comment:7</guid>
      <description>
          &lt;ul&gt;
            &lt;li&gt;&lt;strong&gt;owner&lt;/strong&gt;
              changed from &lt;em&gt;rgrp&lt;/em&gt; to &lt;em&gt;Stiivi&lt;/em&gt;
            &lt;/li&gt;
            &lt;li&gt;&lt;strong&gt;priority&lt;/strong&gt;
                changed from &lt;em&gt;awaiting triage&lt;/em&gt; to &lt;em&gt;critical&lt;/em&gt;
            &lt;/li&gt;
            &lt;li&gt;&lt;strong&gt;milestone&lt;/strong&gt;
                set to &lt;em&gt;ckan-v1.3-sprint-1&lt;/em&gt;
            &lt;/li&gt;
          &lt;/ul&gt;
        &lt;ol&gt;&lt;li&gt;move repo to bitbucket
&lt;/li&gt;&lt;li&gt;clone james proxy code and modify to make google spreadsheets compatible (add a test ...)
&lt;/li&gt;&lt;li&gt;update the ckanext to pass on parameters ....
&lt;/li&gt;&lt;li&gt;Deploy all of this to test.ckan.net
&lt;/li&gt;&lt;li&gt;Rufus: check redirects with javascript
&lt;/li&gt;&lt;/ol&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>Stiivi</dc:creator>

      <pubDate>Fri, 17 Dec 2010 14:36:29 GMT</pubDate>
      <title></title>
      <link>http://localhost/ticket/698#comment:8</link>
      <guid isPermaLink="false">http://localhost/ticket/698#comment:8</guid>
      <description>
        &lt;p&gt;
Here is the fork for (json) data proxy:
&lt;/p&gt;
&lt;p&gt;
&lt;a class="ext-link" href="https://bitbucket.org/Stiivi/dataproxy"&gt;&lt;span class="icon"&gt;​&lt;/span&gt;https://bitbucket.org/Stiivi/dataproxy&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;
I've refactored it and moved transformations into separate modules. For each resource type there should be a module in transform/&amp;lt;type&amp;gt;_transform.py
&lt;/p&gt;
&lt;p&gt;
Each module should implement &lt;tt&gt;&lt;/tt&gt;transform(flow, url, query)&lt;tt&gt;&lt;/tt&gt; and should return a dictionary
as a result.
&lt;/p&gt;
&lt;p&gt;
Existing modules:
&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;transform/csv_transform - CSV files
&lt;/li&gt;&lt;li&gt;transform/xls_transform - Excel XLS files
&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;
if there is no resource_type module, HTTP 200 Error Resource type not supported is returned.
&lt;/p&gt;
&lt;p&gt;
You can override URL file extension or specify type if extension is missing through type= URL option. For example if you have any URL that contains CSV data however the url is just foo.com/data then you can pass: url=&lt;a class="ext-link" href="http://foo.com/data&amp;amp;type=csv"&gt;&lt;span class="icon"&gt;​&lt;/span&gt;http://foo.com/data&amp;amp;type=csv&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;
Note: Source refactored/updated in example/dataproxy, being tested by running locally localhost:8000.
&lt;/p&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>Stiivi</dc:creator>

      <pubDate>Sun, 19 Dec 2010 17:56:08 GMT</pubDate>
      <title></title>
      <link>http://localhost/ticket/698#comment:9</link>
      <guid isPermaLink="false">http://localhost/ticket/698#comment:9</guid>
      <description>
        &lt;p&gt;
pushed parameter passing; change handling of unknown reply type on proxy side: do not raise exception, but reply with 200 Error - unkown reply type, use json/jsonp
&lt;/p&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>Stiivi</dc:creator>

      <pubDate>Mon, 20 Dec 2010 15:22:47 GMT</pubDate>
      <title>attachment set</title>
      <link>http://localhost/ticket/698</link>
      <guid isPermaLink="false">http://localhost/ticket/698</guid>
      <description>
          &lt;ul&gt;
            &lt;li&gt;&lt;strong&gt;attachment&lt;/strong&gt;
                set to &lt;em&gt;ckan-dataapi.png&lt;/em&gt;
            &lt;/li&gt;
          &lt;/ul&gt;
        &lt;p&gt;
Data API through remote data proxy
&lt;/p&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>anonymous</dc:creator>

      <pubDate>Mon, 27 Dec 2010 17:56:53 GMT</pubDate>
      <title></title>
      <link>http://localhost/ticket/698#comment:10</link>
      <guid isPermaLink="false">http://localhost/ticket/698#comment:10</guid>
      <description>
        &lt;p&gt;
Data proxy documentation: &lt;a class="ext-link" href="http://democracyfarm.org/dataproxy/api.html"&gt;&lt;span class="icon"&gt;​&lt;/span&gt;http://democracyfarm.org/dataproxy/api.html&lt;/a&gt; (included in sources)
&lt;/p&gt;
&lt;p&gt;
Updated ('s' as in structured) data proxy app: &lt;a class="ext-link" href="http://sdataproxy.appspot.com"&gt;&lt;span class="icon"&gt;​&lt;/span&gt;http://sdataproxy.appspot.com&lt;/a&gt;
&lt;/p&gt;
      </description>
      <category>Ticket</category>
    </item><item>
      
        <dc:creator>rgrp</dc:creator>

      <pubDate>Wed, 29 Dec 2010 19:10:15 GMT</pubDate>
      <title>status changed; resolution set</title>
      <link>http://localhost/ticket/698#comment:11</link>
      <guid isPermaLink="false">http://localhost/ticket/698#comment:11</guid>
      <description>
          &lt;ul&gt;
            &lt;li&gt;&lt;strong&gt;status&lt;/strong&gt;
                changed from &lt;em&gt;new&lt;/em&gt; to &lt;em&gt;closed&lt;/em&gt;
            &lt;/li&gt;
            &lt;li&gt;&lt;strong&gt;resolution&lt;/strong&gt;
                set to &lt;em&gt;fixed&lt;/em&gt;
            &lt;/li&gt;
          &lt;/ul&gt;
        &lt;p&gt;
This ticket is complete:
&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;ckanext-dataapi: working /api/data/{resource-id} with tests
&lt;/li&gt;&lt;li&gt;&lt;a class="ext-link" href="https://bitbucket.org/okfn/dataproxy"&gt;&lt;span class="icon"&gt;​&lt;/span&gt;https://bitbucket.org/okfn/dataproxy&lt;/a&gt; - the dataproxy code running at &lt;a class="ext-link" href="http://jsonpdataproxy.appspot.com"&gt;&lt;span class="icon"&gt;​&lt;/span&gt;http://jsonpdataproxy.appspot.com&lt;/a&gt;
&lt;ul&gt;&lt;li&gt;functioning but needs tests and improvements
&lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;
There a whole bunch of improvements to be done but these will be in &lt;a class="closed ticket" href="http://localhost/ticket/888" title="enhancement: Improvements to the dataproxy and the data API (closed: fixed)"&gt;ticket:888&lt;/a&gt;
&lt;/p&gt;
      </description>
      <category>Ticket</category>
    </item>
 </channel>
</rss>