Ticket #433 (closed defect: fixed)
Data package metadata in the Egg
| Reported by: | wwaites | Owned by: | |
|---|---|---|---|
| Priority: | awaiting triage | Milestone: | datapkg-0.8 |
| Component: | Keywords: | ||
| Cc: | Repository: | ||
| Theme: |
Description
Still not sure if we shouldn't use the existing setuptools machinery to manage this -- there is already a way to get at the metadata. In any event, I've just made an addition to datapkg that makes it possible to put datapkg_sources as a dictionary in your package's setup.py. Afterwards it is possible to pull the metadata out of the egg. Of course this could easily be changed to save the information in whatever form, indeed if you pass it a string instead of a dictionary it will just write whatever you gave it into the datapkg_sources.spec. The point is, I think that the egg is a good place to stuff this information.
For non-python users, it is always possible to simply put up the datapkg_sources.spec somewhere on the web so they can directly retrieve the data files.
From the docstring::
This is the implementation for an [egg_info.writers] entrypoint.
Datapkg adds an argument to setuptools's setup() function called
datapkg_sources. The argument should be a dictionary of the form:
.. code-block:: python
setup(
...,
datapkg_sources = {
"cra2009" : "http://www.hm-treasury.gov.uk/d/cra_2009_db.csv"
}
)
The result of this is that there will be a file in the egg called
datapkg_sources.spec that looks like this::
[sources]
cra2009=http://www.hm-treasury.gov.uk/d/cra_2009_db.csv
How do you get at this data? Simple::
.. code-block:: python
import pkg_resources
dist = pkg_resources.get_distribution("ukgov_treasury_cra")
spec = dist.get_metadata("datapkg_sources.spec")
and 'spec' will be the contents of the file as a string.

So one more modification, the datapkg_sources argument is no longer a dictionary but a string, similar to the metadata.txt but with the name as the section heading rather than [DEFAULT] so as to be able to support more than one download. As of now, this works::
This class treats an installed python package as a data index. For instructions on creating such a package, what needs to go in its setup.py and such, see :func:`datapkg.pypkgtools.datapkg_sources`. Here we are concerned with how to use such a package. An example of one such package can be installed like so:: % pip install hg+http://bitbucket.org/ww/ukgov_treasury_cra Once installed, datapkg can be used to inspect it and install parts wherever desired:: % datapkg list egg://ukgov_treasury_cra cra2009 -- Country and Regional Analysis 2009 % datapkg install egg://ukgov_treasury_cra/cra2009 file:///tmp [...] % ls -l /tmp/cra2009/ total 11112 -rw-r--r-- 1 ww wheel 5681852 May 12 15:48 cra_2009_db.csv -rw-r--r-- 1 ww wheel 292 Aug 17 22:37 metadata.txtOf course the related python code and machinery hasn't been ported over to that package yet, but that's quite another matter.