Ticket #433 (closed defect: fixed)

Opened 4 years ago

Last modified 4 years ago

Data package metadata in the Egg

Reported by: wwaites Owned by:
Priority: awaiting triage Milestone: datapkg-0.8
Component: Keywords:
Cc: Repository:


Still not sure if we shouldn't use the existing setuptools machinery to manage this -- there is already a way to get at the metadata. In any event, I've just made an addition to datapkg that makes it possible to put datapkg_sources as a dictionary in your package's setup.py. Afterwards it is possible to pull the metadata out of the egg. Of course this could easily be changed to save the information in whatever form, indeed if you pass it a string instead of a dictionary it will just write whatever you gave it into the datapkg_sources.spec. The point is, I think that the egg is a good place to stuff this information.

For non-python users, it is always possible to simply put up the datapkg_sources.spec somewhere on the web so they can directly retrieve the data files.

From the docstring::

    This is the implementation for an [egg_info.writers] entrypoint.
    Datapkg adds an argument to setuptools's setup() function called
    datapkg_sources. The argument should be a dictionary of the form:

    .. code-block:: python

            datapkg_sources = {
                "cra2009" : "http://www.hm-treasury.gov.uk/d/cra_2009_db.csv"

    The result of this is that there will be a file in the egg called
    datapkg_sources.spec that looks like this::


    How do you get at this data? Simple::

    .. code-block:: python

        import pkg_resources
        dist = pkg_resources.get_distribution("ukgov_treasury_cra")
        spec = dist.get_metadata("datapkg_sources.spec")

    and 'spec' will be the contents of the file as a string.

Change History

comment:1 Changed 4 years ago by wwaites

So one more modification, the datapkg_sources argument is no longer a dictionary but a string, similar to the metadata.txt but with the name as the section heading rather than [DEFAULT] so as to be able to support more than one download. As of now, this works::

    This class treats an installed python package as a data
    index. For instructions on creating such a package, what
    needs to go in its setup.py and such, see 
    :func:`datapkg.pypkgtools.datapkg_sources`. Here we are
    concerned with how to use such a package.

    An example of one such package can be installed like so::

        % pip install hg+http://bitbucket.org/ww/ukgov_treasury_cra

    Once installed, datapkg can be used to inspect it and 
    install parts wherever desired::

        % datapkg list egg://ukgov_treasury_cra
        cra2009 -- Country and Regional Analysis 2009
        % datapkg install egg://ukgov_treasury_cra/cra2009 file:///tmp
        % ls -l /tmp/cra2009/ 
        total 11112
        -rw-r--r--  1 ww  wheel  5681852 May 12 15:48 cra_2009_db.csv
        -rw-r--r--  1 ww  wheel      292 Aug 17 22:37 metadata.txt

Of course the related python code and machinery hasn't been ported over to that package yet, but that's quite another matter.

comment:2 Changed 4 years ago by wwaites

changed datapkg_sources to datapkg_index and updated to work with the new changes to how the downloader works.

comment:3 Changed 4 years ago by wwaites

  • Status changed from new to closed
  • Resolution set to fixed
Note: See TracTickets for help on using tickets.