Custom Query (2152 matches)

Filters
 
Or
 
  
 
Columns

Show under each result:


Results (2107 - 2109 of 2152)

Ticket Resolution Summary Owner Reporter
#2876 fixed Admin Config changes are not forced toby toby

Reported by toby, 21 months ago.

Description

need to make sure these update everywhere when done

#318 fixed Insufficient validation of resource URIs johnglover wwaites

Reported by wwaites, 4 years ago.

Description

The CKAN instance on data.gov.uk serves invalid URIs out of its API.

For example the following can be found,

http://uk.sitestat.com/homeoffice/rds/s?rds.hosb0509tabsxls&ns_type=pdf&ns_url=[http://www.homeoffice.gov.uk/rds/pdfs09/hosb0509tabs.xls]

In this URI, the : and / characters after the ? in the query part are invalid according to section 3.4 of RFC2396

Also URIs are not stripped of whitespace at the end.

This causes problems when other software with a more correct interpretation of what a valid URI is attempts to consume data from CKAN. In this instance the Talis triplestore complains about such URIs.

"Be liberal in what you accept and conservative in what you send" would seem apt.

Actions

  • Validation of urls as part of form entry or data loading
    • Need to consider situation where this should happen out-of-band (i.e. we allow load even with invalid data and then flag bad dates in separate validation process). In general doubtful that we should do this here because url invalidity is such a big deal
  • This code should support analysis of existing data so we can go through existing database and find invalid urls
    • Also useful to have this so we can do out of band validation
#433 fixed Data package metadata in the Egg wwaites

Reported by wwaites, 4 years ago.

Description

Still not sure if we shouldn't use the existing setuptools machinery to manage this -- there is already a way to get at the metadata. In any event, I've just made an addition to datapkg that makes it possible to put datapkg_sources as a dictionary in your package's setup.py. Afterwards it is possible to pull the metadata out of the egg. Of course this could easily be changed to save the information in whatever form, indeed if you pass it a string instead of a dictionary it will just write whatever you gave it into the datapkg_sources.spec. The point is, I think that the egg is a good place to stuff this information.

For non-python users, it is always possible to simply put up the datapkg_sources.spec somewhere on the web so they can directly retrieve the data files.

From the docstring::

    This is the implementation for an [egg_info.writers] entrypoint.
    Datapkg adds an argument to setuptools's setup() function called
    datapkg_sources. The argument should be a dictionary of the form:

    .. code-block:: python

        setup(
            ...,
            datapkg_sources = {
                "cra2009" : "http://www.hm-treasury.gov.uk/d/cra_2009_db.csv"
            }
        )

    The result of this is that there will be a file in the egg called
    datapkg_sources.spec that looks like this::

        [sources]
        cra2009=http://www.hm-treasury.gov.uk/d/cra_2009_db.csv

    How do you get at this data? Simple::

    .. code-block:: python

        import pkg_resources
        dist = pkg_resources.get_distribution("ukgov_treasury_cra")
        spec = dist.get_metadata("datapkg_sources.spec")

    and 'spec' will be the contents of the file as a string.
Note: See TracQuery for help on using queries.