Custom Query (2152 matches)

Filters
 
Or
 
  
 
Columns

Show under each result:


Results (37 - 39 of 2152)

3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
Ticket Resolution Summary Owner Reporter
#1616 fixed Catch exceptions when rebuilding the search index amercader amercader

Reported by amercader, 2 years ago.

Description

Right now if an exception is found while reindexing, the whole process stops and the remaining datasets are left out of the index. The process should continue after logging the exception. If more than a certain number of exceptions occur in a row, the process should stop.

#1640 fixed Setup publicdata.eu harvester for Serbian CKAN datasets amercader dread

Reported by dread, 2 years ago.

Description

Set-up publicdata.eu to harvest datasets at rs.ckan.net (Serbian community CKAN).

#1641 fixed ckanext-archiver: Content-length header not reliable to check if resource has been modified amercader amercader

Reported by amercader, 2 years ago.

Description

The download task in ckanext-archiver performs a HEAD request on the resource URL and checks if the "Content-Type" and "Content-Length" headers differ from the values stored to see if the resource needs to be updated [1].

The "Content-Length" header, although widely used, is not mandatory and some servers don't provide it, e.g.:

$ curl -I http://portfolio.theglobalfund.org/en/IATI/Activities?countryCode=AFG
HTTP/1.1 200 OK
Cache-Control: private
Transfer-Encoding: chunked
Content-Type: text/xml
Vary: Accept-Encoding
Server: Microsoft-IIS/7.5
Set-Cookie: ASP.NET_SessionId=3qhqekddgmre0kmk5cynq0sy; path=/; HttpOnly
X-AspNetMvc-Version: 3.0
content-disposition: attachment; filename=AFG_IATI_12012012.xml
X-AspNet-Version: 4.0.30319
X-Powered-By: ASP.NET
Date: Thu, 12 Jan 2012 12:36:43 GMT

Also worth noting that requests, the python library that uses ckanext-archiver, sets an "Accept-Encoding: gzip" header by default, which depending on the configuration of the remote web server, may prevent the "Content-Length" server from being sent, e.g.:

$ curl -H "Accept-Encoding: gzip" -I http://iatistandard.org/published-temp/adb-activities.xml
HTTP/1.1 200 OK
Date: Thu, 12 Jan 2012 12:12:46 GMT
Server: Apache
Last-Modified: Mon, 28 Nov 2011 15:55:35 GMT
Accept-Ranges: bytes
Vary: Accept-Encoding
Content-Encoding: gzip
Content-Type: application/xml

curl -I http://iatistandard.org/published-temp/adb-activities.xml
HTTP/1.1 200 OK
Date: Thu, 12 Jan 2012 11:56:23 GMT
Server: Apache
Last-Modified: Mon, 28 Nov 2011 15:55:35 GMT
Accept-Ranges: bytes
Content-Length: 2686720
Vary: Accept-Encoding
Content-Type: application/xml

All this can lead to some resources never getting updated, and of course the size property of the resource not being set.

As we need to download the resource anyway, it would be better to check if the real length of the data has been modified (and store it).

[1] https://github.com/okfn/ckanext-archiver/blob/0a189262dca4ab5b286fb6a02b4ab8a201f639f3/ckanext/archiver/tasks.py#L72

3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
Note: See TracQuery for help on using queries.