Ticket #851 (closed defect: fixed)
Link Checking
Reported by: | wwaites | Owned by: | wwaites |
---|---|---|---|
Priority: | major | Milestone: | ckan-v1.3-sprint-1 |
Component: | DGU | Keywords: | |
Cc: | Repository: | ||
Theme: |
Description
revisit link checker from http://knowledgeforge.net/ckan/ckanext/file/tip/ckanext/link_checker.py
revisit ollyc's parallel work
look at how this gets used in practice (maybe deprecate in favour of curate tool (suitably wrapped to hide details from the user)
Change History
comment:1 Changed 3 years ago by wwaites
- Status changed from new to assigned
- link checker above uses the queue. queue not running generally
- quickest way forward is just to put the curate tool in a cron job and make a suitable rule. shall do this soonest
comment:2 Changed 3 years ago by wwaites
Ready to put into cron job. cf:
- http://groups.google.com/group/fuxi-discussion/browse_thread/thread/47f131fc2e3817e3 (Actions)
- http://groups.google.com/group/fuxi-discussion/browse_thread/thread/bf955620a6ae77d8 (denoted/calculated functions)
- http://groups.google.com/group/fuxi-discussion/browse_thread/thread/71a94191e9fef384 (FuXi? 1.2)
- https://github.com/wwaites/curate/commit/042a96c1589c0fa4980aca733c64c080e02f111e (curate tool update)
comment:3 Changed 3 years ago by wwaites
- Status changed from assigned to closed
- Resolution set to fixed
currently running against ckan.net, adding broken_link tag if a broken link is found. perhaps something more elaborate should be done? works for now anyhow...
comment:4 Changed 3 years ago by wwaites
- Status changed from closed to reopened
- Resolution fixed deleted
urllib2 is good for http(s) urls but not, unfortunately, for other types most prominently ftp.
change the httpReq action to use http://curl.haxx.se/libcurl/python/
comment:5 Changed 3 years ago by wwaites
- Status changed from reopened to closed
- Resolution set to fixed
Implemented curlReq that does a curl request and returns statements similar to or analogous to httpReq. Require curate<=0.8
This in a cron job is sufficient to go through all the packages and update them with a broken link tag now:
curate -r https://github.com/wwaites/curate/raw/master/examples/tagging.n3 -s -k API_KEY