Ticket #851 (closed defect: fixed)

Opened 2 years ago

Last modified 2 years ago

Link Checking

Reported by: wwaites Owned by: wwaites
Priority: major Milestone: ckan-v1.3-sprint-1
Component: DGU Keywords:
Cc: Repository:
Theme:

Description

revisit link checker from  http://knowledgeforge.net/ckan/ckanext/file/tip/ckanext/link_checker.py

revisit ollyc's parallel work

look at how this gets used in practice (maybe deprecate in favour of curate tool (suitably wrapped to hide details from the user)

Change History

Changed 2 years ago by wwaites

  • status changed from new to assigned
  • link checker above uses the queue. queue not running generally
  • quickest way forward is just to put the curate tool in a cron job and make a suitable rule. shall do this soonest

Changed 2 years ago by wwaites

  • status changed from assigned to closed
  • resolution set to fixed

currently running against ckan.net, adding broken_link tag if a broken link is found. perhaps something more elaborate should be done? works for now anyhow...

Changed 2 years ago by wwaites

  • status changed from closed to reopened
  • resolution fixed deleted

urllib2 is good for http(s) urls but not, unfortunately, for other types most prominently ftp.

change the httpReq action to use  http://curl.haxx.se/libcurl/python/

Changed 2 years ago by wwaites

  • status changed from reopened to closed
  • resolution set to fixed

Implemented curlReq that does a curl request and returns statements similar to or analogous to httpReq. Require curate<=0.8

This in a cron job is sufficient to go through all the packages and update them with a broken link tag now:

curate -r https://github.com/wwaites/curate/raw/master/examples/tagging.n3 -s -k API_KEY  
Note: See TracTickets for help on using tickets.