Ticket #851 (closed defect: fixed)

Opened 3 years ago

Last modified 3 years ago

Link Checking

Reported by: wwaites Owned by: wwaites
Priority: major Milestone: ckan-v1.3-sprint-1
Component: DGU Keywords:
Cc: Repository:
Theme:

Description

revisit link checker from http://knowledgeforge.net/ckan/ckanext/file/tip/ckanext/link_checker.py

revisit ollyc's parallel work

look at how this gets used in practice (maybe deprecate in favour of curate tool (suitably wrapped to hide details from the user)

Change History

comment:1 Changed 3 years ago by wwaites

  • Status changed from new to assigned
  • link checker above uses the queue. queue not running generally
  • quickest way forward is just to put the curate tool in a cron job and make a suitable rule. shall do this soonest

comment:3 Changed 3 years ago by wwaites

  • Status changed from assigned to closed
  • Resolution set to fixed

currently running against ckan.net, adding broken_link tag if a broken link is found. perhaps something more elaborate should be done? works for now anyhow...

comment:4 Changed 3 years ago by wwaites

  • Status changed from closed to reopened
  • Resolution fixed deleted

urllib2 is good for http(s) urls but not, unfortunately, for other types most prominently ftp.

change the httpReq action to use http://curl.haxx.se/libcurl/python/

comment:5 Changed 3 years ago by wwaites

  • Status changed from reopened to closed
  • Resolution set to fixed

Implemented curlReq that does a curl request and returns statements similar to or analogous to httpReq. Require curate<=0.8

This in a cron job is sufficient to go through all the packages and update them with a broken link tag now:

curate -r https://github.com/wwaites/curate/raw/master/examples/tagging.n3 -s -k API_KEY  
Note: See TracTickets for help on using tickets.