Ticket #1154 (closed enhancement: fixed)

Opened 3 years ago

Last modified 3 years ago

Make ckan robust against solr failure

Reported by: nils.toedtmann Owned by: johnglover
Priority: major Milestone: ckan-sprint-2011-10-28
Component: ckan Keywords: search
Cc: pudo Repository: ckanext-solr
Theme: none

Description (last modified by johnglover) (diff)

According to pudo, a ckan with activated solr extension throws a 5xx when solr is unreachable. Instead, it should behave more like a ckan without ckanext-solr when this happens.

Change History

comment:1 Changed 3 years ago by johnglover

  • Status changed from new to assigned
  • Cc sebbacon removed
  • Priority changed from awaiting triage to major
  • Owner set to johnglover
  • Milestone set to ckan-current-sprint
  • Keywords search added

comment:2 Changed 3 years ago by johnglover

  • Status changed from assigned to closed
  • Resolution set to fixed

Fixed in core (branch feature-1275-solr-search).

comment:3 Changed 3 years ago by dread

  • Status changed from closed to reopened
  • Resolution fixed deleted

Looking at the changeset, it now does a connection check at start-up of CKAN. If it fails then CKAN runs but doesn't show any packages, apart from via the Tags interface, and the search page and widget on the front-page are removed. (BTW Is this a useful thing to do? Without search, perhaps CKAN should simply not start?)

I believe the problem pudo is referring to is temporary periods when the SOLR server is slow/overloaded/down. During this time pudo and I get exception emails saying connection failed for request. e.g.

Module ckan.controllers.home:29 in index
<<          query = query_for(model.Package)
               query.run(query='*:*', facet_by=g.facets,
                         limit=0, offset=0, username=c.user)
               c.facets = query.facets
               c.fields = []
>>  limit=0, offset=0, username=c.user)
Module ckan.lib.search.common:115 in run
<<          self.query = QueryParser(query, terms, fields)
               self.query.validate()
               self._run()
               self._format_results()
               return {'results': self.results, 'count': self.count}
>>  self._run()
Module ckanext.solr.backend:63 in _run
<<              # this wrapping will be caught further up in the WUI.
                   log.exception(e)
                   raise SearchError(e)
               finally:
                   conn.close()
>>  raise SearchError(e)
SearchError: [Errno 111] Connection refused
}}} (recently from nl.ckan.net)

comment:4 Changed 3 years ago by johnglover

  • Description modified (diff)

Looking at the changeset, it now does a connection check at start-up of CKAN. If it fails then CKAN runs but doesn't show any packages, apart from via the Tags interface, and the search page and widget on the front-page are removed. (BTW Is this a useful thing to do? Without search, perhaps CKAN should simply not start?)

When discussing the ticket we agreed that we should allow ckan to start without search. It is possible that people simply want to list/view information without searching for it and we decided not to prevent them from doing so.

I believe the problem pudo is referring to is temporary periods when the SOLR server is slow/overloaded/down. During this time pudo and I get exception emails saying connection failed for request.

Ok, other options then:

  • if an error occurs during search, return 0 results
  • if an error occurs during search, redirect to a generic 'search is unavailable' page
  • if an error occurs during search, retry the query a certain number of times before giving up and doing one of the above.

What do you suggest?

comment:5 Changed 3 years ago by dread

Search requests - option one seems like a bad user experience, and option three seems too complicated. I should do option 2, explain it may well be a temporary problem, and let the user press reload, as per any normal web request.

A more difficult problem though is what to do about a failed search indexing. Because of the try/except catch we've had until the patch yesterday, we get no exception emails for this, but I'm worried that this has caused packages to not be indexed, and hence are essentially lost in CKAN (unless someone realises there is a problem and does a reindex at some point).

BTW I added paster command 'search-index check' for looking at this problem, but I don't think it was ever added to solr. Is it easy to add get_all_entity_ids for the purpose of checking for this problem?

comment:6 Changed 3 years ago by johnglover

Ok I'll implement option 2 for search query issues.

Yes it is easy to get a list of IDs from Solr so I'll have a look at that command and make sure that it works. I think a case could be made for not allowing a package to be added if it cannot be indexed however, and making the user simply retry the request.

comment:7 Changed 3 years ago by dread

Yes, forcing a retry may be possible, although I guess on these occasions you'd be looking at rolling back the creation of the package object too, which is not really in the spirit of the notifications system at present. This is why we had the queue for solr for occasions like this - is it here no longer?

comment:8 Changed 3 years ago by johnglover

Not in core, there is a queue feature in the extension still which I could add to core. I think though it was a case of using either the queue or synchronous search, I don't think it would automatically fall back to the queue if there was an indexing problem, but we could look at that. Was there not talk of changing the queue system recently though? There are also no tests for the Solr queue so that would take a bit of work as well.

comment:9 Changed 3 years ago by johnglover

RE search queries failing: Just having a look through the controller code and the package search controller and the search api both perform option 2 already, so I'll just update the home controller.

comment:10 Changed 3 years ago by dread

Ok, I chatted to David about this and we feel it should go like this: notification to search indexer fails and raises an exception, this gets passed through the notification system back up to the code that creates the package, and you do a roll back. Does this tally with what you mentioned in the last comment or is what's there better?

Thanks for the info on the queue and ckanext-solr. It sounds like this isn't deployed successfully anywhere (or does pudo know if it is?). In this case we should be clear that it is not what it says it is: "This extension adds support for package search using Apache Solr to CKAN" and maybe simply mark it as deprecated. Pudo what do you think?

comment:11 Changed 3 years ago by johnglover

Ok, I chatted to David about this and we feel it should go like this: notification to search indexer fails and raises an exception, this gets passed through the notification system back up to the code that creates the package, and you do a roll back. Does this tally with what you mentioned in the last comment or is what's there better?

This sounds good to me, definitely better than what is currently there.

Thanks for the info on the queue and ckanext-solr. It sounds like this isn't deployed successfully anywhere (or does pudo know if it is?).

I'm not actually sure if the queue works or not, but if it does I'll have to write some tests for it before we could add it to core. A working queue system would be useful generally however, as there are other things that it would be useful for (such as QA), but yes, going forward the Solr extension should really be deprecated.

comment:12 Changed 3 years ago by johnglover

  • Status changed from reopened to closed
  • Resolution set to fixed

All attempts to index packages should now throw a SearchIndexError? if Solr is unavailable and the create/update will be rolled back. These errors are caught in the main package controller (redirecting to an error page) and the API controller. I have added tests for both controllers.

Also updated the paster 'search-index check' command to work with solr.

Note: See TracTickets for help on using tickets.