Ticket #1737 (closed enhancement: fixed)

Opened 2 years ago

Last modified 23 months ago

Expose solr-based search API

Reported by: rgrp Owned by: icmurray
Priority: major Milestone: ckan-v1.8
Component: ckan Keywords: [0.5d]
Cc: rgrp Repository: ckan
Theme: none

Description (last modified by rgrp) (diff)

Super ticket: #1745

Required for some improvements to UX (such as autocomplete and better search).

Change History

comment:1 Changed 2 years ago by rgrp

  • Keywords 1d added

comment:2 Changed 2 years ago by rgrp

  • Description modified (diff)

comment:3 Changed 2 years ago by rgrp

  • Milestone changed from ckan-sprint-2012-02-20 to ckan-v1.7

comment:4 Changed 2 years ago by rgrp

  • Status changed from new to assigned
  • Milestone changed from ckan-v1.7 to current-ckan-sprint-2012-02-20

comment:5 Changed 2 years ago by rgrp

  • Keywords [1d] added; 1d removed

comment:6 Changed 2 years ago by rgrp

  • Owner changed from zephod to rgrp

comment:7 Changed 2 years ago by dread

SOLR syntax is already accepted in:

  • api/3/search/package
  • Action API "package_search"

Actually api/1/search/package and api/1/search/package accept SOLR syntax too, but they also translate old-CKAN search parameters syntax to SOLR as well.

See: http://docs.ckan.org/en/latest/apiv3.html

comment:8 Changed 2 years ago by rgrp

  • Milestone changed from current-ckan-sprint-2012-03-05 to ckan-v1.7

Did not get to this sprint as focused on #1797.

comment:9 Changed 2 years ago by icmurray

  • Keywords [1-2d] added; [1d] removed
  • Owner changed from rgrp to icmurray

See http://ckan.okfnpad.org/feature-1737-expose-solr-based-search-api

Immediate actions

  • analysis of whether the current action/get.py:package_search() function exposes all we currently need for the use cases described above.
  • how to return that data (expand the current format?)
  • analysis of whether the current (action) API v.3 can use query parameters rather than as well as POSTed data.

(controlles/api.py:action() ) uses "self._get_request_data()" which in-turn pulls out data from POST body.

Tasks

Extend the existing package_search action.

  • pass the facet fields into the logic layer from the request parameters.
    • /api/3/action/package_search
  • facet information not being returned via package_search action (is empty).
  • whitelist any GETable api actions, and optionally construct the query from url params rather than body

comment:10 Changed 2 years ago by icmurray

  • Milestone changed from ckan-v1.7 to current-ckan-sprint-2012-04-30

comment:11 Changed 2 years ago by icmurray

  • Status changed from assigned to closed
  • Resolution set to worksforme

Facet results are already available through the package_search action api. The facet fields need to be specified in the query, otherwise no faceting will be done (ie - the default facets specified in the .ini file are not used).

This leaves "whitelist any GETable api actions, and optionally construct the query from url params rather than body" todo, which I've pulled out into another ticket, as it wasn't originally in this ticket. (#2330)

comment:12 Changed 2 years ago by icmurray

... just to add to the above:

>>> pprint(json.loads(requests.post('http://127.0.0.1:8088/api/3/action/package_search', json.dumps({'facet.field': ["tags", "groups"]})).content))

<snip>


u'search_facets': {u'groups': {u'items': [{u'count': 1,
                                                        u'display_name': u"Roger's books",
                                                        u'name': u'roger'},
                                                       {u'count': 2,
                                                        u'display_name': u"Dave's books",
                                                        u'name': u'david'}],
                                            u'title': u'groups'},
                                u'tags': {u'items': [{u'count': 1,
                                                      u'display_name': u'tolstoy',
                                                      u'name': u'tolstoy'},
                                                     {u'count': 2,
                                                      u'display_name': u'russian',
                                                      u'name': u'russian'},
                                                     {u'count': 2,
                                                      u'display_name': u'Flexible \u30a1',
                                                      u'name': u'Flexible \u30a1'}],
                                          u'title': u'tags'}}},
 u'success': True}

comment:13 Changed 2 years ago by rgrp

  • Status changed from closed to reopened
  • Resolution worksforme deleted
  • Milestone changed from ckan-sprint-2012-04-30 to current-ckan-sprint-2012-05-15

Having now thought about this I'm re-opening this ticket for the following reasons:

  • No real documentation (other than that in this ticket yet available)
    • It would also be nice to know how this maps to SOLR API (can i use all of the facet options solr provides or not ...?)
    • And I would again emphasize my preference for having *direct* access to something that looks *exactly* like SOLR API as I can then use client and docs from SOLR to work with it.
  • No clear resolution of separation between Action and REST API (and search API). Really seems to me there should be convergence between latter 2 (as suggested in the ticket) -- this would also resolve the problem that having GET /api/dataset return all datasets is *not* a great idea
  • The Action API requires a POST request. Since the primary purpose of the search API would be usage from JS it would be nice if GET and JSONP were supported. (Though given our CORS support we could argue this was optional).

Not saying *all* of this needs fixing but some clear approach here would be useful

comment:14 Changed 2 years ago by icmurray

  • Keywords [0.5d] added; [1-2d] removed
  • Milestone changed from ckan-sprint-2012-05-15 to current-ckan-sprint-2012-05-29

comment:15 Changed 2 years ago by icmurray

  • we won't be providing direct access to solr api, as we think the cost outweighs the benefit.

comment:16 Changed 2 years ago by icmurray

  • Cc rgrp added

comment:17 Changed 2 years ago by rgrp

I *strongly* disagree re access to solr API -- i don't really care if it is direct but I want something that looks like it at least for core query parameters and facets ...

Is there some major issue around security etc (e.g. limiting to only public datasets or similar?)

comment:18 Changed 2 years ago by icmurray

For completeness, package_search docs can be viewed at https://github.com/okfn/ckan/blob/master/ckan/logic/action/get.py#L983

@rgrp: are you happy with this ticket now, does it expose all you need? If so, I'll close it...

comment:19 Changed 2 years ago by ross

  • Milestone changed from current-ckan-sprint-2012-05-29 to ckan-v1.8

comment:20 Changed 2 years ago by rgrp

Let's close. What would be nice though is ETA on the GET support getting deployed on the DataHub? -- just tried using it today and realized it didn't work :-)

comment:21 Changed 23 months ago by icmurray

  • Status changed from reopened to closed
  • Resolution set to fixed

The GET-able actions ( #2330 ) are in master, and will make it into 1.8

comment:22 follow-up: ↓ 23 Changed 23 months ago by rgrp

But not v1.7.1? (When is v1.8 due?).

Also for the record a couple of things I found when trying to use this:

  • No support for facet sort order or facet limit afaict ...

comment:23 in reply to: ↑ 22 ; follow-up: ↓ 24 Changed 23 months ago by icmurray

Replying to rgrp:

But not v1.7.1? (When is v1.8 due?).

Not 1.7.1 as it's not a bug fix (https://docs.google.com/document/d/170fxET3kd9dJ4L6VAj3yZugtK0rrVe44J4HuLbTUsEU/)

I don't know when 1.8 is due.

Also for the record a couple of things I found when trying to use this:

  • No support for facet sort order or facet limit afaict ...

Thanks, that's good to know.

facet.limit should be working [1], so if it's not, then that's a bug. I can check that.

for facet.sort, I've added ticket #2543

[1] https://github.com/okfn/ckan/blob/master/ckan/logic/action/get.py#L1022

comment:24 in reply to: ↑ 23 Changed 23 months ago by icmurray

Replying to icmurray:

Also for the record a couple of things I found when trying to use this:

  • No support for facet sort order or facet limit afaict ...

Thanks, that's good to know.

facet.limit should be working [1], so if it's not, then that's a bug. I can check that.

facet.limit is working as I'd expect. eg (on master):

import requests
import json
from pprint import pprint as pp
pp(json.loads(requests.get('http://ian-laptop:5000/api/3/action/package_search?q=data&facet.field=tags&facet.limit=1').content))

Returns a result where only the tag with the highest count is returned in the search_facets result dict.

One thing I did spot though was that we aren't able to specify per-field parameters. ie whilst facet.limit sets the facet limit for all facet fields, solr allows that to be overridden on a per-field basis, eg facet.tags.limit. This isn't something we support at the moment. I've created a ticket for this, #2573 . Let me know if this something that you'd find useful, otherwise I'll leave it on the backlog for a future iteration.

Note: See TracTickets for help on using tickets.