Ticket #1737 (closed enhancement: fixed)
Expose solr-based search API
Reported by: | rgrp | Owned by: | icmurray |
---|---|---|---|
Priority: | major | Milestone: | ckan-v1.8 |
Component: | ckan | Keywords: | [0.5d] |
Cc: | rgrp | Repository: | ckan |
Theme: | none |
Description (last modified by rgrp) (diff)
Super ticket: #1745
- Convert /api/rest/dataset to be search query (i.e. take arguments in ?....)
- Directly expose solr though may want to override limit. See https://github.com/okfn/openspending/blob/master/openspending/ui/controllers/api.py#L48
Required for some improvements to UX (such as autocomplete and better search).
Change History
comment:4 Changed 2 years ago by rgrp
- Status changed from new to assigned
- Milestone changed from ckan-v1.7 to current-ckan-sprint-2012-02-20
comment:7 Changed 2 years ago by dread
SOLR syntax is already accepted in:
- api/3/search/package
- Action API "package_search"
Actually api/1/search/package and api/1/search/package accept SOLR syntax too, but they also translate old-CKAN search parameters syntax to SOLR as well.
comment:8 Changed 2 years ago by rgrp
- Milestone changed from current-ckan-sprint-2012-03-05 to ckan-v1.7
Did not get to this sprint as focused on #1797.
comment:9 Changed 2 years ago by icmurray
- Keywords [1-2d] added; [1d] removed
- Owner changed from rgrp to icmurray
See http://ckan.okfnpad.org/feature-1737-expose-solr-based-search-api
Immediate actions
- analysis of whether the current action/get.py:package_search() function exposes all we currently need for the use cases described above.
- how to return that data (expand the current format?)
- analysis of whether the current (action) API v.3 can use query parameters rather than as well as POSTed data.
(controlles/api.py:action() ) uses "self._get_request_data()" which in-turn pulls out data from POST body.
Tasks
Extend the existing package_search action.
- pass the facet fields into the logic layer from the request parameters.
- /api/3/action/package_search
- facet information not being returned via package_search action (is empty).
- curl -X POST -d '{"q": "{!lucene q.op=AND df=text}tags:health +community -profile"}' 'http://thedatahub.org/api/3/action/package_search'
- figure out why it's not working, and what the facet information should look like
- whitelist any GETable api actions, and optionally construct the query from url params rather than body
comment:10 Changed 2 years ago by icmurray
- Milestone changed from ckan-v1.7 to current-ckan-sprint-2012-04-30
comment:11 Changed 2 years ago by icmurray
- Status changed from assigned to closed
- Resolution set to worksforme
Facet results are already available through the package_search action api. The facet fields need to be specified in the query, otherwise no faceting will be done (ie - the default facets specified in the .ini file are not used).
This leaves "whitelist any GETable api actions, and optionally construct the query from url params rather than body" todo, which I've pulled out into another ticket, as it wasn't originally in this ticket. (#2330)
comment:12 Changed 2 years ago by icmurray
... just to add to the above:
>>> pprint(json.loads(requests.post('http://127.0.0.1:8088/api/3/action/package_search', json.dumps({'facet.field': ["tags", "groups"]})).content)) <snip> u'search_facets': {u'groups': {u'items': [{u'count': 1, u'display_name': u"Roger's books", u'name': u'roger'}, {u'count': 2, u'display_name': u"Dave's books", u'name': u'david'}], u'title': u'groups'}, u'tags': {u'items': [{u'count': 1, u'display_name': u'tolstoy', u'name': u'tolstoy'}, {u'count': 2, u'display_name': u'russian', u'name': u'russian'}, {u'count': 2, u'display_name': u'Flexible \u30a1', u'name': u'Flexible \u30a1'}], u'title': u'tags'}}}, u'success': True}
comment:13 Changed 2 years ago by rgrp
- Status changed from closed to reopened
- Resolution worksforme deleted
- Milestone changed from ckan-sprint-2012-04-30 to current-ckan-sprint-2012-05-15
Having now thought about this I'm re-opening this ticket for the following reasons:
- No real documentation (other than that in this ticket yet available)
- It would also be nice to know how this maps to SOLR API (can i use all of the facet options solr provides or not ...?)
- And I would again emphasize my preference for having *direct* access to something that looks *exactly* like SOLR API as I can then use client and docs from SOLR to work with it.
- No clear resolution of separation between Action and REST API (and search API). Really seems to me there should be convergence between latter 2 (as suggested in the ticket) -- this would also resolve the problem that having GET /api/dataset return all datasets is *not* a great idea
- The Action API requires a POST request. Since the primary purpose of the search API would be usage from JS it would be nice if GET and JSONP were supported. (Though given our CORS support we could argue this was optional).
Not saying *all* of this needs fixing but some clear approach here would be useful
comment:14 Changed 2 years ago by icmurray
- Keywords [0.5d] added; [1-2d] removed
- Milestone changed from ckan-sprint-2012-05-15 to current-ckan-sprint-2012-05-29
comment:15 Changed 2 years ago by icmurray
- package_search action is now documented, with reference to the solr search parameters available. http://docs.ckan.org/en/latest/apiv3.html (auto-docs not working on rtd at the moment).
- actions defined in get.py are now GETable. http://docs.ckan.org/en/latest/apiv3.html#get-able-actions
- we won't be providing direct access to solr api, as we think the cost outweighs the benefit.
comment:17 Changed 2 years ago by rgrp
I *strongly* disagree re access to solr API -- i don't really care if it is direct but I want something that looks like it at least for core query parameters and facets ...
Is there some major issue around security etc (e.g. limiting to only public datasets or similar?)
comment:18 Changed 2 years ago by icmurray
For completeness, package_search docs can be viewed at https://github.com/okfn/ckan/blob/master/ckan/logic/action/get.py#L983
@rgrp: are you happy with this ticket now, does it expose all you need? If so, I'll close it...
comment:19 Changed 2 years ago by ross
- Milestone changed from current-ckan-sprint-2012-05-29 to ckan-v1.8
comment:20 Changed 2 years ago by rgrp
Let's close. What would be nice though is ETA on the GET support getting deployed on the DataHub? -- just tried using it today and realized it didn't work :-)
comment:21 Changed 23 months ago by icmurray
- Status changed from reopened to closed
- Resolution set to fixed
The GET-able actions ( #2330 ) are in master, and will make it into 1.8
comment:22 follow-up: ↓ 23 Changed 23 months ago by rgrp
But not v1.7.1? (When is v1.8 due?).
Also for the record a couple of things I found when trying to use this:
- No support for facet sort order or facet limit afaict ...
comment:23 in reply to: ↑ 22 ; follow-up: ↓ 24 Changed 23 months ago by icmurray
Replying to rgrp:
But not v1.7.1? (When is v1.8 due?).
Not 1.7.1 as it's not a bug fix (https://docs.google.com/document/d/170fxET3kd9dJ4L6VAj3yZugtK0rrVe44J4HuLbTUsEU/)
I don't know when 1.8 is due.
Also for the record a couple of things I found when trying to use this:
- No support for facet sort order or facet limit afaict ...
Thanks, that's good to know.
facet.limit should be working [1], so if it's not, then that's a bug. I can check that.
for facet.sort, I've added ticket #2543
[1] https://github.com/okfn/ckan/blob/master/ckan/logic/action/get.py#L1022
comment:24 in reply to: ↑ 23 Changed 23 months ago by icmurray
Replying to icmurray:
Also for the record a couple of things I found when trying to use this:
- No support for facet sort order or facet limit afaict ...
Thanks, that's good to know.
facet.limit should be working [1], so if it's not, then that's a bug. I can check that.
facet.limit is working as I'd expect. eg (on master):
import requests import json from pprint import pprint as pp pp(json.loads(requests.get('http://ian-laptop:5000/api/3/action/package_search?q=data&facet.field=tags&facet.limit=1').content))
Returns a result where only the tag with the highest count is returned in the search_facets result dict.
One thing I did spot though was that we aren't able to specify per-field parameters. ie whilst facet.limit sets the facet limit for all facet fields, solr allows that to be overridden on a per-field basis, eg facet.tags.limit. This isn't something we support at the moment. I've created a ticket for this, #2573 . Let me know if this something that you'd find useful, otherwise I'll leave it on the backlog for a future iteration.