Ticket #2251 (closed enhancement: fixed)
Internal analytics for ckan.
Reported by: | kindly | Owned by: | toby |
---|---|---|---|
Priority: | major | Milestone: | ckan-sprint-2012-04-16 |
Component: | ecportal | Keywords: | 6d |
Cc: | Repository: | ckan | |
Theme: | none |
Description (last modified by toby) (diff)
Page views and Resources clicks need to be tracked.
User Stories
US1 As a Site Admin / Visitor (?) I want to see how often a page has been viewed (every page) and how often resources have been downloaded.
US1a Next to a resource or a dataset see how often it has been downloaded / viewed
US1b I want to see datasets or resources ranked by most downloaded or viewed
US1c See a trend graph for a dataset (and resources)
Adminstrative Dashboard (?)
- I want to see the traffic breakdown by country to my site ...
- Ditto for browser type, language, etc etc
- I want to see it graphed over time ...
Implementation Details
- How do we store this data in CKAN?
- How do we track (and store)?
- How do we display
- Config option ckan.status.enabled = False (by default)
Storing Data
How does ckanext-googleanalytics do this? Current table:
package_id | count_recent | count_total
Move to a new stats_summary table
id | item_id | object_type | stats_type (total, month_yyyy_mm, ...) | value
Do we store this data into the search (solr) so we can search by it?
Displaying Data
- Helper functions / dictize:
- Helper function: h.stats_get(object_type, id, stats_type)
- h.stats_top_ranked(object_type, number) -> returns object_dicts or just labels or ...
- Change to dictize
- Helper function: h.stats_get(object_type, id, stats_type)
- Location in the default theme (do we show for example in search results too!)
- Support for ranking by most popular in search?
Tracking Data
- Our own solution (just write to site_tracking)
- Google analytics (plus extension for retrieving data) <- would need a refactor
- Piwiki
Own Solution
(For later: not as part of this ticket probably)
site_tracking table
id | url | timestamp | action (page_view, resource_download) |
- Make javascript to make request to ckan to store clicks and page views.
- Add middleware so these requests do not go through pylons and just store data quickly.
Change History
comment:2 Changed 2 years ago by rgrp
Big +1: everyone wants to know page views. Would like detail of how this goes into interface. Downloads already being tracked. Also isn't this just an extension to ckanext-googleanalytics.
comment:3 Changed 2 years ago by toby
- Status changed from new to accepted
not really part of ckanext-ga as we need to sit first in the middleware for speed reasons. I'm looking at recording the details and will do a summary page like ga extension.
As far as other uses like showing popular resources etc then we need to decide where the data will live - on the resource or separate.
Downloads tracked? where is this done in the code/data stored
comment:4 Changed 2 years ago by toby
notes from meeting 21-3-2012
US1 As a Site Admin / Visitor (?) I want to see how often a page has been viewed (every page) and how often resources have been downloaded. US1a Next to a resource or a dataset see how often it has been downloaded / viewed US1b I want to see datasets or resources ranked by most downloaded or viewed US1c See a trend graph for a dataset (and resources) Adminstrative Dashboard (?)
- I want to see the traffic breakdown by country to my site ...
- Ditto for browser type, language, etc etc
- I want to see it graphed over time ...
Implementation Details
- How do we store this data in CKAN?
- How do we track (and store)?
- How do we display
- Config option ckan.status.enabled = False (by default)
Storing Data
How does ckanext-googleanalytics do this? Current table: package_id | count_recent | count_total Move to a new stats_summary table id | item_id | object_type | stats_type (total, month_yyyy_mm, ...) | value Do we store this data into the search (solr) so we can search by it?
Displaying Data
- Helper functions / dictize:
- Helper function: h.stats_get(object_type, id, stats_type)
- h.stats_top_ranked(object_type, number) -> returns object_dicts or just labels or ...
- Change to dictize
- Helper function: h.stats_get(object_type, id, stats_type)
- Location in the default theme (do we show for example in search results too!)
- Support for ranking by most popular in search?
Tracking Data
- Our own solution (just write to site_tracking)
- Google analytics (plus extension for retrieving data) <- would need a refactor
- Piwiki
Own Solution
site_tracking table id | url | timestamp | action (page_view, resource_download) |
- Make javascript to make request to ckan to store clicks and page views.
- Add middleware so these requests do not go through pylons and just store data quickly.
comment:5 Changed 2 years ago by toby
TODO
look at
- How do we store this data in CKAN?
- How do we display * Config option ckan.status.enabled = False (by default)
comment:6 Changed 2 years ago by rgrp
- Description modified (diff)
Update description in great detail.
comment:9 Changed 2 years ago by toby
- Priority changed from awaiting triage to major
we want
resource show downloaded/viewed on resources
show on the dataset total/recent
comment:11 Changed 2 years ago by toby
- Keywords 6d added; 4d removed
- Milestone changed from ckan-sprint-2012-04-02 to current-ckan-sprint-2012-04-16
comment:12 Changed 2 years ago by toby
we have various options on tracking unique users seems the sane approach.
- unique daily view - needed for nice graphing
- unique total views
- total unique daily views - higher numbers and feels more right - easier to calculate if original data is lost/or archived
for now will collect all and then decide which to use for display as gives us flexibility if needed
display
why is group listing of datasets inconsistent with main search (TDH at least)? should we make these consistent?
group listing allows downloading of resource without tracking (this should be fixed)
if we make listings uniform do we want the download links or not?
I don't like the term download when it is just a link (maybe that's just me - what would be better?)
comment:13 Changed 2 years ago by toby
most work completed: outstanding issues
- add a prune tracking_raw data to cli
- add indexes to tracking_raw
- add summary page for new data format
- add api counter
1) do we record the url or action etc (action seems more sensible) 2) add in call or in action?
comment:14 Changed 2 years ago by toby
api counter moved to ticket #2282 as needs better specification
comment:15 Changed 2 years ago by toby
For meeting today with Adria
- How to add tracking to solr index. Can the data live in tracking_summary? Do we need any extra indexes on table to help indexing?
- How will we add order by popularity to search?
- Can we add resources too?
- Will search be best way to find most popular packages/resources or should we get that pages data from tracking_summary?
comment:16 Changed 2 years ago by toby
- Status changed from accepted to closed
- Resolution set to fixed