Ticket #2251 (closed enhancement: fixed)

Opened 2 years ago

Last modified 2 years ago

Internal analytics for ckan.

Reported by: kindly Owned by: toby
Priority: major Milestone: ckan-sprint-2012-04-16
Component: ecportal Keywords: 6d
Cc: Repository: ckan
Theme: none

Description (last modified by toby) (diff)

Page views and Resources clicks need to be tracked.

User Stories

US1 As a Site Admin / Visitor (?) I want to see how often a page has been viewed (every page) and how often resources have been downloaded.

US1a Next to a resource or a dataset see how often it has been downloaded / viewed

US1b I want to see datasets or resources ranked by most downloaded or viewed

US1c See a trend graph for a dataset (and resources)

Adminstrative Dashboard (?)

  • I want to see the traffic breakdown by country to my site ...
  • Ditto for browser type, language, etc etc
  • I want to see it graphed over time ...

Implementation Details

  1. How do we store this data in CKAN?
  2. How do we track (and store)?
  3. How do we display
  • Config option ckan.status.enabled = False (by default)

Storing Data

How does ckanext-googleanalytics do this? Current table:

package_id | count_recent | count_total

Move to a new stats_summary table

id | item_id | object_type | stats_type (total, month_yyyy_mm, ...) | value

Do we store this data into the search (solr) so we can search by it?

Displaying Data

  • Helper functions / dictize:
    • Helper function: h.stats_get(object_type, id, stats_type)
      • h.stats_top_ranked(object_type, number) -> returns object_dicts or just labels or ...
    • Change to dictize
  • Location in the default theme (do we show for example in search results too!)
  • Support for ranking by most popular in search?

Tracking Data

  • Our own solution (just write to site_tracking)
  • Google analytics (plus extension for retrieving data) <- would need a refactor
  • Piwiki

Own Solution

(For later: not as part of this ticket probably)

site_tracking table

id | url | timestamp | action (page_view, resource_download) |

  • Make javascript to make request to ckan to store clicks and page views.
  • Add middleware so these requests do not go through pylons and just store data quickly.

Change History

comment:1 Changed 2 years ago by kindly

  • Description modified (diff)

comment:2 Changed 2 years ago by rgrp

Big +1: everyone wants to know page views. Would like detail of how this goes into interface. Downloads already being tracked. Also isn't this just an extension to ckanext-googleanalytics.

comment:3 Changed 2 years ago by toby

  • Status changed from new to accepted

not really part of ckanext-ga as we need to sit first in the middleware for speed reasons. I'm looking at recording the details and will do a summary page like ga extension.

As far as other uses like showing popular resources etc then we need to decide where the data will live - on the resource or separate.

Downloads tracked? where is this done in the code/data stored

comment:4 Changed 2 years ago by toby

notes from meeting 21-3-2012

US1 As a Site Admin / Visitor (?) I want to see how often a page has been viewed (every page) and how often resources have been downloaded. US1a Next to a resource or a dataset see how often it has been downloaded / viewed US1b I want to see datasets or resources ranked by most downloaded or viewed US1c See a trend graph for a dataset (and resources) Adminstrative Dashboard (?)

  • I want to see the traffic breakdown by country to my site ...
  • Ditto for browser type, language, etc etc
  • I want to see it graphed over time ...

Implementation Details

  1. How do we store this data in CKAN?
  2. How do we track (and store)?
  3. How do we display
  • Config option ckan.status.enabled = False (by default)

Storing Data

How does ckanext-googleanalytics do this? Current table: package_id | count_recent | count_total Move to a new stats_summary table id | item_id | object_type | stats_type (total, month_yyyy_mm, ...) | value Do we store this data into the search (solr) so we can search by it?

Displaying Data

  • Helper functions / dictize:
    • Helper function: h.stats_get(object_type, id, stats_type)
      • h.stats_top_ranked(object_type, number) -> returns object_dicts or just labels or ...
    • Change to dictize
  • Location in the default theme (do we show for example in search results too!)
  • Support for ranking by most popular in search?

Tracking Data

  • Our own solution (just write to site_tracking)
  • Google analytics (plus extension for retrieving data) <- would need a refactor
  • Piwiki

Own Solution

site_tracking table id | url | timestamp | action (page_view, resource_download) |

  • Make javascript to make request to ckan to store clicks and page views.
  • Add middleware so these requests do not go through pylons and just store data quickly.

comment:5 Changed 2 years ago by toby

TODO

look at

  1. How do we store this data in CKAN?
  1. How do we display * Config option ckan.status.enabled = False (by default)

comment:6 Changed 2 years ago by rgrp

  • Description modified (diff)

Update description in great detail.

comment:7 Changed 2 years ago by rgrp

  • Description modified (diff)

comment:8 Changed 2 years ago by rgrp

  • Description modified (diff)

comment:9 Changed 2 years ago by toby

  • Priority changed from awaiting triage to major

we want

resource show downloaded/viewed on resources

show on the dataset total/recent

comment:10 Changed 2 years ago by toby

  • Description modified (diff)

comment:11 Changed 2 years ago by toby

  • Keywords 6d added; 4d removed
  • Milestone changed from ckan-sprint-2012-04-02 to current-ckan-sprint-2012-04-16

comment:12 Changed 2 years ago by toby

we have various options on tracking unique users seems the sane approach.

  • unique daily view - needed for nice graphing
  • unique total views
  • total unique daily views - higher numbers and feels more right - easier to calculate if original data is lost/or archived

for now will collect all and then decide which to use for display as gives us flexibility if needed

display


why is group listing of datasets inconsistent with main search (TDH at least)? should we make these consistent?

group listing allows downloading of resource without tracking (this should be fixed)

if we make listings uniform do we want the download links or not?

I don't like the term download when it is just a link (maybe that's just me - what would be better?)

comment:13 Changed 2 years ago by toby

most work completed: outstanding issues

  • add a prune tracking_raw data to cli
  • add indexes to tracking_raw
  • add summary page for new data format
  • add api counter

1) do we record the url or action etc (action seems more sensible) 2) add in call or in action?

comment:14 Changed 2 years ago by toby

api counter moved to ticket #2282 as needs better specification

comment:15 Changed 2 years ago by toby

For meeting today with Adria

  • How to add tracking to solr index. Can the data live in tracking_summary? Do we need any extra indexes on table to help indexing?
  • How will we add order by popularity to search?
  • Can we add resources too?
  • Will search be best way to find most popular packages/resources or should we get that pages data from tracking_summary?

comment:16 Changed 2 years ago by toby

  • Status changed from accepted to closed
  • Resolution set to fixed
Note: See TracTickets for help on using tickets.