Changes between Initial Version and Version 5 of Ticket #1129


Ignore:
Timestamp:
05/19/11 19:07:48 (3 years ago)
Author:
kindly
Comment:

Legend:

Unmodified
Added
Removed
Modified
  • Ticket #1129

    • Property Cc rufus.pollock@… added
    • Property Priority changed from awaiting triage to major
    • Property State changed from draft to accepted
    • Property Milestone changed from to ckan-v1.5
  • Ticket #1129 – Description

    initial v5  
    6666    We also need to way to get what the dictized representation of the package was like at any point of its revision history.  This querying is non-trivial in sql. 
    6767 
    68     There are 3 ways found of doing such queries efficiently.  
    69  
    70     These ways have been profiled/tested against package_extras in dgu.  This was chosen as its likely that the package_extras_revision table is the largest table we currently have, with over 100,000 rows.   
    71     The test was to see how long it would take to get out the package_extras for each of the 6864 packages separately as they were 100 days before the test was run.  
    72     The test scripts are attached. 
    73  
    74     1. Using distinct on. 
    75  
    76     Pros:   
    77          * The fastest method tested taking 19.622389s 
    78          * The most intuitive and maintainable script to write. 
    79          * The shortest script. 
    80     Cons: 
    81          * No direct sqlalchemy support (coming in 0.7) so only hand written script. 
    82          * Very postgres specific and not part of the sql standard. 
    83          
    84     2. Using Window functions 
    85  
    86     Pros:   
    87          * Very fast method tested taking 21.120075s 
    88          * Fairly straight forward to write 
    89          * Part of sql standard. 
    90     Cons: 
    91          * No direct sqlalchemy support yet also but coming in 0.7.  
    92          * Need upto date version of posgres. 
    93          * Not supported by sqlite, mysql. 
    94           
    95     3. Self join 
    96        
    97     Pros:   
    98          * Database agnostic. 
    99     Cons: 
    100          * Fairly slow 35.285168s 
    101          * Difficult to write in a way thats reusable 
    102          * Difficult to nicely in sqlalchemy sql expression 
    103    
    104  If we were to use 1 or 2 we would need to drop any sqlite support for the tests. 
    105  
    106  There is a forth option of just returning all the rows and finding out the last change before a date however this will throw back a lot of rows especially if there are lots of edits. 
    107  
    10868=== Participants === 
    10969 
     
    11272== Progress. == 
    11373 
    114 Need to decide on the correct solution. 
     74Decided to go with option 2. However we will change the revisioning system to be like the schema attached. 
     75This gets rid of difficult querying problems caused by querying the revision tables by adding an end date, meaning you can do range queries. 
     76 
     77The better and more normalized version of a revisioning system is outlined [https://docs.google.com/drawings/d/1Y7nMgVsrs081Pame2RdbZHlCAlV33ddTZ8VAsab1j-0/edit?hl=en_GB&authkey=CJfd8vsB].  
     78We will be a step closer to that, with this change, but we will keep the current vdm more or less, intact. 
     79