68 | | There are 3 ways found of doing such queries efficiently. |
69 | | |
70 | | These ways have been profiled/tested against package_extras in dgu. This was chosen as its likely that the package_extras_revision table is the largest table we currently have, with over 100,000 rows. |
71 | | The test was to see how long it would take to get out the package_extras for each of the 6864 packages separately as they were 100 days before the test was run. |
72 | | The test scripts are attached. |
73 | | |
74 | | 1. Using distinct on. |
75 | | |
76 | | Pros: |
77 | | * The fastest method tested taking 19.622389s |
78 | | * The most intuitive and maintainable script to write. |
79 | | * The shortest script. |
80 | | Cons: |
81 | | * No direct sqlalchemy support (coming in 0.7) so only hand written script. |
82 | | * Very postgres specific and not part of the sql standard. |
83 | | |
84 | | 2. Using Window functions |
85 | | |
86 | | Pros: |
87 | | * Very fast method tested taking 21.120075s |
88 | | * Fairly straight forward to write |
89 | | * Part of sql standard. |
90 | | Cons: |
91 | | * No direct sqlalchemy support yet also but coming in 0.7. |
92 | | * Need upto date version of posgres. |
93 | | * Not supported by sqlite, mysql. |
94 | | |
95 | | 3. Self join |
96 | | |
97 | | Pros: |
98 | | * Database agnostic. |
99 | | Cons: |
100 | | * Fairly slow 35.285168s |
101 | | * Difficult to write in a way thats reusable |
102 | | * Difficult to nicely in sqlalchemy sql expression |
103 | | |
104 | | If we were to use 1 or 2 we would need to drop any sqlite support for the tests. |
105 | | |
106 | | There is a forth option of just returning all the rows and finding out the last change before a date however this will throw back a lot of rows especially if there are lots of edits. |
107 | | |