| 68 | | There are 3 ways found of doing such queries efficiently. |
| 69 | | |
| 70 | | These ways have been profiled/tested against package_extras in dgu. This was chosen as its likely that the package_extras_revision table is the largest table we currently have, with over 100,000 rows. |
| 71 | | The test was to see how long it would take to get out the package_extras for each of the 6864 packages separately as they were 100 days before the test was run. |
| 72 | | The test scripts are attached. |
| 73 | | |
| 74 | | 1. Using distinct on. |
| 75 | | |
| 76 | | Pros: |
| 77 | | * The fastest method tested taking 19.622389s |
| 78 | | * The most intuitive and maintainable script to write. |
| 79 | | * The shortest script. |
| 80 | | Cons: |
| 81 | | * No direct sqlalchemy support (coming in 0.7) so only hand written script. |
| 82 | | * Very postgres specific and not part of the sql standard. |
| 83 | | |
| 84 | | 2. Using Window functions |
| 85 | | |
| 86 | | Pros: |
| 87 | | * Very fast method tested taking 21.120075s |
| 88 | | * Fairly straight forward to write |
| 89 | | * Part of sql standard. |
| 90 | | Cons: |
| 91 | | * No direct sqlalchemy support yet also but coming in 0.7. |
| 92 | | * Need upto date version of posgres. |
| 93 | | * Not supported by sqlite, mysql. |
| 94 | | |
| 95 | | 3. Self join |
| 96 | | |
| 97 | | Pros: |
| 98 | | * Database agnostic. |
| 99 | | Cons: |
| 100 | | * Fairly slow 35.285168s |
| 101 | | * Difficult to write in a way thats reusable |
| 102 | | * Difficult to nicely in sqlalchemy sql expression |
| 103 | | |
| 104 | | If we were to use 1 or 2 we would need to drop any sqlite support for the tests. |
| 105 | | |
| 106 | | There is a forth option of just returning all the rows and finding out the last change before a date however this will throw back a lot of rows especially if there are lots of edits. |
| 107 | | |