Ticket #876 (closed enhancement: fixed)

Opened 3 years ago

Last modified 3 years ago

Support sqlite as a database backend for CKAN

Reported by: rgrp Owned by: sebbacon
Priority: critical Milestone: ckan-v1.3-sprint-2
Component: ckan Keywords:
Cc: Repository:
Theme:

Description

Among other things this will allow the tests to run much quicker.

Change History

comment:1 Changed 3 years ago by kindly@…

I have looked into this already so I can give you a head start. I am working on a project that uses many backends so I have some experience. So here is what I have found so far.

nested transactions

VDM does not support sqlite, as it uses nested transactions. I do not think vdm needs nested transactions. It can use a flush instead. Here is the patch that works. All vdm tests pass.

--- a/vdm/sqlalchemy/base.py	Sat Sep 11 23:06:26 2010 +0000
+++ b/vdm/sqlalchemy/base.py	Mon Dec 20 16:16:34 2010 +0000
@@ -40,9 +40,8 @@
         self.setattr(session, 'HEAD', True)
         self.setattr(session, 'revision', revision)
         if revision.id is None:
-            session.begin_nested()
             session.add(revision)
-            session.commit()
+            session.flush()

indexes

The index file 021_postgres_upgrade.sql in the migrate repository will not run as it uses syntax particular to postgres. Another will need to be made thats similar. sqlite does not support complex indexes like upper(text), so a work around will need to be found.

unicode

The harvesting returns utf8 encoded strings and pysqlite dbapi only supports python unicode objects (as far as I can tell). There will need to be a process in converting all strings that get into the database with string.decode("utf8")

dates

Have not looked into this one too much. However, as sqlite stores everything as strings the timestamps appear to be failing on conversion back into python.

I have solved the above two issues before by adding attribute extensions to sqlalchemy mappers to do the conversions without effecting too much code.

in memory sqlite

Some tests need to change in order to make sure the database is created first because the database gets lost each time. In the tests that I have made pass, they run in about a seventh of the time as they do on postgres.

Other things to keep in mind.

  • Need a new flag in test.ini to remove full text indexing completely, or always use it with solr.
  • There are enough incompatibilities between the databases that you would also want to test against postgres as well, at least before a release.
  • I would probably upgrade sqlalchemy first, so you will not have to the changes twice. The new versions are significantly faster too.
  • I have submitted a patch to #868 that makes the tests run about 2.5 times as fast and I think there are more low hanging fruit if the aim is test speed.

comment:2 Changed 3 years ago by sebbacon

Thanks for the info!

Re. nested transactions. I am getting repeated non-deterministic test failures against sqlite (and indeed postgres, but these failures appear more frequent against sqlite). One of them I seemed to be able to get rid of by eliminating the savepoint as per your first point. However, it appears that sqlite does support savepoints; to demonstrate it, the following test code appears to work in the latest sqlalchemy:

from sqlalchemy import *
from sqlalchemy.orm import *

db = create_engine('sqlite:///')

metadata = MetaData(db)

users = Table('users', metadata,
    Column('user_id', Integer, primary_key=True),
    Column('name', String(40)),)
users.create()

class User(object):
    pass

usermapper = mapper(User, users)

Session = sessionmaker()
session = Session()

fred = User()
fred.name = "Fred"

sue = User()
sue.name = "Sue"

amy = User()
amy.name = "Amy"

session.add(fred)
session.add(amy)
session.begin_nested()
session.add(sue)
session.rollback()
session.commit()
assert session.query(User).count() == 2
print "OK"

So, while I agree they're not needed, I'm not sure they're a problem. What do you think? Also, have you seen non-deterministic test errors like this?

comment:3 Changed 3 years ago by kindly@…

I have read quite a lot of people having problems with savepoints with sqlite and thought they were not supported on sqlalchemy. They are at least not consistant with postgres ones. I may well be out of date on this. Here is an example even though its a bit old.

I did get some non deterninistic errors, the above seemed to fix them. A failed subtransaction is not handled well by sqlalchemy and I think this causes knockon effects due to the unresolved transaction. I would stay well clear of them entirely if possible.

What are the errors you are getting??

My 2 cents. ignore me at will...

I would think about using a different backend for testing than production. look here. If you want to support both then you should test on both. There are simple ways to scrape a few more minutes off the tests. If you want real speed, then a multiprocess solution (with a database per core) would be sensible if a bit tricky.

comment:4 Changed 3 years ago by anonymous

Thanks for your feedback, very useful.

I don't really agree with the people in the linked discussion who say it's pointless testing against a different database from production.

The goal here is to make it easy enough for people to run as many tests as possible that they actually do so. Even 15 minutes is too long in that case. With sqlite we can get it in at under 5 minutes. I would also like to identify the longest running tests (which I would characterise as "functional" or "integration" tests and make them run as a separate suite, and then encourage a culture of writing true unit tests before functional tests, so that running unit tests can happen in 1 minute and be part of the regular development cycle.

That's no replacement for also running *all* tests periodically, and also running tests under postgres, which we can continue to do on the continuous integration server.

Longer term I agree that it would be better to run local tests against postgres too, but that will I think involve refactoring many of the tests.

comment:5 Changed 3 years ago by anonymous

I agree with all your points about testing apart form using sqlite, especially splitting out the functional tests and continuous integration.

Longer term I agree that it would be better to run local tests against postgres too, but that will I think involve refactoring many of the tests.

Well there are two options

  1. refactor the tests
  2. refactor the code to use sqlite and postress

It is a value judgment as to which is more complicated. I personally think 2 is more complicated but may be wrong on that. The real danger with 2 is that you are needlessly adding complication to production code, with 1 you are only changing the tests.

Upgrading to sqlalchemy 0.5+ should happen first regardless. You will need upto date documentation.

There is another option too. Put the postgres data directory on tempfs/ramfs and turn off durability here. We would need a way to db init before the tests where run (or) at boot). This may be the best of both worlds.

Anyway Happy xmas!!

comment:6 Changed 3 years ago by dread

  • Status changed from new to closed
  • Resolution set to fixed

Seb and David have completed this I believe. I've merged the changes into core CKAN in cset:68d63fda4814.

Note: See TracTickets for help on using tickets.