A better way : SaaS with Django and PostgreSQL Schemas

This is a continuation of the previous post. ibjoeb on Hacker News rightly pointed out the issues with creating one database per customer, and suggested using schemas. I completely agree. Less management issues, less resource consumption and just a better design.

There is a Django ticket to support schemas in a generic manner, but I’m doubtful if it is useful for schemas created on the fly. Accordingly, now, here are the changes I tried out to use schemas instead of databases :

settings.py

CONNECTION_PREPROCESSOR='common.routers.SetSearchPathPreprocessor'
MASTER_SCHEMA='public'
...
MIDDLEWARE_CLASSES = (
'middleware.threadlocal.SchemaMiddleware',
...)

Here we definee a connection preprocessor. Unlike the connection_created signal of Django, which is invoked every time a connection is created, this will be called every time a connection is looked up. The middleware is a standard middleware to make the currently active schema available throughout the stack

middleware.threadlocal

from django.utils.thread_support import currentThread
_schemas = {}

class SchemaMiddleware:

    def process_request(self, request):
         subdomain =request.get_host().split(".")[0]
         #Set up the Company model meta option db_table as explained here - http://stackoverflow.com/questions/1160598/how-to-use-schemas-in-django
         # This will make sure that queries on this model will always go to the master schema
         company=Company.objects.get(subdomain)
         set_schema(company.schema)
         return None

The middleware looks up the registration details by subdomain and sets the schema in the threadlocal

commons.router

from middleware.threadlocal import get_schema
from settings import MASTER_SCHEMA
class SetSearchPathPreprocessor(object):
    def process(self,cursor):
        schema= get_schema()
        if not schema: schema=MASTER_SCHEMA
        cursor.execute("SET search_path TO  %s" % schema)

The preprocessor is PostgreSQL specific and sets the search path. See PostgreSQL schemas. This is called when a connection is looked up. Note that using django.db.connection will break this code. The preferred way is to lookup by database alias. For example, django.db.connections[‘default’]

django.db.utils.ConnectionHandler

def __getitem__(self, alias):
    if alias in self._connections:
        return self._connections[alias]
    self.ensure_defaults(alias)
    db = self.databases[alias]
    backend = load_backend(db['ENGINE'])
    conn = backend.DatabaseWrapper(db, alias)
    try:
        from settings import CONNECTION_PREPROCESSOR
        preprocessor=load_class(CONNECTION_PREPROCESSOR)()
        preprocessor.process(conn.cursor())
    except ImportError:
        pass
    self._connections[alias] = conn

return conn


def load_class(path):
    i = path.rfind('.')
    from django.utils.importlib import import_module
    module, attr = path[:i], path[i+1:]
    try:
        mod = import_module(module)
    except ImportError, e:
        raise ImproperlyConfigured('Error importing class %s: "%s"' % (module, e))
    except ValueError, e:
        raise ImproperlyConfigured('Error class')
    try:
        return getattr(mod, attr)
    except AttributeError:
        raise ImproperlyConfigured('Module "%s" does not define class "%s"' % (module, attr))

Our old connection handler class, modified differently. The getitem tries to import a preprocessor. If it is available, it loads the class, instantiates it and calls the process method. In our case, the preprocessor is SetSearchPathPreprocessor, which will set the PostgreSQL search path.

Voila! Now you can switch to different schemas on the fly. Add a few migration command magic and you are set!

Thanks a lot, ibjoeb!