This is a continuation of the previous post. ibjoeb on Hacker News rightly pointed out the issues with creating one database per customer, and suggested using schemas. I completely agree. Less management issues, less resource consumption and just a better design.
There is a Django ticket to support schemas in a generic manner, but I’m doubtful if it is useful for schemas created on the fly.
Accordingly, now, here are the changes I tried out to use schemas instead of databases :
settings.py
CONNECTION_PREPROCESSOR='common.routers.SetSearchPathPreprocessor'
MASTER_SCHEMA='public'
...
MIDDLEWARE_CLASSES = (
'middleware.threadlocal.SchemaMiddleware',
...)
Here we definee a connection preprocessor. Unlike the connection_created signal of Django, which is invoked every time a connection is created, this will be called every time a connection is looked up.
The middleware is a standard middleware to make the currently active schema available throughout the stack
middleware.threadlocal
from django.utils.thread_support import currentThread
_schemas = {}
class SchemaMiddleware:
def process_request(self, request):
subdomain =request.get_host().split(".")[0]
#Set up the Company model meta option db_table as explained here - http://stackoverflow.com/questions/1160598/how-to-use-schemas-in-django
# This will make sure that queries on this model will always go to the master schema
company=Company.objects.get(subdomain)
set_schema(company.schema)
return None
The middleware looks up the registration details by subdomain and sets the schema in the threadlocal
commons.router
from middleware.threadlocal import get_schema
from settings import MASTER_SCHEMA
class SetSearchPathPreprocessor(object):
def process(self,cursor):
schema= get_schema()
if not schema: schema=MASTER_SCHEMA
cursor.execute("SET search_path TO %s" % schema)
The preprocessor is PostgreSQL specific and sets the search path. See PostgreSQL schemas. This is called when a connection is looked up. Note that using django.db.connection will break this code. The preferred way is to lookup by database alias. For example, django.db.connections[‘default’]
django.db.utils.ConnectionHandler
def __getitem__(self, alias):
if alias in self._connections:
return self._connections[alias]
self.ensure_defaults(alias)
db = self.databases[alias]
backend = load_backend(db['ENGINE'])
conn = backend.DatabaseWrapper(db, alias)
try:
from settings import CONNECTION_PREPROCESSOR
preprocessor=load_class(CONNECTION_PREPROCESSOR)()
preprocessor.process(conn.cursor())
except ImportError:
pass
self._connections[alias] = conn
return conn
def load_class(path):
i = path.rfind('.')
from django.utils.importlib import import_module
module, attr = path[:i], path[i+1:]
try:
mod = import_module(module)
except ImportError, e:
raise ImproperlyConfigured('Error importing class %s: "%s"' % (module, e))
except ValueError, e:
raise ImproperlyConfigured('Error class')
try:
return getattr(mod, attr)
except AttributeError:
raise ImproperlyConfigured('Module "%s" does not define class "%s"' % (module, attr))
Our old connection handler class, modified differently. The getitem tries to import a preprocessor. If it is available, it loads the class, instantiates it and calls the process method.
In our case, the preprocessor is SetSearchPathPreprocessor, which will set the PostgreSQL search path.
Voila! Now you can switch to different schemas on the fly. Add a few migration command magic and you are set!
Thanks a lot, ibjoeb!