SaaS with Django and PostgreSQL

UPDATE: Based on comments on HackerNews, I have added another post which explains how to use a single database, and one schema per customer, rather than one database per customer.

The problem

I am building a B2B web application (Django +PostgreSQL) . I want to isolate customer’s data from one another. As in a typical multi-tenant architecture, one database per customer. A customer is typically a small/medium company. The customer signs up on www.mygreatservice.com and gets a subdomain – mygreatcompany.mygreatservice.com . How do I tell the Django application running www.mygreatservic.com that a new subdomain was created, a new database was created and for all requests coming for the subdomain mygreatcompany , use this new database?

The solution

I am using PostgreSQL and creating one database per customer. Django 1.2+ supports multiple databases, but all of databases must be predefined in settings.py. Here’s what am planning to do:

  1. Store the customer registration data in a master database. Generate a database name and store it as well. If the subdomain requested by the customer was “mygreatcompany”, the database name cound be “mygreatcompany_6a5f2d” – subdomain + random suffix.Similarly,the database username and password can be generated. All of this is stored in a Company model, on the master database

    class CompanyManager(models.Manager):
    
     """    This  manager always uses master db """
    
     def __init__(self):
         super(CompanyManager, self).__init__()
         self._db = MASTER_DB
    
     def db_manager(self, using=MASTER_DB):
         obj = copy.copy(self)
         obj._db = MASTER_DB
         return obj
    
     class Company(models.Model):
         contact_name=models.CharField(max_length=512)
         company_name=models.CharField(max_length=512)
         subdomain=models.CharField(max_length=64,unique=True)
         database_name=models.CharField(max_length=128,unique=True)
         database_user=models.CharField(max_length=128,unique=True)
         database_pwd=models.CharField(max_length=128)
         contact_email=models.EmailField()
         objects=CompanyManager()

    The manager is overridden such that it always uses the master DB (Okay, its not perfect yet, but you get the idea).

  2. Use a middleware to lookup the name of the new database, based on the subdomain.

    from django.utils.thread_support import currentThread
     _db = {}
    
     def set_db(db):
         _db[currentThread()] = db
    
     def get_db():
         return _db.get(currentThread(),None)
    
     class DBMiddleware:
         def process_request(self, request):
             subdomain =request.get_host().split(".")[0]
             company=Company.objects.get(subdomain)
             set_db({'database_name':company.database_name,'database_user':company.database_user,'database_pwd':company.database_pwd})
             return None

    Again, not perfect, and purists will balk at the use of thread locals.But hey, its a start, right?

  3. Next, tweak django a little, to retrieve a db connection based on this database name. utils.py in django.db holds the key.

    In the class ConnectionHandler :

    def __getitem__(self, alias):
         if alias in self._connections:
             return self._connections[alias]
         self.ensure_defaults(alias)
         db = self.databases[alias]
         backend = load_backend(db['ENGINE'])
         conn = backend.DatabaseWrapper(db, alias)
         self._connections[alias] = conn
         return conn

    This method returns the connection if its already there in the alais-connection map. Otherwise, it creates a connection by looking up the backend by alias name, loading it, and then creating a connection.

    This will not work for us since our database definitions and aliases are not predefined, but available elsewhere.

    So the new getitem looks like:

    def __getitem__(self, alias):
         if alias in self._connections:
             return self._connections[alias]
         #This is a new customer database,so make a copy of the default DB, set the alias, and other database details and store it in the dictionary of databases
         if not alias in self.databases:
             #Get db details from threadlocal. Aha! Call Pollution Control now!
             db_details=get_db()
             new_db = self.databases[DEFAULT_DB_ALIAS].copy()
             new_db['NAME']=db_details['db_name']
             new_db['USER']=db_details['db_user']
             new_db['PASSWORD']=db_details['db_pwd']
             self.databases[alias]=new_db
         self.ensure_defaults(alias)
         db = self.databases[alias]
         backend = load_backend(db['ENGINE'])
         conn = backend.DatabaseWrapper(db, alias)
         self._connections[alias] = conn
         return conn
  4. Finally, define a custom ConnectionRouter, to pick up the right alias,based on the subdomain

    from middleware.threadlocal import get_db
     class DynamicDBRouter(object):
    
         def db_for_read(self, model, **hints):
             return get_db()['database_name']
    
         def db_for_write(self, model, **hints):
             return get_db()['database_name']
    
         def allow_relation(self, obj1, obj2, **hints):
             return True
    
         def allow_syncdb(self, db, model):
             return True

If only Django let me define my own connection handler, just like it does for routers, middleware, and template loaders.

I have tested this on a dev environment and seems to work well, and with a custom command to migrate all databases at once (I use south ), life is easy.

However, I have to extensively test this on a multi-user environment to convince myself that this works as expected in a production environment.