SaaS with Django and PostgreSQL
UPDATE: Based on comments on HackerNews, I have added another post which explains how to use a single database, and one schema per customer, rather than one database per customer.
The problem
I am building a B2B web application (Django +PostgreSQL) . I want to isolate customer’s data from one another. As in a typical multi-tenant architecture, one database per customer. A customer is typically a small/medium company. The customer signs up on www.mygreatservice.com and gets a subdomain – mygreatcompany.mygreatservice.com . How do I tell the Django application running www.mygreatservic.com that a new subdomain was created, a new database was created and for all requests coming for the subdomain mygreatcompany , use this new database?
The solution
I am using PostgreSQL and creating one database per customer. Django 1.2+ supports multiple databases, but all of databases must be predefined in settings.py. Here’s what am planning to do:
Store the customer registration data in a master database. Generate a database name and store it as well. If the subdomain requested by the customer was “mygreatcompany”, the database name cound be “mygreatcompany_6a5f2d” – subdomain + random suffix.Similarly,the database username and password can be generated. All of this is stored in a Company model, on the master database
class CompanyManager(models.Manager): """ This manager always uses master db """ def __init__(self): super(CompanyManager, self).__init__() self._db = MASTER_DB def db_manager(self, using=MASTER_DB): obj = copy.copy(self) obj._db = MASTER_DB return obj class Company(models.Model): contact_name=models.CharField(max_length=512) company_name=models.CharField(max_length=512) subdomain=models.CharField(max_length=64,unique=True) database_name=models.CharField(max_length=128,unique=True) database_user=models.CharField(max_length=128,unique=True) database_pwd=models.CharField(max_length=128) contact_email=models.EmailField() objects=CompanyManager()
The manager is overridden such that it always uses the master DB (Okay, its not perfect yet, but you get the idea).
Use a middleware to lookup the name of the new database, based on the subdomain.
from django.utils.thread_support import currentThread _db = {} def set_db(db): _db[currentThread()] = db def get_db(): return _db.get(currentThread(),None) class DBMiddleware: def process_request(self, request): subdomain =request.get_host().split(".")[0] company=Company.objects.get(subdomain) set_db({'database_name':company.database_name,'database_user':company.database_user,'database_pwd':company.database_pwd}) return None
Again, not perfect, and purists will balk at the use of thread locals.But hey, its a start, right?
Next, tweak django a little, to retrieve a db connection based on this database name. utils.py in django.db holds the key.
In the class ConnectionHandler :
def __getitem__(self, alias): if alias in self._connections: return self._connections[alias] self.ensure_defaults(alias) db = self.databases[alias] backend = load_backend(db['ENGINE']) conn = backend.DatabaseWrapper(db, alias) self._connections[alias] = conn return conn
This method returns the connection if its already there in the alais-connection map. Otherwise, it creates a connection by looking up the backend by alias name, loading it, and then creating a connection.
This will not work for us since our database definitions and aliases are not predefined, but available elsewhere.
So the new getitem looks like:
def __getitem__(self, alias): if alias in self._connections: return self._connections[alias] #This is a new customer database,so make a copy of the default DB, set the alias, and other database details and store it in the dictionary of databases if not alias in self.databases: #Get db details from threadlocal. Aha! Call Pollution Control now! db_details=get_db() new_db = self.databases[DEFAULT_DB_ALIAS].copy() new_db['NAME']=db_details['db_name'] new_db['USER']=db_details['db_user'] new_db['PASSWORD']=db_details['db_pwd'] self.databases[alias]=new_db self.ensure_defaults(alias) db = self.databases[alias] backend = load_backend(db['ENGINE']) conn = backend.DatabaseWrapper(db, alias) self._connections[alias] = conn return conn
Finally, define a custom ConnectionRouter, to pick up the right alias,based on the subdomain
from middleware.threadlocal import get_db class DynamicDBRouter(object): def db_for_read(self, model, **hints): return get_db()['database_name'] def db_for_write(self, model, **hints): return get_db()['database_name'] def allow_relation(self, obj1, obj2, **hints): return True def allow_syncdb(self, db, model): return True
If only Django let me define my own connection handler, just like it does for routers, middleware, and template loaders.
I have tested this on a dev environment and seems to work well, and with a custom command to migrate all databases at once (I use south ), life is easy.
However, I have to extensively test this on a multi-user environment to convince myself that this works as expected in a production environment.