Enhanced Django forms

Django forms are great - with all the validations and the works. If there is one thing that annoys me in forms, it is this syntax to retrieve form data - myform.cleaned_data['username'] . It would be a lot cleaner if I could do something like myform.username(). Here's one way of doing it with a bit of magic with __getattr__ (which goes against Django's design principle, but too good to pass).

Create a utility class with a  __getattr__ that returns a closure , that looks up the cleaned_data dictionary with the given field name, if the field name is valid. What the heck! Python is more precise than English, so here goes :

 

 

Django deployments made easy

I use a combination of Git, Pip, Virtualenv, Fabric to make my django deployments painless. It is now a single step process for me from development to deployment.  Here is the fabric file I use. Feel free to use it in your projects. My project layout is very similar to what is described here.

ROOT
 apps/ # apps
 config/ # settings.py, urls.py, manage.py
 lib/ # Third party apps
 public/ # static resources
 templates/ #templates
 fabfile.py

And here is the fabfile

Change the variables defined at the beginning of the file and you are all set!

from __future__ import with_statement
from fabric.api import *
from fabric.contrib.console import confirm
from fabric.contrib.files import *
from os import path
env.hosts = [<>]
BASE_DIR='<>'
APP_NAME='<>'
USER_NAME='<>'
DEPLOY_DIR=BASE_DIR+APP_NAME
GIT_CLONE_URL="<>
GIT_PUSH_URL="<>"
FCGI_PORT='<>'
CUR_DIR= path.abspath(path.dirname(__file__).decode('utf-8'))

def setup():
    """Sets up the remote Ubuntu server with pip, virtualenv , clones the repo"""
    with cd(BASE_DIR):
        run('sudo apt-get install python-setuptools python-dev build-essential')
        run('sudo easy_install -U pip')
        run('sudo pip install -U virtualenv')
        run('virtualenv --no-site-packages '+ DEPLOY_DIR)
    with cd(DEPLOY_DIR):
        run('git clone %s' % GIT_CLONE_URL)
        run('source bin/activate && pip -E . install -r %s/requirements.txt' % APP_NAME)

def upgrade():
    with cd(DEPLOY_DIR):
        run('source bin/activate && pip -E . install -r %s/requirements.txt' % APP_NAME)
        
def init():
    with cd(DEPLOY_DIR):
        run('source bin/activate')

def commit_deploy(rev='HEAD',comment='Commit'):
    """ Add, commit, push changes to git repo, and deploy"""
    local('git add . && git commit -m %s' % comment)
    deploy(rev)
    
def deploy(rev='HEAD'):
    """Push changes to remote git and deploy the app. Takes an argument 'rev' which can be used to deploy a particular revision."""
    local('git push %s master' % GIT_PUSH_URL)
    with cd(DEPLOY_DIR+'/'+APP_NAME):
        run('git pull')
    stop_fcgi()
    migrate_and_start()

def restart():
    """Restart the FCGI processes on deployment machine"""
    stop_fcgi()
    with cd("%s/%s/config" % (DEPLOY_DIR,APP_NAME)):
        run("source ../../bin/activate && ./manage.py runfcgi pidfile=%(base_dir)s/process.file outlog=%(base_dir)s/out.log errlog=%(base_dir)s/err.log host=127.0.0.1 port=%(port)s " % {'port':FCGI_PORT,'base_dir':DEPLOY_DIR},shell=True)

def stop_fcgi():
    """Stop the FCGI processes on deployment machine"""
    with cd(DEPLOY_DIR):
        print("Stopping FCGI....")
        with settings(warn_only=True):
            result=run("cat process.file |xargs kill -9")
            if result.failed: print "No FCGI running"
        #if exists(APP_NAME):run("unlink %s" % APP_NAME)

def migrate_and_start():
    """Run syncdb, migrate(south) and start the FCGI processes on deployment machine"""
    with cd("%s/%s/config" % (DEPLOY_DIR,APP_NAME)):
        print("Migrating database...")
        run('source ../../bin/activate && ./manage.py syncdb')
        run('source ../../bin/activate && ./manage.py migrate')
        print("Restarting server....")
        run("source ../../bin/activate && ./manage.py runfcgi pidfile=%(base_dir)s/process.file outlog=%(base_dir)s/out.log errlog=%(base_dir)s/err.log host=127.0.0.1 port=%(port)s " % {'port':FCGI_PORT,'base_dir':DEPLOY_DIR},shell=True)
        print("...Done!'")

Features not supported yet

  • Reverting a bad deployment
  • Run tests before deployment
  • Support multi-level deployment - Staging/Production
  • Cross platform. Ubuntu only.

ZeroMQ and Django

One of my Django projects required to pull some substantial data from LinkedIn and Facebook when a user registered. I was scrambling for my options, as the data would be used almost immediately (in the next few requests by the user). I could not do it in the same request (the registration request). A scheduled job (Thanks for making this easy, django_extensions) was not really a good solution. 

Enter ZeroMQ. Here's what I did :

1. Install it. Details from this excellent piece - http://johanharjono.com/archives/633

In a nutshell,

sudo add-apt-repository ppa:chris-lea/zeromq
sudo apt-get update
sudo easy_install pyzmq
2. Look at the examples here - https://github.com/imatix/zguide/tree/master/examples/Python and write a simple PUSH/PULL server as a django command

from django.core.management import call_command
from django.core.mail import send_mail
from settings import ADMINS, ZMQ_URL
 
from django.core.management.base import BaseCommand
import zmq
 
class Command(BaseCommand):
    args = ''
    help = 'Start the ZMQ server'
 
    def handle(self, *args, **options):
 
        context = zmq.Context()
       # Create a pull socket
        socket = context.socket(zmq.PULL)
        print "Server started"
        # "tcp://*:5555"
        socket.connect(ZMQ_URL)
        while(True):
            message = socket.recv_json()
            try:
                call_command(message['command'],*message['args'],**message['options'])
            except Exception,e:
                print "ERROR",e
                #Send an email to an admin configured in settings.py
                send_mail("ZMQ Error:%s failed" % message['command'], "The command failed with %s arguments" % (message), ADMINS[0][1], [ADMINS[0][1]], True)
 
 
 

3. Write the client utility to send messages

import zmq
def send_command(command,*args,**options):
    context = zmq.Context()
    socket = context.socket(zmq.PUSH)
    socket.bind (ZMQ_URL)
    socket.send_json(dict(command=command, args=args, options=options))
    socket.close()
    context.term()

There! Now I can execute django commands asynchronously by calling,

send_command("migrate")
send_command("send_email",to_addr,from_addr,subject)

It cannot get simpler than this!

Disclaimer : I am not an expert on ZeroMQ/pyzmq, and still learning the tricks. The connection cleanup code is not all that great, error handling is basic at best.

A better way : SaaS with Django and PostgreSQL Schemas

This is a continuation of the previous post. ibjoeb on Hacker News rightly pointed out the issues with creating one database per customer, and suggested using schemas. I completely agree. Less management issues, less resource consumption and just a better design.

There is a Django ticket to support schemas in a generic manner, but I’m doubtful if it is useful for schemas created on the fly. Accordingly, now, here are the changes I tried out to use schemas instead of databases :

settings.py

CONNECTION_PREPROCESSOR='common.routers.SetSearchPathPreprocessor'
MASTER_SCHEMA='public'
...
MIDDLEWARE_CLASSES = (
'middleware.threadlocal.SchemaMiddleware',
...)

Here we definee a connection preprocessor. Unlike the connection_created signal of Django, which is invoked every time a connection is created, this will be called every time a connection is looked up. The middleware is a standard middleware to make the currently active schema available throughout the stack

middleware.threadlocal

from django.utils.thread_support import currentThread
_schemas = {}

class SchemaMiddleware:

    def process_request(self, request):
         subdomain =request.get_host().split(".")[0]
         #Set up the Company model meta option db_table as explained here - http://stackoverflow.com/questions/1160598/how-to-use-schemas-in-django
         # This will make sure that queries on this model will always go to the master schema
         company=Company.objects.get(subdomain)
         set_schema(company.schema)
         return None

The middleware looks up the registration details by subdomain and sets the schema in the threadlocal

commons.router

from middleware.threadlocal import get_schema
from settings import MASTER_SCHEMA
class SetSearchPathPreprocessor(object):
    def process(self,cursor):
        schema= get_schema()
        if not schema: schema=MASTER_SCHEMA
        cursor.execute("SET search_path TO  %s" % schema)

The preprocessor is PostgreSQL specific and sets the search path. See PostgreSQL schemas. This is called when a connection is looked up. Note that using django.db.connection will break this code. The preferred way is to lookup by database alias. For example, django.db.connections[‘default’]

django.db.utils.ConnectionHandler

def __getitem__(self, alias):
    if alias in self._connections:
        return self._connections[alias]
    self.ensure_defaults(alias)
    db = self.databases[alias]
    backend = load_backend(db['ENGINE'])
    conn = backend.DatabaseWrapper(db, alias)
    try:
        from settings import CONNECTION_PREPROCESSOR
        preprocessor=load_class(CONNECTION_PREPROCESSOR)()
        preprocessor.process(conn.cursor())
    except ImportError:
        pass
    self._connections[alias] = conn

return conn


def load_class(path):
    i = path.rfind('.')
    from django.utils.importlib import import_module
    module, attr = path[:i], path[i+1:]
    try:
        mod = import_module(module)
    except ImportError, e:
        raise ImproperlyConfigured('Error importing class %s: "%s"' % (module, e))
    except ValueError, e:
        raise ImproperlyConfigured('Error class')
    try:
        return getattr(mod, attr)
    except AttributeError:
        raise ImproperlyConfigured('Module "%s" does not define class "%s"' % (module, attr))

Our old connection handler class, modified differently. The getitem tries to import a preprocessor. If it is available, it loads the class, instantiates it and calls the process method. In our case, the preprocessor is SetSearchPathPreprocessor, which will set the PostgreSQL search path.

Voila! Now you can switch to different schemas on the fly. Add a few migration command magic and you are set!

Thanks a lot, ibjoeb!

SaaS with Django and PostgreSQL

UPDATE: Based on comments on HackerNews, I have added another post which explains how to use a single database, and one schema per customer, rather than one database per customer.

The problem

I am building a B2B web application (Django +PostgreSQL) . I want to isolate customer’s data from one another. As in a typical multi-tenant architecture, one database per customer. A customer is typically a small/medium company. The customer signs up on www.mygreatservice.com and gets a subdomain – mygreatcompany.mygreatservice.com . How do I tell the Django application running www.mygreatservic.com that a new subdomain was created, a new database was created and for all requests coming for the subdomain mygreatcompany , use this new database?

The solution

I am using PostgreSQL and creating one database per customer. Django 1.2+ supports multiple databases, but all of databases must be predefined in settings.py. Here’s what am planning to do:

  1. Store the customer registration data in a master database. Generate a database name and store it as well. If the subdomain requested by the customer was “mygreatcompany”, the database name cound be “mygreatcompany_6a5f2d” – subdomain + random suffix.Similarly,the database username and password can be generated. All of this is stored in a Company model, on the master database

    class CompanyManager(models.Manager):
    
     """    This  manager always uses master db """
    
     def __init__(self):
         super(CompanyManager, self).__init__()
         self._db = MASTER_DB
    
     def db_manager(self, using=MASTER_DB):
         obj = copy.copy(self)
         obj._db = MASTER_DB
         return obj
    
     class Company(models.Model):
         contact_name=models.CharField(max_length=512)
         company_name=models.CharField(max_length=512)
         subdomain=models.CharField(max_length=64,unique=True)
         database_name=models.CharField(max_length=128,unique=True)
         database_user=models.CharField(max_length=128,unique=True)
         database_pwd=models.CharField(max_length=128)
         contact_email=models.EmailField()
         objects=CompanyManager()

    The manager is overridden such that it always uses the master DB (Okay, its not perfect yet, but you get the idea).

  2. Use a middleware to lookup the name of the new database, based on the subdomain.

    from django.utils.thread_support import currentThread
     _db = {}
    
     def set_db(db):
         _db[currentThread()] = db
    
     def get_db():
         return _db.get(currentThread(),None)
    
     class DBMiddleware:
         def process_request(self, request):
             subdomain =request.get_host().split(".")[0]
             company=Company.objects.get(subdomain)
             set_db({'database_name':company.database_name,'database_user':company.database_user,'database_pwd':company.database_pwd})
             return None

    Again, not perfect, and purists will balk at the use of thread locals.But hey, its a start, right?

  3. Next, tweak django a little, to retrieve a db connection based on this database name. utils.py in django.db holds the key.

    In the class ConnectionHandler :

    def __getitem__(self, alias):
         if alias in self._connections:
             return self._connections[alias]
         self.ensure_defaults(alias)
         db = self.databases[alias]
         backend = load_backend(db['ENGINE'])
         conn = backend.DatabaseWrapper(db, alias)
         self._connections[alias] = conn
         return conn

    This method returns the connection if its already there in the alais-connection map. Otherwise, it creates a connection by looking up the backend by alias name, loading it, and then creating a connection.

    This will not work for us since our database definitions and aliases are not predefined, but available elsewhere.

    So the new getitem looks like:

    def __getitem__(self, alias):
         if alias in self._connections:
             return self._connections[alias]
         #This is a new customer database,so make a copy of the default DB, set the alias, and other database details and store it in the dictionary of databases
         if not alias in self.databases:
             #Get db details from threadlocal. Aha! Call Pollution Control now!
             db_details=get_db()
             new_db = self.databases[DEFAULT_DB_ALIAS].copy()
             new_db['NAME']=db_details['db_name']
             new_db['USER']=db_details['db_user']
             new_db['PASSWORD']=db_details['db_pwd']
             self.databases[alias]=new_db
         self.ensure_defaults(alias)
         db = self.databases[alias]
         backend = load_backend(db['ENGINE'])
         conn = backend.DatabaseWrapper(db, alias)
         self._connections[alias] = conn
         return conn
  4. Finally, define a custom ConnectionRouter, to pick up the right alias,based on the subdomain

    from middleware.threadlocal import get_db
     class DynamicDBRouter(object):
    
         def db_for_read(self, model, **hints):
             return get_db()['database_name']
    
         def db_for_write(self, model, **hints):
             return get_db()['database_name']
    
         def allow_relation(self, obj1, obj2, **hints):
             return True
    
         def allow_syncdb(self, db, model):
             return True

If only Django let me define my own connection handler, just like it does for routers, middleware, and template loaders.

I have tested this on a dev environment and seems to work well, and with a custom command to migrate all databases at once (I use south ), life is easy.

However, I have to extensively test this on a multi-user environment to convince myself that this works as expected in a production environment.