Weblog » Running Porcupine on SMP systems
The last days I have been wondering what it takes to have multiple Porcupine processes running on SMP systems, as the first step towards scalability. Since almost all of the modern CPUs are multi-core and due to the Python's Global Interpreter Lock, in order to take full advantage of a CPU's horsepower is to have multiple Porcupine processes (sub-processes to be precise) all accessing the same database.
The good news is that Berkeley DB allows different processes accessing the same database. But the current implementation has a minor fault; the database environment is opened using the db.DB_RECOVER flag. Each time Porcupine starts a database recovery is attempted. The problem with this procedure is that it can only be initiated from a single threaded environment (see what happens if you try to start Porcupine twice without having stopped the first instance). Therefore, database recovery should only be performed using an external script after having stopped all Porcupine services (processes).
The next thing I had to consider is what gets shared between these processes. The ideal scenario says "nothing". The first issue that I need to deal with is the session manager. Since the Porcupine session manager is extensible by sub-classing the GenericSessionManager class, then one can write a new session manager that persists the sessions in the database. Another alternative is to design a new session manager that keeps the session state inside cookies. The latter has two main drawbacks; it requires cookies to be enabled on the browser side and the length of each request is growing proportionally along with amount of data written. But on the other hand, no disk access.
Last but not least, the object cache. The current object cache is useless when we are talking about multiple processes. The memcached project seems to be the right fit although not as portable as writing a native Python alternative.