Tuning Berkeley DB for high performance web applications

Posted by tkouts on 1 June 2009 | 1 Comments

Tags: berkeleydb tuning

Extensive stress tests of the new Porcupine version using different configurations for Berkeley DB have shown some interesting results. Here are some tips for configuring Berkeley DB in order to get the most out of it when used for web applications.

1. Use the DB_TXN_NOWAIT flag for transactional operations. This option minimizes the number of the write locks at a given time since transactions do not block waiting for the next lock to become available, but instead they are aborted releasing all the locks that they currently possess. This option also eliminates the need for a deadlock detector because no transaction will ever block waiting for a lock acquired by another, resulting in a deadlock-free database. Once a lock is not available the transaction is immediately aborted and then retried.
Of course this approach requires that you wrap all of your transactional operations in a special wrapper (a Python decorator) that is responsible for retrying each transaction whenever a lock is not granted for a given number of times using some kind of exponential/linear layoff between retries.

2. Drop the D out of ACID for transactions that need not to be durable by using the DB_TXN_NOSYNC flag. Such type of transactions include the session state or even better all transactions if you have some kind of synchrounous replication to multiple nodes for fail over capabilities.

3. For transactional cursors use an isolation level of 2 whenever possible by using the DB_READ_COMMITTED flag. This increases the overall write throughput because data items read by this type of cursors can be modified or deleted prior to the commit of the transaction for this cursor.

4. If you have many requests competing for updates on the same resource consider using a semaphore for limiting the maximum number of concurrent transactions per process to a specific number. Tests have shown that under these conditions and without a semaphore, single process server instances achieve a better throughput than the multi process instances. This is happening because on a single process server instance the maximum number of transactions is kept low enough for transactions not to compete hard one against another, resulting in less retries.

5. I guess that last one is well known. Keep that data and log files on separate disks.


Post your comment

Comments

  • Actually i agree with you This option also eliminates the need for a detector block the transaction because it will never be stuck waiting for a lock acquired by another, resulting in a database without blocking. Once the lock is unavailable the transaction is aborted immediately and then try again.

    Posted by Halloween costumes 2011, 13/05/2011 5:08pm (8 years ago)

RSS feed for comments on this page | RSS feed for all comments

Clicky Web Analytics