eXtremeDB & Multi-Version Concurrency Control
In eXtremeDB version 4.0, McObject introduced an optional multi-version concurrency control (MVCC) transaction manager. MVCC offers an alternative to the “pessimistic” database locking of the original eXtremeDB MURSIW (MUltiple Reader, SIngle Writer) transaction manager, and can dramatically improve scalability and performance, especially in applications with one or more of the following characteristics:
- On-disk or hybrid (in-memory and on-disk) database storage
- Many tasks or processes concurrently modifying the database (versus read-only)
- Multi-core systems
Pessimistic database locking makes all or portions of the database unavailable to all but the single task that is updating it, thereby blocking other tasks. In practice, for an in-memory eXtremeDB database, this is often not an issue because transactions execute faster (in microseconds) than complex lock arbitration itself would take.
But even with the speed of an in-memory eXtremeDB database, serializing a large number of concurrent tasks writing to the database can be a performance disadvantage. And even a few concurrent tasks writing to much slower file system-based tables in a hybrid database could be problematic for the MURSIW transaction manager (even the fastest solid-state disks, or SSDs, are more than an order of magnitude slower than RAM).
In contrast, MVCC is an optimistic model in which no task or thread is ever blocked by another because each is given its own copy (version) of objects in the database to work with during a transaction. When a transaction is committed, its copy of objects it has modified replaces what is in the database. Because no explicit locks are ever required, and no task is ever blocked by another task with locks, MVCC can provide significantly faster performance and greater utilization of multiple CPUs/cores.
Under MVCC, when tasks want to update the same data at the same time, a conflict does arise, and a retry will be required by one or more tasks. However, an occasional retry is far better, in performance terms, than the guaranteed complex lock arbitration, and blocking, caused by pessimistic locking. Further, in most systems, conflicts under MVCC are infrequent because of the logically separate duties among tasks–that is, task A tends to work with a different set of data than tasks B, C and D, etc.
All editions of eXtremeDB from version 4.0 onward, including downloadable evaluation packages, include both the MURSIW transaction manager and the MVCC transaction manager.
MVCC In Operation
Figure 1 compares MVCC to pessimistic locking, in operation. The diagram shows three database tables, each with five rows, and three tasks that are reading and/or modifying certain rows of certain tables. Task 1 is modifying Table 1’s row 3 (T1R3) and Table 2’s row 5 (T2R5). Task 2 is modifying T3R1, T3R3 and T3R5. Task 3 is reading T3R3 and modifying T1R5 and T3R2. Note that there are two copies (versions) of T3R3: a copy in Task 2 and Task 3.
For purposes of this discussion, assume that all three tasks are started as close together in time as possible, but in the order Task 1, Task 2, Task 3. With MVCC, the three tasks run in parallel. With pessimistic locking, there are three possibilities: database locking, table locking and row locking. Each will exhibit different degrees of parallelism (but all less than MVCC).
Database locking: Task 1, Task 2 and Task 3 will be serialized. In other words, Task 1 will be granted access to the database while Task 2 and Task 3 are blocked as they “wait their turn.” When Task 1 completes its transaction, Task 2 will run while Task 3 continues to wait. Finally, Task 3 will run after Task 2 completes its transaction.
Table locking: Task 1 and Task 2 will run in parallel because Task 1 acquires locks on Table 1 and Table 2, while Task 2 acquires locks only on Table 3. Task 3 will block until Task 1 and Task 2 complete because it also needs a lock on Table 1 (which is locked by Task 1) and Table 3 (which is locked by Task 2). Task 3 will be blocked for the length of time required by Task 1 or Task 2, whichever is greater.
Row locking: Again, Task 1 and Task 2 will run in parallel because they operate on different tables, (hence on different rows). Task 3 will again block because Task 2 has a write lock on T3R3, which Task 3 wants to read.
MVCC Performance Test
Any serialization effectively defeats a multicore system, because all but one core will be idle with respect to the utilization of the shared database. However, strategies to maximize parallelism, such as MVCC or fine-grained locking, impose their own overhead. In the case of fine-grained locking (row locking) there is lock arbitration, which can be complex. In the case of MVCC, there is version management–creating object versions, merging them and discarding them.
So for MVCC to be justified, the gain in parallelism has to outweigh the additional processing overhead. To illustrate, the graphs in Figures 2, 3 and 4 show the relative performance of McObject’s eXtremeDB in-memory database system on identical multithreaded tests executed on a multicore system, using a multiple-reader, single-writer (MURSIW, or database-locking) transaction manager, and its multi-version concurrency control (MVCC) transaction manager.
Figure 3. eXtremeDB UPDATE performance with MVCC (red) vs. MURSIW (blue) transaction managers.
Figure 4. eXtremeDB DELETE performance with MVCC (red) vs. MURSIW (blue) transaction managers.
Note that the MURSIW transaction manager’s serialization of read-write transactions (emphasis on the SIngle Writer characteristic) in the above example explains the nearly-flat performance line of MURSIW transactions as the number of cores increases from 1 to 20. In other words, if eXtremeDB with the MURSIW transaction manager can achieve 700,000 transactions per second, it can do so with one thread (1 thread executing 700,000 transactions-per-second) or with 20 threads (20 threads executing 35,000 transactions-per-second each).
Cursor Iteration, Hash and Tree Search
McObject also compared performance of eXtremeDB’s MVCC and MURSIW transaction managers in performing three read-only operations: a cursor iteration using a database cursor to loop over every object in a table, as well as primary key searches using hash and tree indexes.
The results are very different from the write-intensive benchmark results, above. MURSIW will outperform MVCC for read-only operations because MURSIW is a very lightweight transaction manager. MVCC, on the other hand, has to make versions of objects, track them and discard them when they’re no longer referenced in an active transaction.
The choice of MURSIW or MVCC should be a function of the ratio of query transactions to insert/update/delete transactions for a given application. Almost all applications are a blend. Because MURSIW is more efficient for database reads, applications in which reads predominate are may be better-served by the MURSIW transaction manager. Conversely, as the proportion of read-write transactions increases, so does the likelihood that MVCC will improve performance.
Figure 5. Cursor iteration, or database performance using a cursor to loop over every object in a table.
Figure 6. This graph shows the number of objects per second that can be returned using a hash index search.
Figure 7. Performance (objects-per-second returned) using a tree index search.