Object Oriented Databases Concurrency
Concurrency: One of the best features of third-party database management systems is their ability to help users and applications work with commonly accessible data while providing predictable behavior to all clients.
Concurrency is usually managed by “locking” data and by using transactions to define when the locks are released. The transactions almost always must adhere to the ACID properties.2
In order to support the ACID properties, standard locking models include at least two obvious types of locks – read and update. Read locks can allow other readers to read data, but must restrict updaters from changing the data as they are reading it. Update locks must exclude access to the data that the updater is modifying.
Despite the seeming simplicity of the concept, there are many ways to implement concurrency that can have a dramatic effect on performance and confidence in the data.
One of the important implementation details is when the data is locked and how conflicts are resolved. Another important implementation detail is the granularity of the locks. It can be very important to know (if not control) how much data is being locked when an attribute is being read or modified.
Here again, physically organizing data by type can create severe inefficiencies. Almost all database management systems have coupled locking with the physical layouts of data for efficiency. This introduces another trade-off of poor choices.
If page-level locking is implemented, rows that have no relationship with each other than a common type will be locked together. For example, locking one customer record will effectively lock many customers.
Not only does this prevent access to the other customer records, knowing which customers are locked together is impossible because users and applications generally have no control of which customers are on the same page.
This unpredictability can be disastrous not only because one must handle cases where locks aren’t granted, but also because it can lead to deadlocks. The other choice is row-level locking, but this has obvious performance ramifications. If an application is going to read or update hundreds of thousands of rows in a second, it can mean that the same number of locks must be evaluated per second. Since the lock manager is by nature centralized, it also becomes a bottleneck.
The one-size-fits-all solutions generally deal with this trade-off with two strategies: Optimistic locking and short transactions. Although the release of an update lock must occur at or after the end of a transaction to maintain atomicity and isolation, when the update lock is initiated is much more open to implementation choice.
Optimistic locking assumes that the likelihood of a transaction reading a piece of data and that piece of data being updated at the same time is very small. Optimistic locking allows for the rows to be read without locking other updaters out for the length of the transaction. This strategy depends on short transactions, since the shorter a transaction is, the more likely this assumption will be true.
There’s a severe danger with this approach, however. Although the database will prevent “dirty” reads (reads of partially modified data), this does allow a user or application to violate the spirit, if not the letter, of the “Isolation” and “Durability” parts of the ACID properties by allowing updates on “stale” reads.
A common example is an employee (say Mary) who is given a $5 an hour raise at around the same time the whole company is given a 10% raise. Let’s say that transaction A (entered by one HR person) is responsible for the $5 raise and transaction B (entered by another HR person) is responsible for the 10% raise.
If transaction B reads Mary’s current salary after transaction A reads it, but before transaction A commits the $5 an hour change, transaction B will be using stale, obsolete data as the basis for the update. Not only will Mary lose the 10% increase on the extra $5, she might lose the $5 raise entirely, because transaction B may overwrite transaction A’s changes. This is known as “last one in wins” case and can cause serious problems.
The typical cure for this is a “SELECT FOR UPDATE” SQL statement. This causes pessimistic locking, where the read locks are obtained in advance of updating. Another cure for this is to set up a stored procedure that timestamps data so that it can be checked for staleness the time it’s committed. If the data is stale, the transaction is aborted.
If the database is supporting row level locking and there are a lot of updates, both of these strategies will lead to severe performance problems. This is compounded when users don’t want to or can’t keep their transactions short. Users may not be able to keep their transactions short if they need to ensure atomicity over a complex set of operations.
A simple example is that a bank does not want to transfer money from one account to another in two transactions just to keep them short. If the withdrawal from one account fails, the bank certainly doesn’t want the money deposited anyway. They want both operations (and possibly many more) in the same transaction.
This is where Objectivity/DB containers and pessimistic locking help solve this dilemma. Most applications want to lock more than an individual row at a time, not only for efficiency, but also logically. If you are locking a customer, it makes sense that you’d also lock his accounts. Standard database pagelevel locking doesn’t allow you to do this because pages are generally homogenous according to type.
Objectivity/DB containers, however, give users and applications a clear way to cluster objects together according to how they’re related. An Objectivity/DB user can simply cluster a customer and his accounts in the same container. If there’s a lock conflict, the application learns of it immediately and is able to handle the exception before more work is done.
Some SQL DBMS vendors offer “clustering”, but only in a very limited way and consequently rarely used. For example, Objectivity/DB allows clustering on an objectby-by-object basis, regardless of type. Other clustering features require objects to be of a limited number of specific types. Comparing the documentation on a vendor-by-vendor basis shows other substantial differences.
Not only does Objectivity/DB provide standard read and update locks, it also offers MROW (Multiple Readers, One Writer) mode, which is a form of “stale” reads except far safer and easier than using the optimistic locking approach.
As one may remember from the “Persistence” section, Objectivity/DB updates pages by writing the modified page, changing the page map, then releasing the old page to a free list.
To provide MROW mode, Objectivity/DB simply allows readers to read the old pages as the new ones are being created. By keeping reference counts on the old pages, Objectivity/DB can release them when the last reader has committed or refreshed their view.
This technique allows very high concurrency applications the ability to read existing data without waiting for updaters to finish. Because Objectivity/DB still maintains pessimistic locking and keeps track of the modified pages, the user will get notified immediately if the application tries to update data that has been modified. The user or application can then refresh the data and re-request the update privilege.
This fully prevents the “last one in wins” problem, since transaction B would be able to read Mary’s old salary, but would know immediately if transaction A updated the salary when transaction B attempted to make its change. The effect would be that transaction B would simply refresh the view (getting the updated salary), then apply the 10% increase. The application code is clean, there are no complicated stored procedures, the transaction completes quickly and no operation is lost to a race condition.
The end result is that Objectivity/DB users and applications have tremendous control and confidence in the integrity of their data and the ability to choose the length of their transactions based on what’s most logical and efficient for the application – not what’s least likely to cause errors.
