The Role of Smart Caching in High Performance, Scalable Database Systems
One of the mantras of most mainstream Database Management Systems (DBMSs) is that the physical organization of the data should be hidden under the hood. The argument is that the user should not have to know anything about the underlying mechanics in order to store and retrieve data. This is a worthy goal, but, in reality, this task is often delegated to a trained Database Administrator.
When we architected ThingSpan, powered by Objectivity/DB, we took a different approach. We gave the application engineer the power to decide how to best cache, cluster and distribute data. However, once placed, ThingSpan presents a “Single Logical View” of the data. The kernel works out where data is stored and communicates with remote data servers if it isn’t on the client machine. Part 1 of this blog series describes the advantages of this approach.
The Logical and Physical Environments
The key thing about distributed database environments is to make disparate physical resources look like a single logical environment to clients. This can be done in multiple ways, such as hiding the databases behind a single server interface, using a federation layer that sends appropriate requests to multiple database servers, sometimes in parallel, or by making the physical resources appear to be a part of a single address space. Objectivity uses the latter model, termed a “Single Logical View”. We will return to this topic later.
Figure 1 - The Single Logical View
One of the primary goals of any database system is to make all of the data that an application requires appear to be in memory whenever needed. This is difficult because of the physics associated with standard storage devices.
A random memory access takes about 50 to 100 nanoseconds, whereas a random magnetic disk access takes around 4 milliseconds. A Solid State Disk can be 4 to 10 times faster than a conventional HDD. Even so, 1 millisecond is still more than 10,000 times slower than 100 nanoseconds.
Physical Input/Output Operations (I/Os) are the enemy of all database systems. RAM is still much more expensive than magnetic or optical storage (which is the slowest technology for random access). However, using as much RAM as possible to keep data available after it has been created and then “persisted” to disk, or after having been read from storage, can considerably speed up some operations. It can also slow things down, as I’ll demonstrate.
The area where the data is stored in memory is called a cache. It may be created in standard process/thread workspace or be a part of the hardware. It may be: basic; hierarchical; segmented by object size or usage type; or a combination of those techniques.
A basic cache is used as an intermediary storage area between the application and the persistent storage. It may be a single, contiguous area or a dynamically managed group of smaller areas.
Figure 2 - Basic Cache
A hierarchical cache exploits devices with varying performance characteristics and price points. Some of the cache may be located within the storage device or infrastructure. However, having separate cache managers that aren’t aware of the overall system performance and throughput goals can lead to problems. Initial benchmarking of ThingSpan on the ZFS file system infrastructure the throughput was less than we expected. Log analysis showed that the large caches in the I/O subsystem were queuing up data that was being rapidly streamed via the small ThingSpan caches. When the application committed the transaction there was a significant delay while the I/O system wrote the data to disk. Making the storage system caches smaller increased system throughput.
Figure 3 - Hierarchical Cache
Segmented Cache Based on Object Characteristics
Small objects may be densely packed within large chunks of pre-allocated memory, whereas large objects may be dynamically allocated their own segment of memory. Structures such as B-Tree indices may have the top layers of the tree locked in memory while other nodes are managed with LRU or MFU algorithms. Hash table buckets may be handled in a similar way.
Figure 4 - Segmented Cache Based on Object Size or Type
Segmented Cache Based on Data Usage
It often makes sense to segment the cache based on the way that the data is primarily used. Frequently accessed lookup tables may be given a dedicated area. A Write-thru cache can be small, buffering just enough incoming data to be ready for the next disk rotation that brings an empty space into reach.
A Least Recently Used (LRU) algorithm is useful for data that is being randomly accessed, but in logically related groups, such as customer and order details. A Most Frequently Used (MFU) algorithm is handy for preventing data that is very likely to be used periodically during a process from being discarded or written out to disk too soon. Tuning different cache management strategies can significantly impact the throughput and performance of a system.
Figure 5 - Segmented Cache Based on Data Usage
It is very hard for a centralized server cache to juggle all of the conflicting demands put on it by a mix of different applications. However, a distributed system, such as Objectivity’s ThingSpan, can have each client’s cache behave optimally for its usage. Objectivity/DB, the database component of ThingSpan, exploits Operating System, local and remote hardware caches (hierarchical) along with separate caches for small and large objects or system structures. It primarily uses LRU discard algorithms, but this can be managed by setting different cache sizes for each processing thread. The application designer can also set initial, incremental growth and maximum cache sizes. There is no point in allocating more cache than the amount of RAM available, or the Operating System will have to move memory pages to and from swap space.
As data can be cached locally, keeping data across transactions makes sense in many applications. For instance, the content of a map or a factory layout isn’t likely to change between transactions. So, ThingSpan uses a Lock Server process to indicate whether a particular database, or sub-section of a database (a container), has been updated since the data was read and cached. If not, the new transaction can use the cached data. Otherwise it has to re-read and cache it. This is handled automatically within the kernel and is an important part of the ThingSpan “smart caching” strategy.
Figure 6 - Cross-Transaction Caching
In this article we examined the need to reduce the number of I/Os required to perform database operations then discussed smart caching. In the next article in this series we’ll look at data clustering.