You are here

A More Detailed Introduction to NuoDB's Cache

As Seth mentioned in his post, NuoDB’s Transaction Engines implement a form of caching. He gave a high-level overview of what’s going on, and compared NuoDB’s techniques with several other database cache-like approaches. In this post, I’m going to elaborate on Seth’s points from the point of view of an implementer firmly entrenched in the details.

What the Cache Is, Technically:

A NuoDB domain is a collection of processes. Specifically: one or more Transaction Engines, and one or more Storage Managers. The Storage Managers are all about the ‘D’ in ACID. The transaction engines get to worry about ‘A’, ‘C’ and ‘I’. At an abstract level, a transaction engine is a SQL front-end (query parser, compiler, optimizer and executor) sitting on top of an in-memory cache of distributed data objects. Between the two sits a small abstraction layer that handles mapping things from front-end-land (e.g. ‘Table foo’) to things in distributed-object-land (e.g. object 14). It’s important to understand how different these two pieces are, I often say that NuoDB is a transactional, consistent distributed object system that happens to understand SQL. The front end lives and breathes SQL, the back end lives and breathes distributed objects (for which the NuoDB-ism is ‘atoms’).

Each atom is distinct and independent. While it may require several atoms to execute a query, each atom is updated and read independently. Those of us implementing this part of the system take great pains to make sure that atoms never have to co-ordinate with one another. The reason for this is simple, if atoms don’t have to co-ordinate, then they can be thought of separately. They can be updated independently, process replication messages independently and managed by the caching system independently.An atom can reside on 0 or more Transaction Engines and 0 or more Storage Managers. By reside, I mean they’re present in memory. Every atom will have an on-disk presence in the Storage Managers, but if the atom isn’t being used for anything, it might not be in any node’s memory. Note that this is the only time when an atom is read from disk, when absolutely no NuoDB process has a copy in memory.  When a transaction engine (TE) needs an atom to process a query, it will fetch a copy from its nearest neighbor and start using it. When the TE needs to free up some memory, the cache management process starts up (sometimes referred to as ‘Atom Garbage Collection’) and evicts ‘cold’ atoms. That is, it drops atoms that are deemed no longer necessary (specifically, by using a variation of Least-Recently-Used (LRU)). That’s it, as long as an atom isn’t being actively used by a transaction executing on the TE, it is eligible for dropping. And if the application’s behavior changes, and the atom is needed again, a new copy will be fetched from the nearest neighbor. Those readers who’ve either used or implemented caches in the past probably have some questions, and I’ll do my best to address them here.

Question 1: ‘What Is Being Cached, Really?’  In a word, atoms. More specifically, instances of atoms. The cache is really a cache of distributed objects, and not a more traditional DB cache. In a ‘classic’ cache, what’s often being cached is ‘pages’ or ‘blocks’ of memory. These are usually the units of organization that the database has chunked its disk-based data structures into. NuoDB’s atoms are implemented to be fast and convenient in memory, and so are not constrained by disk layout considerations. Another approach of ‘classic’ caches is to cache results extracted from such a ‘page’ (often in something that looks like a key/value store). This means that a given datum has 2 representations, the representation in memory on the actual database machine and the representation in the cache. This makes issues like invalidation and cache coherence very complicated (some systems force the application developer to handle it themselves). NuoDB is only caching the atoms themselves, however each atom will be at most 50-60K in size (and many are considerably smaller).

Question 2: ‘What if the Database/Table Doesn’t fit in Memory?’ Each Transaction Engine only needs to have the subset of atoms that represent the current working set for all outstanding queries. Consider a table scan, even if the table is ginormous (say 1000x memory size), at any given moment the scan only has to have the current and next row in memory to perform a single iteration of the scan. In NuoDB that is at most 3 atoms (4 for an index scan), which is going to be approximately 150K. Additionally, the atoms are structured to have locality of reference so the cost of fetching an atom copy for a scan will be amortized over the next few thousand iterations. So, as the query is scanning the table, it pulls in the atoms it needs when it needs them. As memory pressure increases, the Atom GC process will kick in a evict the least recently used atom copies, but since the scan has already processed those atoms their eviction won’t cause any hiccoughs in the scan’s forward progress. In short, NuoDB is designed so that a query only needs a few atoms to process an increment of work, therefore a database can be much larger than any single TE’s memory, but still any TE can process queries against it.

Question 3: ‘What about Cache Coherence/Consistency?’ For those who have not made a deep study of caches, cache coherence is basically a term for consistency mechanisms used to keep multiple independent caches in sync with the backing store. In hardware, it’s the mechanisms that make sure that if one processor writes a value to main memory, other processors with their own caches will eventually see that updated value. For database caches, coherence boils down to how to implement database consistency across a set of independent caches. This is a huge problem, and is very difficult to solve (especially if one wishes to have the caches be as high-performing as possible). NuoDB side-steps this whole issue by delegating consistency concerns to the atoms themselves. Remember that atoms are designed to be independent from one another. What this means is that each atom keeps itself consistent with its peers (remote copies) on other nodes in the presence of local and remote updates. A dropped atom copy doesn’t have to worry about consistency, so the caching layer just has to make sure that it isn’t trying to drop something in the middle of being used. All the consistency is handled by the individual atoms themselves.

Question 4: ‘Is the Cache Write-Through or Write-Back?’ This is another cache-specific question, and it relates to how changes made against the cache are propagated to the ‘backing store’. In a classical database cache, this is a question about how a change made to a db cache on one machine eventually trickles back to the main database itself. With a classical system, this question is often deeply tied in with that of coherence, as changes (or information about them) may have to be propagated between caches as well as back to the main database itself. Many systems bypass this issue by making all the caches read-only. However, this isn’t a complete solution as changes really do need to be propagated to caches, so some kind of timeout or invalidation mechanism must exist even in a read-only cache. NuoDB side-steps this issue as well, by removing the concept of a ‘backing database’. Remember that all atoms are responsible for maintaining consistency with their peers. Remember too, that storage managers are responsible for durability and durability of atoms alone. Therefore, every change made to any transaction engine’s atom copy will be replicated to its peers (and rapidly, too). As long as storage managers are logically peers of all atoms, then they will get all such changes replicated to them and will be able to have fully updated atom copies (which they are constantly writing out in the background). NuoDB doesn’t have ‘authoritative’ or ‘master’ copies of atoms. Every atom copy is as good as any other, which means that there is no ‘backing store’ to worry about. Every TE’s local copy of the atom is good enough to execute transactions against, and we just rely on the in-built atom-by-atom replication/consistency mechanisms to keep all the copies in sync in a transactionally consistent way. NuoDB is concerned with durability, however that is achieved by having an atom copy in a storage manager that can receive the changes and then write them out to disk in a way most convenient for the storage manager.

Question 5: ‘Do Storage Managers Need a Copy of Every Atom in Memory?’The previous question mentioned that Storage Managers a ‘logical peers’ of every atom in the system. This doesn’t mean that they necessarily have a full copy of every atom in the system. If a Transaction Engine is a SQL processor sitting on top of an atom cache, then a Storage Manager is an atom cache sitting on top of an atom serializer. Storage Manager’s have finite memory constraints and they need to be managed with as much care as the transaction engine’s. Therefore, a cache management policy identical to the transaction engine’s is used to maximize the use of the SM’s memory. However, atom copies can’t be unilaterally purged from memory if there’s a transaction engine out there making changes to it, but there’s no sense in making every SM have an in-memory copy of every atom ever. The solution is: instead of dropping completely out of memory, the SM retains a tiny stub a few bytes in size that can be re-inflated on demand if that atom becomes ‘hot’ again. Note that re-inflating doesn’t necessarily mean restoring from durable storage. If another SM or TE has a copy, it may be faster to just copy that instead. Only if all copies in all TE’s are dropped, can the SM safely completely drop its in-memory copy of the atom.

Is NuoDB’s Cache a Traditional Cache? I hope that I’ve convinced you that NuoDB’s caching mechanisms have all the benefits of traditional database memory caches, but without the usual drawbacks. In fact, application developers don’t even have to know that underneath the hood of their Transaction Engine is a fancy distributed object cache. They don’t have to modify their application one bit to take advantage of it. I’ll summarize some of the more common caching drawbacks, and call out how NuoDB avoids them.

  1. No Manual/Application-level coherence and invalidation - NuoDB exposes a normal SQL interface to clients. All the caching logic is divorced from atom consistency logic, and therefore the complications of coherent caching are avoided. Therefore, no client code has to be harmed to support distributed caching.
  2. No Partition Specification/Manual Partitioning - Some distributed caches side-step consistency concerns by requiring that the data be partitioned so that each separate partition fits in a single memory cache. NuoDB doesn’t require the programmer to partition anything. The atoms are in memory where they are being used. Each atom is responsible for making sure that it’s consistent with its peers, so that each TE and SM just has to worry about operating against their in-memory atoms. Partitioning in NuoDB is implicit and dynamic. If one TE uses a different subset of the data from another, it will have a different collection of atoms. No synchronization/replication required. However, if some data becomes ‘hot’ everywhere, then atoms representing that data will be on multiple TE’s. No sweat, we handle that case too. What this means is that as the application changes over time, NuoDB changes in step. And, if you add or remove nodes, the system handles that dynamically and on the fly. No rebooting, no prolonged ‘sync’ or ‘repartitioning’ step.
  3. Database doesn’t have to be in memory - Some caching systems require that some node, somewhere have a copy of the database data in its memory. This means that the size of the database is limited to the sum total of all memory in the cluster of nodes. NuoDB has no such restriction. NuoDB will take advantage of all the memory you give it, but it only requires just enough memory to load the few atoms each running query needs at any given moment. Storage Managers make all data durable, so transaction engines can drop atoms with impunity and respond to changes in application behavior immediately. This has two important consequences. First, a NuoDB database is not limited to the amount of memory available in the domain. Second, because of the first consequence, this means that the number of active nodes for a database can easily grow and shrink to meet client demand. So if you add 10 nodes on a busy day, and your database doubles in size, it doesn’t mean that you can’t shrink back down to 2 nodes because now the database won’t fit in two node’s worth of memory. NuoDB imposes no restrictions to adding or subtracting transaction engines to a running database.

Well, that’s all I was planning on discussing about how NuoDB’s caching works. I’ll be turning over the discussion to Adam, who’ll be diving into specific differences between NuoDB and other systems. Adam, tag, you’re it!

Add new comment