Figure 5: Durable Distributed Cache
New DDC Architecture Offers Comprehensive Solution
By stepping back and rethinking database design from the ground up, Jim Starkey, NuoDB’s technical founder, has come up with an innovative solution that makes very different trade-offs. It’s an entirely new design approach that called Durable Distributed Cache (DDC). The net effect is a system that scales-out/in dynamically on commodity machines and VMs, has no single point of failure, and delivers full ACID transactional semantics.
Let’s take a look at Starkey’s thought process and the DDC architecture.
Memory-Centric vs. Storage-Centric
The first insight that Starkey had was that all general-purpose relational databases to date have been architected around a storage-centric assumption, and that this is a fundamental problem when it comes to scaling out. In effect, database systems have been fancy file systems that arrange for concurrent read/ write access to disk-based files such that users do not trample on each other. The DDC architecture inverts that idea, imagining the database as a set of in-memory container objects that can overflow to disk if necessary and can be retained in backing stores for durability purposes. Memory-centric vs. storage-centric may sound like splitting hairs, but it turns out that it’s really significant. The reasons are best described by example.
Suppose you have a single, logical DDC database running on 50 servers (which is absolutely feasible to do with an ACID transactional DDC-based database). And suppose that at some moment Server 23 needs Object #17. In this case Server 23 might determine that Object #17 is instantiated in memory on seven other servers. It simply requests the object from the most responsive server – note that as the object was in memory the operation involved no disk IO – it was a remote memory fetch, which is an order of magnitude faster than going to disk.
You might ask about the case in which Object #17 does not exist in memory elsewhere. In the DDC architecture this is handled by certain servers “faking” they have all the objects in memory. In practice of course they are maintaining backing stores on disk, SSD or whatever they choose (in the NuoDB implementation they can use arbitrary Key/Value stores such as Amazon S3 or Hadoop HDFS). As it relates to supplying objects, these “backing store servers” behave exactly like the other servers except that they can’t guarantee the same response times.
So all servers in the DDC architecture can request objects and supply objects. They are peers in that sense (and in all other senses). Some servers have a subset of the objects at any given time, and can therefore only supply a subset of the database to other servers. Other servers have all the objects and can supply any of them, but will be slower to supply objects that are not resident in memory.
Let’s call the servers with a subset of the objects Transaction Engines (TEs), and the other servers Storage Managers (SMs). TEs are pure in memory servers that do not need use disks. They are autonomous and can unilaterally load and eject objects from memory according to their needs. Unlike TEs, SMs can’t just drop objects on the floor when they are finished with them; instead they must ensure that they are safely placed in durable storage.
For readers familiar with caching architectures you might have already recognized that these TEs are in effect a distributed DRAM cache, and the SMs are specialized TEs that ensure durability. Hence the name Durable Distributed Cache.
Resilience to Failure
It turns out that any object state that’s present on a TE is either already committed to the disk (i.e. safe on one or more SMs), or part of an uncommitted transaction that will simply fail at application level if the object goes away. This means that the database has the interesting property of being resilient to the loss of TEs. You can shut a TE down or just unplug it and the system does not lose data. It will lose throughput capacity of course, and any partial transactions on the TE will be reported to the application as failed transactions. But transactional applications are designed to handle transaction failure. If you reissue the transaction at the application level, it will be assigned to a different TE and will proceed to completion. So, what this means is the DDC architecture is resilient to the loss of TEs.
What about SMs? Recall that you can have as many SMs as you like. They are effectively just TEs that secretly stash away the objects in some durable store. And unless you configure it not to, each SM might as well store all the objects. Disks are cheap, which means that you have as many redundant copies of the whole database as you want. In consequence the DDC architecture is not only resilient to the loss of TEs, but also to the loss of SMs.
In fact, as long as you have at least one TE and one SM running, you still have a running database. Resilience to failure is one of the longstanding but unfulfilled promises of distributed transactional databases. The DDC architecture addresses this directly.
Learn more about how the NuoDB Distributed Database works in the Technical White Paper:
Elastic Scale-Out and Scale-In
What happens if I add a server to a DDC database? Think of the TE layer as a cache. If the new TE is given work to do, it will start asking for objects and doing the assigned work. It will also start serving objects to other TEs that need them. In fact the new TE is a true peer of the other TEs. Furthermore, if you were to shut down all of the other TEs, the database would still be running, and the new TE would be the only server doing transactional work.
Storage Manager’s, being specialized Transaction Engine’s, can also be added and shut down dynamically. If you add an “empty” (or stale) SM to a running database, it will cycle through the list of objects and load them into its durable store, fetching them from the most responsive place as is usual. Once it has all the objects it will raise its hand and take part as a peer to the other SMs. And, just as with the new TE described above, you can delete all other SMs once you have added the new SM. The system will keep running without missing a beat or losing any data.
So the bottom line is that the DDC architecture enables capacity on demand. You can elastically scale out the numbers of TEs and SMs and scale them back in again according to workload requirements. Capacity on demand is the second promise of distributed databases that is finally a reality with the DDC architecture.
The astute reader will no doubt be wondering about the hardest part of this distributed database problem, namely that we are talking about distributed transactional databases. Transactions, specifically ACID transactions, are an enormously simplifying abstraction that allows application programmers to build their applications with very clean, high-level and well-defined data guarantees. If I store my data in an ACID transactional database, I know that it will isolate my program from other programs, maintain data consistency, avoid partial failure of state changes and guarantee that stored data will still be there at a later date, irrespective of external factors. Application programs are vastly simpler when they can trust an ACID compliant database to look after their data, whatever the weather.
The DDC architecture adopts a model of append-only updates. Traditionally an update to a record in a database overwrites that record, and a deletion of a record removes the record. That may sound obvious, but there is another way, invented by Jim Starkey about 25 years ago. The idea is to create and maintain versions of everything. In this model you never do a destructive update or destructive delete. You only ever create new versions of records, and in the case of a delete, the new version is a record version that notes the record is no longer extant. This model is called MVCC (multi-version concurrency control), and it has a number of well-known benefits even in scale-up databases. MVCC has even greater benefits in distributed database architectures, including DDC.
There isn’t space here to cover MVCC in further detail but it is worth noting that one of the things it does is allow a DBMS to manage read/write concurrency without the use of traditional locks. For example, readers don’t block writers and writers don’t block readers. It also has some exotic features, including the ability to maintain a full history of the entire database. But as it relates to DDC and the distributed transactional database challenge, there is something very neat about MVCC. DDC leverages a distributed variety of MVCC in concert with DDC’s distributed object semantics that allows almost all the inter-server communications to be asynchronous.
The implications of DDC being asynchronous are far-reaching. In general it allows much higher utilization of system resources (cores, networks, disks, etc.) than synchronous models can. But specifically it allows the system to be fairly insensitive to network latencies, and to the location of the servers relative to each other. Or to put it a different way, it means you can start up your next TE or SM in a remote datacenter and connect it to the running database. Or you can start up half of the database servers in your datacenter and the other half on a public cloud.
Modern applications are distributed. Users of a particular website are usually spread across the globe. Mobile applications are geo-distributed by nature. Internet of Things (IoT) applications are connecting gazillions of consumer devices that could be anywhere at any time. None of these applications are well served by a single big database server in a single location, or even a cluster of smaller database servers in a single location. What they need is a single database running on a group of database servers in multiple datacenters (or cloud regions). That can give them higher performance, datacenter failover and the potential to manage issues of data privacy and sovereignty.
The third historical promise of distributed transactional database systems is geo-distribution. Along with the other major promises (resilience to failure and elastic scalability), geo-distribution has heretofore been an unattainable dream. The DDC architecture with its memory-centric distributed object model and its asynchronous inter-server protocols finally delivers on this capability.