How NuoDB works
Arthur C. Clarke, the distinguished science fiction author, once said: “Any sufficiently advanced technology is indistinguishable from magic”. And a SQL database that can maintain ACID-compliant transactions, active-active, across a geo-distributed network sounds a bit like magic. But in reality it is a radically innovative solution to a major problem of database technologies.
For a deeper look under the hood of NuoDB, check out our Technical White Paper
NuoDB multi-layer architecture
The NuoDB architecture is split into three layers: an administrative tier, a transactional tier and a storage tier. The transactional and storage tiers are scaled separately and handle failure independently. Because of this, transactional throughput can be increased with no impact on where or how data is stored. Similarly, data can be stored in multiple locations with no effect on the application model.
This entirely new design, the Durable Distributed Cache (DDC), scales out and in dynamically, on commodity machines and VMs and has no single point of failure. Uniquely, it does all this while delivering full ACID transactional semantics.
Figure 2: Multi-level architecture
In a DDC, Transaction Engines (TE) manage client transactions. For example, a TE needs the record for Customer #17; it determines that the data for Customer #17 is cached on a number of other servers. It gets the data from the most responsive server, as a remote memory fetch. But TEs can only share data they already have cached; if a request cannot be served from cache, a Storage Manager (SM) will serve the data. SMs don’t support transactional connections to clients; they only manage the persistence of data and serving that up for the TEs.
All servers in the DDC architecture can request objects and supply objects. Any one TE has only a subset of all the objects in the database cached at any time, and can therefore only supply a subset of the database to other TEs. SMs, collectively, have all the objects and can supply any of them, but will be slower to supply objects that are not already in memory.
Resilience to Failure
Any data in a TE’s cache is either already committed to the disk (that is, safe on one or more SMs), or is part of an uncommitted transaction that will simply fail at the application level if the object goes away. This means that the database is resilient to the loss of TEs. You can shut a TE down or just unplug it, and the system does not lose data. It will lose throughput capacity of course until a replacement TE is spun up, and any partial transactions on the TE will be reported back to their client applications as failed transactions. If the client restarts the transaction it will be assigned to a different TE to proceed to completion. This means the DDC architecture is resilient to the loss of TEs.
SMs scale out as needed, in the same way that TEs do. Database designers use NuoDB’s Storage Groups to map sub-sets of the database to specific SMs. They may do this to satisfy residency requirements, to localize data with its main users or to enable scale out of data and I/O performance. Multiple SMs also mean the DDC architecture is resilient to the loss of individual or groups of SMs.
Geo-distributed ACID transactions
Transactions, specifically ACID transactions, are an enormously simplifying abstraction that allows application programmers to build their applications with very clean, high-level and well-defined data guarantees. Application programs are vastly simpler when they can trust an ACID-compliant database to look after their data. The DDC architecture adopts MVCC (multi-version concurrency control). MVCC confers a number of well-known benefits, most notable of which is that it allows a database to manage read/ write concurrency without centralized locks. In NuoDB, readers don’t block writers, and writers don’t block readers. Even more importantly, almost all inter-server communications are asynchronous, so network and server latencies are much less critical. A NuoDB instance can start TEs or SMs in a remote data center and connect it to the running database. Or it can start up some of the database servers in a private data center and the rest on a public cloud. Modern applications are distributed. Users of websites are spread across the globe. Mobile applications are geo-distributed by nature.
"Mobile applications need a unified database, running on a group of database servers in multiple locations."
These applications are not well served by a single monolithic database server. Nor are they well served by clusters of sharded, separate databases that do not cope well with roaming users. They need a unified database, running on a group of database servers in multiple locations. That gives them scale-out performance, data center failover and the potential to manage issues of data privacy and sovereignty.