You are here

Quick Dive into NuoDB Architecture


Traditionally, relational databases were designed for scale-up architectures. Supporting more clients or higher throughput required an upgrade to a larger server. Until recently, this meant that implementing a read & write scale-out architecture either required a NoSQL database and abandoning traditional relational database benefits, or relying on sharding and explicit replication. There were no solutions that could scale-out and still provide complete ACID (Atomicity, Consistency, Isolation, and Durability) -compliant semantics. This tension is what inspired the NewSQL movement and ultimately led to today’s modern “elastic SQL” databases.

NuoDB is an elastic SQL database designed with distributed application deployment challenges in mind. It’s a full SQL solution that provides all the properties of ACID-compliant transactions and standard relational SQL language support. It’s also designed from the start as a distributed system that scales the way a cloud service has to scale, providing high availability and resiliency with no single points of failure. Different from traditional shared-disk or shared nothing architectures, NuoDB’s patented database presents a new kind of peer-to-peer, on-demand independence that yields high availability, low-latency, and a deployment model that is easy to manage.

This article highlights the key concepts and architectural differences that set NuoDB apart from traditional relational databases and even other elastic SQL databases.

The NuoDB Architecture

Multiple Independent Services

NuoDB splits the traditional monolithic database process into two independent services: a transactional processing service and a storage management service. Each service can be scaled independently. It also has an administration component. This section focuses on the transactional and storage processing services tiers that support application database activity.

Splitting the transactional and storage processing services is key to making a relational system scale. Traditionally, an SQL database is designed to synchronize an on-disk representation of data with an in-memory structure (often based on a B-tree data-structure). This tight coupling of processing and storage management results in a process that is hard to scale out. Separating these services allows for an architecture that can scale out without being as sensitive to disk throughput (as seen in shared-disk architectures) or requiring explicit sharding (as seen in shared-nothing architectures).

In NuoDB, durability is separated from transactional processing. These services are scaled separately and handle failure independently. Because of this, transactional throughput can be increased with no impact on where or how data is being stored. Similarly, data can be stored in multiple locations with no effect on the application model. Not only is this key to making a database scale, it enables NuoDB to scale on-demand and implement powerful automation models.

Figure 1: The architecture is made up of three independent services.Figure 1: The architecture is made up of three independent services.

The transaction service is responsible for maintaining Atomicity, Consistency, and Isolation in running transactions. It has no visibility into how data is being made durable. It is a purely in-memory tier, so it’s efficient as it has no connection to durability. The transaction service is an always-active, always consistent, on-demand cache.

The storage management service ensures Durability. It’s responsible for making data durable on commit and providing access to data when there’s a miss in the transactional cache. It does this through a set of peer-to-peer coordination messages.

The two services discussed above consist of processes running across an arbitrary number of hosts. NuoDB defines these services by running a single executable in one of two modes: as a Transaction Engine (TE) or a Storage Manager (SM). All processes are peers, with no single coordinator or point of failure and with no special configuration required at any of the hosts. Because there is only one executable, all peers know how to coordinate even when playing separate roles. We refer to TEs and SMs as Engines.

TEs accept SQL client connections, parsing and running SQL queries against cached data. All processes (SMs and TEs) communicate with each other over a simple peer-to-peer coordination protocol. When a TE takes a miss on its local cache, it can get the data it needs from any of its peers (either another TE that has the data in-cache or an SM that has access to the durable store).

This simple, flexible model makes bootstrapping, on demand scale-out, and live migration very easy. Starting and then scaling a database is simply a matter of choosing how many processes to run, where, and in which roles. The minimum ACID NuoDB database consists of two processes, one TE and one SM, running on the same host.

Starting with this minimal database, running a second TE on a second host doubles transactional throughput and provides transactional redundancy in the event of failure. When the new TE starts up, it mutually authenticates with the running processes, populates a few root objects in its cache, and then is available to take on transactional load. The whole process takes less than 100ms on typical systems. The two TEs have the same capabilities and are both active participants in the database.

Similarly, maintaining multiple, independent, durable copies of a database is done by starting more than one SM. A new SM can be started at any point, and will automatically synchronize with the running database before taking on an active role. Once synchronized, the new SM will maintain an active, consistent archive of the complete database.

Atoms: Internal Object Structure

The front-end of the transactional tier accepts SQL requests. Beneath that layer, all data is stored in and managed through objects called Atoms. Atoms are self coordinating objects that represent specific types of information (such as data, indexes or schemas). All data associated with a database, including the internal metadata, is represented through an Atom.

The rules of Atomicity, Consistency, and Isolation are applied to Atom interaction with no specific knowledge that the Atom contains SQL structure. The front-end of a TE is responsible for mapping SQL content to the associated Atoms, and likewise part of the optimizer’s responsibility is to understand this mapping and which Atoms are most immediately available.

Figure 2: A Transaction Engine has a client-facing layer that accepts SQL requests and internally drives transactions and communicates with its peers in a language-neutral form.Figure 2: A Transaction Engine has a client-facing layer that accepts SQL requests and internally drives transactions and communicates with its peers in a language-neutral form.

Atoms are chunks of the database that can range in size, unlike pages or other traditional on-disk structures that are fixed size. Atoms are also self-replicating, ensuring that an Atom’s state is consistent across Engines. The size of an Atom is chosen to help maximize efficiency of communication, minimize the number of objects tracked in-cache, and simplify the complexity of tracking changes.

Only required objects are pulled into a cache. Once an object is no longer needed, it can be dropped from the cache. A TE can request an object it needs from another TE cache any time. If a TE doesn’t have a given Atom in its cache, it doesn’t participate in cache update protocols for that Atom.

Data Durability

Abstracting all data into Atoms is done in part to simplify the durability model. Each Atom is stored as key-value pair file. By design, durability can be done with any file system that supports a CRUD interface and can hold a full archive of the database.

Figure 3: The SM handles caching and storage of Atoms to disk, including journal and archive management.Figure 3: The SM handles caching and storage of Atoms to disk, including journal and archive management.

In addition to tracking the canonical database state in its archive, SMs also maintain a journal of all updates. Because NuoDB uses MVCC, the journal is simply an append-only set of diffs, which in practice are quite small. Writing to and replaying from the journal is efficient.

Management Tier

Along with the two database services is a management service. As with databases, the management service is a collection of peer processes. These processes are called Brokers, and one runs on every host where a database could be active. Starting a management service on a host is a provisioning step; it makes the host available to run database processes and visible to the management environment. This collection of provisioned hosts is called a Management Domain.

A Domain is a management boundary. It defines the pool of resources available to run databases and the set of users with permission to manage those resources. In traditional terms, a DBA focuses on a given database and a systems administrator works at the management domain level.

Each Broker is manages a set of Engines. For physical server deployments, this is typically the Engines running on the same local host. A Broker is responsible for Engine process management (start/stop), monitoring, and configuration management. A Broker also has global view of all Brokers in the Domain, and therefore all processes, databases, and events that are useful from a monitoring point of view.

All Brokers have the same view of the Domain and the same management capabilities. So, like the Engines, there is no single point of failure at the management level as long as multiple Brokers are deployed. Provisioning a new host for a Domain is done by starting a new Broker peered to one of the existing Brokers.

Figure 4: An admin client sends a single management message to a Broker to start a process on some host. Once the TE is started, management messages flow back to all Brokers.Figure 4: An admin client sends a single management message to a Broker to start a process on some host. Once the TE is started, management messages flow back to all Brokers.

When an SQL client wants to communicate with a TE, it starts by connecting to a Broker. The Broker tells the client which TE to establish a connection with. This connection brokering is one of the key roles of a Broker, and means that a Broker is also a load-balancing point. Load-balancing policies are pluggable and can be implemented to support optimizations around key factors, like resource utilization or locality.

Just as an SQL programmer addresses a NuoDB database as a single, logical entity even though it’s distributed across many processes, a systems administrator addresses a Domain as a single, logical point of management. This is done through any of the Brokers. They provide a single place to connect to a Domain, manage and monitor databases, and ensure that the system is running as expected.


As software development organizations are moving to a cloud-deployment model, a new database architecture is needed. This article has covered the key architectural elements of NuoDB – the Elastic SQL database. To see NuoDB in action, watch the demo video at To try NuoDB out for yourself, NuoDB offers a free Community Edition. Simply go to

To read the full version of this article, download our Technical Whitepaper.



Martin KyselMartin has been part of NuoDB as a C++ engineer, scrum master and tech lead since 2015. His work on the intersection between Platform and SQL has given him ample opportunities to understand not only the strategic differentiators of NuoDB but also the commonalities with established players. Martin is passionate about technology and shares his knowledge via talks, articles and meetups. His 10+ years of experience span a variety of roles ranging from DB/network administration to high performance distributed software engineering.


What an attractive article! Thanks for sharing.

Hi Martin,

Thank you so much for the detailed architecture blog. I also read the technical whitepaper and it's quite interesting. I had few questions regarding the communication between TE and SM layers. Based on my reading I understand that TE layer manages query execution in terms of Atoms in a distributed catch and SM is a complete copy of the database.

Let's say we are sending thousands of write (INSERT) requests. In the scope of the active transaction a pending record now exists in-cache and messages are sent to the Storage Managers immediately. Once the transaction successfully commits the change is now visible in the TE’s cache. How long would this take to at least get acknowledgment from one SM in order for it to be available in the cache? Assume the Database size is very big and as far as my understanding SM holds a complete copy of the database so this would be writing INSERTS in a complete unsharded database. Wouldn't take this a long time? My assumption is if the database was horizontally scaled out at SM level, this would be faster.
Let's also say we're sending a read (SELECT) request against the big database. In the case where an Atom is no longer in cache at any TE this will also require to populate the cache by communicating with the Store Manager which is by querying against a complete database in SM? Wouldn't this hurt performance? As far as my understanding the purpose of SM layer is for high availability and durability.

So to conclude my questions, what I'm worried about is in a case where data is not available in TE cache then we need to communicate with SM which holds a complete copy of database. Wouldn't this be like querying directly against a big unsharded database? To my understanding from your Architecture Blog, scaling (horizontal distribution) only happens at TE layer and at times where TE doesn't have atoms available in cache how would this affect the performance? I'm thinking maybe I missed some big picture of this.

Thank you again and I'm really interested in the product so far except for the few questions I had. I will definitely recommend it for our Company once I resolve all my doubts.

Hi Nehmia, You ask a good set of questions. Let me explain them one by one. TL;DR: we do support manual sharding.
  • NuoDB supports different commit protocols that allow the user to select different levels of durability. In SAFE commit (which is the recommended protocol), all SMs needs to acknowledge a transaction before it is considered durable. This means that the commit latency is governed by the slowest SM in your network. You can select different commit protocols according to your use case, just be aware of the consequences.
  • As for the size of data in the SM. Since all tables are split into smaller chunks (atoms), the SM only needs to have the latest atom loaded (the one that is being appended to). This performs well if there are no indexes on the table or if the inserts are strictly increasing. Your writes will be limited by the disk bandwidth of the SMs. Adding more TEs won’t generally increase the bulk load speed, since they all contact the same set of SMs. Your commit latency is TE->SM network latency + journal disk write for commit message on SM + SM->TE network latency.
  • You can scale the SMs horizontally by using table partitioning. This means that each TE will only wait for a commit from SMs serving a subset of the data. This not only reduces the set of SMs that you need to wait for, but it also parallelizes the disk writes. While you still have to pay for the network round trip, you will decrease disk write latencies during high insert workloads.
Let us look into reads and different read scenarios.
  • Trivial scenario 1) the whole database fits into memory. In this case there is nothing you need to do. The whole DB is in the cache on both TEs and SMs.
  • Trivial scenario 2) the database can be sharded perfectly. Let us assume that all queries access a completely separate subset of data. No sharing is required. You can manually select TEs that handle particular queries and shard the DB on SM level as well. If you can keep your shards small enough to fit in memory, you will never have to fetch from cold cache.
  • Real scenario: Your queries access some subset of data across multiple shards. Atoms are always fetched from TEs first, before we load them from disk from cold storage. This means that frequently accessed data will be in memory somewhere in the cluster. Only infrequently accessed data needs to be fetched from disk. Again, sharding on the SM layer helps with parallelism.
Of course, if you can not shard the database in a way that fits in memory, you will eventually have to fetch atoms from disk. NuoDB pre-fetches atoms from SMs during index and table scans, which offsets some of the disk load speeds during large scans. If you have more questions, don’t hesitate to ask. Martin

hey nice post

NuoDB has a very interesting architecture. I have worked closely with in-memory and distributed databases for many years and I'm interested in how you have solved some of the challenges. For example:
1. Network latency. Compared to in-memory processing speeds even the best networks are many orders of magnitude slower. Every time you have to do anything over the network this can hugely impact performance (both latency and throughput) so in general the network is the 'weak spot' for distributed in-memory systems. Presumably the same is true for NuoDB? Do you leverage any technologies such as RDMA or RoCE to mitigate this? What about virtualisation/containerisation; virtualising network access is generally problematic in performance terms so again how do you mitigate this?
2. I don't see any reference to a membership service or a quorum service; how do you handle network partition/split brain scenarios to avoid inconsistency (or worse)?
3. Connection 'explosion'; in a distributed system often every 'node' (or process/service in NuoDB terms) needs at least one (and maybe many) logical connections to every other 'node'. This typically results in a lot of physical network connections and as the number of nodes increases the number of connections increases, often in a near-exponential manner. Since most (all) OS are not very good at handling huge numbers of network connections how do you mitigate this?
Cheers, Chris

I really wanted to send a small word to say thanks to you for the fantastic points you are writing on this site.

Add new comment