Single, Logical System
One key way that the Domain abstraction supports higher availability is by providing the view of a single, logical system. When a SQL client connects to a database, it uses a single connection string as you would with any singleserver relational database. The connection string is a standard format that specifies the server and database name. In this case, however, the server is a Broker which will use its load-balancing policy to tell the client which TE it should connect to.
If the client loses its connection, it runs the same process and re-connects to the database. The fact that it may have been connected to a different TE is transparent because all TEs are equivalent. So when a TE’s host fails, or is simply turned off, the client continues to operate. Standard connection pools already handle attempts at reconnection, so applications run uninterrupted when hosts fail. Likewise, connection pools will periodically drop connections and re-establish them, so if new TEs are added to a database they will be used without the client having any knowledge of the change.
Figure 2. An administrator selects a database template outlining a minimally redundant system of two Transaction Engines (TEs) and two Storage Managers (SMs). NuoDB’s orchestration capabilities start a database meeting those requirements (two TEs and two SMs). NuoDB continues to monitor the created database to ensure it meets the service level agreement (SLA). If a TE becomes unavailable, NuoDB automatically adjusts to enforce the SLA by automatically starting up a new TE. The administrator is alerted to the activity but does not need to take action.
By using DNS rules, load-balancers, or other standard tools, a single name can be given to the collection of Brokers. Doing this, an application continues normally regardless of the hosts that fail or are added to the Domain. Likewise, the tools provided to monitor and manage a Domain see a logical view by connecting to any single Broker, so simple indirection provides for a continuously available point of management.
Online Upgrade & Migration
This view of a logical database provides a high degree of availability where clients continue to operate even as resources fail unexpectedly. This model means that expected failures are also transparent to clients. One common example of this is upgrade.
"This view of a logical database provides a high degree of availability where clients continue to operate even as resources fail unexpectedly."
Because all Domain and Database components are redundant, and because losing any single entity is transparent to an operating client, an administrator can run a rolling upgrade with no downtime. In this case, “upgrade” could mean changes to the hardware, operating system, local software, or even the NuoDB installation. It could also mean migrating to new networks or new hardware. By shutting down one service or host at a time, running the update, and then restoring that service or host, you can run a complete upgrade with no loss of availability. Because new TEs and SMs can be added at any time to a running database, an administrator can choose to preprovision additional capacity before rolling an upgrade so that there’s also no loss of capacity.
NuoDB is designed to run in a mixed-version deployment and support version-forward compatibility. For instance, if a Domain is set up using NuoDB Swifts Release 2.1, a rolling upgrade can take that deployment to NuoDB Larks Release 2.2 as described in the previous paragraph. The upgraded peers will wait to start using new features of the protocol until all peers have been upgraded. During the process, if there is any reason the upgrade cannot be completed, the new peers can be rolled back to the current version. Only when all peers are running the same version is the durable state updated, at which point the database is now running the new version of NuoDB.
Infrastructure and Representation Agnostic
The ability to react to failure or run live upgrades is sometimes made more difficult by requiring a homogeneous environment. NuoDB Domains support running on mixed operating systems and hardware. In addition to lowering costs, this often makes it much easier to react to failure by bringing new resources into the Domain to compensate.
Internally, NuoDB is also independent in its representation of data. While the front-end looks like a standard SQL database, the peer communication is built around objects that understand what role they play and where they are replicated within the system. This results in a simpler communications model and means that durability is focused on storing named objects, not the traditional blocks or pages that are tightly coupled with SQL structure. From an availability point of view, this has two significant benefits.
The first benefit is that SMs are storing to a key-value store. That can be a local file system, Amazon’s S3, HDFS or any number of other services that themselves provide higher availability and easier failover models than highend disk arrays. The second benefit is that a SQL schema is just a mapping from object representation to application structure, not something rigid in the on disk layout. When an application needs to change its data structure, it can do it in constant-time without having to take down the database or incur disk churn.
Service Level Agreements
The abstraction provided by the Domain has another important advantage for availability. By formalizing the provisioning model, collecting global statistics and exposing a single point for managing database peer processes, the Domain is a building block for automating database management. NuoDB exposes Templates, which are a formal way of defining Service Level Agreements. A user may run with predefined Templates or write their own. When a database is instantiated against a Template, the Domain takes care of starting appropriate processes, monitoring the state of the system, reacting to failure, and alerting the user if the system is an unfulfillable state.
This model guarantees minimum availability while letting the Domain decide how best to use its resources. In some cases, that may mean proactive changes to get ahead of likely failures. In other cases, such as multi-tenant deployments, it can mean completely shutting down databases that are not currently active. Called Hibernation, peer processes can always be re-started on-demand as long as they fit within the specified SLA. In this model, the database is still available even though it’s not using any resources.
NuoDB’s automation, resiliency, and disaster recovery features combined with its advanced distributed architecture means that it can easily be configured to deliver continuous availability. Many vendors talk about high availability, but few can leap the bar of continuous availability.
- Robin Bloor, CHIEF ANALYST, THE BLOOR GROUP
A critical element for true availability is sustaining complete failure to networks or data centers. Everything that has been described to this point works across physically distributed sites. A Domain or Database provides all of the same active, logical properties running in multiple locations. A complete data center can fail without any loss of data or global service availability. Obviously there is loss of capacity, but the architecture is designed to react to that efficiently. When the site is restored, the database will expand to pick up operations and the storage points will re-synchronize automatically.
Because of the heterogeneous nature of NuoDB operations, this global model extends to hybrid deployment. In one model, this may mean running across different public or private clouds to survive failure of any one service provider. In another model, it means running in one cloud but always keeping data stored in another for disaster cases. In either case, the service and its data are available in multiple locations and highly resilient to catastrophic failure.
Figure 3. A globally distributed database is running across three data centers. As the London data center fails due to a power outage, incoming database requests need to be rerouted. Ideally, these requests would be redirected to the most responsive database components (in this case, the nearest data center in New York), and users would naturally experience a degradation in database performance until London comes back online.
1 As of NuoDB Larks Release 2.2, all SMs archive the entire database. Future releases will introduce a new capability to partition storage, so that each SM may keep some or all of the database. See the section on Future Directions for more detail.