Replication? Is It Easy?
A guest blog by Dr. Robin Bloor, Bloor Research
Database replication is simple, right? You pick the tables to replicate, copy them and then apply new transactions to them (from a log file) to keep them up-to-date.
Actually, it’s not so simple. In fact most databases don’t do it well at all. We need to distinguish between data copying and sophisticated replication, because copying actually is simple.
But why would anyone want to replicate data? I think there are three reasons:
- For workload distribution. In essence you replicate all or part of a database to distribute query workloads.
- For geographic distribution. You keep data in the locations where it is used and replicate it to other sites.
- For resilience. You replicate the whole database and use it as a stand-by, in some DBMS solutions.
So you can copy all or part of a database and distribute it to one or more servers. You might do that and update the copies at the end of every day using the database log file. It is relatively simple. In fact you could organize this without any help from the database software.
It only becomes difficult when:
- You want the data to be completely up-to-date.
- You want to be able to recover automatically if anything fails.
If you provide a query service, ideally the data should be up-to-date. But in some business situations it won’t matter too much if it is a few minutes behind. And if the replica fails in some way, it may be OK for the business to wait for a while to recover, which should be possible using the database log file.
It is interesting to note here that, in the early days of the Internet, a great deal was achieved with simple replication. You could replicate the whole web site several times. Most users were just reading web pages and not changing anything. All traffic went through one or more web servers and web sites themselves were not being updated particularly frequently. Of course, there were problems with transactional sites, but response times did not need to be very fast, so there was leeway. This is less the case now.
Nevertheless, depending on circumstance, you can actually achieve quite a lot with fairly primitive replication. However, when you need a high service level, up-to-date information and failover, you are no longer in the world of simple replication.
Technically, the term “replication” really means that the copy is constantly being updated and is actually up-to-date to within a fraction of a second. The arrangement has to be able to failover automatically if any component fails. If this is your requirement, few databases will be able to deliver and those that can will only be able to do so to a given level of service. It is technically difficult to achieve.
So what are the business circumstances where you might wish to do this?
They all involve distributing data. Consider a retail chain that has outlets all over the US and changes product prices every day. It can distribute the daily prices by simple replication. However, if it wishes to change the prices every second it cannot.
There are more business situations like this than you might imagine. Every on-line auction or trading operation can fall into this category. When the market is centralized there is no need to distribute, but nowadays many markets are distributed across multiple times zones. So is the market for internet adverts. So, as it happens, is the market for airline tickets, hotel rooms and car rentals. And similar kinds of business situation arise in many global supply chains where local availability of transport or storage meets varying demand.
The Master Slave Problem
Replication is difficult because a transaction can take place at more than one site. If we consider the simple situation of just two sites, then for any table (the prices table, say) that needs to be updated, one database will be the “master” and the other, the “slave.” All transactions that update the table are first applied to master and then the slave table is updated. The problem is that the database cannot regard the transaction as complete until it knows that both copies are updated and hence there is no possibility of data inconsistency.
This is not the only drawback. All applications that wish to update the table must connect to the master, no matter where they are running. In practice, replication done in this way is slow and the in-built latency may be unwelcome to the business. And if the master site fails, then recovery is far more complicated than if only one database were running.
The situation becomes more complex when you add more sites that need replicated data. You can even have multi-master replication where different sites are the master site for different tables. This makes everything even more complex since every site has to behave both as a master and as a slave depending on what is happening. And of course recovery becomes more challenging. It gets worse if the sites a very distant from each - halfway round the world for example. Network latency can become a barrier when the database is trying to perform updates or send messages from a server, through a local network over a wide-area network to another server on another network. The delays due to protocol requirements, message queuing, transfer time and synchronization can easily become prohibitive.
NuoDB: Replication Built-In
NuoDB is the exception. In fact, it is unique in its approach to replication, because it isn’t organized in a master-slave way at all. It is peer-to-peer (also known as “active/active.”) Wherever a transaction arrives is where it is executed. NuoDB keeps a full copy of the database in every location, so in practice every table is replicated at every location. When a transaction is executed at one site, it is also sent to all other sites and these are updated asynchronously.
It is unlikely that there will be a clash where two different transactions at two different locations try to update the same record at the same time. But even if there is, NuoDB handles it. NuoDB never updates data, it simply creates a new version of the data that changed. And all records are time stamped. If there were a consistency clash then it can be resolved locally without any need to back-out of the transaction.
An interesting and very useful aspect of this capability is that it’s invisible. When NuoDB runs in more than one location, it replicates everything automatically. You do not need to configure it to do this. And because it doesn’t have to synchronize each transaction its replication is as fast as the network allows. The data at one site is unlikely to ever be more than a fraction of a second behind another site even if the two sites are on opposite sides of the planet.