Blog

Cloud Computing: It’s Not The Destination, It’s The Journey!

A Guest Blog By Dr. Robin Bloor, Bloor Research

For some companies, especially large ones, the big attraction of the cloud is cost. The simple fact is that data centers are expensive to build and expensive to run. Of course cloud data centers are expensive to run too, but the big cloud vendors got going at a propitious time. They got to choose where to locate their data centers - in cooler climates close to sources of cheap electricity in areas where labor was relatively inexpensive. If you compare such data centers to those that were built before the Internet began to change everything, for example by banks and other big IT users in the financial sector, it’s clear that there are huge cost savings to be had.

So why haven’t the big banks either moved a great deal of their applications to run in the public cloud, or perhaps better, built their own cloud data centers and gradually migrated applications from expensive data centers in places like New York and Los Angeles, to inexpensive data centers near hydroelectric power stations in Washington State?

The Advantage of the Green Field

There is an obvious explanation. It’s about flexibility. If you set up a public facing cloud infrastructure service, you get to decide what applications run in it. You can structure the service and pricing so that only the convenient and profitable applications run in your cloud. In short you get to choose the applications.

In reality, large financial sector companies don’t have many such applications that can be easily transported from the data center into the cloud. Many of their applications are running on mainframes or large Unix clusters - not the commodity servers of the cloud. And when you consider the applications that run quite happily on commodity x86 servers, migrating those applications is rarely a simple matter.

Database Intransigence

Moving an application from the data center to the cloud is not necessarily difficult. If it’s a stand-alone application that isn’t running 24x7, it’s simply a matter of copying. You set up a copy of the application environment. You create a database, you take a copy of the data and upload it, and when you’ve done all that, you’re probably good to go. Sure, you’ll need to test the configuration and maybe also application performance, but it’s likely to be fairly painless.

The situation is distinctly different when the database needs to be running all or most of the time. Such situations have become increasingly common in many areas of the financial sector. This is partly because the financial sector is global. It rarely sleeps and even when it takes a nap, it has to do so with one eye open, because the Internet is always awake.

This situation is exacerbated by the fact that nowadays there are more and more dependencies between applications. Some of these dependencies derive from regulatory requirements. When you are managing risk, you may need to be able to generate snap-shots of the state of multiple applications and data sets at exactly precise times - perhaps even globally. This naturally increases the availability requirement for applications.

Other dependencies derive from the way we do IT. In the past ten years or so we have built systems that share functionality through what is called Service Oriented Architecture. This software architecture reducing a great deal of duplication but it does so at the expense of creating application dependencies.

When you have direct dependencies across many applications, if you want to migrate some of them to the cloud, you probably need to move them all. You will want to avoid direct application-to-application calls going between the data center and the cloud. Consequently groups if applications often have to migrate together. The most difficult part of this migration is moving the databases.

The Data Journey

Moving data takes time. Moving a great deal of data can take considerable time. If the applications are running 24x7 or if the level of dependency between applications is high you have to be able to switch over from running in the data center to running in the cloud almost instantly. This takes some arranging.

The only strategy that is going to work effectively is one using database replication. In effect, you set up the data center databases to replicate their contents to database instances in the cloud. You can load the cloud databases at leisure from a full database backup and then gradually replicate all updates to the cloud databases as they occur. Once you have the two databases in step you can move all the applications and users into the cloud. You may even be able to move the applications gradually because you’ve got the data layer in sync from the get go.

It may not necessarily be plain sailing. Replication almost always places a significant extra load on a database - and thus the applications may slow down during the migration until they are fully operational in the cloud.

Truly Distributed Database

A relatively new database is NuoDB which offers a truly distributed capability and enables a slightly different approach. A good way to think about its capability is that it replicates between multiple sites, but it does so in a peer-to-peer way. Normally with database replication one database is the master and the other the slave. But with this product, transactions can execute on either copy of the database and the two copies are automatically kept in step. A product that can work in this manner - there may be more than one for all I know - would be perfect for migrating databases to (or from) the cloud.

However, the truly neat thing about this capability is that it will also enable an application and its database to permanently straddle the cloud. With such a mode of operation, you could choose to use cloud resources only at peak times. It would provide an unusual level of flexibility in managing infrastructure.

Such distributed capability is likely to become popular if it makes the use of the cloud more flexible, especially in areas like the financial sector and particularly banking, where data center costs are high and there is an obvious need to provide greater for infrastructure flexibility.

NuoDB Note: Thanks, Robin. Readers: we just posted an independent research study on the cloud in Financial Services. You can find it here. Take a look and let us know what you think about the study, Robin’s point-of-view or……..your point-of-view.

Q&A With NuoDB CTO On Big Data And Product Development

NuoDB CTO, Seth Proctor, sat down with Roberto Zicari of ODBMS.org to discuss big data, common customer use cases and the technical features currently under development at NuoDB.

Here are some of the highlights of the conversation.

What is your current product offering?

“NuoDB 2.0 is a webscale distributed database providing standard SQL capabilities for applications that need to operate in a cloud model. SQL ’92 support (joins, indexes, DDL, etc.) mean ACID transactions while our novel distributed architecture is designed for scale-out performance by adding or removing nodes on-demand, …”

Who are your current customers and how do they typically use your products?

“Customers use NuoDB today to support operational use-cases that need transactions, availability guarantees, active-active deployment and analytic capabilities. Often these are users with existing applications needing to migrate to the cloud or users who are building new applications designed from the start with a requirement for elasticity…”

What are the main technical features you are currently working on and why?

“…Starting with this core set of scale-out features, one major area of development for NuoDB is enhanced operational efficiency and performance capabilities for geo-distributed deployments.”

“…Because these kinds of deployments are complicated to manage, another key area of development is around automation… As NuoDB matures in the market this feature will continue to simplify the operational experience of running at scale.”

For more detail on product development, visit the NuoDB DevCenter and Tech Blog.

The full transcript can be found here.

Replication? Is It Easy?

A guest blog by Dr. Robin Bloor, Bloor Research

Database replication is simple, right? You pick the tables to replicate, copy them and then apply new transactions to them (from a log file) to keep them up-to-date.

Actually, it’s not so simple. In fact most databases don’t do it well at all. We need to distinguish between data copying and sophisticated replication, because copying actually is simple.

But why would anyone want to replicate data? I think there are three reasons:

  1. For workload distribution. In essence you replicate all or part of a database to distribute query workloads.
  2. For geographic distribution. You keep data in the locations where it is used and replicate it to other sites.
  3. For resilience. You replicate the whole database and use it as a stand-by, in some DBMS solutions.

Simple Replication

So you can copy all or part of a database and distribute it to one or more servers. You might do that and update the copies at the end of every day using the database log file. It is relatively simple. In fact you could organize this without any help from the database software.

It only becomes difficult when:

  1. You want the data to be completely up-to-date.
  2. You want to be able to recover automatically if anything fails.

If you provide a query service, ideally the data should be up-to-date. But in some business situations it won’t matter too much if it is a few minutes behind. And if the replica fails in some way, it may be OK for the business to wait for a while to recover, which should be possible using the database log file.

It is interesting to note here that, in the early days of the Internet, a great deal was achieved with simple replication. You could replicate the whole web site several times. Most users were just reading web pages and not changing anything. All traffic went through one or more web servers and web sites themselves were not being updated particularly frequently. Of course, there were problems with transactional sites, but response times did not need to be very fast, so there was leeway. This is less the case now.

Nevertheless, depending on circumstance, you can actually achieve quite a lot with fairly primitive replication. However, when you need a high service level, up-to-date information and failover, you are no longer in the world of simple replication.

Sophisticated Replication

Technically, the term “replication” really means that the copy is constantly being updated and is actually up-to-date to within a fraction of a second. The arrangement has to be able to failover automatically if any component fails. If this is your requirement, few databases will be able to deliver and those that can will only be able to do so to a given level of service. It is technically difficult to achieve.

So what are the business circumstances where you might wish to do this?

They all involve distributing data. Consider a retail chain that has outlets all over the US and changes product prices every day. It can distribute the daily prices by simple replication. However, if it wishes to change the prices every second it cannot.

There are more business situations like this than you might imagine. Every on-line auction or trading operation can fall into this category. When the market is centralized there is no need to distribute, but nowadays many markets are distributed across multiple times zones. So is the market for internet adverts. So, as it happens, is the market for airline tickets, hotel rooms and car rentals. And similar kinds of business situation arise in many global supply chains where local availability of transport or storage meets varying demand.

The Master Slave Problem

Replication is difficult because a transaction can take place at more than one site. If we consider the simple situation of just two sites, then for any table (the prices table, say) that needs to be updated, one database will be the “master” and the other, the “slave.” All transactions that update the table are first applied to master and then the slave table is updated. The problem is that the database cannot regard the transaction as complete until it knows that both copies are updated and hence there is no possibility of data inconsistency.

This is not the only drawback. All applications that wish to update the table must connect to the master, no matter where they are running. In practice, replication done in this way is slow and the in-built latency may be unwelcome to the business. And if the master site fails, then recovery is far more complicated than if only one database were running.

The situation becomes more complex when you add more sites that need replicated data. You can even have multi-master replication where different sites are the master site for different tables. This makes everything even more complex since every site has to behave both as a master and as a slave depending on what is happening. And of course recovery becomes more challenging. It gets worse if the sites a very distant from each - halfway round the world for example. Network latency can become a barrier when the database is trying to perform updates or send messages from a server, through a local network over a wide-area network to another server on another network. The delays due to protocol requirements, message queuing, transfer time and synchronization can easily become prohibitive.

NuoDB: Replication Built-In

NuoDB is the exception. In fact, it is unique in its approach to replication, because it isn’t organized in a master-slave way at all. It is peer-to-peer (also known as “active/active.”) Wherever a transaction arrives is where it is executed. NuoDB keeps a full copy of the database in every location, so in practice every table is replicated at every location. When a transaction is executed at one site, it is also sent to all other sites and these are updated asynchronously.

It is unlikely that there will be a clash where two different transactions at two different locations try to update the same record at the same time. But even if there is, NuoDB handles it. NuoDB never updates data, it simply creates a new version of the data that changed. And all records are time stamped. If there were a consistency clash then it can be resolved locally without any need to back-out of the transaction.

An interesting and very useful aspect of this capability is that it’s invisible. When NuoDB runs in more than one location, it replicates everything automatically. You do not need to configure it to do this. And because it doesn’t have to synchronize each transaction its replication is as fast as the network allows. The data at one site is unlikely to ever be more than a fraction of a second behind another site even if the two sites are on opposite sides of the planet.

Pages

NuoDB in 90 Seconds

NuoDB in 90 seconds from NuoDB on Vimeo.

What They Are Saying

"NuoDB is proof that innovation is continuing apace in the database market. Driven by seasoned database experts with an impressive track record, the company is now releasing a high-performance SQL-based large-scale-out transactional database that runs happily on commodity hardware. It will likely attract a good deal of attention."

Robin Bloor, Ph.D,
Chief Analyst & Cofounder,
The Bloor Group, Bloor Research

Archives