Learn how NuoDB can help you get it all. From scale-out performance, continuous availability, and geo-distribution to multi-tenancy and no knobs administration. All with SQL and ACID guarantees!
(Sam): The presentation is our illustration of our technology and our solution as we look at SQL in the cloud. A few administration points to mention as we go through the presentation: everyone should see a control panel on the right? Everyone is going to be in mute mode to make sure we optimize sound quality. We encourage questions and will have various Q and A sessions, one slightly in the middle and the other one at the end. Any questions you would like to have addressed, please fill out the question section and post that to the control panel. Myself, who is the organizer, will take that information down, and we will work to address questions as possible as we go through the presentation.
Again, this is NuoDB’s SQL in the cloud presentation. I also have my colleague, Boris Bulanov, on the line with me. Boris is our vice president of technology, has many, many, many years of distributed technology, database technology experience, and as we go through this presentation, Boris will hit many of the technical elements of our product solution in addressing architecture components of what we bring to the table in a product of a database that is built for supporting SQL in the cloud. Boris, can we go to the first slide?
(Boris Bulanov): Sam, can you see it? It should be there.
(Sam): Nope, it’s not going yet. There we go. Just a quick overview of who we are as a company. NuoDB is a next-generation web-scale database built for the cloud. We have defined and have tackled many of the complex issues that we see people who are moving to cloud computing or want to leverage distributed IT infrastructure, the problems that they have with historic database architectures. And as we go through this presentation, you should get a very firm feel of what we’ve solved, our approach in solving them, and the solution we’re bringing to bear in the market.
Our first version of the product was released in January 2013. In that time, we’ve promoted a very aggressive free download community, and that community currently numbers a little around 13,000. So we’ve got a strong presence of people who have the technology out in the market, using the technology, and giving us feedback as we move through more mature phase-- releases of our product. In the short time, we’ve been a recipient of numerous industry awards. The one I like to highlight the most is, we are being tracked by Gartner in the Magic Quadrant, which is no small feat, considering the time frame that we’ve launched our product in the market, and what we’ve been able to do in that short time.
Another thing I like to talk about NuoDB, there’s a lot of activity in the market with a lot of niche vendors and people trying to approach certain different problems of the database problem in cloud computing. The one thing we really like to tout in our structure as a company is our deep database DNA, as we call it, the management team and our investors. And just a note on that, in our investor class, we have three former CEOs of former database companies, represented by Ingres, Sybase, and Informix, and the management team itself has a strong history of distributed database computing expertise. That enables us to look at the problem of bringing a fully functional RDBMS system, SQL-based, to the market, based on years of learning, years of expertise, and also understanding the market as it exists today.
And we are headquartered in Cambridge, Massachusetts, at this time. And I think that kind of describes us as a company. We’re -- we see ourselves as innovators. We see ourselves as very agile and thought leaders in a very complex, changing marketplace.
I’m going to turn this over to Boris, who’s going to address the topics that we’re going to go through today. And again, as we go through the rest of the presentation, please, in addressing any questions, note those questions to the question panel. I will be collecting those, and then we can address those in Q and A as that session comes in the presentation. Boris?
(Boris Bulanov): Yes, thank you Sam. Great introduction. Let me quickly overview what we’re going to be talking about in the next hour or so. We’ll touch on the market. That will take us a few minutes. And then I’m going to cover a couple of customer examples, which are typical customers of NuoDB. And most of the time we’re going to spend, probably half an hour or a little bit more than that, describing the architecture. And that’s a very elegant -- you know, NuoDB have taken a very elegant approach to the architecture. It’s, well -- everybody will understand what it does and how it does it. And that becomes sort of the lynchpin for understanding other kinds of capabilities that we are pursuing as a database. And those are really the ones which are true differentiators of what NuoDB does in the market and for our applications. And then towards the end, we’re going to have a short Q&A question, and as Sam said, please contribute your questions all throughout, and then we’ll (inaudible) some of those either in the middle of the session or towards the end of it.
So before we dive into the actual meat of our discussion, what I wanted to do is step back, a little bit of an overview of the overall database market as it exists today. If you’re listening to us, clearly you have some background in databases, and interest, and we appreciate your interest and time. But if you look at the history of databases, clearly the majority of interesting stuff happened early on, and then relational databases really paved the way for databases to become probably the most prolific, the most successful applications in software history, right?
So a little bit more than 30 years ago, relational databases were first described as a research paper, and then implemented by a number of companies, and for the last 30 years, they are the engines of a lot of applications, of a lot of systems. And generally what became very successful with relational databases is this notion that SQL is the language. It’s a very simple language that people can understand and relate to it in a standard way. So as you go from one environment to another environment, the knowledge that you have is actually -- you know, can be transported very easily. Also, what’s very important about relational architecture is the fact that application code is actually decoupled from the knowledge of how data is stored, how data is manipulated, and therefore, applications -- application developers -- can really concentrate on what needs to be done, rather than how the actual storage model is implemented. And that gave relational databases a tremendous boost, tremendous acceptance and ubiquity.
And just to add to this kind of, the list of features, the notion of transactions, the fact that whatever an application does actually is guaranteed to be recorded, and it is recoverable. So whether it’s, you know, banking transactions or telecommunication transactions, the functionality of the databases is critical, of relational databases.
And then probably 10 or so years ago, there’s another trend in databases which has emerged. And most of you know about this. It’s called NoSQL. And essentially what NoSQL databases have done, they’ve pretty much disregarded most of the functionality from the relational databases, such as SQL, such as transactions, and they concentrate on a very important aspect of database systems, which is scalability. And they’ve reached wonderful results with that. Today, NoSQL market segment is very vibrant. Mostly it’s popular with internet-scale applications and systems, like Google, Facebook, Yahoo, but also a lot of enterprises are starting to adopt NoSQL technology. The challenges of NoSQL is that, because it doesn’t have that standard SQL interface, it’s much more difficult to create applications which are ubiquitous, right? It’s more difficult to create an ecosystem of users, and applications, and tools which will exist around a relational database. But nevertheless, NoSQL market is very, very successful.
And lastly, in the last couple of years, a new market segment is emerging. And sometimes people refer to it as NewSQL. Really what NewSQL marketplace is pursuing, is to be able to go back to the sort of roots of relational database, and implement the SQL layer in its old glory and functionality, but at the same time provide this kind of scalability that NoSQL implementations achieved so well, and therefore have sort of like the best of both worlds, right? Have the relational interface, and at the same time, to be able to scale up. So this is the place. NewSQL market segment is really where NuoDB is playing. This is our market. And we consider that our technology is unique. It’s -- we’re the market leader, market segment leader. So throughout today, we will try to share with you our thoughts, why we think that is the case.
So switching gears a little bit. Also in the beginning, what I wanted to give you is a little bit of a -- it’s sort of like a frame of reference, if you will, for what is so unique about what NuoDB does, and what it is so important. This slide that you have in front of you attempts to capture that. So if you look at one of the most important characteristics of either application or system, scalability comes to mind, right? People need systems which scale, right? And scale means that as your load increases, as your size increases, you got to be able to --- your application needs to be able to scale with that load, with that size. And historically the approach to this challenge was something that is called a scale-up, right? So if you look at mainframes, if you look at traditional database design, if you need to process more data, the way to do this is to have more and more hardware, have stronger machines with more CPUs, bigger processors, you know, larger disks, and that is the architecture. And interestingly enough, if we look at the traditional relational database market, this is the approach that that classical design takes. How do I create a system which scales up?
Alternatively, also, for a very long time, roughly 30 years -- Sun Microsystems really invented this approach -- the alternative to this scale-up approach is a scale-out approach. Scale-out approach -- and then you see that diagram on the right side -- the approach is very different. The approach is to take advantage of the network of computers. Sun’s CEO, Scott McNealy, roughly 30 years ago, coined the phrase that “network is the computer” right? And that’s an excellent description of what the benefit of this kind of network of computers is. And if we’ll extend that analogy to network is the application, a very interesting picture emerges.
Let me go to this next slide and give you a little bit of a flavor for the next level of detail. If we’ll think about, how do we build applications? Applications generally come with a very specific stack or layers of functionality. Generally, you would have something which is called a web tier. Web tier allows browsers to connect and (inaudible) to connect to the application. And then you have application tier. That’s where application and business [logic?] goes. And then you have a database tier. That’s where data is persistent and safely stored. And then you have storage layer, which stores data on a disk, and data persists and becomes durable.
So interestingly enough, from the scale-out perspective, right -- and scale-out is that alternative model to gain additional capacity -- the web tier scales out very nicely, because it’s stateless, right? There is no state. There is no data in that tier. Everything can disappear and be reincarnated elsewhere.
Then there is an application server tier. And this tier also scales out very well. And generally, the approach to designing this application layer technology is something that is called stateless, right? So the idea is that you push the actual data management into the database, and then application search scales out very nicely. However, sometimes people decide that they need a little bit more functionality control of the data, and they start manipulating data in memory of the application layer. And that becomes very, you know, difficult to manage, problematic, but at the same time, you get certain benefits.
But what’s interesting about this picture is that the most important layer of scale-out in this application stack is the database layer. And also interestingly enough, because of the history and heritage of technology like relational databases, this tier is generally very difficult to scale out. And people go through all different types of tricks and, you know, trade-offs, to achieve what they need. But that is where the fundamental problem of, how do we build applications which scale out in a network for a distributed database system. This is where the challenge is, but at the same time, this is where the opportunity is. This is opportunity for innovation. This is, you know, out of all software systems, this is the place where, if we invest correctly and we come up with the right technology, that’s where we’re going to get the next, you know, huge leap in terms of how we build applications, how they scale, and so forth.
And, by the way, this is exactly where our designs, our approaches, work the best. And we will describe those through this presentation.
So, but if we step back and concentrate for a minute on this notion of, within that application stack, on a platform which is a distributed processing platform, or commodity resource platform -- within that platform, what are the approaches that people take to give the scale-out capabilities to the application server stack, to the databases? And this chart here roughly captures the approaches. Again, this is just sort of background.
One approach, which is very common, is referred to as a shared disk approach, right? In that approach, you have a system which is very well designed. It’s -- we’ll call it a high-fidelity system, right? There are multiple CPUs. There are multiple processes running on several machines. All of these processes and databases share the disk. Therefore, there is much less challenge in terms of, how do you replicate data across different processes? There’s fiber optics. It’s really a great piece of engineering, and it works well. And examples of that -- Oracle has a system that’s a combination of hardware and software called RAC, and that machine delivers a certain level of scale-out for multiple databases. IBM has a similar offering with DB2.
However, there are several drawbacks to this approach. One of them is that this system is scalable -- it can scale out only to a certain degree, right? There is a certain number of processors and database instances that you can put in that box. And secondly, it’s very, very expensive, but it works very well. It works very well. It’s somewhat difficult to manage -- in terms of failure -- but the system operates nicely. The final drawback is that it’s complex and it’s expensive.
Another approach, which we can see in the next column, is something that, you know, 90% of today’s approaches, and vendors, and applications take, in order to manage these scale-out capabilities, right? And generally, these types of approaches -- there are several types of architectures. Sometimes some of them are referred to as sharded approach or shared-nothing approach. In that case, data is separated into pools, which are really not related to each other, but they contain certain subsets of data that can be accessed very efficiently.
But there are multiple subsets, and therefore you have to play different kinds of games in terms of providing some sort of federated layer and caching layer on the top of those pools of data, and that becomes difficult. But that said, this is still a workable approach. It’s not a generic approach, in the sense that you always compromise something to achieve a certain goal, and that’s okay. If you don’t have any other solution, this solution works the best, right? And if you look at the NoSQL market and some of the players in the NewSQL market, this is the approach that is taken. And again, people generally refer to it as sharded approach.
Then to the next column to the right, you have a -- we’ll call it a tier for completeness architectures. There’s a very interesting project that has been done by Google. In that case, Google have taken an approach which is very extensive, very interesting to both implement and actually roll out, and they’ve done it in order to support a very specific application, an application which actually contributes most of the revenues to Google, an application called AdWords. But essentially you have to be able to track all keywords and ads that Google displays around the globe and be able to properly monetize all of those keywords, and Google (inaudible) how. And for that, Google invented this phenomenal engine. They call it a planet-sized database, right?
It runs -- there are multiple challenges. They have to synchronize time, so the system is based on the atomic clocks and GPS to make sure that transactions can be done globally, correctly, and data is reliable. And Google invested a tremendous mind share and expense to do this. And it’s a very impressive system, but it’s really not a generally available system. It exists to support certain kinds of usage pattern. But on the other hand, you know, it’s a great example of how a relational database may be scaled up and may be designed, and support, you know, features or distribute systems such as [scale-up?], and such as distributed presence across the globe, and so forth. So very impressive system, and it relies, technically, on the notion of synchronous replication state between different instances of database servers.
And then as you move to the right, you see NuoDB, right? And this is really just an introduction in terms of comparison to other approaches. The approach we take, we call it durable distributed cache, and really the mechanism that underlies the approach is the ability to replicate data to multiple processes on demand in real time. So data is available, as we will see from the architecture slides, in multiple processes at once, it’s guaranteed to be consistent, and so forth. And that is really what we’re going to be talking about. So a very high-level introduction to what is NuoDB conceptually.
And really, it’s actually very easy to describe and explain what NuoDB does, right? NuoDB is a pure relational database. So everything you expect from Oracle, from, like, SQL, from SQL Server, you’ll find in NuoDB. And generally, on a very high level, we’ll list those types of features. It’s SQL support, ACID SQL support. You can, as a developer, as a user of technology, as a tool provider, you can rely on [standard?] SQL, and that is critically important. But also there is a native support for ACID transactions, and that’s the capability which is absolute critical for operational systems, for OLTP systems. Without that, the data doesn’t have reliability. When two people go to, you know, two different people go to branches of a bank or two ATMs and transfer money, you have to guarantee that the money left one account and ended up in another account. And the only way you can do it in a database system is by supporting these transactions. And that’s what, you know, that’s what NuoDB actually provides.
But that’s a very high-level view. On the back end, though, NuoDB is very different from any relational database you’ve seen so far, right? The back end of NuoDB is a distributed system, and therefore, looping back to this original discussion, really the place that NuoDB is addressing is, how do you scale out in an application stack, but you scale out the database. And there are very interesting observations about NuoDB architecture. If you’ll consider traditional relational designs, relational database designs, they all derive from the same patterns, from the design books, which were written 30 years ago. And now database companies essentially just change certain functionalities, change certain features, but the actual design is very fixed. It’s the same for many, many databases.
What creators of NuoDB have done, they actually started with a clean sheet of paper, right, and they designed this database bottom up, specifically for one purpose. And that is to provide a database for a commodity platform which can scale up and deliver functionalities, which we’ll talk about later on, which are consistent with this type of approach, with this type of architecture. And we’ll talk about those.
So, with that, let me move into a slightly different set of slides. What we wanted to do, because this is sort of an introductory session for NuoDB, we want to give you a little bit of a flavor, not only for the market it will play on, and the problem we’ll solve, but also who are our customers? Not all customers, but just typical customers. And we have -- I have three examples here, and hopefully they’ll describe the spectrum reasonably well.
One of our customers and actual partners is a customer called Dassault Systèmes. Dassault is probably not a household name. I happen to know it for a very long time. They are a leader in (inaudible), which is mechanical design, product design, and something called product lifecycle management. Historically, they’ve been spun off from IBM and have a phenomenal product called CATIA. And CATIA is being used around the world for various tasks, like designing airplanes and airplane wings, and designing cars, and designing machinery, submarines, you name it.
And in the past, Dassault competed head-to-head with a number of also very significant players. I don’t know if you’ve heard the names called Unigraphics, and Parametrics Technologies, and AutoCAD. Those are all players in that space, but Dassault, over the last number of years, really perfected their technology, and they are the, sort of, like the undisputed leader of this market. Also what is quite interesting is they are a very large company, the second-largest ISV in Europe, and the first largest is SAP, so you can imagine that being the second one is not that bad.
So the interesting thing about them is that the company is very engineering-focused. The engineers really run the company. And they’ve looked at the marketplace, and they wanted to really innovate and become even, you know, a bigger leader in their market segment. And the way to do this is not only to work with top companies, but also enable to provide these high-level services and tools to the rest of the market. Like, usually you have, in each market you have -- well, let’s say the market is product design. You have companies which produce, you know, high-end products, like airplanes, but also you have designers of clothing, and food packaging, and so forth. So ability to provide a platform and gain customers on this sort of tail end, if you will, smaller customers, into their mix of who they serve, is really a fundamental market-changing event. And if you look at the other markets, like, you know, you look at the market of Salesforce, sales automation, or you’re looking at market of movie rentals, one of the concerns that market leaders have is not to be so like Salesforce or Netflix by other people in their industry, because they (inaudible). So I could talk about them for a long time, but the point being is that, what they want to do, what this particular company, Dassault, is doing, they are moving very aggressively in the cloud, and they would like to be able to provide all of their services and products to a variety of different customers, large and small. And that’s the challenge.
And as they were doing this -- and the reason why this is very important is because that’s not a unique strategy for Dassault, but from the other examples and from a variety of our customers and prospects in industry, that is the really the trend that is driving people towards going in the cloud and making all of these kinds of investments. But one of the things that they observed is that the current database technology that they used -- and I will not name names, but they are the suspect that you probably would recognize -- they wouldn’t allow Dassault to grow and be dynamically competitive with their existing database technology, for exact reasons that I described before, but there are also other scalability reasons that we’ll talk about in a minute. But they’ve made -- they [loved?] NuoDB. They loved their architecture. They’ve done an extensive testing for a number of years. We’ve been working with Dassault for more than three years. And they decided that this is the technology for them to take them into the next phase of growth and ability to provide services as, you know, as a service in the cloud provider. And they’ve standardized NuoDB, and not only they became the adopter of our technology, using it throughout the enterprise, but also they invested in the company, and they wanted to accelerate certain features of our product, which are interesting to Dassault, and they’re funding a whole number of engineers for that purpose. So this is an example of a very large ISV who essentially bet their future on NuoDB. And that’s a great example.
So let me go quickly to another example, which is on a very different side of the business from Dassault, a company called DropShip Commerce. And what DropShip does, it provides outsourcing of supply chain, of a certain part of the supply chain. They have very large customers, such as Amazon and Home Depot. And they provide a very sophisticated system which allows them to host catalogs. It allows them to do things like, you know, reconcile orders and provide invoices. And the most critical part of what they do is the actual shipping. So if you imagine that, if you’re a manufacturer of shoes, and somebody buys your shoes online, that one way of doing business would be having a warehouse full of shoes, and then you would ship shoes from the warehouse. What DropShip Commerce allows businesses to do is to actually go directly to the manufacturing warehouse and ship it directly to the consumer, right? And therefore cutting out the inventory, the transfer cost, and so on, the transportation cost, and so forth.
And the challenge that DropShip Commerce has is very similar for Dassault, even though the business model is totally different. They do have large customers, like I mentioned, the caliber of Amazon and Home Depot, but also they have very small customers who rely on them for the same type of functionality. And that is really the approach that companies take to dominate a certain market segment. And again, just like in case with Dassault, the current relational database systems or NoSQL systems could not really solve this kind of challenges that this type of implementation entails, in an efficient way or economic way. And NuoDB did, and DropShip Commerce is a very successful user of our technology.
And the last example I wanted to give you very quickly is Platform28. And this is again a very similar example, where they have a range of very large customers as well as smaller customers, and a huge user base, but the use case and the pattern of what they’re trying to do is very simple. It’s a call center, right? They’re a call center in a cloud implementation. And the interesting thing about this implementation is that, again, it relies on high scale on one end and small scale on another end, depending on who the users are. And with a single platform, they were able to provide this kind of functionality, but again, for that, you need to scale out. You need that -- if you remember that application stack, right? Web tier, application tier, database, storage. You need to be able to scale out and scale in on all levels of application. So that’s the flexibility that NuoDB provides to the market.
So I wanted to pause here for a second. Sam, did you have any other comments or observations about what we just talked about? Sam, you online?
(Sam): Boris, I muted myself. No, not really. I think comments from the audience are that they’re eager to move forward to the technology component of the presentation.
(Boris Bulanov): Aha! Okay. Perfect, perfect. So let’s -- we’ll make them wait no more.
So, this is really the architecture side. And we can spend a little bit of time here, but as I said, architecture of NuoDB is very straightforward, very elegant, and it’s based on something that would be called a peer-to-peer architecture, right? And essentially, if you look at these boxes, which you see in the middle of the picture, as I mentioned, the distributed architecture is, in the case of NuoDB, entails many processes working together in a peer-to-peer fashion, okay. And these boxes in the middle -- you see, some of them are black, some of them are green -- are the representation of those processes that work together as a peer-to-peer network and deliver services, right?
Interestingly enough, a process within the NuoDB back end implementation can play one of two roles. One role is that the process serves actual SQL clients, and this is the role of processes you see on the top, the green ones. So SQL client, when an application connects to NuoDB, it’s going to be connected to one of those, we’ll call them transaction engines. But that’s really where the work is done, where the SQL [statement?] comes in. It’s [forced?]. It’s optimized. The query plan is created. The data caching is done. And these are the processes responsible for processing of data.
On the bottom of this box, you see a set of black boxes. And that’s the processes which play another role. And this role is the role of durability, so we call these processes storage managers. And what they are responsible for is writing data to the disk and reading data off the disk. Importantly, that these are the same processes, right, so these are the same executables. They just have to play different roles, and they’re considered to be peers of each other, right, which is very important. So it’s totally a peer-to-peer network.
Because this network of processes delivers, this is a database, and the main function -- one of the main functions it delivers is durability of data and proper handling of transactions. There is something called ACID in relational database terminology, which stands for atomicity, consistency, isolation, and durability. So one can think of this kind of network architecture, or peer-to-peer architecture, that those processes on the top of this diagram serve ACI, atomicity, consistency, and isolation, and the processes on the bottom of this picture actually interact with the storage, and they are responsible for durability. So this is a very high-level view of the processes.
The second very important aspect of this architecture is something that is called memory centricity, and that is a fundamental design decision and architectural intent of NuoDB. And once you understand them and appreciate that particular element of the architecture, I think everything falls into place. So in a storage-centric system, to contrast in-memory-centric system. In a storage-centric system, when a particular database process doesn’t have the data which is necessary to process a request in memory, a storage-centric system would go to the storage layer, to the disk, and retrieve that record, right? That’s a storage-centric system. In NuoDB’s in-memory-centric system, if the data element that is needed to process a certain request is not available in the memory of the process fulfilling that request, then the system would actually -- the process would actually reach out to its peer process, a neighbor, and take data from that. So rather than incur the storage read I/O operation, which was generally very expensive, the process would reach out over the network to a place where -- to a process which has the data loaded in memory and get it from there. And that totally changes the dynamic of how the system operates, right? So the idea is that, unlike in a storage-centric system, the center of gravity is the disk, the center of gravity for in-memory system is memory of the processes, right, of that network of peer processes, which work with each other, and they keep all this data in memory for all kinds of processes. So that’s the second element of the architecture, right, ability to be in memory centric.
And then just to touch on a couple other things. There is a blue box around those peer processes, and they would call it a management tier. So there is another layer of processes, very lightweight processes. We call them brokers and agents. And they really provide housekeeping functionality for the system. They connect application clients to a particular transaction engine, to a particular process which does an actual job. They collect information about how processes run. They, you know, they act in cases of network failures. They figure out who is -- who should be functioning and who should be taken offline, and so forth. So, but that management tier is very important. It’s just -- and we’ll talk about this -- just like core peer processes, this is also a full tolerant, no single point of failure system, and it works in conjunction with the core engines.
So I wanted to give you a quick example here in terms of what happens, how the system operates. The developer, or an application, sees NuoDB back end as a JDBC driver or as an ODBC driver, so it’s a normal SQL engine. You provide a URL that you point to an OODB back end, to a system, and it’s connected by the broker to appropriate transaction engine, and that’s where the actual session begins, right? So as application transfers the -- or issues the SQL request, it is obtained by the process-- by the transaction engine. It’s processed. In case of the data element is not found in the process itself, it will be digged up from one of the peers, peer processes, and the results of the processing is returned.
In case of the application doing an update or insert operation, a slightly different sequence of steps takes place, right? So again, the application connects to the proper transaction engine, the query is issued. Then the engine turns around, and then either opens transaction for just that particular statement, or that particular statement is done in the context of a longer running transaction, but the update is actually communicated to a storage manager and written to the disk, right? And only when a transaction engine gets a handshake or acknowledgement from a storage manager -- that’s the process responsible for durability -- that the data has been written to the disk, only then the application will get the control back, saying that the transaction was successfully [committed?].
So why is this important? It’s important because -- to reinforce this point that, on one hand, this is in-memory-centric system, but durability is key to the database functionality, and therefore one way to think about this is as a write-through memory system. So whenever you update the data, not only is it kept in memory, but it is also written to the disk in the reliable transactional manner.
So, with that, let me move on to the next slide and show you a slightly different take on what we’ve talked about, and this notion of why this idea of in memory centric distributed system is so key, and that’s really the right design for the next generation of databases. And the reason for why it is, (inaudible) just for the sake of our design. It is very elegant, it works, but what is the real value of this? And we like to think about our technology, and really concentrate on what kind of capabilities we provide, around five core capabilities, right? And we’ll go to certain detail in each one of them, but they are captured in the slide right here, right?
So really, I think in terms of the most fundamental thing is the scale-out performance. How do you horizontally scale out in an environment which is a commodity platform, right? That’s the first one. We’ll touch on a continuous availability. A key aspect of keeping these kinds of distributed systems up and running is not really (inaudible) availability. That’s not good enough. You have to be continuously available, and we’ll look how it’s done.
But then there’s a couple of other things which are key, geo-distribution, multi-tenancy, and this notion of, you know, very easy administration, low-touch, we’ll call it no-knobs administration. These are the capabilities which are not nice-to-haves, right? If you think about what database should do in the generation of systems, how it should support internet-scale applications in the cloud, these are the capabilities which are -- which, you know, anybody should be after. And we are pursuing those capabilities as a part of where we invest in the product, how we invest, how we work with our customers.
So let me quickly touch on a few of those concepts, because they’re very critical. So the first concept is this notion of scale-out performance. And I don’t know if you can see it well on your screens, but generally the way we’d demonstrate this notion is by running all types of standard benchmarks in applications, and custom applications. And this is the fact that you’re after. You can see that there’s a stepwise -- step-like diagram, right? And essentially what it depicts is, is the system running in certain capacity, simply by adding more resources on the back end. The system capacity increases. For instance, capacity may mean, you know, running more transactions, right? It may mean something else. In this particular case, this particular run, as you can see on the top, is performing two million -- over two million transactions per second, and that’s a stunning number, right? This is a mainframe or, you know, beyond mainframe scale numbers. And they’re achieved here. In this case, I don’t quite recall where this particular graph comes from, but it was actually performed on Amazon web services, you know, relegated 20, 30 servers on Amazon, and we were able to scale up a certain benchmark in this fashion, which is very remarkable.
So, another aspect of this is -- you can’t see that for sure, but there’s a little graph, left side, which also detects the notion of latency, right? The latency with these types of volumes is still millisecond latency, 10, 15, 20 milliseconds per request, which is, again, very remarkable.
Let me move on to the next slide -- or actually, we can actually spend a few minutes here to describe this in a little bit more detail. So how does scale-out work, right? We’ve talked about these architectural transaction engines. Remember they were -- this is actually a little subset of the architecture slide -- there were some engines on the top which were doing transactions. Simply by adding these types of processes to your system, the system gains capacity to process data, which is very significant. We can provision -- provision meaning you have a machine which either has NuoDB already installed on it, and NuoDB installation takes literally seconds -- but NuoDB may be preinstalled, or it may be an empty, what we call host, clean host. We can provision the host and start running a database on the host in a manner of seconds, right? And all the APIs are scripted. So this notion of a scale out on demand -- and not only scale out, but also scale in. When you decrease the number of transaction engines you have in the system, your capacity to process data decreases, and therefore you experience what is called elastic capacity, right? You can grow when you need to, and you can shrink it back when you need to, and that’s a very powerful capability.
The second point I wanted to spend a few minutes on is this notion of continuous availability, and that is absolutely critical. One of the fundamental differences between scale up and scale out is that scale out generally occurs in what nowadays people call commodity platforms. And commodity platforms can be things like, you know, a data center running virtual machines, or having sort of bare hardware, right, where you run certain processes directly on the machines. It could be private cloud or it could public cloud. So one of the characteristics of all of these environments is that it’s not high-fidelity environment. Therefore, virtual machines can fail. You can have network partitions. Disks can fail, because they’re much cheaper disks. So the database design should be such that it is -- it makes the system, the database, the entire application stack, durable, or reliable, in light of those types of failures, because those failures are much likelier to occur in the distributed system than they are on the single box, clearly.
And therefore that is one of the (inaudible) designs behind NuoDB architecture, is to be able to continuously operate, right? High availability is not enough. Continuous availability is what’s needed for this class of system. And that’s how architecture that we already talked about works in this kind of environment, right? So we have ability to actually either take certain processes, such as transactions and storage managers, offline for maintenance. That is the, sort of, the schedule, and you anticipate those kinds of events, like, for instance, upgrade from version to version. We’ll call it a rolling upgrade. So in this case, in NuoDB, you don’t have to take the system down. You don’t have to shut it down. System operates as you upgrade different elements of the system, and it’s designed in such a way that it can be comprised of servers and processes which actually interoperate, even though they are on different versions of software, as an example. But also the same kind of approach works for failures which are not -- you know, they’re catastrophic failures, right? The processors fail. Storage fails. The system just continues to operate, because, as we talked about the scale-in model, right, there’s nothing different about a transaction engine failure, you know, as compared to the scale-out, scale-in model. But at the same time, if a particular storage manager fails, or a disk fails, there is also ability to have multiple storage managers maintaining the durability of the data in multiple places, and therefore, you can always not only recover, but you can continue to operate in a mode which is really -- you know, doesn’t cause any interruptions.
So, let me go a little bit faster here, touch on a couple other points. The notion of geo-distribution is something that we hear from our customers very often, and it’s a very natural ability which comes, you know, from globalization of business in general. But if you look at banking, if you look at finance in general, if you look at (inaudible) or supply chains, they span the globe, right? And today, it’s really a nightmare to move data around the globe in a consistent manner that preserves the data, on one hand, and on the other hand, gives everybody the most recent version of the data. And just like what we’ve talked about by continuous availability, about continuous availability, and the outcome of this type of a (inaudible) architecture is that the system is continuously available. Another aspect of this design is that you can run this peer-to-peer network of processes in a geo-distributed manner, right? You can run it across multiple data centers. You can run it across geographies, and the right things happen, right? So the data moves from place to place, (inaudible), communicate, in-memory of a network to each other. They have the most up-to-date view of the data. Data may reside in multiple clients’ memory spaces, and it’s all managed correctly. Transactions are properly functioning and update data in all storage managers. But at the same time, we have certain optimizations for geo-distribution, and that’s also where, you know, there’s ongoing work to make these kinds of capabilities much more interesting and optimized.
But even now, you can run the single logical NuoDB database in multiple data centers, or in multiple geographies, and the data will be both read locally from the local copy of the storage or of the database, but at the same time, should be able to update as they’re done geographically, right? The system still behaves in exactly the same way as you would expect, with optimization where you have a read or a write which is local to you, which will go to your storage manager, and to your peer processes, and your locality, before they’ll scan the web. But again, very interesting functionality.
The next one I wanted to quickly touch on is this notion of multi-tenancy, and that is the process which allows you to -- On one hand, if you’ll think about the scale out -- scale out is the notion of, I have a huge demand for a particular database, and I want to just add more and more capacity to it. The flip side of that is, I am actually trying to shrink my footprint, and I would like to be able to run multiple databases on the smallest amount of resources. And very interestingly, NuoDB is architected in such a way that it can actually cover both ends of the spectrum. And without going into too much details, we’ve done joint work with HP, as an example, and ran really a tremendous number of databases in a single installation. In this case, it’s 72,000 databases -- thousand databases on a single system. So we can put databases to sleep, we can hibernate them, wake them up in milliseconds. There are multiple techniques we implemented in this multi-tenancy world that allow us to do very interesting things.
Also very importantly, in our multi-tenancy approach, unlike in the traditional approaches, where you have to either partition schemas and create additional columns, per-tenant columns in every table, or you have to go with, you know, so like MySQL approach, where you have a database per tenant. All approaches have a fundamental flaw, which doesn’t allow them to be very flexible. In the case of NuoDB, we can shrink demands to smaller footprint, and have huge density of databases on a particular piece of hardware or stack, right, orphan stack or something like that, or conversely, we can scale those databases out very effectively. And that’s a very interesting capability.
And the last one I wanted to touch on is also a fundamental key capability, which is also very logical, if you’ll think about this. If you look at a traditional database, this notion of database administrating and database management is pretty much, you know, whenever you think of relational database, you always think about DDH, right? They go hand in hand. However, if you will start thinking about the fact that you have to run, you know, hundreds and thousands of databases and processes across multiple, you know, business domains, if the fixed overhead of maintenance per process is significant, as you scale it out, it becomes prohibitively expensive. And that is really not something that you can design later on the top of a product. It has to be designed into the product from day one. And that’s what we call no-knobs administration. Essentially what it means is that generally, NuoDB databases need no feeding or configuration, hand-feeding or configuration. They just run. They’re all scripted. You can start and manage them through script interfaces.
But also what we’ve done is we’ve put together something that we called automation view of the databases. And that is a very high-level view of what you’d like the database to do and behave like, right, in terms of redundancy, in terms of geo-distribution, in terms of its core high-level behavior, and the kind of isolation you’d like to have. And this notion of template-driven view of databases is very powerful. If you have a database which already runs, and it runs on two hosts, or machines, or resources, and you’d like to make it redundant, right, all you have to do is just apply a different template to a running database, and the system automatically changes the shape and tautology of how database works. And that is really a very fundamental view of how we see this market of distributed database is progressing and emerging, from the perspective of not only being a great system to, you know, to scale out and handle a big load, but also to be operationally consistent with the environments which exist today in the cloud, and private cloud, public cloud, and data centers, which allow the operators of those environments to really control how database performs and is configured in a very low-touch, no-knobs kind of way.
So with that, I wanted to pretty much just step back. And just the last thing I wanted to mention is the fact that, you know, we’re a young company, but we’re starting to develop a very impressive list of customers -- not customers, but partners, across the spectrum. We’ll be definitely investing much more in the [space?], both from the perspective of bringing more and more system integrators on board, you know, relating to more and more technology partners, but you can imagine, with our type of relational interface, we’ll be concentrating on bringing more and more relational tools into the mix, so they can just operate with our environment. I’ve also worked quite closely with a number of technology providers, like IBM and HP, as well as cloud vendors, because we see tremendous uptake and interest from our customers in the cloud infrastructure.
So I’ve taken a little bit longer than I wanted in terms of actually going through this material, so unfortunately we will not have too much time to answer questions, but please, a couple of -- so, like housekeeping things, right? First of all, if you’re a developer and you haven’t tried our latest release, please do. It’s a fantastic release. We’re really proud of it. You clearly can just download it, no strings attached. And secondly, if you have questions other than this particular session, please fire off to firstname.lastname@example.org. Somebody will get back in touch with you very quickly. We promise that.
And with that, what I wanted to do is, first of all, thank you for your time and your attendance, but secondly, Sam, what do we have in terms of questions that we can fit into the next five minutes?
(Sam): Yeah, we have about four questions. I think we can address those in the time. I’m going to start with the first one that came through. The question is, how does on-demand replication work in NuoDB as compared to normal replication in databases like Mongo? Is it possible to have replication across data centers?
(Boris Bulanov): Great question. It’s -- I’ll probably be able to give just a short version of the answer, but if you’re interested, we can definitely connect after. But essentially, replication -- NuoDB architecture doesn’t have replication per se, but it’s an outcome of how the engines and storage managers are working with each other, right? The replication is an outcome -- natural replication (inaudible) architecture itself. So as the logical database would span two data centers, let’s say east and west. As the data is inserted or updated in one of the data centers, essentially the same data from a transaction engine is committed first to a storage manager which is local to that transaction engine, but at the same time, it’s moved into the storage manager which operates in a different data center, right? So if you update in the east, then the data will find its way on the west. And interestingly enough, that those storage managers actually know each other’s state, and will guarantee that even if the reading clients are located in different geographies, they’ll still see the consistent view. Sometimes the view may be slightly delayed, because there are laws of physics, and we cannot violate them, but the architecture itself is such that it presents the correct view of the data, even in a geo-distributed, multi-data center deployment. And more details, again, if you’re interested to follow up on this.
(Sam): From the same party, kind of a parallel question. How is data consistency ensured when there are multiple transaction engines working in conjunction with a single storage engine?
(Boris Bulanov): Right. So, because this is a peer-to-peer system, transaction engines, or actually all the processes in this peer system, they may not necessarily have all the data, and very likely not to have all the data that they need, in their own memory, but whenever -- but at the same time, they are continuously aware of each other’s state, right? And that’s really the trick. How do you propagate the metadata about where the data is, what state it is in, and then to get it? That is the metadata synchronization implementation that was done. And that’s really the secret sauce and the difficult part about implementation like that. How do you maintain coherency of multiple processes having data in memory, while not slowing it down? So the simple answer, that it does work correctly, but the details are clearly not as simplistic as we can explain them in one sentence.
(Sam): And the final question I have here that kind of is still within the same parallel. You hit upon some of this. I’m going to still ask the question, and maybe you can reframe it a little bit. Assume you replicate data between the durability processes to protect node failure. In that case, would you not have large latencies when the database is distributed over large distances?
(Boris Bulanov): Excellent question. And it actually brings out another important characteristic which I have not touched on before, is that the database, the storage manager, or any process for that manner, can run in any type of environment, right? So it can run in a data center. It can run in a cloud. It can run on your Mac or Windows machine. And interestingly enough, just like in general in distributed system design, if you look at designs of systems like Cassandra, and Hadoop, and so forth, the way that distributed systems gain reliability is by replicating, or by maintaining multiple copies of data in different places. That’s something referred to as K-safety.
So in the case of NuoDB, where we take advantage of the same kind of approach -- this is a distributed system with all kinds of design patterns that are well proven -- and therefore, the storage managers can actually be running in very different environments and provide different qualities of service. But again going back to this notion, because all the storage managers are still peers of each other, whenever something, somebody, one of the transaction engines or one of the applications, updates a certain record, that message is really communicated to multiple -- I would go as far as to say to all -- storage managers at once. The fact is that some storage managers will get this data faster than others.
And you can actually implement different degrees of what we call transaction commit protocol, which is application-driven scale, slider rule, if you will, between latency and reliability. So more storage managers have to acknowledge the fact that they’ve received and wrote your update to the disk. More latency you gain, or you experience more latency in response, but you gain greater and greater reliability, because you can commit data to 1 storage manager, to 5 storage managers, to 20 storage managers, right? And they have to acknowledge it. So it’s actual application’s choice. Application decides to balance this question of reliability and storage durability versus latency. And that scale is in the control of the application developer and configuration -- and the person who’s responsible for configuration.
Sam, unfortunately, I don’t think I’m going to have time for more answers, questions and answers, but these are great questions and great participation, so really looking forward to continuing the discussions.