VP of Products, Ariff Kassam, discusses what makes an elastic SQL database an ideal solution for modern hybrid cloud applications at NYC Database Month.
M1: Our esteemed speaker this evening was formerly the vice president of Teradata, has 20 years of experience, and is currently the VP of product for NuoDB. Please welcome Ariff Kassam. (applause)
Ariff Kassam: Thank you, Eric. Thank you, everyone. Good evening. I also have three other colleagues from NuoDB; if you guys have questions later you can always find them out. It’s Bill, Christina, and Nik down front here.
I appreciated you taking the time today. This talk is all about elastic SQL.
So elastic SQL -- is it an oxymoron to reality? So the basis of the talk is the discussion and sort of the confluence between sort of two trends. General, traditional databases aren’t really suited for modern-day architectures -- elastic, containerized, distributed architectures that a lot of people are building applications for. However, NoSQL databases -- it’s hard to migrate existing legacy SQL applications to NoSQL databases due to sort of the transition between in the loss of SQL semantics as well as ACID semantics. So this discussion is around this emergent topic of what we are naming elastic SQL databases. And there’s a few other -- as you guys heard from the trivia questions -- a few other databases that are sort of doing the same sort of thing in this space. And we believe it’s an emergent space that allows the best of both worlds: the idea of being able to elastically scale your database layer while maintaining ACIDity transactions and SQL compatibility.
I’ll go through these couple slides pretty quick, right, just set-up slides. I’m sure a lot of people understand sort of the architecture trends that have been going on recently over the last few decades, right? Today we are in a world where everything is monetized, distributed, and in the cloud. You’re building applications based on architectures that are fundamentally going down smaller and smaller components -- these ideas of microservices that encapsulate a singular API set and singular service that then get strung along together through APIs. However, in this world, the database has been traditionally still monolithic, single system that doesn’t scale out and you have to scale it up, you don’t distribute it. It’s very hard to make it agile; it’s very hard to put it into containers. That whole paradigm doesn’t fit with how you guys are building applications today.
There have been obviously different ways that legacy databases have tried to address this. From a hardware perspective, obviously scale-up is always a way to sort of improve performance on your database. But you can also move to cluster databases, right? You can take a MySQL cluster, you can do other types of clustering where you’ve got sort of a shared disk architecture and scale out a few new nodes. There are databases out there like Oracle RAC, IBM pureScale, that have advanced shared memory capabilities that allow you to cluster multiple nodes. More for an HA perspective, but I’ve seen customers out there deploying three or four or five nodes on Oracle RAC for scalability. So those are advanced database technologies that allow you to do some scale-out, but you’re always left with a shared disk or single system or single disk environment that doesn’t scale out as you go across either an environment or into sort of a cloud or container type of environment.
From a software perspective, obviously there’s other ways to also address this, this lack of scalability and scale-out through software. All right, you have two-phase command, traditional way of distributing transactions across multiple servers. I’m sure that everyone here is aware of some of the limitations around two-phase commit protocols. And then there’s always deal with a sharded database, all right, taking a database and sharding it into multiple independent servers that could be distributed across multiple nodes to provide scale-out and availability. And again, sharding is a great technology, but it does put a lot of onus on the application to manage those shards. DBAs have to manage those shards as well. So it’s not a perfect solution -- it’s a good solution but it’s not a perfect solution.
By the way, if you guys have questions, feel free to ask. I don’t mind sort of taking questions as I present. There will be a question-and-answer period at the end of the talk if you want to save your questions.
M3: What is sharding?
Ariff Kassam: Sharding, for those who are unfamiliar with the term, is taking a traditional database and breaking it up into smaller databases where -- it’s another way of partitioning it. You can think about a partitioning database where, for example, if I’ve got a customer table, and the customer’s -- I can take A to K and put them on one shard, L to P on another shard, O to --
Ariff Kassam: Partitioning and sharding are sort of interchanged, but typically sharding means it’s the entire schema of the database in one shard, but parts of the data in that shard. That make sense?
So I’ve talked about traditional databases and how they -- the lack of scale and how we sort of talk though and address scale for traditional databases. Obviously there’s NoSQL databases, right? They were quite honestly born of the frustration of customers’ lack with -- the inability to scale their database. Right? So the whole NoSQL movement was to address the shortcomings of traditional SQL databases. They automatically shard, they automatically partition the data that is sent to it, right, but there’s all sorts of, again, different trade-offs. There’s typically a lack of SQL, there’s typically a lack of ACID or transactional semantics. Data management is different. Data management is typically on the onus of the application developer, right, the idea of schema or no schema, right? With relational databases, you obviously have to have a schema up front. You got to define that schema, you have to model the data, you have to be able to load data into a schema, and that takes time. NoSQL: schemaless. Right? Just dump it in there. Just dump the data under a key and I get a value, and it’s JSON structure or other document structure, and it’s just data is there. However, in order to use that data, somebody’s got to interpret it. Somebody needs to understand the format, needs to understand the fields, needs to know what that data is for. All right? So it just shifts the onus of schema management to that application. And I would contend that that actually is hard, because as applications evolve, the management of schema becomes a lot -- becomes very difficult. All right? Schema management becomes an issue. It’s great -- I would say it’s great for applications where the schema could be changing a lot. I mean, typically you’ve got these things in terms of IOT, in terms of sensors, where you’ll have a lot of sensors out there with a lot of different data formats, and it’s easy enough to have a schemaless environment where I’ve got all the different data formats in one database. So you don’t have to try and model the entire world and say “What kind of data could a sensor give me?” and model that in terms of a fixed schema. So for IOT and those types of environments, schemaless databases are great. But again, if you’re developing sort of operational applications where schema does not change significantly day to day or month to month, I would contend the schemaless management is in the long term harder.
So as I said at the beginning of the talk, right, we are seeing a market gap where we are trying to address customer applications. The need for dynamic scale-out, the need for distributed databases, the need for availability, continuous availability, within a datacenter and across datacenters on one side, and on the other side, the need for transactions, the need for ACID, the need for schema, the need for SQL, and the ability to take existing applications that you already have today in your companies and try to migrate that to the next sort of modern architectures that are distributed, containerized that are continuously available. So we are sort of terming -- we are using the term elastic SQL, and you’ll hear that a lot in my talk. That’s sort of the idea of “elastic” being the ability to scale in, scale out, to containerize, to virtualize, to run on commodity, to run on distributed. So “elastic” is sort of an overloaded term in our language, and it just talks about sort of the ease of which you can sort of dynamically scale out and use databases. There’s a few logos here, and I’ll actually go through another slide that sort of talks through sort of the players in this space. But any sort of questions at this level of what we’re trying to address and sort of -- where we’re sort of focused in the market?
M4: Yeah, so you did mention that you have a schema environment, right, environment where you need a schema, or you’re in the IOT space where you don’t need one. Considering that’s where we’re headed, you know, do you see -- are you -- what’s the balance? Like, why are you in the middle?
Ariff Kassam: Yeah, that’s a good question. The question -- if I understood you correctly, we’re -- I’m talking about a schema type of environment where you defined a schema and you have an application that requires a schema, and I talked about IOT or sensors or big data in general, where the trends there are sort of schemaless, where there’s a lot more data being generated in the IOT space. I would -- so the question is why are we sort of on one side versus the other? Why are we addressing a market that may be not growing as fast as the IOT market? Is that the question?
Ariff Kassam: So I would contend there are a ton of technologies, open-source and closed-source, that address the IOT market. There’s still a need -- there’s always going to be a need for operational transactional systems. Bank accounts -- standard, canonical example, right? We are -- we’re going after sort of that market just because we believe that it is still a lot of legacy applications out there that are stuck in Oracle, SQL Server world, that we want to move to sort of a scale-out type of environment.
So for most people who have, say, Oracle or Microsoft environments, the schema is the smallest piece of what’s blocking them from moving forward. The Oracle environment, I have a half million lines of PL/SQL. I have OCI clients. I have stuff written in the Oracle APEX environment. I could make similar statements with different tools at Microsoft. How can you address that whole ecosystem rather than just the database schema?
Ariff Kassam: So, for those who didn’t hear the question, his question concerned what is around not just the data in the database, but the tooling and sort of how people use the database in terms of store procedures, the clients and all those other components. Is that...?
M5: Yeah. Because if you move just the database, that’s like upgrading my car by just moving my transmission. You kind of need the rest of the car.
Ariff Kassam: That’s a fair statement, right? So there is no magic bullet here, right? If I’ve got a million lines of PL/SQL and Oracle, unfortunately you’re stuck, right? I’m sorry to say, but you -- Oracle owns you, right? (laughter) And they’re -- they’re going to continually milk you for maintenance revenues for as long as that application stays there. There is no magic bullet.
M6: But I mean, I’ll be honest with you, and I think all these newer, semi-newer platforms on the database side have really not built any ecosystems and partner to build those ecosystems. Because if you look at, you know, the Oracle and SQL Servers, when they all were the cool ones coming off the mainframe, they did the same thing. Right? They built a killer engine and everyone in the mainframe world were just kind of like, “So what? You don’t have Omegamon, you don’t have JCL, you don’t have all this infrastructure.” And then the Toad started coming out, the Quests, all that infrastructure came out, and then you guys, the new ones said, “Oh yeah, and then here’s a profiling tool, here’s a data modeler tool, etc.,” and that’s what we’re stuck on. And I think that’s one of the things that I really believe you guys could build. I mean, I’ve dealt with one of your competitors that sells things online also. And, you know, they won’t build some of this infrastructure. And you’re sitting there arguing with them about, you know, Informatica is where it is today, and you won’t build an ETL migrating mapper in S3, so then, you know, we’ll just not spend the money next year on buying more S3 instances we don’t use. I mean, it’s -- and I think we’re all in that same boat when it comes to these tools. Having that infrastructure, you know, there is critical.
Ariff Kassam: Yeah, I agree. So again, the comment is around the ecosystem around the database. The database is not the thing, right; there is a set of business processes that get data from one spot to another, and you need the entire infrastructure to make business sense for migrating. I agree with you that we -- generally in this space, this sort of elastic SQL space or new SQL space, or whatever you want to call it, is lacking maturity in terms of ecosystems and partnerships. That’s a function of still working through development of the database, as well as working through partnerships and getting traction to attract companies like Informatica or someone to sort of partner with. So it’s a maturity thing. I think it will come. I think the ability to have SQL as sort of the layer that floats all boats -- right, if I got a tool that works on SQL and we support SQL, then generally the tooling around that, like Toad or other things, will work. Great questions -- or even observations. I appreciate those.
Ariff Kassam: Okay. What we tried to do with this mapping -- and I’ll try to walk through it generally -- is provide sort of a pathing to show you sort of the different choices you have in the database landscape. There a plethora of databases out there for you guys. There’s a lot, between NoSQL, new SQL, old SQL, streaming -- there’s just a lot of databases out there. So up at the very top, you’ve got sort of analytics, operational, and targeted applications. So the two big ones: there’s always a different between analytical workloads or operational workloads. Analytical, again, you’re working on a large amount of data, running long-running queries, complex queries, to analyze and predict and understand what the data is telling you. All right? I’m not taking money out of a bank account, I’m not doing a fraud detection charge, I’m not doing something transactional. Okay? There are a ton of data warehouses out there, between Hadoop, between Teradata, between Oracle, IBM -- they all have data warehouse technologies. On the other side are targeted applications. These are very niche types of applications for a very, very specific type of environment. These could be scientific databases, they could be time series databases, they could be streaming databases -- they could be all sorts of different databases. And there are a few examples, again, down here, which you probably can’t read -- I know you can’t read.
So we are focusing on the operational environment. In the operational environment, there are sort of two main categories: there’s single server or distributed databases. In the single server -- I talk through some of these use cases -- in the single server, I either shard it, in a monolithic environment, or I cluster it. So cluster, again, you’ve got Oracle RAC, IBM pureScale, MySQL Cluster. There’s tons and tons of different cluster technologies out there you could take a database to. From a sharded -- again, you can shard a database with any technology, okay? They all support sharding environments.
Moving to sort of the distributed environment, we sort of broke them out into sort of ACID and Rich SQL transactional versus partial. And there’s all sorts of different flavors and all sorts of different combinations and all sorts of different permutations between transactions, ACID support, SQL support. Right? Cassandra has some transactions; other databases out there have other SQL support. It’s a mixed bag of what each vendor supports. Right? These are typically your NoSQL environments.
And then in terms of Rich ACID SQL and ACID -- we list ourselves in that space. There’s also newer technologies out there that you may have heard of, between of Google Cloud Spanner, and CochroachDB obviously here. They’re all in this space, and there’s also Clustrix mentioned earlier.
So, as I sort of talked about elastic SQL, the way we define elastic SQL databases is this idea of simplicity with scale-out and elasticity, while maintaining ACID transactions, consistency, and SQL semantics that databases of record, applications of record, need. These are your typical transactional databases that have to have durability of the data and a consistent view of the data.
M10: Could you expand on the streaming application?
Ariff Kassam: Streaming applications?
M10: Yeah, use cases.
Ariff Kassam: So streaming, the most common one today, currently, is this idea of IOT, internet of things, where I’ve got sensors everywhere, submitting pings or heartbeats or whatever -- again, another example that was pretty popular -- an oil rig. Right? The oil rig’s got multimillion-dollar equipment on there. All of them have sensors, temperature, pressure, all sorts of gauges that are sending out data in a stream, continuous stream. And you’ve got something capturing all that, analyzing that, typically at the point of [presence?], on the rig, analyzing the data there, looking for trends, looking for anomalies, looking for thresholds that have been hit. They’re capturing it -- they’re capturing the data for learning so that they can -- you can stop things, stop working on things or do some maintenance on equipment before the actual problem occurs. And the data is captured for historical models, for predictive analytics. How do I build better models on prediction? Look at the data, find out where I missed an even, and figure out how I would capture that. That help?
M10: Yeah, thanks.
Ariff Kassam: So this is a chart that sort of quickly summarized sort of the key features that we think distinguish the different categories of databases. You’ve got traditional databases, you’ve got your NoSQL databases, elastic, which has Google Cloud Spanner, Cochroach, and NuoDB. The top four are around SQL. Database of record -- is the data durable and consistent? ANSI-SQL -- can we migrate SQL apps’ in-memory performance? And then the bottom three are elastic: Can I scale in, out? Do I have continuous availability? And can I deploy this in different environments, either a combination of on-premises, in the cloud, or hybrid? Or even different clouds.
All right. As normal, as we expect, traditional databases are great for SQL, but they’re not elastic. NoSQL, I contend, have issues with maintaining the SQL properties of this. Obviously they’re great for elastic. Google Cloud Spanner has a partial checkmark for SQL, because if you actually look at their API and interfaces, it’s not SQL. Some of the operations have to be done through APIs. A lot of the writes have to be done through an API called [NoSQL call?]. And also, in terms of deployment flexibility, you can only run that in the Google Cloud; you can’t run anywhere else.
Cockroach is an open-source. Again, partial, because the level of SQL syntax they support is limited. They’re getting better, but just because -- because -- building databases is hard, building SQL syntax is hard. It takes time. So they haven’t gotten to that level of maturity yet. They also don’t have an in-memory section to their capabilities. Whereas -- and I’ll talk to you a lot of these -- NuoDB has checkboxes in all of these categories.
So what I’m putting -- again, I’m not going to spend a lot of time on this slide, but it’s sort of talking about the different -- when you talk about elastic SQL database architecture -- and I’ll get into more of the architecture in subsequent slides -- but when you start talking about the architecture between Cockroach and Google Cloud Spanner versus us, the key difference with us is we’ve got this in-memory cache that’s shared between multiple different processes. With GCP Spanner and Cockroach, they do rights -- they partition the data and they do rights to a specific set of the data in those partitions, and when you need to do rights across partitions, they do either two-phase commit or some other coordinated transactions.
M11: I have a question regarding your comparison between Spanner DB and NuoDB. So when we are talking about -- so is NuoDB more co-located in terms of geography? As far as I know, Spanner DB uses cloud to synchronize across different regions. So since you’re using in-memory technology, so you are co-locating most of your service?
Ariff Kassam: So the question is Google Cloud Spanner is a globally distributed database, right? I can have data nodes in Europe, data nodes in San Francisco, data nodes here in New York. The question is, can NuoDB do that? Right? That’s what I’m talking about -- we’re talking about in-memory. Can we distribute the database across multiple regions? We do do distribution of data, database nodes within regions, but we don’t support long-distance global distribution. There is -- for consistency, for conflicting records, we maintain coordination between the nodes. So there will be latency impact depending on how far. And I’ll talk through some of the architecture of how we support different datacenters, multiple datacenters distributed data. That data is cross (inaudible) in datacenters.
So the next set of slides are going to be a little bit more technical, for those who are interested in a lot of the sort of how we sort of talk -- how we built this database. So any other -- any general questions upfront of sort of the market, where we see the differences between NoSQL, new SQL, us, Cockroach, and Spanner? Okay.
So NuoDB was started back in 2010. It was started by a person named Jim Starkey, who was Ingress, MySQL, and a few others; I can’t remember exactly. Sorry. But the key thing, the key architectural change that he thought of when he started that database is he took a traditional database stack, right. Your traditional database has a query engine. It takes the query, it parses the query, it optimized the query, and executes the query. And you have a storage layer that makes the data durable, that sort of makes sure that that data, when it happens, it’s saved on disk so I never lose it. What Jim did is he took those two related but separate processes and split it up.
So NuoDB is made up of two different processes: a transaction engine that does the SQL optimization and execution, and a storage manager layer that manages durability. So those two processes, you need both to make a database, but you can have multiple instances of each one of those. So shown here is your traditional database architecture and split. So these are your transactional engines and these are your storage managers. The transaction engines do your typical thing: they accept the query, they parse the query, they optimize the query, and they execute the query. But they execute fully in memory. There are no storage components to the transaction engines; they are a fully distributed in-memory transaction processing engines. The storage managers also have an in-memory component to them, but they also store things on disk. That’s their job: making sure that the data gets written to disk for durability. The storage managers could have a full copy of the data, so the number of storage managers defined at the redundancy of your data that’s stored to disk. Or the data could be sharded or partitioned across storage managers if you want to scale out your IO bandwidth at your storage layer.
So, generally speaking, the transaction engine and the storage engine are peer-to-peer nodes on a network or across networks that talk among themselves to manage and coordinate transactions. Again, the transaction engines, the green dots -- the bright green dots, sorry -- are fully in memory, can be spun up and spun down dynamically. They are not -- the caches -- the data caches that they manage is not preallocated, is not pre-sharded. The caches are dynamically allocated as applications execute workloads against the transaction engines. The storage managers, again, manage data on (inaudible), and they could be located within a datacenter or across datacenters. But at the -- the basic understanding here is that this is a peer-to-peer network between all the nodes. All right? There’s no master, there’s no head node, there’s no leader that all applications have to go to. An application can connect out to any transaction engine in the environment.
Again, sorry for the eye chart. But what I wanted to show is sort of the architecture between the transaction engine and the storage manager. And what I wanted to show is that each component has what you would expect in a standard database architecture. So I talk about the transaction engine. Transaction engine is all about processing queries. So it has, at the very top, a SQL parser. It has a query engine that has a graphing model, a rewrite rule engine, it has statistics, it has a cost space optimizer, it has a planner, it has an execution engine. These two, three layers here are the exact same as you would find in Oracle, SQL Server, or any other relational database.
The bottom layer on the storage manager -- as I said again, the storage manager is all the durability. The bottom layer of the storage manager is also the same that you would find in any traditional database. It is basically management of the disk. In our case we’ve got what we call the archive, which is your persistent data, and then you have a journal, which is your write-ahead log, is your transaction log, right. So our journal is our (inaudible). So this part of the architecture diagram is, again, the same as if it was the Oracle, MySQL environment. What’s different is the layers in between.
On the transaction engine, the layer from here down is different. On the storage manager, these two layers above are different. And what they are is what we call this dynamic cache, this dynamic durable cache. I’ll talk about this in a second, but one of the key concepts within a database is this idea of taking data and metadata represented in a database into what we call atoms. Atoms are representations of data and metadata, but the concept of atoms is actually transformative. It basically allows -- each atom knows how to self-replicate and how to be serialized to disk. Sorry, I’m getting a little bit ahead of myself. So I’ll talk about atoms in a second. But ultimately, this layer and this layer is all about cache management. Right? Data cache management. We call it atom cache, but it’s the same thing. Data within the system, on the TE, is fully cached, and in the storage layer, there’s a representative cache at the same time.
So I talked about atoms. So atoms in NuoDB are one of the fundamental architectural changes that we -- again, that enable us to do what we do. They are objects, both data and metadata. They are self-replicating. They are also what is known as CRDTs, so they are conflict-free replicated data objects or data types. That means that if I do a write on one node and a write on another node, the atom changes can be merged together, right, without any sort of conflict or resolutions. As you can sort of see, there’s several different types of atoms within NuoDB. And again, we don’t have to go into a lot of detail here, but the key takeaways that I want you to think about when you think about NuoDB and how we do what we say we can do is this concept of splitting your traditional stack into two processing nodes: transaction engines and storage managers. This concept of atoms, which allow us to have a cache that is intelligent and durable, both in the transaction engine as well as the storage manager.
So, I want to walk through some typical sort of use cases in typical environments that we see in our customers. So over here on the left-hand side, I have two transaction engines and two SMs, and then I’ve got applications. We call this a two-by-two -- two transaction engines, two storage managers. This is sort of your minimally redundant database. I can have a failure of my transaction engine and still continue operating. I could have a failure of my storage manager and continue operating.
The scale-out aspect of NuoDB is very simple. All I need to do is add a subsequent transaction engine into this environment, and I can start processing application workload as soon as the transaction engine is there. The data that that application requires -- when that transaction engine starts up, it’s empty. There’s no data in it. It’s not prepopulated; you don’t have to load data in it. I just start a process, and an application can connect up to that process and start issuing SQL commands. As soon as that happens, the transaction engine says, “Okay, I need this data.” If that data already exists in other transaction engines, it will get that data from peers in memory. If that data is not in any of the transaction engines, it will have to go to disk to get that data, so it coordinates that with the storage engine. So the data that is populated in each transaction engine is dependent on the workload that the application drives. If I’ve got an application that is partitioned, I could have an application connected up to one TE across those partitions, and I could have an optimized balance of application throughput for that particular TE. If that TE fails, okay, so that data’s gone, the application can connect to another TE and get that data. It’s just going to take a little more time because that data is not pre-cached or already hot in that transaction. Does that make sense?
So I already talked through this, but again, just as simply as you can add a node and add application workload, the failure of a transaction engine does not affect application availability. It will obviously affect application throughput, because I’ve lost one of my nodes for processing transactions. But just like a failure of this node, the application will get a reconnect, a failure of the socket. It does a reconnect, and it connects it to any other transaction engine -- it doesn’t have to connect up to that specific one -- and start processing those transactions. So from a failure protection standpoint, the loss of a transaction engine doesn’t affect your availability.
M12: So how does the client know how to move to the new endpoint?
Ariff Kassam: So the question was how does the client know how to move to the new endpoint. You could either use a load balancer in front and indicate that the transactions are out there, or we also have a -- it’s not shown in this diagram, but we also have an administration process that knows we -- the locations of all the transaction engines. The typical client connection would -- we’d do an initial connect to our admin process, and (inaudible) say connect to this TE. So it’s effectively a load balancer built in.
F1: Is there a particular ratio to the number of transaction engines that you have for a situation? I mean, supposing all you have is two engines and they both conk out?
Ariff Kassam: So the question is is there a typical ratio. Unfortunately, it depends, situation depending on your workload. If you’ve got a really heavy workload, then you can scale up your transaction engines as much as you want to either load-balance -- a round-robin type of load-balancing algorithm across those engines, or you could have affinity applications with TE, depending on how you’ve architected your application. If you have an application -- if you’ve sort of containerized your application -- microservice, right? -- if I deploy a new application container, you could potentially deploy a TE with that container as sort of a bundle to scale out your application with the TE. If it’s a write-heavy workload, then it’s going to be dependent on your throughput on your SMs. The way that’s shown here is the SMs are fully redundant copies of the data, so all writes will go to both of them. If you want to scale up your IO bandwidth, what you’d need to do is partition your data across SMs and scale out your SMs that way. Does that help?
F1: So you’re saying they’re not dynamic -- they don’t pop up as needed; you have to --
Ariff Kassam: Yeah. So the question, is it -- do they dynamically add themselves to the environment? The question is no. We are working with integrating with orchestration tools like Kubernetes, Mesos, Docker Swarm, to manage the rules, to say “If my utilization hits X, automatically deploy a new TE.” So you could do that, and actually we have a demo, a video demo later on in the presentation that shows that exactly, where as load was going, we dynamically increased the transaction engines for a really heavy workload to scale out.
M6: So do you guys have like a resource mastering engine or, I mean something like RAC does, you know, most of the availability clustering allows you to do that.
Ariff Kassam: Like what kind of resource?
M6: Well, so that you can constantly measure -- so as an end user, as someone running console, you could constantly see the risk.
Ariff Kassam: So we don’t have our own. We provide metrics and use other tools, integration points, to -- we typically just Graphana and order something like that to show utilization across the nodes and performance. Of the -- again, it’s part of this sort of conversation around ecosystem, right? Our mantra is providing APIs to integrate with other tools that (inaudible) their functionality.
M6: So you integrate with like EM and Microsoft SMS, that kind of stuff?
Ariff Kassam: Not the vendor-specific tools, but sort of we provide templates like Graphana and a few others, which actually are open-source monitoring tools. Was there a question out there?
M2: More of a clarification. I guess I’m just curious if currently most of the database is being used in banking versus trading. Like, what’s the current...?
Ariff Kassam: So the question is what’s our customer use cases today. We have a wide variety. We have some banking customers, we have some telco customers, we have some web startups. We have a manufacturing -- a software manufacturing company. So we’ve got customers across a wide spectrum of use cases.
So again, talking about typical deployment scenarios. I mentioned the two-by-two is sort of your standard deployment. It provides availability and scale-out, at least for two nodes, for most applications. Scaling out is, again, as simple as adding additional transaction engines to the same environment. Now, if I want availability across data centers, I could have a number of different types of protection. In this case, this is a warm DR site, so what I’ve done is just put a storage manager in that remote site. All that storage manager is doing is capturing changes to the data. If I lose my primary site, all I need to do is start up a transaction engine and I have a fully running database that’s consistent with what was lost. So what we call a warm DR scenario, because it takes time to start up a transaction engine. You could also have a hot DR site, and what that means is I’ve got a DR site that has both transaction engines and storage managers ready to go, but no application workload through that. So your fail-over point, your recovery time, is less because there’s no startup time. As soon as the application connects up here, it’s ready to go. We also support active-active. And there was an original question back here about geodistributed, right? We don’t do geodistributed active-active, but we do availabilities over metro area active-active, right, where your network latencies are 30, 20 milliseconds round-trip. We support the ability to have applications doing read-write workloads across two regions, metro-area regions, at the same time. We ensure that the data’s consistent, we’ll make sure it’s available, and you can scale out in any of those datacenters. We’re also trying to get to a three data center environment, where it’s active-active-active, where you actually get the benefit, a 33% benefit in utilization in that environment. Questions?
M13: What are those different companies? What are they looking for?
Ariff Kassam: So the question is, if I heard you correctly with the question was the customers, our customers, what were they looking for? Quite honestly, the most common things is elastic scaling. These customers are migrating their applications to the crowd. As part of a SAAS software that they’re providing out there to their customers, they now manage the database. And they don’t have -- they didn’t want to architect a system for customers that they don’t get in three years. They want to be able to start with a database installation that meets their demand now and grows over time as their crowd demand grows. So the ease of elastic scale-out was critical for them.
M13: Do you do any migration?
Ariff Kassam: Yes. So we’ve done Oracle and SQL Server migrations. We’re seeing a lot of -- we’re seeing a lot of both those. Did you have a question?
M14: Yeah. How do you ensure consistency when you go (inaudible) architecture (inaudible)?
Ariff Kassam: So the question is how do we ensure consistency. So that’s a great question. What I will -- I’ll try to summarize it and go into more detail offline. But we do have a lock manager for each -- a particular set -- so we talked about atoms, and I don’t know if you saw that part, right. So an atom is a set of data within the database. We have what we call this idea of a chairman. A chairman is effectively a lock manager for that set of atoms. But that chairman can be distributed across nodes. So we don’t have a fixed lock manager, we don’t have a fixed set of chairmans, we have a distributed environment where if there’s conflicting writes at the same time to the same piece of data, the first person that goes, that communicates to the chairman, will get that write lock, effectively. All right? And the chairman is effectively located -- is always located in the transaction engine.
M14: And what happens when you have a partition?
Ariff Kassam: I’ll get there in a second. So the question was what happens with a partition, and I’ll get to that -- I have a slide on that specifically. Ha! Next slide.
So we are a CP system. So it -- if you guys are familiar with CAP, right, continuous -- consistency, availability, and partitioning tolerance. In the face of a partition tolerance, in the face of a network partition, to avoid split brain, we maintain consistency. We don’t maintain availability. So what happens -- if I’ve got a network partition, in order to maintain consistency, we will automatically shut down nodes on one side to avoid splitting them.
M14: But what if you don’t know which side you’re on? You may do the wrong side.
Ariff Kassam: There’s no you, right. This is done automatically. We will shut down the minority.
M14: No, obviously there’s no me. I’m the application.
Ariff Kassam: You’re the application. If you’re an application over here, if you were a client that connected up to the application over here, and there’s a partition, we will automatically shut down the database nodes here -- I don’t know if I’m supposed to touch this.
M15: It’s all right.
Ariff Kassam: We automatically shut down these nodes, so the application will see a failure. So one area of -- from a (inaudible) perspective, one area that we are actively working on is the ability to tolerate partitions. In that case, what we’re going to be able to -- what we’re trying to do is allow the database to diverge -- be locally consistent but diverge over time -- when the network yields, similar to a golden gate or other types of application systems, the tech complex and dissolvers conference. The person back there -- sorry, hard to see.
Ariff Kassam: So the question, do we need cache between them?
M16: Yeah, do you replicate the cache between them.
Ariff Kassam: No, so we don’t actually replicate -- sorry, the question is do we replicate the cache, and the answer is maybe. (laughter) If I’ve got -- if I access a piece of data at this transaction -- say it’s my customer record, right, I update it here. And then for some reason my wife’s over here and she wants to update my profile as well, on this side. So that data is now also cached -- my record is also cached in this transaction engine. Any changes that happen on either side will get replicated to both, as well as to the durable disk. So the cache is replicated, so long as that data is in memory somewhere else.
M16: How do you avoid the conflict / collisions?
Ariff Kassam: So again, the idea of what I talk about the chairman, right, so that record atom, my record, has a particular chairman. Say this transaction engine owns the chairmanship of my record, right. If I do an update here at the same time that my wife does an update here, the first person to the chairman will get the lock. Since I am local here, I will get that lock; she will not.
M14: But if they happen at the same time, you may not have control over there. I mean, this is the problem that everyone is trying to solve. So if you are saying that you shut down the outer part of the network, that’s your decision. You basically shut down the outer part of the network. And by doing that, you probably will create a situation where, in a lot of cases, the system will not be available, so your website or whatever your clients will not be able to use it. If they’re okay with that, that’s fine, but if you’re trying to do somewhere in between, you know, get the lock on one side and not to get on the other side, it will lead to chaos very, very soon.
Ariff Kassam: So there’s a number of questions there. Can we take this one offline, specifically. I mean, do you have a specific -- I’d be happy to answer the question, but can you -- are you talking about network partitions or the lock manager? Or both?
M14: They are kind of the same, from this perspective.
M6: I mean, think about it. If you have a trading application -- I mean, anything that involves concurrency, but trading is a great one -- that you might have traders in London, traders here, traders in Tokyo, right, and they build -- they use a SunGard-type application, pump data in, you suck all that in, you put all your algos around it, etc., and then basically you want a concurrency on that pricing and you put a lock on one part of it, you can’t wait for that lock manager to basically, you know, determine which one to unlock, right?
M14: You can wait, but your clients will (inaudible).
M6: Exactly. Exactly, that’s fair.
M14: Will it be okay for them wait? Basically because this will happen all the time.
M6: Well, the challenge is that you’re maintaining integrity, which is right.
M14: But if you start tampering with the states over there --
M6: Yeah, exactly.
M14: -- you immediately inflict chaos, because no control.
Ariff Kassam: So let me see if I understand your thing. When a write comes -- when a conflicting write comes on a non-chairman node, the transaction engine has to communicate with the chairman. If it can’t do that because of the partition, then that request fails.
M6: And everyone’s been screwing around with Veritas and RAC for 15 years to optimize that, across WANs and -- we all just play the same game. There’s no magic button.
Ariff Kassam: There’s no magic solution here.
Ariff Kassam: What we’ve done is distributed the cache across multiple nodes. Those nodes could be in different locations. Again, I’m putting this upfront -- we’re not geodistributed, so that particular example is not the best, but I can still see the same situation between New Jersey and --
M6: Healthcare, same thing. Healthcare’s all the same.
Ariff Kassam: So we’re distributing the cache up here to allow quick access to applications -- access here, right. But on writes, it will coordinate with the transaction manager somewhere, on this side or this side.
M6: The question is is your in-memory -- you have it in memory later; that’s hierarchical. Right? Because that’s what a lot of the legacy database vendors are doing -- SQL Server, Oracle database -- they’ll use their in-memory engine, because they all bought one, basically, or five, to figure out which one works -- and then basically use that to create the vector for that little cache that you’re actually trying to get a transaction from.
Ariff Kassam: Okay. This is going a little bit further than --
M6: All right, all right, that’s fine.
Ariff Kassam: Let’s take this one offline. Yes?
M17: How do you tell apart -- so you said if a specific node fails, the system pretty much continues. How do you tell the network segmentation from a node failure, specifically, let’s say, a node that happens to be a chairman for one of your records, just, you know, collapses, crashes, whatever, goes away. How do you differentiate that from a network segmentation --
Ariff Kassam: Again, that’s a very good question. So this is a peer-to-peer network, so everybody’s pinging each other, right, and so once there is a -- once one node detects a failure to another node, it advertises that to its peers, and they all --
M17: So they vote them out?
Ariff Kassam: They vote them out, effectively.
M17: So a new chairman gets elected?
Ariff Kassam: Yes.
M17: So the old chairman forgets the (inaudible)?
Ariff Kassam: It’s not chairman anymore.
M17: Yeah, starting off based on that distributed cache to figure out that one of its vectors that it thought was the latest, greatest in machine --
Ariff Kassam: Yeah, it’ll sync from the other people when it wakes up.
M6: It’ll do a destructive --
Ariff Kassam: Yes, yes.
M6: Yeah, all right.
M17: (inaudible), you know, it could have (inaudible), but, you know, whatever, but nobody else got (inaudible) --
Ariff Kassam: Right. So if it had a transaction that it wasn’t committed, right, and it’s not durable, so it’s lost.
M6: Well, see, that’s a rough vacation policy. You take a nap and you lose your job and get voted out of office.
M14: And it’s not only a lost, the data is lost. In that situation, data is lost.
Ariff Kassam: No, so again, the data might be written here, but if it’s not durable, right, it’s lost. If a transaction engine -- if I get one commit -- so again, so there’s also this idea of commit and committing, right. When I do a commit, right, we have policies -- there’s (inaudible) commit policies within the database. I could accept -- I can accept the fact that that would be -- the commit was (inaudible). I receive the commit at the transaction engine, and then I can go back to the application and say I’m done. In that case, the failure of the transaction engines, we lose this data.
M14: That’s right.
Ariff Kassam: Agreed. We also have the ability to say, “I at least need to hear back from one storage manager before I say that.” Or, “I have to hear back to n storage managers before I turn that (inaudible).”
M14: (inaudible) consistency or full consistency? (inaudible) your full consistency model, if you kind of disqualify upon any network partition the whole portion, then that will become failure for the clients. The clients will get many more failures.
Ariff Kassam: If I -- again, I think we’re going around in circles on this, but if I lose the network, all these nodes are gone.
M14: Right. So the applications will disconnect.
Ariff Kassam: The application is gone. Agreed, right. Again, we say consistent, not availability.
M18: So what happens when the chairmen, when it fails? What happens next? Does it (inaudible) to challenge it or what?
Ariff Kassam: So if the chairman was on this node and this node failed, right, we have this concept of nodeless. Where is this particular atom in cache anywhere else? The next node in that list. It could be here. If that data was here, it could be here, or it could be here.
M18: So it inherits it.
Ariff Kassam: It inherits it. And then we also have algorithms to do what we call chairman migration. If, for example, the data’s here, and the next node on the list is over here -- let’s assume that this is still up and running -- the next node is over here, the chairman moves over here, but all the workload for that data is mostly over here, we will migrate it to that point.
M18: So is that done by (inaudible), or?
Ariff Kassam: Yes, it’s all done automatically.
Ariff Kassam: Today we can’t. May be something we can add in the future. But generally the migration of the chairman is pretty fast.
M18: So (inaudible) used to determine which node is the right one to migration?
Ariff Kassam: Right now it’s just a list. We just go through the list of (inaudible) chairmans.
M18: Just at random.
Ariff Kassam: Yeah.
M19: So when you’re doing the multiphase commit thing that you’re describing, I’m guessing that the minimum requirement for a commit is that the transaction node is happy, the chairman of that record says that he’s happy, and one of the storage nodes says that he’s happy?
Ariff Kassam: Again, it’s tunable, so you couldn’t do it, but the chairman is not required for a commit to happen. Right? The chairman’s required to be able to do the write, right, but once that doesn’t commit, then either that acknowledgement comes from the transaction engine or the storage manager or both.
M18: Can I determine which node would get to receive the query?
Ariff Kassam: Yeah. So you can connect your application -- the question was can I determine which node gets the query. The applications can connect to any one of these transaction engines. So you can send your application workload to a particular transaction engine based on workload type. And again, we’ve got customers that are doing what we call HTAP -- hybrid transactional analytical processing -- where they’ve taken one of the transaction engines, put a lot of memory on that server, and are submitting a lot of analytical queries to that particular transaction engine so that that cache is loaded up and populated and quick.
M20: So. I know you’re getting a lot of bombarding questions, so I’ll try to simplify. (laughter) So NoSQL is addressing, you know, if you’re moving from traditional databases to NoSQL, there is a shift of responsibility to the developer more to maintain that --
Ariff Kassam: In general, yes.
M20: So if you’re moving -- and this is to address the traditional expansion of traditional databases like banking systems and so forth -- so is it that the learning curve for the traditional database developer, is that a large leap to provide solutions in your environment here?
Ariff Kassam: Generally no, because we support SQL, and it’s just a SQL abstraction layer, JMC, ODC, whatever type your fiber is. It looks and feels just like another database. The application is unaware of the distribution of nodes or the type of nodes that exist in this type of environment. We look like a single logical database to the application.
M20: And as far as those clients that (inaudible), is that more like traditional?
Ariff Kassam: Yes. It’s obviously a little more work on the operations side, because it’s a multinode distributed environment, just like any other environment, but again, where we’re going with this is integrating with the newer sort of orchestration tools that are out there that manage these types of distributed environments.
F2: Amazon Redshift’s (inaudible), how it differs with this.
Ariff Kassam: So Amazon Redshift is an analytical solution, right, so if you remember my eye chart of different databases, and Redshift was in the analytical, against Teradata, Hadoop, Oracle, Exadata. They’re more of an analytical system, not an operational system.
F2: I know. In terms of the dynamo?
Ariff Kassam: Dynamo is a more inductive database, it’s a NoSQL type of solution.
So I’ve got a -- hopefully this works -- I’ve got a demo, so just to set up the demo. This was a demo that we did at Red Hat here in -- sorry, not here -- in Boston, at the Red Hat conference show about a couple months ago. It is an integration of NuoDB with OpenShift. OpenShift is a Kubernetes solution, so what we did was we had applications in containers, we had our transaction engines and storage managers in containers. And what we did was scale out the number of transactions. This is a read-mostly workload, so we scaled out to 33 transaction engines, showing you throughput of transactions. We also showed a migration from on-premises to a cloud environment, so we took systems -- we had a database that was on-premises, and we added nodes in AWS, so we were adding application workload in AWS. So hopefully this works.
M: So one more question.
Ariff Kassam: Sorry, you’re going to have to wait. I can’t stop this.
All right, so the demo’s going to show availability, active-active hybrid cloud. All right, so we -- to orient you, the bottom left is going to be a topology diagram, the bottom right is going to be OpenShift, and the top right is going to graphs showing you transactions per second and latency.
So in this show, we sort of scale that to 33 or 31 transaction engines, and this goes out to 2.89 transactions per second, and this is the -- so this is Kubernetes OpenShift console. Each of these are pods or, sorry, containers. So in this case you can shut down the on-premises and the application will continue working. Sorry, it’s a little fast.
So that was sort of the summary. Each one of those steps will then be shown in the following images.
So that’s OpenShift. Graphs between transactions per second, and this line is the number of nodes. And then this is just a logical diagram. Right? So one TE, one SM, the application. So one TE getting 37,000 transactions per second. So here it’s showing you -- the right-hand side is on-premises, so one TE, one SM. You see you add another TE on-premises. There’s nothing in the cloud on the left-hand side. So we added another TE to the environment.
So in this case we’ve got four transaction engines. So what happens if a node goes down? All right, the application reconnects to another transaction engine and continues its transactions. All right, so scale out to four transaction engines.
So as I said, this looks like a single logical environment. You can have on-premises, so we’ve got four nodes on premises. You add a storage manager in AWS, and now you’ve got the data in the cloud. You can then add transaction engines and applications in AWS to start workload in the cloud. You can then add another node for redundancy for storage management, and you can start scaling your transaction engines in the cloud.
So in this case I’ve got on-premises environment and in the cloud. If I lose the on-premises, either part migration or failure -- so we’re just simulating a failure by shutting down all the nodes -- the application continues to keep running, but obviously with lower throughput.
And then finally, we just add a whole bunch of transaction engines out to 33, or 32, and you can see the transaction into scaling here, and going out to 2.8 million transactions per second.
The demo’s on the website if you’re interested in looking at it slower. (laughter) And it’s got a good voiceover, better than mine. So, quickly?
M14: When you’re saying in your demo that the system is shut down, part of the system got shut down, and the application’s still running, it’s only running because it connected to those nodes --
Ariff Kassam: Correct.
M14: -- which are in charge of --
Ariff Kassam: Which are available.
M14: If your engine’s connected to the nodes which are on the other side, that application is not running.
Ariff Kassam: Obviously, right.
M14: But you’re saying it’s running, and in fact, half of the situations is that it’s not running.
Ariff Kassam: My assumption is that you have, above the database tier, the application tier, or the client tier, you have load balancers to direct your client workload across your application servers that are available. Right? That’s the assumption that we’re going with. And that’s sort of general best practices for application development, is you’ve got different load balancer tiers and you direct client workload to what’s available. If I shut down the database on one side, the application servers will have no connection. So.
M14: This is at least confusing, because, you know, when you’re saying it’s running, it’s running only on those nodes which are still up. And this is a very valid assumption for everything.
Ariff Kassam: Sorry, I got to finish up, quickly. Very quickly, a custom use case.
Alfa Systems is a UK-based software company. They provide leasing software for assets. Most common is if you go lease a car from any manufacturer, most of the software behind that leasing infrastructure is Alfa Systems. They just did an IPO on the UK stock exchange, and it was the biggest IPO since, I can’t remember, 2000. It was a big IPO. Their challenge was that they’re offering a SaaS model to their customers and they needed a database to support their application, and they wanted elastic scaling [in terms of?] availability. They were using Oracle, they found it too expensive, and MySQL couldn’t scale. And so they chose us for scalability, continuous scalability from an active-active perspective, and the benefits was the 90% savings on Oracle.
This is it. (applause)
M1: Thank you, that was Ariff Kassam. Thank you all for coming.