Watch NuoDB's Trek Palmer at NYC Database Month explain the compromises traditional databases and NoSQL databases face in cloud environments, and see where he believes SQL is headed to meet the cloud deficits of other DBMSs.
(Eric): The first guy walking up is Trek Palmer who is chief engineer (applause) (inaudible) memory implementation for Java, so seriously (inaudible)? Exactly, he built it. (inaudible) is, he built it. OK and the other gentleman here is Wiqar Chaudry. He is the Tech Evangelist manager for NuoDB. (applause) (inaudible) large scale marketing databases for banks and for all of you who aren’t clear on what that means, it means all that spam mail that you get from Bank of America is through his system. (laughter) Please welcome (inaudible). (applause)
(Wiqar): All right, guys, while this is getting warmed up, let me just kind of lay out how tonight’s presentation is going to work. I’ll be giving everybody sort of a very high level five minute overview of the company, why the company was started, and then I’m going to hand it over to Trek, and then Trek will take over from there to do his magic, whoever it is that it is. Very good, so NuoDB is the elastically scalable database. What does that mean? Well, before I can answer what it means, I have to probably go back in time and try to share with you guys the great things that our friends at Oracle, Microsoft and IBM and the likes gave to us back in the twentieth century. They gave us a powerful query language known as SQL. They gave us industry standards, data guarantees, such as ACID. They gave us tools like (inaudible). They help our employees build up their skills on these data assessment tools that they provided us. And they let us put our existing data someplace where it was structured and safe and secure. But we’re in the twenty-first century and what the twentieth century databases cannot do is they cannot meet the twenty-first century challenges that we’re all dealing with, right? First and foremost commodity datacenters. So you talk about commodity datacenters, really the question is who do you want to buy your hardware from? Do you want to buy it from Best Buy or do you want to buy it from HP or (inaudible). And so you want to buy it from Best Buy. Especially is you’re a small company up and coming you probably don’t have the hundreds of thousands of dollars to spend (inaudible). You want to be able to (inaudible) big data and again big data means different things to different people. The way that we defined big data is really the three-V definition: Volume, velocity, and variety. The amount of data that companies are generating these days is unbelievable and the amount of the data that’s being consumed by applications, such as marketing applications, the reason why Google knows everything about you. That is insatiable and the speed at which this data is being generated is also incredible and it’s also something that twentieth century databases cannot deal with very well. Bottom workload, again, that speaks to the big data challenge. Twenty-four by seven operations -- somebody mentioned Dalles, Oregon, right? That’s where all the Google data centers are. Well, there’s data centers everywhere in the world, right? Microsoft’s datacenters are in Japan. Or some of their datacenters. Lots of datacenters or in places like (inaudible) Texas. Well, these datacenters are distributed all around the world to accommodate the 24/7 workload and operations that we as consumers are going to (inaudible) because it’s noontime somewhere in the world. Geo-distribution, again, just mention datacenters being distributed all around the world. Traditional twentieth century databases technologies were not designed to be distributed in that fashion. And then finally to developer empowerment. So, you go about all these employee skills, we have all these tools that we’re used to using and we need to accommodate the twenty-first century challenges to be able to empower our developers to use the existing skills (inaudible) have to solve these (inaudible). So, in short, there’s a crisis in the database world. And you bring up all these companies here, Amazon, Facebook, Flicker, Wikipedia, Google, not because they are models in terms of solving these challenges but in a sense that if you take a look at all their different data architectures, they use the very same technology for mySQL in various different configurations, some of them are using NoSQL in various different configurations. Some of them are using a combination of mySQL and NoSQL, and then you have these companies that actually invented NoSQL technologies to kind of get over this crisis in the database world. And that’s a problem, right? So a lot of these companies are doing the same thing, right? They’re writing data, they’re reading data, they’re processing transactions, so why are -- their architectures are so vastly different? Well, there’s lots of reasons for that but we can get into some of those a little bit later. So let me introduce to you a guy, many of you may already know him, obviously from the trip you're on. Jim Starkey is actually one of the founders of NuoDB. He’s also our CTO. James Starkey started his career at DEC, as many of you guys may already be familiar with. He invented Fiber Falcon, he invented the blob data type. He also invented multi-versioning in currency control which is also the heart of NuoDB in terms of how we manage our data. And Jim had this great idea. His great idea was to build a database on what we call the emergent architecture. And emergent architecture is something that exists in nature. So the example that we get here is a flock of birds. So a flock of birds takes off and a flock of birds lands, there’s no real leader. There’s not master bird that’s orchestrating the entire flight process. And we as humans, we also believe, we also behave in the same way, right? So if you think about all the big movements that happen on Facebook and on Twitter, that’s also considered emergent behavior. There’s no leader from let’s say, the Occupy Wall Street movement. It’s just a bunch of random people that are out there doing crazy things or just (inaudible). But our database is built on the idea of emergence. So before we get into the architecture of the database, which Trek will cover, I just want to show everybody a couple slides that prove that our database works, right? So when you add a database to your system, you’re able to process a certain number of transactions per second. When you add a second node to our database, your throughput, in terms of transactions per second, should roughly (inaudible). But add a third machine, again, you should see a significant increase in performance in terms of throughput. And what we’re trying to prove here is that we’re doing all this without sort of the traditional administrative hurdles that you have to go over, go through in traditional databases, like (inaudible)and MySQL or (inaudible) that you add, the more scalability that you’ll be able to get. And we’re not just adding nodes, we’re scaling your database and you’re keeping your database across all these nodes a signal to (inaudible) because there’s no sharding or partitioning or anything like that going on. So from an application perspective, when you connect to our database, your application sees a database, it doesn’t see the hundred nodes that your database might be distributed on and (inaudible). We’re also able to scale out on information (inaudible) service. What does that mean for you guys? Well, it means that you can spin up nodes on Amazon as needed. So as your demand goes up, you can provision dynamically additional (inaudible). We also support what we call a heterogeneous deployment infrastructure. So for example, let’s say you’re in a tight crunch and you’re a startup and your application goes viral overnight, and all you have left in terms of processing power are your laptops. So every single employee can get out their laptop and install NuoDB on them, and keep your applications running through the night so you get funding the next morning, and boom, (inaudible). (laughter) That’s absolutely possible with our database. So without further ado, I don’t want to take up too much of your time with my stuff. I want to introduce to you guys to Trek Palmer who is our Chief Engineer. So once again, this is Jim Starkey’s idea but there’s a lot of people that are involved in actually executing his idea and Trek is actually one of those people and he plays a very integral role in bringing this (inaudible) to market. Trek. (applause)
(Trek Palmer): So as (inaudible) said, I’m Trek Palmer. I’m a Chief Engineer at NuoDB, and I’m going to talk about the architecture a little bit. So the agenda (inaudible) so first I should talk a little bit about myself because (inaudible) pay attention to me. The second is (inaudible) overall (inaudible) dive into the architecture and I’m going to give you and example of some of the natural acts that I, and probably many people in the room, have been forced to perform on a normal daily basis and, it’s OK. I was young and I needed the money. (laughter) I had (inaudible) I’m going to give a quick demo just starting with the database (inaudible) show you how easy it is to actually (inaudible). And then here. So (inaudible). OK, so I am a refugee from academia. I was in the PhD program up in Massachusetts, the University of Massachusetts. I researched programming languages and especially transactional (inaudible) with a huge reaction (inaudible) not a lot of people were into software transaction (inaudible) but I spent a lot of time working on (inaudible). My last professional gig before NuoDB, is I worked on the (inaudible) meta database for the (inaudible) system content platform. It’s a (inaudible) and they had us distributing databases and (inaudible). So (inaudible) it’s elastically scalable. Another nice feature, but it’s mostly (inaudible). So it actually means is that you can have lots, and lots, and lots of databases. So we’re not talking about having one database that has a half a dozen (inaudible). We’re talking about the ability for the system to provision what we call a (inaudible) pool (inaudible) resources. And say you’ve got a thousand machines. If you need to run 10,000 tiny databases, and run 10,000 tiny databases and (inaudible) or you have one giant database to process (inaudible) the machines. (inaudible) refers to that possibility. You can tell the system where you want to put databases and you add resources to a particular database on a particular (inaudible). This is transactionally consistent. So we don’t make compromises. So we’re not a NoSQL solution, we are a NewSQL solution and that is buzzword and what it really means is we’re actually committed to being transactionally consistent, your application doesn’t -- you don’t have to reinvent a database every time you use our data. So you would (inaudible) a NoSQL (inaudible). You have to sit there and think about how to make things consistent, how to deal with conflicts, all that sort of stuff. So you (inaudible) to make sure and you can actually interact with it. With SQL it would (inaudible) transactions. And because we’re supporting it across lots of hardware, and because of supporting lots of databases, we also are trying to make this as easy to manage as possible. So that came up when the question was asked how long does it take to install and people are throwing out numbers like 90 minutes perhaps half seriously, but half seriously. And the idea is that it takes almost no time to install it, you throw it on there, you start the little agent, and all of a sudden that hardware is available for (inaudible). All you had was a giant piece of metal and (inaudible) once a month, but it’s really easy. So Seth (inaudible) who’s going to give a talk, he’s actually out in China getting married right now, and he informs me that (inaudible) don’t call me on that. I’d probably (inaudible) it’s actually Chinese for (inaudible) specifically and (inaudible). So the architecture for the super high level (inaudible) is that we practically have three tiers. So the top tier is this management layer, and that’s where basically you just sort of add machines to the node. So that’s how you (inaudible) and say I want to group all these together, this is going to be my database. And then you have a transaction handling layer. There’s a series of transactional engines, (inaudible) and they talk SQL and these are guys that actually handle mutating data in memory only and then keeping in sync with each other on all the different nodes. And then there’s a storage layer. The storage managers sit on the same or different nodes, and they receive the updates from the transaction engines, and they’re storing this durable storage. And, again, spin up as many storage managers as you want, as many transaction engines as you want, and basically they need to be flexible so that your application has particular requirements you can customize your database to your application rather than what most of us end up doing which is spending a lot of time customizing our applications. So the agents, top management layer, soeverybody else and says that this machine is available for use in the database. They have a (inaudible) messaging system (inaudible) and the (inaudible) so you can do scripting automation. You also have this (inaudible) and I’ll be showing some of that later. So a broker is a special kind of agent plus it’s additional special knowledge, (inaudible) has a sort of global knowledge (inaudible). And the reason that you want something like this is because clients don’t have know anything about a particular (inaudible). So if a client wants to connect to your database, all it needs to know is where a broker is and then ask the broker. I want to connect to this particular database and the broker then redirects it to the nearest transaction that can satisfy the request. So our JVC client is actually really simple. It’s a normal JVC client with just a little bit of stuff in the front, which is (inaudible). So the transaction engines themselves, they’re these (inaudible) transaction processing units. So they see the nodes, they set client connections, and they maintain their little slices of the data that we (inaudible) that they’re manipulating according to the user’s request and they’re (inaudible) one another (inaudible). And the whole thing is using multi-versioning currency control so you (inaudible) data to support a lot of transactions without anyone stepping on each other’s toes. And all of the replication from one node to the other in the background (inaudible). So the (inaudible) normal database, what we have is (inaudible) of replication where we decided where the main database is going to be and then (inaudible) where you’re going to keep your backup database and you have send this (inaudible) and just watch it like a hawk and everything. You still have a transaction engine as long as you have (inaudible) message (inaudible) a little bit ahead and a little bit behind (inaudible). The storage manager is a (inaudible) point for (inaudible). So they also have (inaudible) mobile transaction engines and then (inaudible) are continuously writing this stuff (inaudible). So behind the storage manager is (inaudible) file system which is (inaudible). We also support a (inaudible) backend. So we don’t agonize that much about particular storage on disk or anything, and what that means is if you have any sort of storage (inaudible) they can support it. And each individual storage manager is going to have (inaudible) form of redundancy and (inaudible). OK, and now I’m going to go through (inaudible). OK, so sharding. People like to make fun of sharding. I like to make fun of sharding. So on (inaudible) database application (inaudible) application. (inaudible) clients coming in (inaudible) outside talking with (inaudible). And that’s fine. Databases -- all the twentieth century database technology departments are talking about a big part of that whole (inaudible) system is just completely dominated (inaudible). So as long as you know (inaudible) at a particular node, and as long as (inaudible) particular node, this is fine. The problem is when you can’t, when you have too much load or you want to have some redundancy. And so one of the standards for (inaudible) around this program is (inaudible). So it sounds simple. So (inaudible) cut our data in half and then the application will decide to which half it directs any particular client at any particular time. The problem is that it’s never that simple. All of a sudden now you have to (inaudible) application. And even though (inaudible) or we’re not going to do that, we’re going to push everything down and make everything [smaller?] after a while marketing, product management, and all those people conspired to ask for features that require (inaudible) consistency (inaudible) and so it becomes a problem all the time. And it turns out that the transaction (inaudible) very, very hard to (inaudible) databases is (inaudible) and difficult because there’s lots of (inaudible) all the time. It’s now (inaudible) I get is we’re going to take what the database is good at doing and it forces application writers to replicate themselves. And that’s why charting is a nightmare. And that’s just the beginning, right? Usually another thing that people ask for is (inaudible) application or (inaudible) and then you have to worry about choking things and you have to worry about managing memory because things are just way too big. And all these things just build up. And the funny thing is these are all problems the databases solve, right? But most databases can only draw on a single node and that’s like part of the whole problem. So people have been solving this problem (inaudible) but there’s no need for that. (inaudible) database (inaudible) storing data (inaudible) and of course the worst one is (inaudible). So we need a scale chart -- here’s (inaudible) then because (inaudible) application (inaudible). People say (inaudible) blah, blah, blah this is (inaudible) and that’s all true but no one ever gets it right the first time and it just becomes a nightmare. And everybody knows that. Just sort of like in order to get charting right, expect to (inaudible) perfection on the part of (inaudible) database (inaudible) a little unrealistic. So (inaudible). So here’s the new (inaudible) version of (inaudible). So you (inaudible) host A, host B, host C (inaudible) broker (inaudible). So this is (inaudible) the ideal solution. So you’re running a transaction (inaudible) makes (inaudible) initial ideal world (inaudible) and the application (inaudible). But then of course those customers show up and they start driving up the demand. And your boss says, “Boy, we’re spending a lot of money. Maybe we should (inaudible) them.” And so you start adding things to the domain. But (inaudible) simple. You saw that host A is bogged down. So what are you going to do? You’re going to go to the management (inaudible) and (inaudible) everything’s fine. So now I’ve got (inaudible). And then -- so this is the initial charting solution. (inaudible) well, we need to scale even more, we need another (inaudible). Now we (inaudible) same amount of time. We start a transaction (inaudible). Now you can handle (inaudible). And so (inaudible) adding a (inaudible) because there’s no sustaining (inaudible). The transaction engines are going to have access to the entire database. If there is -- we call them apps, but they’re (inaudible) whatever transaction (inaudible) you just ask your store manage (inaudible) gets it (inaudible) drops it. It’s basically like this (inaudible) memory patch (inaudible) with a giant (inaudible) processor on the front end. And so just add more when you need them. So it literally is as simple as just adding (inaudible). You can keep your application nice and stupid, which means you can get by with nice and stupid applications (inaudible) wonderful people. And so the brokers are (inaudible) they had to connect (inaudible) all they know is that there’s a database named (inaudible) whatever at this terminal. It goes to it and connects to the transaction and they have no idea (inaudible) or anything, they just run with it. (inaudible) regardless of the number of nodes it’s running on. So (inaudible) and there’s going to be a lot of them when you’re writing SQL code, a lot of them you don’t know until you do something horrible to yourself (inaudible) database and so you find out (inaudible) consistency. You don’t have to worry about that. (inaudible) keep doing that for you. So another unnatural thing is (inaudible) consistency. So the way I like to think about it is (inaudible) late inconsistency. It’s just (inaudible) and bitten after it’s too late for you to do anything reasonable about it. So you have to -- it’s not transactionally consistent and application (inaudible) think about (inaudible) think about not carrying about certain things, all the same actions, and basically what [conventionally?] consistent database is asking you to do is to make a tradeoff. For some people, engineers mostly, a tradeoff should be [anathema?] to us. But the tradeoff (inaudible). It’s a tradeoff between performance and correctness. Correctness is a really hard thing trade on, it’s really hard to think about. It’s really hard to look at something and say, “This is slightly broken.” How bad could that be? Essentially what correctness tradeoffs are. It’s like, what if he gets it wrong sometimes? What does that mean? And that’s hard to think about. So [UEB?] is consistent, transactionally consistent all the time, everywhere, on every node and you have transactions, you thousands upon thousands of simultaneous transactions, everything’s fine. And when a transaction [committed?] it (inaudible) consistent. So the way (inaudible) protocols (inaudible) is that you specify basically the number of sort of durable places that you want the (inaudible) to be registered at before you (inaudible) and so what you’re trading off there is performance and availability. So what’s committed is guaranteed (inaudible) guaranteed consistent through all time and all you’re saying is how many storage (inaudible) and that basically means what kind of availability (inaudible) survive. So you can decide basically at commit – when you’re (inaudible) you should decide sort of how much you care about a particular transaction. And so you can make tradeoffs within the same application (inaudible) availability and that’s a lot easier tradeoff to make between performance (inaudible).
(Trek): Oh, sure. (inaudible)
(Eric): (inaudible) because right now you’re picking the storage (inaudible) like the (inaudible) can you say (inaudible).
(Trek): Well, the transactions are going to (inaudible). So if you have three or four -- you know, if you have a thousand machines that are all running transactions, they’re all going to have -- could potentially have (inaudible) they’re all running however many transactions they want to run at the same time. (inaudible) order for it to be durable. So it’s an (inaudible) database. So in order for it to be durable (inaudible) it has to be durable somewhere. So all you’re saying is (inaudible) durable in one place, I care about being durable in ten places, or however much like availability you decide that you needed to (inaudible). So if that’s all you’re doing, you’re just specifying (inaudible) because eventually, (inaudible) the chain’s going to get replicated to the other storage (inaudible). So eventually every storage (inaudible) is going to have (inaudible). All you’re saying is what -- how many do you need to guarantee to have it (inaudible) to this before you’re willing to return (inaudible). So that’s what (inaudible) between performance and availability. You have nothing to worry about correctness (inaudible) returns it is actually (inaudible) and (inaudible).
(Trek): Yes, so that’s actually something we’re working on right now (inaudible) cases. So the commit protocol is something that we’re (inaudible). And that’s (inaudible). But the idea is that it would be a (inaudible). So again, that’s one of the possibilities. (inaudible) OK, so multi- (inaudible) is another thing. (inaudible) So a traditional database can monopolize a single node (inaudible) unfortunately that time has passed, or fortunately because (inaudible). So in order (inaudible) many databases in a single (inaudible) so like I said, if you have a thousand machines, you have one giant (inaudible) machine database or you can have 10,000 a 100,000 databases scattered around there if you have -- in this case we have lots and lots of (inaudible). And so each database can be set to scale (inaudible). So that’s the idea (inaudible) we figure maybe some of these guys are going to grow and (inaudible) more resources and (inaudible). So each (inaudible) process and they can only serve one database. (inaudible) many, many different processes operating on one machine and that’s something we call isolation. So basically you have many different databases running on the same machine and they’re actually different processes (inaudible). So here’s (inaudible). We’re back to the [three host?] domain, so you’ve got database A running the transaction on (inaudible) and the JVC client (inaudible). And the (inaudible) so you’re deciding to scale this one (inaudible). And so (inaudible) host C and then you have (inaudible). And now you need another database (inaudible) preference (inaudible) is going to be called database one. So that (inaudible) host C (inaudible). (inaudible) and it just sits there (inaudible) but it’s all one domain. So (inaudible) two databases. And the (inaudible) adding transactions (inaudible) agents running (inaudible). And now (inaudible).
(M): So for example (inaudible). So what would be the total storage available for the (inaudible)? Is it like (inaudible).
(Trek): One terabyte.
(M): Yeah, but (inaudible) thousands more (inaudible). (inaudible) one terabyte (inaudible).
(Trek): That’s correct if you’re going to limit (inaudible). That’s why we support (inaudible) because (inaudible). The trick to that solution is to support massive storage, and then you can have (inaudible) writing to some giant (inaudible) the key value guys are going to be working really good at (inaudible) discs together and allow people read and write (inaudible) to it, and we’re just going to focus on how do we do that in (inaudible). So this is actually a positive thing. So you think of your normal databases, giants (inaudible) software sitting on top of the machine, right? Lots of agonizing decisions have been made over the decades about how exactly how (inaudible) each and every (inaudible). Databases have and in some cases they’re on file formats (inaudible) and things like that. (inaudible) and the problem is (inaudible) basically when you sort of optimize something in the storage (inaudible) compromising other layers (inaudible) and so all these engineering compromises actually start to pollute the stack and make it much more complicated. And so we simplified that whole thing (inaudible) for a small company, and we’re not going to spend a lot of time inadequacy people’s (inaudible). The other thing is people don’t care, right? Like you have these (inaudible) people want to write storage wherever they want, they decided to pool their disks in this fashion (inaudible) and we’re not making you commit to one solution or another. And so we can keep the storage -- basically the idea is the format of our object we optimize in memory processing and then we just worry (inaudible) as fast as possible (inaudible).
(M): OK, so if I have (inaudible) that I’m using (inaudible) is noted? (inaudible) So how much (inaudible) do I have to position (inaudible).
(F): In this transaction, is there a constantly [polling?] or is it just that once the transaction is committed you extend a portion to the (inaudible)?
(Trek): No, so the idea is that you’re sitting there, so you open a transaction (inaudible) a new transaction. Everybody finds out about that transaction, that’s an extremely likely process. And then you perform your operations, we’re using multi-versioning currency controls (inaudible) associate with a particular transaction. Nothing particularly revolutionary about that. What is revolutionary is these changes are continuously in the background being (inaudible) everyone who has a copy of this (inaudible) and then when you actually request a commit, basically what that means is I want to make sure that all the changes I made (inaudible) transaction have [been made?] durable somewhere. So the storage managers themselves, they have a process that’s writing (inaudible) and they also have a (inaudible) which in some databases they call them (inaudible). So it’s basically sort of like logging things (inaudible) and so as long as there’s a permanent record somewhere in the system of every change that’s been made as part of your transaction, then it is some sense (inaudible) and whether (inaudible) excited about unless you specify is how many of these storage managers have to agree if that transaction is completely committed in order for you to rely on the return to the client. So if you’ve got a transaction that you don’t particularly care about, it’s durability, like you don’t care because if you lose a single machine, you’re going to lose the last five seconds of (inaudible) data or the last [ten minutes?] of data, let’s say it’s (inaudible) the lowest possible level so that it’s as fast as possible. But if you have something like financial data, for which you are legally culpable if something horrible goes wrong, you’re going to write it onto (inaudible) machines, (inaudible) regulatory agencies (inaudible) and that’s all within a single database. You don’t have to do any of this ping-pong that people do now (inaudible) storing all this (inaudible) as fast as possible, (inaudible) to do all my financial processing and I had two complete software [stacks?] two completely different sets of (inaudible) one database (inaudible).
(M): That isn’t quite what I was asking. Are you familiar with the IBM product (inaudible) I think (inaudible) but it was very transactional and a lot of financial houses used it. But what it (inaudible) transaction and (inaudible) yes, there is. OK you do it and then commit and then (inaudible) transaction (inaudible). And that’s what I was asking when you were saying that (inaudible) passive transaction type thing or (inaudible) a transaction is done (inaudible).
(Trek): Well it’s SQL so you actually have to -- it’s like if (inaudible) you actually have to tell the system (inaudible). Which means you can (inaudible) start a transaction that’s (inaudible) database, they forget to close it, they go on vacation, and all of a sudden you’ve got some crazy thing. (inaudible) what the hell is going on? (inaudible) But like basically when you open a transaction there’s sort of this unending list of operations that are going on and then at some point you (inaudible) I would like to know when all this stuff is done. So the way that’s implemented is in the Symantecs in the SQL standard, if anybody’s unfortunate to have read that (inaudible) tell you that you’re not supposed to return from any sort of (inaudible). And since you already asked the database, it has to be (inaudible) somewhere. (inaudible) affirmative record of everything necessarily (inaudible).
(M): So you’re saying the action after (inaudible)?
(Trek): Well, no, this whole time in a synchronously propagating (inaudible) so it’s even got a transaction when you’re updating a thousand (inaudible) or something. You read a bunch, you update a bunch, you read a bunch, you update a bunch. As these things are going on, the transaction engineer (inaudible) logging this and writing it out to this this whole time. So the whole idea is you’re hiding all this latency by (inaudible) the background and then you reached (inaudible) and in many cases it’s going to effect (inaudible) because when the storage managers have done their job, they tell everybody about it. So then a lot of people know that a transaction’s committed to (inaudible).
(M): (inaudible) each transaction has to be affected on the hundred nodes by the end of the day, right?
(Trek): Well no, so again, ideally like the (inaudible) case all you have to do is have all the (inaudible) from one transaction stored on one (inaudible), that’s it. Because (inaudible). So if they make their way from one storage manager, all the other storage managers (inaudible) the systems (inaudible) going to get released. That’s actually (inaudible). And in the (inaudible) they pull (inaudible) their memory (inaudible). So we are consistent everywhere. We’re not eventually consistent. We are simultaneously consistent (inaudible). So every node knows all about the changes, it’s just made (inaudible) little bit ahead in the last few years, but because those (inaudible) also made changes and those (inaudible) are inbound, you’re also a little behind (inaudible) but through the magic of multi-versioning currency control we can manage all this and basically minimize (inaudible) incredibly low level and then make it very, very fast.
(M): So is it true that if you have a hundred nodes (inaudible) across every single one of those (inaudible).
(Trek): Right. Unless you have an extraordinarily popular (inaudible) right. So the idea is that (inaudible) so whatever [atoms?] the clients can just add individual node (inaudible) process their transactions are going to be (inaudible) and doesn’t need those atoms (inaudible). They’re going to get dropped out of memory and it’s fine because you can always (inaudible) another transaction or some storage manager somewhere. And so basically over time you (inaudible) those transactions (inaudible) have some partition of the data that’s relevant to the transaction that it’s processing at the moment and through (inaudible) collection and aging of these things like has the workload changed on the transaction and (inaudible). And so part of setting up these things is you can actually specify how you want to provision these guys out. So if you have a small database, and you know it’s a small database, you can say I really don’t want you to (inaudible) a hundred (inaudible) or whatever it is. I would like to let it just fill up now or if you just want (inaudible).
(M): But (inaudible) have the same exact (inaudible).
(Trek): No, no, no. They just have to (inaudible). So I make a change to it, that means I’m going to be (inaudible) by making all these changes. My changes make it propagated to these other nodes. So they made their changes and are receiving my changes. As my changes come in, basically is like another secret client changing my (inaudible) representation. Now all these changes are going on (inaudible) storage managers don’t actually process client requests, they just sit there and passively receive all the transaction (inaudible) and so as those flow in, the storage managers are continuously updating their views (inaudible). Standard database (inaudible) makes sure the disk is being ripped to as fast as possible so you’re getting like low latency and high (inaudible).
(Trek): Oh, are you talking about the transactional (inaudible)?
(Trek): No. So the client makes the transaction and then the transaction (inaudible) local to that transaction, even though (inaudible) it’s registered to all these other guys. The only client that can add operations to the that transaction is the one that’s (inaudible). So we’re not (inaudible). So if you’re fortunate enough to (inaudible) pick some random example out of the air. And your transaction isn’t (inaudible) at the time, but that node is submerged, you’re probably (inaudible). But the whole point is (inaudible) transactionally assisted database (inaudible) if you have a faulty (inaudible) there’s not reason for you to expect any of that to be in a position persistent?] at all (inaudible). But that’s the beauty part. People like acid. SQL’s an oddly weird language, right? The semantics are really useful and meaningful and much easier to sort of think about (inaudible) and what you want (inaudible) world to be, but it is just sort of like deal with something that kind of writes things whenever it feels like it and then maybe your applications (inaudible). Because let me tell you, it’s going to be much harder to get it right every time you write the application from scratch. (inaudible) heart and soul (inaudible).
(Trek): Well it depends which storage manager, right? So you have all these storage managers, when you’re processing a transaction and you say commit, it gets committed to that storage manager, and that storage manager dies before that change is committed to anyone else. Then as long as that transaction is still around everything’s fine because [any other?] storage manager in half the system if going to pick up that change along with (inaudible). The problem you have is when you start having this horrible catastrophic failures and that’s why we allow (inaudible) because that’s basically telling the system like I’m super paranoid about this data and (inaudible) So you can have better (inaudible) in New York. You’ve got your hundred transaction nodes, you’ve got your ten storage managers but you’re paranoid. So you’ve got another storage manager (inaudible) I don’t know where I came up with that idea. And it just receives all these updates and is processing there. So only the big things like the big transactions along with the money involved (inaudible) you actually wait until (inaudible).
(M): (inaudible) just to add to that, let’s say you get your hands on (inaudible) database (inaudible) that your getting. You hit update and hit enter on your code line, because that’s what we’re using to connect to the database, (inaudible) shut that (inaudible) down (inaudible) and then they’re like oh, OK, (inaudible). Now, what’s going to happen is you’re going to find out -- the client’s going to return back some type of error and it’s going to (inaudible) and when (inaudible) it’s going to say, oh, that node went down. But guess what? I have like 800 other ones that (inaudible) right? So you (inaudible) and it’ll just -- next time you try to hit submit again (inaudible).
(Trek): (inaudible) like if you’re going to make this incredibly direct linkage between the (inaudible) and the web browser (inaudible) that’s not going to work for anybody (inaudible) but the (inaudible) we’re going with here is because you’ve got direct (inaudible) transaction (inaudible) by the time you establish that connection, if things do disconnect, it’s all on the application level, so basically just (inaudible). That’s not all that unusual, that’s what the (inaudible) do all the time anyway. So you have your (inaudible) database and it dies, I don’t care what people say about (inaudible) and crap like that (inaudible) phone calls happen, people run screaming to the bathroom (inaudible) go by and finally the backup database is on. You know, those are like the worst ten minutes of your life and for ten minutes (inaudible). So applications have to [retry?], it’s just in the nature of things. So you’ve got to retry (inaudible) all of a sudden what works for your (inaudible) takes me ten minutes (inaudible) be like that (inaudible).
(M): When you say [durably committed?] to a storage manager, is that to the [journal?] or is that to the (inaudible) whatever (inaudible).
(Trek): Normally it would be to whatever (inaudible). Because if the entire (inaudible) itself happens to be written to disk before the journal’s caught up to that particular thing, then that’s fine.
(M): Doesn’t it go to the journal first?
(Trek): No, it goes to both places. So basically what’s going on is the storage manager is (inaudible) of these atoms. They’re being changed by (inaudible). And in the background, it’s writing them out as (inaudible). So (inaudible) but they’ve also got some kind of (inaudible) because that’s the ones that’s probably going to get (inaudible). And so in the current system I think (inaudible) is going to be (inaudible).
(M): (inaudible) reside in memory on some transaction managers?
(Trek): No, if you’re operating on it, yes. In fact it will reside loon the ones you’re operating on. But let’s say your database goes quiet. You’re just a startup and only the people in Akron, Ohio care about it and (inaudible) and when the sun sets and people in Akron (inaudible) can’t get to your website, then everything’s just going to (inaudible) down, (inaudible) kick in, and the amount of memory used on your node is going to shrink drastically as the atoms are [purged?].
(M): So it’s not always in memory somewhere --
(Trek): No --
(M): How does that [catching?] mechanism work?
(Trek): Well so you need something in a transaction. So let’s say you’ve got your storage managers and they’re backed by (inaudible) data, you’ve got cold storage or cold transaction (inaudible) plus there’s this thing called master (inaudible) which sits on every transaction engine (inaudible) which basically is exactly what it sounds like. It tells you (inaudible) it tells you where a particular atom is. And so the (inaudible) do a really good job of translating god-awful weird kind of like English (inaudible) SQL into a (inaudible) that’s being run (inaudible) and that actually tell you which particular atoms (inaudible) is going to go to the nearest node it has (inaudible) so it’s going to go (inaudible) because in the background (inaudible) where everybody knows who’s (inaudible) so you’re going to go to the best node, the closest node. And the storage manner is going to have to read off of this and (inaudible) and then send it to you (inaudible). So that’s going to be slow. And the next (inaudible) is going to get it from you or your storage (inaudible). And so on, and so on, and so on. Basically you’re going to fill [up?] your memory and as the (inaudible) changes some (inaudible) are going to get dropped (inaudible) background. It’s totally concurrent. It’s not like -- (inaudible) we spend a lot of time making sure it’s not going to stop (inaudible) to drop things that are truly whole. It’s going to get rid of all those -- free up memory (inaudible). So as the working (inaudible) changes, (inaudible) are going to retires, new guys are going to come in to rill up that space and it’s just going to (inaudible). Only if you have -- only if your working set (inaudible) connected to your transaction and exceeds physical memory, you have a problem. But hey, guess what you can do? Start another transaction and then some of the (inaudible) will get shuffled over there. (inaudible) And that’s the whole point. You see a node that’s stressed, you see a node that’s in danger, (inaudible) start another node that (inaudible). And that’s the idea behind the whole thing.
(M): How does the JVC driver know of each of the transaction (inaudible) it doesn’t need to.
(Trek): It does not. (inaudible) When it [goes?] it’s time to talk to a broker, and the broker is wise and all knowing. And it tells it where the nearest transaction engine is. So as (inaudible) out of existence, (inaudible) coming out of existence and they update the (inaudible) with what’s going on. And so then the [program?] can (inaudible). So all the JVC drivers, plain old vanilla boring, (inaudible) stuff, which is just this little thing at the front (inaudible) just pointing at a broker (inaudible) the broker will respond go to this machine (inaudible) database. And then from that point on it’s (inaudible).
(M): So any of these like application codes or any client code, will never have to know the existence of a or any transaction (inaudible).
(Trek): All you need to know is it’s the simple (inaudible) single node database. You’ve got to know where someone is to talk to them, so you have to have a broker, a list of brokers that (inaudible) knows about and then it’s just something that has (inaudible) and if you (inaudible) application well, that’s all layered down (inaudible) all your SQL (inaudible) and then you’ve got all you (inaudible) web designer (inaudible) and other weird stuff (inaudible) and they never have to see any of it.
(M): The catalog is actually stored in (inaudible)?
(Trek): No, the catalogs are actually on the nodes. They’re (inaudible) everybody needs it. (inaudible) So you’ve got the transaction manager and the master catalog. The master catalog’s supposed to (inaudible) everything you want. The transaction manager tells you what the hell is going on. Between the two of them you’re pretty much set. And so everyone has a copy of that. (inaudible) serialized (inaudible) So the (inaudible) is making sure everything’s there. And that’s sort of how you can restore a lot of (inaudible) and so (inaudible) is this weird kind of thing and the SQL (inaudible) basically just (inaudible) and so they go through the exact same process as everybody else. So I mean who’s tracking the data (inaudible) from our point of view but (inaudible) it means there’s less of (inaudible).
(M): And if the (inaudible) dies then you application is pretty much toast.
(Trek): If the (inaudible) dies then you have (inaudible). Right, so it is just like I said. (inaudible) a little handshake (inaudible) once they’re connected to that node, if that transaction engine is still up and running and the database to which it’s connected is still functional, (inaudible) like the last guy running. Then you’ll sit there and you’ll never know about the outage because (inaudible) basically (inaudible).
(M): How do people (inaudible) connection (inaudible).
(Trek): So the broker is actually going to redirect you to a node but then once (inaudible). That doesn’t mean (inaudible) same database is going to be (inaudible) node. (inaudible).
(M): So exactly if you have enough (inaudible) to let’s say node one, and another (inaudible) connected to node 1,000 and (inaudible) same (inaudible).
(Trek): (inaudible) only responsible for the (inaudible) They’re only responsible for writing those changes (inaudible). So they’re actually replication to each other. So every node replicates to every other node. (inaudible)
(M): (inaudible) on node number 1,000 would get the message from node (inaudible).
(Trek): If they both have the same (inaudible). So (inaudible) represented by (inaudible) random client (inaudible) then he will receive that (inaudible). So updates will (inaudible). So we leverage (inaudible). So people don’t have to do (inaudible). All you do is make an update (inaudible). That’s weird (inaudible). So all these things are just new versions on top of that. So there’s just a little bit of the tango that has to happen if you’re (inaudible). Basically it’s just (inaudible) and the first person (inaudible). In many cases an update doesn’t go away, it just gets a new version (inaudible) and then the transaction keeps marching on ahead. And so the whole system is set up so it uses NBCC so that you’ve got node one and he’s updating and node 1,000 is reading well that’s a race. It’s always going to be a race. It’s always been a race in the database world. So what you can do is you can sort of (inaudible) lock down the world and say like this (inaudible) node one, sorry, and then just do that. Or you can just have this multi-versioning currency control so that basically what that’s doing is it’s sort of overlaying a particular view of time on this whole thing, so that if node 1,000 (inaudible) exact same moment that the node (inaudible) update, it’s basically like node 1,000 did the (inaudible). And at that point you’re sort of like (inaudible) in time and space, everything’s fine. As long as there’s not conflicts, as long as everyone has (inaudible) and that’s what (inaudible).
(M): (inaudible) special case or a different case of the same (inaudible)?
(Trek): (inaudible) so basically you’re saying (inaudible) nodes (inaudible). So in every partition there’s a problem. So we don’t want (inaudible) and it’s very unlikely that (inaudible). So we basically need to add one partition (inaudible) ongoing. And so that’s actually something that we’re actually working on at the moment with the idea that it would be (inaudible) partition would be (inaudible) and we’d keep on processing, plus, the smaller partition (inaudible) and stop processing (inaudible) and shut itself down until (inaudible).
(M): So (inaudible) take drastic measures to make sure that a partition can never happen? Because it doesn’t seem like (inaudible).
(Trek): So it’s not so much the partition never happened, it’s that you notice the partition and you fix it. Because multi-partitions there (inaudible). So you have (inaudible) but network partition means that you can’t actually use all these resources together. (inaudible) and so finally it gets resolved (inaudible) and you can start populating the (inaudible). And the storage managers may come back online and have this whole reconciliation process that goes on, so that they get all the updates that they missed. So everybody sees fresh data. The nice thing about this is unlike the (inaudible) separate it into partition, it takes forever because the network has to get all this data on disk from the source and that’s (inaudible). What happens in this case is the storage manager (inaudible) from the other (inaudible) that never had an outage. And so basically our transaction (inaudible). The minute (inaudible).
(M): So the application (inaudible) partition (inaudible) does someone somewhere have to be aware the application (inaudible) shut them down. (inaudible) and your (inaudible) nodes? (inaudible) shut the whole thing down or I imagine you would, right.
(Trek): Well, I mean, it sort of depends (inaudible) application (inaudible) so it’s got similar problems. But those nodes are basically (inaudible) exactly what would happen (inaudible). But they would somehow know (inaudible).
(Trek): So this is the Chairmanship (inaudible). OK, so if we’re given an atom it’s basically a chairman. All the chairman basically (inaudible) a fencepost and it’s the first person past the fence post in that race is going to get that version. So what happens is you sit there and (inaudible) continue on. Meanwhile the chairman gets to decide. Because (inaudible) update goes out and the chairman sees it and basically the first person to get to the chairman that particular sequence number is going to own that sequence number (inaudible) however, it’s (inaudible) the other guy fails, there’s lots of different possibilities that are just (inaudible) things (inaudible) sequence number and everything’s OK. You just keep on processing (inaudible) everything’s fine. So that’s basically how that’s handled. So in cases where you have to have a certain amount of globally consistent (inaudible) we have this chairmanship process, but it’s basically (inaudible).
(Trek): No, no, no. It’s not like there’s a chairman host and all the chairman (inaudible). It’s just a breaking in atoms. Some nodes (inaudible) to be chairman for that node. So chairman (inaudible) as atoms come into (inaudible). And if there’s not atom on any transaction node, there’s no (inaudible). And so that’s basically (inaudible).
(Trek): I don’t know. So for instance (inaudible) but there’s no reason why you can’t have nodes on the east coast, nodes on the west coast. Basically because everyone’s pinging all the time, the nodes on the east coast (inaudible). If you (inaudible) on the east coast and all your storage on the west coast (inaudible) make their way to the west coast. It doesn’t mean they’re not going there. They’re just going to take a lot longer. And so if all you care about is committing whatever subset of nodes are on the east coast (inaudible) you can commit and just keep on processing and then the west coast becomes (inaudible) and vice versa. (inaudible).
(Trek): No, (inaudible). It depends on your application.
(F): (inaudible) can we get the demo because we’re going to have more questions.
(Trek): (inaudible) have four hosts, four brokers and (inaudible) storage manager on this guy (inaudible) now we add storage manager and then you can select however transaction managers you want. (inaudible) And now we need to create a BBA account. And a node for your database (inaudible) if you want (inaudible) unfortunately the resolution is a little bit low. It’s going to show you here are the two things running, the storage manager and the transaction engine. So now this is just doing something really stupid. It’s just creating (inaudible) on this particular database (inaudible) five seconds, and then dropping (inaudible) and the whole point behind this is so I can show you these graphs sitting here. It takes every ten seconds but they’re just slowly pulling up here. (inaudible) You have (inaudible) and then you can go back. (inaudible) I just created a database and it started growing data. And then I come back here, I can see it (inaudible) and then let’s say for some reason I decided that I want to add a process, want to add a transaction engine, add another one here. (inaudible) in real time (inaudible) transactions per second it’s going through and (inaudible) and so as this thing sleeps you’ll see (inaudible). So as this thing basically disconnects and reconnects, it’s going to be bouncing back and forth (inaudible). And start a new agent (inaudible) here (inaudible) now there’s five of us. Bing. And that’s seriously how simple it is, right? So you have a machine (inaudible) and now you can start (inaudible). He’s sort of (inaudible) so he’s a lot slicker. (inaudible) I’m just a poor engineer with a foul mouth, so this is my (inaudible). (applause)
(M): (inaudible) we’ll get a few questions in that (inaudible) before but then it’s getting late so they’ll be around for one-on-one questions. He’ll stay as long as you guys want to hang around and ask them. Also, we’re giving away (inaudible) headphones. We’re going to pass around a bucket. Feel free to throw your card in there. You can potentially win the awesome (inaudible) headphones. The bucket is coming around. The first question.
(Trek): Well, the atoms are address by (inaudible) that’s the mathematical (inaudible).
(Trek): No, the atoms are chosen -- like the atoms are basically almost the same size, so they’re (inaudible). They’re optimized for good memory performance and (inaudible).
(Trek): Yes, (inaudible) is a very ambiguous (inaudible) different things to different people. So if you are talking about (inaudible) processing millions of transactions per second, (inaudible) so that’s part of the way that you handle that is (inaudible) being able to -- quickly being able to respond (inaudible). If that’s what your definition of [big?] data is, that’s (inaudible).
(Trek): (inaudible) part of what we’re doing (inaudible) typically in an all out environment you’re replicating (inaudible). One of the things that’s really cool about NuoDB is because you have all these different nodes, if you’re running a long running query on node one, that doesn’t block (inaudible) engine from (inaudible) other existing (inaudible). So you don’t have to do this other separate replication process to handle some crazy messed up (inaudible). So you can continue to process your (inaudible) and it’s (inaudible). (inaudible) you’re not going to have stuff vanish out from under you (inaudible) weird consistency issues (inaudible) that’s why in a lot of cases people replicate (inaudible) special purpose (inaudible). So let’s say you’re running (inaudible) and you’re updating records. But at the same time you’re running a long running query where you need to sort (inaudible) merge data all together, with (inaudible).
(M): I think you mentioned you’re working on being able to say X number of storage managers (inaudible) participate in the transaction, is it just a number or can you give that names? Can you say like where? Can you say I want these two in this (inaudible) and this one --
(Trek): So again, that’s part of the ongoing sort of basically whatever you specify is going to translate to a number. So there’s certain things that can make sense. So one is specifying a subset of them, one might be specifying majority. So at the moment the actual mechanism is just going to be specifying a number. Like so it’s (inaudible) technology in it. It’s going to be a big part of the (inaudible) release. But the specifics, it’s not single standard. (inaudible) I specify what that means (inaudible) I can’t really say anything specific about that. But we’ll be able to support (inaudible) to a number where it’s between zero and N where N is the number (inaudible).
(M): OK, Trek and Wiqar will be here for questions. Thank you all for coming. (applause)