NuoDB CTO, Seth Proctor, walks through the challenges and benefits of alternative database architectures - from sharding to running NoSQL to taking a chance on a distributed SQL architecture. Highlights how the more common alternatives compare to a distributed SQL architecture.
(Eric): All right, we are going to very soon give away our drone which we believe flies what may be a rolling one. OK, you’ve got to put your business card in that bowl. Please don’t leave it on the -- oh, you’re filling out? That’s OK. When you’re done filling it out please pass it back. Put your card, business card, in there. If you don’t have a business card you can draw your own business card on those sticky notes. We don’t give E for effort so if you get like really creative it’s cool but you still might not win. (laughter) But whoever throws their card in there has a chance to win. Please keep that moving around the room and we will get started right now. There is one empty seat here. Somebody abandoned it. Nope, sorry for giving away your seat. Oh, it’s yours? OK. Our speaker this evening has 15 plus years of experience, holds eight patents, and formerly worked for both Nokia and Sun Microsystems. Please welcome the CTO of NuoDB, Seth Proctor. (applause)
(Seth Proctor): In the back can people hear me at this point? Good. First of all thank you for coming here tonight. This is great to be speaking to such a full room, such a lively room. This is great. Also thanking your taking our T-shirts. [Knox?] in the back of the room particularly thanks you because he did not want to have to carry them home. So there are more. Please come up after the talk and feel free to grab one if you didn’t get one or if you got the wrong size. In addition to making T-shirts, at NuoDB we also do a database. (laughter) And I’m actually not going to talk a lot about -- OK, so the real question is how many times I’m going to trip over these cables tonight. I’m not going to talk a lot tonight about our database specifically. I’ll be around afterwards. I’m happy to talk about what NuoDB is. I will come back to it a little bit at the end of the talk. The short version is what we do at NuoDB is a relational database. Believe it or not SQL like became so uncool that it has come back around now and it’s like hipster cool again. (laughter) So we’re glad. Like rarely am I on the front end of a trend so I was kind of excited when that happened.
We are a relational database that does all the things that a SQL database should do, multiway joins, massively complicated transactions, you know, [views?], [store?] procedures, yadda, yadda, yadda, but designed with the assumption that what people are building today are distributed architectures. We’ve done client server for a long time. We’re now thinking about distribution, we’re thinking about cloud, and that’s the set of problems people want to solve and that’s what we’ve done at NuoDB. And what I’m going to talk about tonight is less NuoDB specifically and more what I think that means. You heard earlier I was at Nokia, I was at Sun for a very long time. For those of you in the room who have been in the computing industry for more than a few years cloud is not really a new thing, right? It’s just a new name. We’ve been doing grid computing, horizontal scale, on demand computing, you know, all kinds of different words for kind of similar concepts for a very long time. And so cloud, to me on the one hand there’s a lot of buzz around it, there’s a huge hype curve, on the other hand I think there actually are interesting architectural issues that come up when you start talking about cloud. And I think to me that’s kind of where the interesting definitions come from. When I talk about cloud it’s not about is this public cloud or is this, you know, private, on premise? Does it have to be Amazon, does it have to be open stack, does it have be a set of protocols?
To me when you’re talking about cloud you’re talking about a set of architectures and a set of problems you’re trying to solve. And that’s really what I want to talk with all of you about tonight is kind of where I think that comes from. And particularly in the data management space why that’s challenging. Why kind of the approaches that we’ve taken for the last 10 years, taking those client server models and trying to retrofit them into this new set of problems, only gets you so far. And hopefully convince you all that those of us who have spent the last five or 10 years kind of stepping back and fundamentally rethinking architectures weren’t totally crazy. A little crazy maybe, but to survive this winter in Boston you had to be at least a little crazy. So that’s really what I want to talk about tonight is what is unique about cloud, what I think that is, and why we’ve done what we’ve done at NuoDB.
What do I think cloud means? I think it means many things, but I think there are a couple of common threads when people talk about cloud architectures that are really important, right? And if you’ve gone onto Amazon you’ve experienced this. You’ve tried to run open stacking experiences. But if you’ve just tried to spin up some virtual machines and like get them running on a new server, right, or you tried to run Docker or you tried to do kind of anything where you’re running something in a slight more natural format than you’ve done before, you’ve kind of gone through this set of issues, right? What I think people mean when they’re talking about cloud architecture is first I think everyone is talking about on demand capacity, right? Pay for what you want, get resources on line when you need them, shut them down when you don’t, right? And I think that’s really one of the reasons that Amazon is so -- did anyone see their numbers last week? Like $5.1 billion and growing at 50% a year? Anyway sorry. If you haven’t seen it Amazon like last week for the first time was like actually had to announce their numbers and they’re really -- they’re kind of huge.
But you know, that’s kind of, when you look at Amazon as a dev test environment that’s part of what makes it so powerful. It is so wicked easy to get on there and just spin up an instance when you need it, shut it down when you don’t, add a few more, go to another region and spin some things up. Right, that’s one of those really powerful ideas of cloud. It’s this like infinite pool of resources at your disposal. And you know, whether that’s public infrastructure or not what that really means is there needs to be a model for dynamic provision, right? There needs to be a way to get at those resources, to get things. And that gives you flexibility, right? That gives you the ability to get things when you need them. That gives you the ability to program the model you need kind of operationally. And increasingly the way people think about that is more and more towards the model. One of the things that people associate with cloud is getting away from those big vertically scaled systems that we certainly did at Sun, that other people did for years, right? You don’t go faster today or handle more capacity by deciding to buy like a bigger refrigerator and waiting six months for it to show up, right? You do it by spinning up a few more machines. My CEO and I have an argument all the time. He likes to say, “You want to go faster? Add more machines.” And I’m like, “No, no, we want more throughput.” It’s different.
(Seth Proctor): It’s going to be that kind of night, huh? OK, good. (laughter) That’s good. That’s kind of expectation setting. Thank you all, I appreciate it. But like I think that’s important, right? That notion of being able to get to commodity infrastructure but be able to solve all of those enterprise scale problems, right? And to me another thing that’s great about cloud that’s not on the slide is it’s very democratizing. Right? It’s like I loved hacking on giant servers when I was at Sun, but I love the fact that today I’m not a Sun employee and I can still hack on giant systems. And it’s because I can get to that kind of capacity and we’re building models based on this kind of demand model, not on hacking to be one of the few people who can get on one of these big systems. Big systems will always be there, right? The [ink death?] of the universe, there will be still be like mainframe systems and they’ll keep going, right? And that’s just kind of the reality of our community today, but increasingly that’s not where we are, right? And increasingly that commodity model doesn’t just mean cheap, it doesn’t just mean kind of easy to get to, but it means hybrid, right? It means I can run things that might be in my data center and a public cloud or might be in multiple public clouds or might be just in my data center but on different kinds of machines with different kind of resources, right? Why? It’s cost effective, helps me solve problems efficiently, it doesn’t make me worry about the next upgrade cycle, right? Again, I can get things when I need them and shut them down when I don’t.
And how do you make all that work? This is one that we sometimes lose track of, right? But as you build these increasingly complicated systems you need to think about simplicity, right? Because when you start talking about well, I’ve got 10,000 nodes running and it’s across, you know, seven Amazon regions and I’m running four different OSs, you know, across 10 different data types and I’m using EDS, NS3 and Glacier and everything else, and suddenly you’re like I have no idea what’s going on, right? And so that’s, you know, when Netflix talks about being wildly successful the way they’ve done it is they’ve built all these tools on top of it, to extract, to simplify, to make it possible to actually manage at this scale, right? So when we’re talking about cloud we’re also talking about increasingly moving towards simplifying the programming experience, the operational experience, you know, the ability to understand what can happen. And that’s really one of the core ideas that we’re going to talk about today is why I think thinking about rethinking core architectures for data management is so important, because that’s what gets you to those much simpler models.
And you see this everywhere, right? You see this with monitoring APIs, you see this with management APIs, you see this with the tools we build. I’m sure everyone in this room has at least some exposure either to like being pleased with the simplicity of something or being infinitely frustrated with the fact that it feels like you’ve got great resources on hand but you can’t do anything with them. I won’t cause people to date themselves by raising their hand, but if you ever did, you know, NPIC programming and were really frustrated that like your FORTRAN program or your C program or whatever was getting like 20% cycles across a cluster, like that’s what I’m talking about. Now it’s all [Jay-sawn?] instead of something. I don’t know, whatever the kids are using now. I can’t keep up with it.
And resiliency, right? Resiliency. This is one of those things that I think is one of the real promises when people think about cloud computing, right? Is that if I run in multiple places then I can sustain failures, right? My service as a whole can keep going even as I lose pieces, right? And if any of you in this room is from the telecom industry, right, you can replace the word cloud architecture with telecom architecture and like a lot of this just works, right? Because like why is there a dial tone 100% of the time we pick up -- OK, for people who don’t know this there used to be these things called phones sitting like on a table and you picked it up and were like beep, beep, beep. You know, the dial tone, right? I seriously joke, but like I recently was doing this like for a university audience and I got a whole lot of blank stares. I was like, you know, the dial tone when you pick up your phone and they’re all like (laughter). I was like God, I hate everyone in this room now. (laughter) But like the dial tone was always there and it has to always be there, right? And the reason it’s always there is because this is the same design methodology, right? You don’t design with redundancy; you don’t design with like another copy of something you can fail over to. You design assuming things will fail, assuming there’s like mad complexity, you build abstraction layers on top of it to make it simpler to work with, you build systems that by their nature not just are redundant but are able to react to failure and get themselves back into a working state.
And then you have a dial tone. You have a thing that is always there, right? And in the telecom industry that’s just like -- that’s taken for granted. In the computing industry we’re kind of starting to get spoiled enough by [public dialup?] that we’re getting there. And this is like one of those cases where I think getting spoiled is actually really good, right? Because it’s kind of making us all raise the bar a little bit higher and say like wait, that’s what I really want. I don’t want to have to be a Fortune 100 company to get this. Like I want that to be the baseline for my startup, I want that to be the baseline for any new thing I’m building today. It should never go down. It should always be there, right? The platform layer should always be there.
And that’s really like when we talk about benefits, like why we’re doing all of this cloud nonsense, right? We’re talking about being able to handle greater capacity, but do it in a cost effective fashion, right? We’re talking about higher availability and better failure [time?] because everything’s automated, everything’s monitored, right? Everything is designed with this failure model in mind. Commodity is great except guess what? There’s a reason it costs less, but it fails. You know what I mean? Seriously, when you spend like $1,000 for a disk or you spent $100 for a disk part of that’s the name brand and the hype and like people have been fooled into buying $1,000 disks for 30 years, but -- there we go. It failed and it came back. Hurray. Part of it really is that like the [lead?] time between failures are radically different, right? Or the temperatures the disk can sustain or like the other weird things that you don’t expect, right? The error correction and things like that. So when you talk about commodity, great, things are cheaper, things will fail much more frequently, right? And so we’re building systems with that in mind and that’s starting to bubble up past kind of core infrastructure to software services we’re building, which I think is a really good thing.
Another benefit that I’m going to talk about a little bit later is that increasingly we’re taking all these ideas and we’re expanding them beyond the data center, right? And we’re really thinking about global operations and that gets really interesting, right? So I’m going to come back to that. What are the challenges, right? Anyone in this room who’s ever studied distributed systems or thought for like maybe two minutes about the challenges involved in like getting something off the laptop and onto say more than one machine, right, has appreciated that it’s an exercise in tradeoffs, right? Later on I’m going to define what I think a distributed system is, but the other definition possibility is it’s an exercise in tradeoffs. And when you decide to distribute something it brings challenges. It brings a lot of great benefits; it also brings real challenges, right? And some of them -- lots of failures happen with greater frequency in crazy, crazy ways. Not to pick on Amazon, but like anyone in this room who’s building anything remotely distributed, test it on Amazon. Because I have never seen things fail the way they fail on Amazon. And if it runs there it will run anywhere, I guarantee. Like seriously, take that one for a spin. For the record, I do love Amazon fiercely. It’s just that like, you know, I’ve never seen TCP connections like end up in the state that they can end up in Amazon. Like I still can’t explain it. And I’ve talked to Amazon architects and they’re like, “Magic, it happens.” No, it doesn’t happen. TCP is a thing. Like it’s supposed to deliver. How did you (laughter). It is much more difficult, by definition, when you distribute something to get a global view of what’s going on. If you want to understand holistically what’s happening as a service and now you’re running in lots of different places, guess what? That’s harder and it costs.
Security. It’s harder. You have more things running in more locations that can be attacked in more places. You have more copies of your data. You probably have more users accessing the system. You have more network [entrants?], right? Security is harder. Security -- if distribution is an exercise in tradeoffs security is a process, not any one thing, right? You’re as strong as your weakest link and so as you have more links guess what happens? And managing the lifecycle of your data as a result is naturally harder, right? And everything else about distributed computing. For folks who haven’t done a lot of distributed computing stuff, you know, go home tonight and read like, you know, ask Wikipedia to tell you about the two generals problem or something. And like that’s -- you’re welcome, that’s the next hour of your life that you’ll spend sitting going, “Oh, that’s interesting.”
But the thing is that like you kind of understand fundamentally how to scale a lot of stuff, which is good, right? Which is why we’re growing at the rate we’re growing, right? I mean we do understand load balancers and [nate?] services. We have been able to build really good application level horizontal scaling. We have caches. We have content distribution networks, right? We have a lot of great technology. Down at the storage layer we understand how to like take arrays of disk and build that out to build increasingly redundant resilient systems, right? We understand a lot of these pieces which is really great. The reason I’m here tonight is because of the database. Because despite all those other things the thing that is foundational to almost every application online is the thing that is fundamentally hard to distribute and make work in this cloud world that we’re talking about. All right? Why?
Why is it so hard? A couple of reasons. I’m sure there are lots of possible reasons. I’m going to offer at least one, right? You go back to the late 1960s, right? And there’s this guy Edward Codd who’s sitting around thinking about formalism for consistency models, right? And thinking about a programming model for data. So he comes up with a set of rules that is really pretty cool, really foundational and fundamental, right? And it’s this kind of abstract calculus for how to reason about correctness of data. And then a bunch of people at IBM sit down and try to build something inspired by this. And depending on, you know, who they talk to either they kind of got it right or they were like wildly off base and like Ted Codd was angry and never wanted to talk to the guys at IBM again. But basically they went off and they designed this thing called System R, right? And System R was kind of this very early first take at building a database based on SQL, right? And for those of you who are SQL people, you know, you may love it, you may hate it. For those of you who are not SQL people my personal feeling is that SQL has at least one true brilliant component to it which is that it’s declarative, right? You don’t tell a SQL database how to go solve a problem, right? When you’re doing map reduce you’re kind of saying like brute force, here’s how to go do it. When you form a SQL query you’re saying this is the problem I want solved. You, SQL database, you go optimize it. You figure out how to rewrite it, how to do whatever with it, right? I mean this is what compilers do. This is like, you know, if Itanium had been successful, which it wasn’t because it was totally insane, but like had Itanium been successful like that was the value proposition, that like we’re going to do an even better job than that, right?
SQL has this declarative nature and that’s -- again, that’s abstraction, that’s really powerful, lets you write your application logic and separately think about the database. So a lot of really good things came out of that early work. One of the limiting things was that just realistically in the 1970s not only did we not have much data, but we didn’t have many places to put it. Like disks weren’t big, memory wasn’t big, you know, bandwidth wasn’t a thing yet really. I mean there really wasn’t a lot of capacity, right? And so what did they do? They said OK, we’re going to come up with a database architecture where we’re going to essentially reinvent virtual memory. Maybe not reinvent because that wasn’t a thing yet either, but like we’re going to do virtual memory, right? We’re going to take data; we’re going to chunk it into things called pages. So we’re going to lay out the data. We’re going to say a table looks like this, a row looks like this, we’re going to write that row, you know, in order into memory, we’re going to map that memory directly onto disk. And guess what? Now it’s really cheap to take that chunk of contiguous data and pull it up into memory and it’s the right size page so that it fits on a disk really efficiently, so that it goes through, you know, your cache lines really efficiently, so we can work with extremely, extremely limited resources.
Really, really good idea in 1975. Probably a really good idea through the ’80s and maybe into the early ’90s, right? Fast forward today, that’s fundamentally why when people talk about relational databases they’re still talking about vertical scale, they’re still talking about getting that bigger box to handle more capacity, right? Because the optimization is all about that pipe between disk and memory, right? And that’s really, really hard to scale up. You can try and apply caching, but guess what? Like your caches aren’t going to be aware of the transactional model, your ACID model, kind of your data model. And so it’s going to break the consistency rules, it’s going to break like what clients are trying to do in terms of interacting with each other. That’s a real problem, right? It’s also really hard to provide a highly available at this point because you’ve got this single pipeline and if it fails what happens, right? I mean you can try to make a copy. We’ll talk about that in the next few slides, but like that’s problematic, right? It makes schema hard to evolve, right? So people hate schema because they’re like, “Oh, schema’s so inflexible. I’ve got to start out right from the beginning and decide what my data looks like.” People shouldn’t hate schema. They should hate the fact that this page based model means that making any changes to your schema requires going through and touching everything on disk and moving it around. And so it’s highly disruptive, it’s highly expensive. It’s not schema that you all should hate and I’m sure a few people in this room are like, “Grr, schema, schema.” It’s not schema, right? It’s inflexibility of schema. Structure is good. Understanding like what your data is and how to work with it is a really good thing. Being able to work with it through different views and like dynamically change that schema and look at the same data from different points of view, even better. We’ll talk about that later. But like it’s that rigid notion of schema that is so challenging.
So this architecture is inflexible, it is hard to scale out, it is hard to provide a continuous availability model around, it is hard to harness commodity because by definition what are you doing? You’re optimizing around disks and bandwidth in a system that can like just scream through things. And that’s why if you talk to like a seasoned Oracle person they’re all like, “Well, tell me about IOPS.” IOPS, IOPS, IOPS, right? We’ve all been trained from the data point of view to care about IOPS, but cloud is not IOPS. Cloud is on demand scale and it’s running things in memory and those are contradictory ideas.
So what have we done? Like one of the things I think we’re all really good at, like everyone in this room, whether you’re a hardcore programmer, you know, whether you’re someone who kind of plays with different computer systems, whether you don’t identify as an operator or a DBA or a programmer or anything else, right? I think probably one of the common threads of everyone in this room is that what we do really well is pragmatically solve problems, right? Is recognize there’s a thing we have to do and if we don’t have all the right tools for it right now, well, maybe eventually the right tool will be there. In the meantime we have problems to solve and we use the tools at hand to figure out how to solve them, right? So what have we done as an industry over the last 10 years? We’ve done the best we can with the tools we have to try to solve these problems. And a couple of common patterns that we see, right? We’ve applied replication. We’ve said we want a database in one place and every time you make a change, copy that change over to another place, right? Or like if you’re really brave like do that in both directions and hope that maybe you got it right. I’m not going to go into the details of why that’s a -- oh. All right, we’re back.
A lot of good systems built with this model to provide higher degrees of availability, to provide redundancy and safety, right? But anyone who’s actually built a system around this knows where the challenges are, knows kind of what the contract is that you’re making with whichever evil deity you fear. We can take the other approach, right? Sharding. Oh, you know, we should have saved a [sophomore?]. Who knows where the word sharding comes from? So everyone in this room kind of knows what sharding is, right? You take a database and you say, you know, I’m going to decompose it into four databases and each of them has a complete disjoint set of the data, right? And now I can do like a subset in one place and a subset in another place. And that’s why like, you know, when you play your online game with your buddies you may or may not be able to play with your buddies because like they may be on a different shard and that’s really frustrating. Then you’re like, “Can I play with them?” and they’re like, “Yeah, it’ll take us a day to like figure out how to get you over there”, right? The game industry is actually where this term came from, right? So you go way, way back to one of the first massively online game that people were trying to build and those developers were pretty sharp and they realized that their system was not going to scale, that they were going to have this problem where on day one there were a couple of users, day two they were going to get a few more users, you know, by day 10 they were going to get swamped and the whole system was going to go down because one database couldn’t handle it.
And then some -- again, this is like, you know, pragmatically how do we solve problems. Some bright person was like, “Wait, I’ve got it. We’re a game. We’re going write into the lore of the game that the reason people are playing this is there used to be one world and it existed in the view of this gem and then some evil force came through and shattered that gem into all these different pieces, shards. And so the universe was broken into shards and everyone’s going to play in a different place and ideally everyone’s working towards trying to bring the gem back together.” And of course that was never going to happen (laughter) but it gave them -- like the semantics of the game gave them an excuse for doing this. I think it’s just genius, right? It’s just like the epitome of make do with what you can and figure out how to do -- and the game industry is -- I spent some time in that world and it’s just awesome because you do get to come up with all these incredibly creative clever outs essentially. For the rest of us we’re like, “Yeah, sorry, you’re not on the same like database as your friend so like, you know, someone saw an update before they did. Like sorry, you know.” And that’s what sharding is, right? It’s this separation view of the world. And it lets you address more capacity, it lets you solve a number of real world problems, it also breaks the view of consistency because now you’ve got multiple databases, right? And you’ve got queries you simply cannot run because they cross these different shards, these different views of the world. Or you can run them, they’re very expensive.
And again, if anyone wants to like, you know, brain teaser, go read about what XA is tonight and, you know, go in to your colleagues tomorrow and be like, “I just read something awful that I want to unread now.” We can also take a third approach, right? Which is increasingly popular and this is the one that kind of keeps me awake at night which is that we can say you know what, who cares about consistency, let’s abandon that, people don’t really need that. Like there are a few, you know, high end applications that care about consistency, care about transactions. The rest of the world, we’ll just get rid of the model of consistency. Application developers are pretty good, they can probably figure out what to do, when there’s conflict they can probably sort it out. You know, when there are failures, eh, they can probably figure out how to get back in the right state. You know, if someone lost a few dollars along the way, meh.
You know, I mean seriously, like I make fun of it and laugh, but like that is the world that we live in, right? That is actually like a lot of systems that behave like that. Again, not because people are idiots, right? But because we’re trying to do the best with what we have, right? We tried [i-dunks?]. In fact, part of the reason I’ve been with NuoDB, I’ve been with the company now for about four and a half years since we first started, and the reason is because I did try to do this a few times and it just got increasingly frustrating and maddening having to explain to application developers or having to explain to myself when I wrote something that didn’t work correctly, what I’d done wrong. But I said you know what? No, I’m done with that. And see, here’s the thing about consistency, right? By the way, if this was like one of the cool kid hacker conferences where you had to like write a certain amount of closure to even be accepted or something like that, this would be one of those slides with like a kitty with like big eyes going, “I can haz consistency with something?” I don’t do that. Like I don’t do the clip art thing. Or like if this was like -- if we were challenging like James Mickens right now this would be like one of his angry like -- sorry, do people not know who James Mickens is? OK, so you’re all wasting time being here at my talk. Like stay because I’d like you to hear this talk, but seriously go home tonight, forget about the other stuff I said to read, go home tonight and go pull up one of his talks on databases because they are hilarious. Also he’s really smart, but they’re also really funny and they’re probably like pound for pound much more entertaining than this talk. I apologize in advance.
But seriously, consistency, here’s the thing, right? We’ve spent a lot of time over the last few years convincing ourselves that transactions are these very special purpose things, right? I hear people say all the time, “I have a few things that need transactions, but the bulk of my system doesn’t need it.” Transactions are complicated, transactions are expensive, transactions are whatever. Transactions are a thing that we invented to make our lives easier, right? Because they’re a programming model that gives you consistency, right? But consistency doesn’t mean transactions, right? Consistency is just a way of talking about the correctness of your data, right? It’s a way of understanding something about the state of your data, about the state of it relative to other actors in your system, and when things fail, and again, things will fail like crazy in the cloud, what state you’re in, right? In that previous slide where I’m talking about sharding, splitting things into multiple different databases, there’s still a consistency model, right? It’s not as strong maybe as first one because you’ve got disjoint sets of data with disjoint sets of transactions and if everything fails you don’t really know how each shard is left relative to every other shard, but there’s still a consistency there, right? You can abandon transactions and still have a consistency model in a number of fashions, right? But when you get rid of consistency entirely, when you can’t understand anything about the state of your system, that’s when stuff fails at 2:00 in the morning and the pager goes off and like people are -- sorry, pager, that’s another one. (laughter)
Dial tone? Pager. Those of you who don’t know, there used to be this thing that you wore like right here and it would beep and you’d be like oh, now I have to go like call -- and all it would tell you is the phone number. It’s like go call this phone number, right? Which also probably had a dial tone. That’s pretty cool, right? But seriously like I was talking with someone who works at PagerDuty and he’s like, “Yeah, our employees, like some of them don’t know what a pager is. Like on day one we had to explain to them like this is why the company is called PagerDuty.” I was like that’s kind of awesome. And kind of sad. But like seriously, this is the thing, like beyond everything else, right? This is the thing that worries me that we’re losing track of. It’s not so much SQL. We can argue is SQL the right programming language or not. I like it and we can use a SQL database. You know, you could argue whether transactions and the way we implemented them in SQL are the right model or whether you want something that is a base model or whether you want something that is kind of coordination avoiding. There are all these really interesting techniques people are playing with today, right? But at the end of the day what you really want is you do want consistency because that’s the foundation that you start to build everything else off and reason about what’s going on in your system. And without a consistency model, without an ability to understand your data at any given point, there are all these crazy side effects, right? Side effects in terms of failure, side effects in terms of what like I see and someone else sees, right?
And I’m now going to cite the thing that everyone cites when they talk about consistency that’s actually not consistent, which is ATM machines. But everyone loves to talk about ATM machines. It’s like if you’ve only got 50 bucks in your account, like if you and your wife both go at the same time to take out those 50 bucks how do you make sure it works correctly? And the trick here is that actually ATMs are non-consistent. Like they are eventually consistent. They actually break this, but everyone loves to cite that example so I’m going to cite that example and say you want those ATMs to be consistent, but maybe you don’t. Maybe you both want the 50 bucks, but I’m saying like then what’s going to happen, is one of these going to be audited (inaudible) it’s going to be a mess and like you’re going to spend money that you don’t have, you know, and then the economy goes down. So, consistency. Sorry?
(M): The bank (inaudible)?
(Seth Proctor): Well, the bank -- [someone has happened?]. Yeah, someone -- other side effects that happen, right? When we’re not thinking about these models and we’re starting to segment everything and split into all of these different views of the world, one side effect of starting to take a database and decomposing it into all these different pieces that you can’t reason clearly about is that you get applications that are tied to your deployment model, right? So if I take a database that’s one database and I split it into two shards and I say like, you know, A through L is on one database and M through Z is on the other database, right? What’s just happened? I’ve now had to change my application logic to understand that. And if I want to change my database I’ve got to change my application logic again, right? Hence DevOps. OK, now so here’s the thing. I’m actually a huge fan of the idea that developers should appreciate the pain that operators live through day in and day out. And I’m a huge believer that operators don’t appreciate enough how hard it is to write application logic and should be fluent in certain languages. So like philosophically a lot of what comes out of the DevOps movement I really love because it really forces people to understand the world in which they live.
That said, I kind of feel like the fact that we’ve had to go into this world is really problematic. And in fact, recently I was giving a talk and it was early morning, it was a beautiful thing, I hadn’t had enough coffee and I actually went as far as to say that DevOps is a bug, which got a couple irate questions from the audience. I didn’t mean that. What I meant was just simply that like why we have such a fascination with the DevOps movement is in large part because we have this tightly coupled application and operational model, right? And that’s the thing that drives me crazy, right? Because it means that when the application developer wants to make changes it’s hard to do so. And when the operator wants to say, “I just want to run in more places” it’s hard to do so. And all those things we were talking about earlier, on demand, flexibility, all those things go out the window, right? And that’s a problem. Failure handling is a problem, reasoning about your system is a problem. You end up with more independent pieces that are harder to manage, are harder to get that global view around and are harder to interpret when they fail, right? When you’ve got a system sharded into 100 pieces and two of those shards fail and you want to bring them back online, like what’s the state of your service? Like hopefully the application developer knows what to do with that.
This is all about complexity, right? I said earlier as we build increasingly complicated systems what we want is we want the net result of this to be simple, right? And my fear is that as we build things in this direction we’re building more and more complexity, right? Can you solve problems by doing replication, can you solve problems sharding, can you solve problems without consistency models? Yeah, you can. And our industry is like one massive existence proof that you can do it. Is it hard, is it expensive, is it increasingly complicated, is it at odds with everything we’re trying to do in cloud architectures, and is it going to be a problem as we look at the next generation of problems we need to solve? Yes. Yes. And that’s kind of my argument. And particularly when I say next generation, this is an example, right? Global deployment, right? Increasingly I talk to people where the baseline is not I need my system to be redundant on two machines. It’s I need my system to be running in at least two data centers. Like that’s the baseline. That’s not like a nice to have. That’s the one we can talk later about whether you can expand to 20 data centers, but like two data centers, must have. Why?
The common one is disaster recovery. That’s the one we’ve done for a long time, but increasingly it’s more about (inaudible), right? It’s not failover. I want to run in multiple places because I have users in multiple places and I want all of them to experience low latency, but I don’t want to give up consistency, I don’t want to give up simplicity of management, right? That’s kind of what I’m looking for. Increasingly it’s also because like I’ve got users in multiple places and they tend to access different sets of data, right? My operational database that’s storing, you know, account information, Boston users tend to stay in Boston, right? New York users tend to stay in New York. Occasionally we travel to each other’s cities and they try to be polite about our various sports teams and whatever else, but like -- and you know, and I throw out the word wicked a few times just to remind everyone where I’m from, but it’s, you know, that tends to be the case. Why should I pay the actual cost? Like not theoretical cost, the actual cost of transferring data between different data centers, right? Again, if you run on Amazon it costs you money to move data from disk in California to disk in Oregon or to disk in Virginia or over to Europe, right? I mean that’s actual money and if you don’t need to move the data around, why do it? Or more to the point, you may not be allowed to, right?
Amazon this fall opened a data center in Frankfurt which is one of my favorite things they’ve done in a long time because one of my favorite topics is data residency. It’s something I’m really, really working hard on. If anyone in this room is interested in talking about this afterwards I’d love to bore you to tears with everything I’m thinking about there. But like in Germany there are all kinds of laws now that like a German citizen’s data has to stay on disk in Germany. Like it doesn’t matter where in the world they’re accessing something, it’s got to be on disk in Germany, right? Go ask anyone at Microsoft how they feel about the fact that their Irish users’ data are in Ireland, right? And go -- if you haven’t like come across these use cases go read about this, right? These use cases about where data lives and how it’s accessed are fascinating. And this is a really important use case for why people want to run in multiple locations.
Global deployment is kind of by definition, among other things, a tradeoff between latencies and safety, right? We all understand that physics is in play. There’s no like one database that’s magically going to make all data consistent in all places simultaneously. And so we have quantum machines and then a lot of us are going to be kind of out of a job, so hopefully that doesn’t happen like right away. But like, you know, that is true, but that said, that’s the thing that we want to be able to understand in these systems. And the way we get that is again it’s about it’s a different architecture. Cloud poses different challenges. The kinds of use cases we care about in these architectures like global deployment have different requirements and pose different challenges and we can’t just retrofit the stuff that we’ve been doing for 10, 20, 30 years in the client server world into this distributed world. Because to solve this kind of thing what we really want is we want to be able to say a data service exists as a service you can access separate from the concerns of how it stores information. That’s how you start to solve these problems.
And again, that means that the database is not the disk. This is another one of these slides that would have like -- I don’t know what clip art. Something funny would be on here, right? And everybody would be like, “Haha, it’s another internet meme.” And if you all want I can like take a break right now and try to find the pictures, but I mean you get the idea, right? It’s that like the database is not the disk, right? We thought for 40 years of the database as the disk, and the database is not the disk. The database, the data management system, DBMS, whatever it is you want to call it, the database is the place where you get your information and it consists of a service that you interact with and it consists of data, right? And when you start thinking about those as separate concerns you start to understand how you actually address some of these cloud scale problems. But the thing is that we all think about the database as the disk, right? And like we do too. So here’s a slide taken right from our marketing deck at NuoDB. You can tell it’s from NuoDB because it’s like fabulously lime green, it’s got birds all over it, and it’s got a lot of disks on there because we all kind of understand when we put a disk on the slide people kind of map that to be like oh, yeah, that’s the database because that’s the disk, got it, right? So I mean we do this too, right?
And by the way, this is kind of another way of looking at what I was talking about earlier, right? When we think about database architectures, when we think about kind of how we’ve tried to take client server models and distribute them, there are a couple of familiar patterns that we’ve done time and time again, right? One is what we call shared disk which is exactly what it sounds like. Like make the disk bigger, whether it’s actually a bigger disk or whether it’s a disk array, you know, whether it’s like some big, huge, complicated thing from EMC or NetApp or something that, but like essentially scale by making the disk have more capacity and then have a lot of tightly coupled clients around it. We’ve also scaled using shared nothing architectures which are not the same as sharded architectures. Sharded architectures are probably an example of that, right? But shared nothing is that idea of saying split things apart, right? Totally different view of the world. But again, still focused fundamentally on the disk, right?
I was having this conversation with someone beforehand. We were talking about Hadoop and we were talking about map reduce. Hadoop is a wildly powerful and popular technology. It solves a lot of really good problems, right? But what is map reduce? Map reduce is a like chunk the data up into a lot of places, put it on a lot of individual disks, and then send work to where that data lives on disk, right? And as a side effect, yes, you get to take advantage of the memory and the CPU located with that, but it’s still focused on where does the data live, where’s the disk, right? Google, for those of you who haven’t read the Google papers, the spanner and F1 and -- someone’s going to have to help me. What’s the one? What is it they’ve built on top of F1? There’s a third one. Go read them. They’re actually really interesting papers. Google a few years ago basically stood up and said, “We know we were kind of the spiritual parents of map reduce and this whole no SQL movement, but it turns out like we can’t do ad words without SQL, but we can’t make SQL scale because it’s all this client server stuff. And so what we’ve decided to do is build this whole new database around synchronous commitment.” And as the name implies synchronous essentially means that you have some notion of global consistency at the cost of latency. That’s what synchronization is, right? And how does Google solve it? They bought a whole lot of atomic clocks, they built a whole lot of very special purpose high speed networks, and they took the schema and they chunked it up in such a way that typically, you know, you can run things at pretty good latency. That’s great if like you’ve got like piles of money and you really don’t know what to do with it and you’re like let’s get a lot of bespoke hardware and build something that solves a very few specific problems.
It is an amazing piece of technology. The papers are remarkably readable. I really highly recommend going and checking it out. The thing is most of us can’t do that. Like I know I don’t have like an atomic clock sitting around, much less the thousands that Google has deployed. So for the rest of us, you know, there’s this idea of how you build these architectures that do all the stuff I’m talking about but address these challenges that are inherent to trying to build these systems. Some time ago we decided to come up with a term to describe this because we kept talking -- you know, people kept saying, “are you shared disk or are you shared nothing? Or maybe you’re synchronous commit.” And we kept saying, “No, we’re not any of those things.” And they’re like, “Well, what are you?” you know, this is like -- Eddie Izzard has this comedic routine where it’s like, you know, people are already living somewhere and then like colonists show up and they plant the flag and they’re like, “I took your land” and they’re like, “You can’t take my land.” He’s like, “Well, do you have a flag? Didn’t think so. So now it’s my land.” So it’s like if you don’t have a name you can’t be an architecture. So we stepped back and tried to think not just about what NuoDB is, but what is it we’re talking about with all these kinds of systems we’re trying to build?
We came up with this thing that we call the durable distributed cache. And it’s interesting, this has been like a slide we’ve used for ages and a couple of weeks ago someone actually was like, “Hey, Seth, is that a thing? Like durable distributed?” And I was like, “Yeah, it is.” And they’re like, “Really? I’ve never heard of it.” I’m like, “That’s probably because it’s like not a thing we talk about kind of in detail. It’s just like how we talk about our architecture.” And they’re like, “Well, what does that really mean?” And you know, I was like, “all right, let me explain what it means.” So let me explain to you all what it means. It’s not about our product specifically. It’s about kind of the kind of architecture that I think you want to address these challenges. The durable distributed cache.
First part of that is the word cache, right? Lots of things run in memory, right? But what I think you want in a modern architecture is you want the database to run in memory. You want in memory database. What do I mean? I don’t necessarily mean all the data has to be in memory, right? Whether you decide to hoist all your data into memory or whether you just have some small subset of it, that’s a separate question, right? But you want the database, you want the service, to be something running in memory, right? Not just because that needs the memory speeds, although obviously that’s good because memory is faster than disk -- well, at least for the moment. As memory and disk start to blur that’s less true, but we’ll take that just for the moment that memory speeds are good. But when you’re running in memory there are all kinds of optimizations you can start to think about, again, that you can’t deploy when your focus is on IOPS and how quickly I interact with a disk, right?
What are the characteristics of a cache, right? Everyone in this room has probably interacted with some kind of cache, whether it’s like memcache or whether it’s Varnish, whether it’s some kind of custom caching technology pulled into your database, into your application. I think a cache has a couple of important properties. They should be transient, right? Shouldn’t be durable, shouldn’t be like safety for the database. They should be able to fail, should be able to come and go. It should be a thing that you populate on demand as you need information that you can drop from when you don’t need information. I think caches should act independently, right? I think you should be able to say I’m caching information in different locations, and that’s fine. I don’t have to rely on the fact that like, you know, my record can only be in cache at a given place, right? Because that gets you right back into this notion of what happens when that place is unavailable or what happens when I want to put my information somewhere else for a specific reason.
Caches naturally support hierarchies, right? We all talk about memory and L1 and L2 and whatever and whatever and whatever and disk, right? And again, like those lines are getting blurred today and like everything has, you know, hidden caches in it and like people don’t trust disk drivers because disks now have like caches in them and you never know when you say “Write something to disk” is it on disk, right? That’s like this existential mystery the database people love to get worked up about, right? But caches are everywhere. Hierarchies are everywhere. Hierarchies are really good because they inherently take advantage of kind of what’s interesting and what’s not interesting. They optimize naturally the things that you’re working with at any given time and that makes a lot of sense. Like I want that in my cache. And perhaps most importantly, a cache should comprehend service [mantles?], right? So I run Postgres and then I put memcache in front of it, right? I may have just accelerated the ability to get at data, but I’m not programming to memcache with the same model that I program to Postgres, right? Ditto if I put -- it’s not about SQL, right? I mean I could put a cache in front of a key value store. And in fact key value store was more than just a key value store. It had some notion of transaction consistency, right? The cache doesn’t. And so I’ve just broken something about the way I’m supposed to interact with the service, right?
So it’s not just about a cache that I layer on as an afterthought, right? When I say the database in the modern world should be something in memory I really mean like the database is the thing in memory, the database is the cache, the cache is the database, right? This is how you interact; this is the service end point. And I think it should be distributed, right? So I said earlier I was going to give you my view on what distribution means, right? A distributed system, that’s a term we throw around a lot, right? We like to talk about distributed meaning like it’s running in multiple places. It turns out the term distributed system actually has a specific meaning, right? And if you go read like, you know, your CS whatever, whatever, whatever book a distributed system is a system composed of independent actors that coordinate via messages, right? What does that mean? It means you have a bunch of things that run independently. They can all do the same kinds of things, they can all take on the same workloads, they can get started independently and they can be stopped independently, but for the things that they share they have to know how to coordinate with each other, right? It’s not a shared disk model, it’s not some, you know, magic. It’s just that they have to be able to talk to each other, right?
This is a view of the world that promotes resiliency, right? Promotes continuous availability and promotes the idea that you can lose things and add things kind of arbitrarily at any point at any time, which is that whole thing about cloud, right? That’s what we want, right? It’s on demand capacity without having to think about how it impacts anything else that’s running. And that distributed system is made up of a whole bunch of pieces, but it should define a single logical service, right? That’s really that abstraction, that ability to say a service is made up of a whole bunch of things, but it just looks like a single thing. It’s how you insulate your applications, it’s how you build towards simplicity, it’s how you build a system that will survive all kinds of crazy failures. So you know, again, not a new idea for anyone who’s ever done like peer to peer programming when that was like hot and everyone was talking about ooh, it’s a peer to peer system, right? That’s what we’re talking about, a peer to peer system.
And if access is, you know, supposed to be in memory then access is via peers that are running in memory, right? And you should be able to scale those independent peers to be able to solve separate tasks. It is a cache. It is paired with this caching technology so you can optimize not just around the fact that you’re in memory, but now because these peers are independent they might be doing different things, different users are interacting with different peers, what does that mean in the world of caches? It means affinity patterns start to naturally emerge. That’s what caches are, right? They’re going to naturally start to bring certain things into memory in one place, certain things in another place, and that’s what you can start to optimize around. Not just the fact that it’s fast because it’s in memory. You can start to take advantage of those patterns, right? That’s how you build a system that really takes advantage of cloud and scales.
Durable. I’m going to argue that almost every data system wants to be durable, right? And what do I mean by durable? I mean like if everything shuts down you should be able to restart it and like your data’s still there, right? That’s a good thing. You can build systems that are purely in memory and just have so many different replicas running in so many different places that you can sustain failure and there’s a really good chance your data’s still there. And for some systems that’s fine, but I’m going to argue that like most of us really want to know that data is written on disk. And actually like most of us take this to an extreme. I’ve got like nine backup drives sitting at home, each like with a subset of my various laptops and disk, you know, desktops and like technically that’s really durable, but if I actually had to go find anything I’d be in a lot of trouble, right? So you want durability, but you also want some like rules about what that is.
If access to a system is in memory or if the database is in memory and that’s where you’re interacting with the database and the database is no longer the disk, then that storage level, that durability is no longer about how you get performance in a system, right? That storage level is now about safety and it’s about availability, right? Safety. The system can crash and I can get my data back in the correct state. Availability. Like if I need something and I don’t currently have it I know I can go there to get it, right? It is the complete set of my data. Freeing yourself from optimization is about disk I/O is what I said earlier, it’s one of those things that has hindered us in terms of client server architectures, right? Why? What does this help us with? Like why do you care, right? You care because if management is operational you can manage things like storage and redundancy and replication and when you need to upgrade a disk and when you need to move things around, you can do all that totally independently from how you’ve written your application, right? And again, that gives your application developers the chance to go off and do whatever they want and your operator can sit there all day and complain about the fact that the disks are, you know, filling up and you’ve got to replace them and you’ve got to swap in whatever else, and it’s not disruptive to the running application, right? And that’s really where we start to get the promise of everything we’re trying to do.
It also lets you then think about layout to address different challenges. When I talked earlier about why you want global operations, there are different kinds of problems you want to solve. Some of them are about performance, some of them are about just simply managing the data, some of them are about ensuring that data is known to only be in certain places or not be in other places, right? Lots of different problems we want to be able to solve. If your database is predicated on the idea that data lives in a particular location because that’s how your system runs, you can’t make those different choices application to application, right? Which means you can’t really solve different problems on the same kind of cloud commodity architecture. And this does get you to commodity, right? Because now we don’t care about massive disk arrays, right? It’s about safety of data; it’s not about raw ions. And you can now -- one thing you do need to think about is you need to think about how does the memory pieces work with that storage tier? Will there be enough flexibility? You can decide is it about [case?] safety, is it about synchronous replication, you know, is it about writing in one place and eventually things will move around? You know, you can decide what those problems are you want to solve but, again, it’s not going to affect what the application sees.
All right, so I started out talking about cloud and saying this was all about cloud and this was motivated by kind of what’s going on in the cloud world. So how do I get back to that now that I’ve kind of gone off on this deep philosophical view of the purity of what I think we should be building? Cloud requirements, right? What did I really say in the beginning? You want to be able to do scale on demand. You want to be able to provide continuous availability, right? You want to be able to simplify application and operation. You want a single logical thing you can program to, a single logical thing you can manage, but you don’t want your operator and your application developer to have to be commingled, right? You want independence there. You want to be able to run hybrid. Hybrid means heterogeneous infrastructure. It means running in multiple locations. It means running in different environments with different requirements. And probably hybrid workloads, right? Everyone today wants to get at least a little bit of analytic insight out of their operational systems, right? Traditionally that’s a really hard problem, right? We’re getting a lot better at that. That’s a thing we do at NuoDB pretty regularly with people. That ends up being a challenge for a lot of systems and it’s a challenge because systems are designed to be one thing or the other.
You want a thing with clear failure semantics and you want to operate as a service. You don’t want to think about lots of random things. You want to think about a service. And I’m going to argue that anyone in this room who’s thinking about public cloud, private cloud, hybrid, you know, open stack or Amazon or kind of any other view, or just simply running a whole lot of virtual machines, right? Most of you have most of these things on your check list about ideally what you’d like when you’re thinking about your foundational, you know, platform systems. And this is what we were just talking about, right? To get to these things you need to have stepped back and thought about these core architectural ideas. Now, can we solve some of these things kind of today? Can we hack at the other stuff and do replication or sharding and can we kind of, you know, build wrappers around things and expose some things to the application? Sure, we can. We can do that. It’s hard, it’s painful, but we can kind of do that. What we can’t do are the things that are the next round of requirements that everyone is starting to ask about, right?
And everyone I talk to today, when I come down here and talk to people on Wall Street, what everyone is asking is the same set of things. Great, I understand all those things, but the real reason I’m moving to these cloud architectures is because of the promise they have to be able to do all of these new things that are incredibly important today in a global world, right? That global operation -- that ability to be active in multiple places and still have a clear consistency model. That data residence problem of being able to understand exactly where your data lives and speak with confidence that you actually are meeting regional requirements correctly and that the audit proves it, right? The ability to run a system in an automated fashion where it’s driven from policy that is your SLA, right? You’re saying this is the problem I want to solve, my database is a service, it’s a collection of things, but I can treat it as a service and I can drive it in an automated fashion so that that’s not the thing I’m spending all my time on. The thing I’m spending my time on is writing my application and I have a service that gives me the on demand capacity I need when I need it and doesn’t use it when I don’t need it. You only get that when you drive things from policy and that’s why everyone who’s building kind of modern infrastructure as a service is thinking first about how do I drive things from policy? You want the same thing from your data management system.
And multiple models, right? Everyone is running more than one database because they have some data they want to think about in a relational format, some things are documents, some things are graphs, some things are key value, some things are big data, and you know, now we’ve got data lakes where we’re just like throwing everything in there and hoping that like they can swim or like whatever the metaphor is. Data lakes is my least favorite new term, but I kind of appreciate what they’re trying to do with it. You know, the reason we have all these different things is because it’s very hard to handle these multiple models in one database. And that’s not because it’s hard to like put different front ends on a database, right? I mean you can take a SQL database and say now be a graph database because I put a SPARQL front end on it. You know, SPARQL is like the worst language ever, but like it’s standard, it works pretty well, solves really good problems. But the hard part of graph is not SPARQL or implementing RDF or some other standard. The hard part of graph is that fundamentally it’s a different set of questions you want to ask that you optimize in a different fashion, right? Why do people care about the database being column or row? Because they understand that there are different questions you will ask and one of those is really well optimized to answer one set of questions and the other one the other. Why? Because, again, all of those things, are you row or are you column, are you graph or are you relational, those all come back to how do you store your data on disk. That’s the gating thing for all of us, right? And when you’re not about disk, when you’re thinking about architectures that are all getting data into memory because that’s where the database is, you can actually think about these multi model problems that are fundamental in a different way.
Here’s the one slide. I’m going to show you what NuoDB is. And my time is basically up so I’m not going to spend a lot of time talking about this, but when you look at NuoDB we are that thing that I’ve been describing. We are a distributed peer to peer system that is in memory but is also durable. It’s designed to scale on demand, to run in multiple locations, to be able to address all of these problems and provide to both the developer and the operator what looks like a single logical system. Standard SQL, you know, joins, views, schema, but flexible schema because we don’t care about representation on disk. You want to add a column? You want to change a column? Just do it. It’s not touching anything on disk because we don’t care about the format on disk. We just care about how you interact with it in memory.
Summary. Cloud is hype, definitely like huge hype curve around it for sure, but it’s not just hype, right? It’s really identifying, I think, a new set of architectural views and problems that people want to solve. And when you think about the foundational services that sit on top of these kinds of models that let you exploit all the things that -- whatever you want to call them, cloud or scale out or horizontal scale, yadda, yadda, yadda, all the things you want around on demand capacity, around continuous availability and global operations and like simplicity in the face of running in, you know, hybrid or heterogeneous environments, right? Those foundational services need to be built, assuming that that’s the stuff you care about, right? Or they’re just not in the end going to be able to give you what you want. Extending that and having services that actually can provide a global consistency model, again, not talking about transactions per se, not talking about SQL, whatever, but a clear consistent view of your data. We are transactional in SQL, but like putting that aside that’s what’s going to make this thing work and be simple, right? Then when you have simpler systems those are going to be easier to scale, they’re going to be far more resilient and they’re going to let you move to what everyone is trying to move to today, right? Which is that global view of the world, global operations, where all of these issues get magnified. Where latencies suddenly are much harder, where failures are suddenly much harder, where security models where you’ve got users from all over the place suddenly are much scarier, right?
If you’re interested in reading more about NuoDB I can point you -- obviously NuoDB.com is our website. Dev.NuoDB.com is the landing page to go find our documentation, to read about what it is we actually support. We have technical blogs, white papers, piles of information in there that I think is actually pretty interesting. And with that I just want to say most of you stayed and most of you stayed awake and I really appreciate that. Thank you to everyone. Thank you to Eric. This is (applause). I was going to say this is actually two and a half years now -- I was looking at the calendar -- that we’ve been coming down and doing this and we’ve talked a lot about this stuff over the years, but it’s always fun to come down here and it’s always really interesting questions. And so I don’t know how much time we have before we get kicked out, but if we have time I’ll take a few questions.
(Eric): Before we go on questions I want to give a great big thank you to Seth Proctor from NuoDB. (applause) We’re going to be giving away this drone very shortly. The bucket to throw your business card in is right there at the back, moving it around. There you go, he’s holding it up. Moving it around the room. Maybe you don’t have a business card. Please design your own business card on those little sticky notes in the bucket, bowl, whatever you want to call that. It’s somewhere in between. And before -- the other thing I’d like to mention is if you’ve got -- if you’ve taken any photos here tonight please post them to our meetup page. Everybody here should be a member of the [New York’s?] SQL NoSQL meetup so you should be able to post them to the page at the end of the event. I just noticed that meetup doesn’t let you do it until after the event is over. But at the end of the day you can all post your photos on there. And oh, we’ve got T-shirts here right at the end of QA, right after the raffle. Everyone is welcome to come up and grab them. We have (inaudible) and we probably don’t have enough shirts, but hey, whoever gets them first. So let’s start with the first question. You are very excited to get to him and you get the first question.
(Q): (inaudible) durability, the cache and durability guarantees that (inaudible) technology kind of more a challenge around that?
(Seth Proctor): In NuoDB specifically or kind of like generally?
(Seth Proctor): Yeah, sure, I’m happy to. So NuoDB fits this model I was talking about, a peer to peer system where there are some peers that are focused on running in memory and other peers that are providing durability for the database. And in particular in our system you can have an arbitrary number of both and they’re complete independent. So you can have in memory peers, they all can do the same work. Part of what we’ve spent a lot of time doing is optimizing how those in memory peers work with each other so that you have that [property?] kind of in memory hierarchy and you can work really efficiently with things in memory. The really interesting question becomes when I make a change in memory how do I make sure that’s reflected in the durable state, right? So that if the system fails or, even worse, something, you know, a sub piece of the system -- it’s really nice when the whole system fails. That one’s really easy. That doesn’t happen so often in real life. More often like something subtle happens in a piece of the system.
We provide what we call a commit protocol and this is very similar to what you see in a traditional database where you can say, you know, does commit mean I’ve written to a kernel buffer or does it mean I’ve actually written to the disk or have I written to the disk and ran it back (inaudible) disk controller’s live and disk controllers. We provide the ability basically to say when I make a change in memory -- here’s the simple version and then I’m happy to go through the more complicated version. The simple version is you make a change in memory, asynchronously we’ll send that change to all the peers who care about it, right? That’s that contract about messaging between peers. We let you essentially decide which subset of those durability peers have to get pulled into the synchronous path to acknowledge back to a client that the commit has succeeded, right? So it’s not tweaking consistency. It’s not like eventual consistency of some kind. You are consistent at the moment you commit, right? The question --
(Q): So you’re waiting for it to form and then you’re --
(Seth Proctor): You can choose. You can say just broadcast it to everyone in memory, done. You could say broadcast to everyone and make sure at least one point of storage acknowledges that it’s written to disk, or two or N. Or one data center, two data centers, three data centers.
(Q): So depending on the -- there’s a hit of performance if my enabled durability (inaudible), right?
(Seth Proctor): There’s almost no hit in performance if you enable the baseline of durability in the system. What you get to choose is as you scale out your system, is that enough for you or are you more conservative, right? As you choose to write to more locations that are further spread apart, right? Think about a database running in three continents, right? Your commit protocol could be make sure that it’s written to disk in at least one place on all three continents before I acknowledge commit. That’s a choice you can make, right? Obviously that’s high latency, but it’s extremely safe. You can also say make sure it’s written to my local data center synchronously in one place. It will get written to all the other places, right? And in the case of failure it will always be reconciled correctly. That’s our contract to you. We can’t violate that or we wouldn’t be an ACID database, right?
(Q): So (inaudible) resolution, it isn’t satisfied with the contract, I have application (inaudible) then have to deal with --
(Seth Proctor): Well, you as the application developer, again, you don’t see any of this. Like I’m an application guy and so I sympathize with people writing application logic. I don’t want to have to worry about this. I want my operator to decide like this is the rule. Like I deploy this database for routine applications. Some of them like essentially need durability, others are super conservative. And it might be the same application, but different users who are deploying it need a different contract in a given deployment of it against a different database. So it’s an operational aspect, it’s not embedded in the application.
(Q): OK, another question. Yeah, so I have a question. So there is a [physical cost?] technology and could you please explain what the difference between the distribution mechanism in NuoDB and Hazelcast and if there are any difference.
(Seth Proctor): There are some pretty substantial differences. I’m not going to try to go into all the detail here for two reasons. One, it will take too long and, two, I’m sure I will get it wrong. It’s been a few years since I was deeply, deeply familiar with Hazelcast. It’s been longer still since I actually tried to program to it. But you’re right, Hazelcast is another system that would have similar characteristics, right? That is designed around distribution that is designed around being able to coordinate at a caching tier. Obviously it’s not designed with the kind of SQL semantics that we’re designed for, so that’s an important difference. It’s not designed for kind of the arbitrary operations in terms of like transactions, in terms of scope, in terms of how things interact with, you know, it’s not designed with the same notion of global scale-out. I think Hazelcast is, you know, again I don’t want to talk a lot about it because I’m going to get it wrong. I mean my memory of the last time I did anything deeply with Hazelcast was that it was very powerful for doing kind of specific caching scale-out applications. I think that’s probably still true today.
(Q): What was the name of the individual you mentioned we should all look up his stuff?
(Seth Proctor): James Mickens. He’s a researcher at Microsoft. And yeah, actually specifically go try to find the talk that he gave at Monitorama. I saw the longer version of it at the MIT database day about a year and a half ago and like it was like 35 minutes that I didn’t stop laughing. It was -- when he starts doing his like Bane impression from Batman, like rallying the troops around NoSQL, like I just, I lost it. It’s real brilliant. I highly recommend it.
(Seth Proctor): M-I-C-K-E-N-S.
(Q): Can you repeat the name again?
(Seth Proctor): M-I-C-K-E-N-S.
(Q): Can you touch on what is the difference between NuoDB and Spark (inaudible).
(Seth Proctor): Sure. I mean Spark, I really like what Spark is. I mean I think Spark is a really good technology that comes from -- unsurprisingly -- it comes from a group at the AMPLab that is doing just some next generation awesome stuff. The AMPLab is also where a lot of the kind of next generation optimistic coordination systems and kind of conflict avoidance systems are coming from, which I also think are really fascinating. You know, the way I look at Spark is the way I look at a number of things that are trying to sit on top of the (inaudible) infrastructure and ecosystem. Which is that, again, Hadoop is designed around kind of the map reduce model. It’s designed around this notion of you want to do very large batch operations and so you distribute those to where you’ve got, you know, segments of your data stored. It’s not so much about on demand like I need more capacity right now. You know, because like if you look at traditional scientific batch scheduling, I mean traditional batch schedulers; you might spend an hour deciding how to schedule a job because it saves you a day of compute time in that, right? I mean that’s kind of more the scale for people who are thinking about these large, large scale systems. You know, what is Spark doing? Spark is doing something really smart and they’re saying let’s still do these scale-out things but let’s get things hoisted up into memory, let’s run it. And that’s more about memory speed than it is about optimizing how the peers coordinate with each other. It’s still a fairly decentralized system, right? It’s still kind of assuming something about the access patterns and where data lives on disk, right? So I think it’s really more fundamentally about how do you provide someone a really fast familiar model for doing query than it is for kind of arbitrary operational processing. But certainly if someone came to me and said, “I really want a Hadoop in my eco system” I’d be like, “well, first of all you want Hadoop, you don’t want a Hadoop”, but like that is how I hear it a lot. People are like, “I need a Hadoop” and you’re like, “Good on you, let me know when that Hadoop arrives for you.” I would definitely tell them like Spark should be on the like short list of two technologies you look at.
(Q): So did you just disprove CAP theorem?
(Seth Proctor): No. No. OK, no, I did not disprove CAP theorem. For the record, no, I did not disprove theorem. CAP is a -- again, in that spirit of distributed systems are about tradeoffs, right? CAP is a really good sensible formulation of the idea that when you scale things out there will be tradeoffs and you really have to decide like when things fail do you care more about the correctness of your system, do you care more about, you know, being able to get at a system in all places? Like by definition if you’re running in two places and the network between them gets cut you have a choice. You can say if I keep running in both places I can keep running, but there’s a chance for my system to diverge, right? Or if I don’t keep running I have to shut something down, right? Now, in a distributed system if I choose to shut down half of that, is my service still available and able to address all the same data? Yes, it is. Right? And in CAP, you know, the formal definition of CAP is that all queries to the system must have responses, right? And that’s why it violates that notion of average. In practice do most of us care about that formal notion? No, we care about the service being largely available, right?
What’s interesting about that? well, what’s interesting about that is that consistency is only violated in that network partition case if there’s a chance for the same thing to be updated in a different way on both sides of that partition, right? And again, this comes back to like it’s not about, you know, pedantic notions of transactions or SQL or whatever else. It’s about what is the consistency, right? You can construct a consistency model where you say I’m going to allow changes to happen on both sides in a way that I know how to reconcile them later. Google Wave is a great example. It’s where people largely gave Google Wave a pass because like it was this brilliant core technology with this awful front end and Google did what they did with like 90% of their stuff. They threw it over the wall for a year and then they killed it. But if you go read about what Google Wave was under the cover, it was essentially a thing that said it’s not eventually consistent, it is consistent. It’s just that in any given moment in time when you observed it two things may be out of sync, but the programming model is such that we know that those changes that are out of sync will always eventually have been consistent relative to each other. I think I did that one correctly.
And so you can always put the system back together. And I think that’s -- certainly philosophically that’s where we’re thinking and we’re taking our software. I think there are lots of different views on this. You know, I think that CAP is -- will always be a useful kind of reality check when you’re thinking about systems, but I think as systems evolve and they become increasingly global and distributed we will have to think more about the subtleties and go beyond kind of is it about CAP, but where do you fall on the spectrum of the problems you’re trying to solve.
(Eric): All right, we have time for a couple more questions, but before you do has everyone had a chance to put their card or information in for this drone? Anyone not, raise your hand now because I’ll pass the bowl back. No. OK. So we’re going to raffle this off very shortly. We’ll take a couple more questions and raffle it off. Yes?
(Q): So (inaudible) very similar to the one that search is using, you know, on top of the (inaudible) and the (inaudible) model. How do you -- so it’s like, it’s a question, how do you route the calls to one shard or the other or do you decide which one it is? And if you (inaudible) shards at the same time how do you handle conflicts?
(Seth Proctor): The answer to both of them is kind of the same thing, right? Which is, again, we don’t think about shards. There is no notion of sharding in NuoDB. What there is is there is a model where the database is in memory, it’s running across a collection of peers that know how to coordinate and at any given time data may be sitting in more than one cache. And when data is sitting in more than one cache it’s up to those peers to do what any distributed system has to do to coordinate between them and make sure that we are not violating those rules of consistency. You’re right, there are similarities to elastic search, there are similarities to some other similar systems. You know, elastic search I think is fundamentally different because it is thinking about more that kind of -- maybe it’s not shards, maybe it’s (inaudible) because it’s not as, you know, directly visible, but that is how it’s thinking about the model. And we’re thinking about something in a fundamentally different fashion.
(Q): I’m sorry, maybe the question was wrong.
(Seth Proctor): OK.
(Q): How do you reconciliate [sic] those changes? Because if I make a transaction and I hit one memory, one data center if you want, and so this same change that is (inaudible)?
(Seth Proctor): The answer not -- I don’t want to sound flip. The answer to this one takes at least half an hour to go through. So I’m going to quote you our white paper. The really short version is that if you have something in cache in two places and you’re making a change in one place part of what our system knows is exactly where something is at any given time. And so it knows that like there may be 100 nodes that the database is made up of, but there are only two places in memory where that object is sitting. And so it knows that those two nodes need to coordinate with each other to ensure that the change is valid. Of course what that means is that if something is only in memory in one place then coordination is entirely local and it means that if you scale it across geographies things tend to have locality in geography, you’re a [land rover?] in terms of the changes you’re making.
(Eric): We have time for just one more question, but Seth will be here for one-on-one questions afterwards. So you’ve had your hand up awhile.
(Q): Thank you. Based on (inaudible) availability is one of the bigger picture of NuoDB and this is one of the reasons (inaudible) could have been used for a long period of time. So my question is, is NuoDB challenging that one? Because in NoSQL we have to lose (inaudible) properties, we have to lose joins (inaudible) we have to lose (inaudible). Are you saying that NuoDB is an alternative to the NoSQL?
(Seth Proctor): For sure. I mean I think, you know, the way I’d look at it is we went through this whole client server world with relational databases, right? And then we spent a long time doing NoSQL systems, partially ACID systems, eventually consistent systems, sharded systems, because they helped solve these problems, right? And I think that 10 years ago if someone stood up -- we may not have the right answer, but if someone had stood up and said, “I have an answer that lets you take Oracle or Postgres or whatever and run it on 100 machines and nothing changes and your application is still a single connection strain” I think most of us would have signed up for that. Been like, “Yes, that’s great, now I can keep going”, right? We’ve gone down this path of doing these other things because they’re the things we could do. So yes, you know, we’re certainly doing something that I think is for some NoSQL makes their life a lot easier. For other people NoSQL is not about scale, it’s about a simpler programming model or it’s about, you know, there are lots of different use cases for it. We have actually more users who are traditional relational database users who are saying, “I’m an enterprise and I need to get that thing that I’ve been doing into the cloud model.” But we also have people who are saying, you know, “I tried Cassandra and I really like that ability to scale. I didn’t like losing consistency. I didn’t like losing my SQL model. I didn’t like losing whatever else.” So yes, I think it’s less about competition. It’s more about kind of the, you know, understanding what the problems are you want to solve and understanding kind of where you want to put whatever.
(Eric): OK, Seth will be here for one on one questions. We’re going to raffle off this awesome drone. After the raffle feel free to come up, grab T-shirts. Thank you, Seth Proctor, CTO of NuoDB. (applause) Can I ask you to mix this up? Come on, mix it up better than that. (laughter) All right, look in the other direction, reach in and grab a card. Look in there, reach in and grab a card. (overlapping dialogue; inaudible) That’s two there. You’ve got to pick one. Read it out. And the winner is?
(M): And the winner is Mark something. Is it Mark or -- I don’t know? Is this Mark?
(Eric): Mark Sutso?
(Mark): No way.
(Eric): All right, thank you all. (applause) Thank you all for coming.