During this breakfast seminar Matt Aslett, 451 Research, presents his data platform map and touches upon the advantages of SQL in the cloud. He then covers common issues he encounters in the marketplace from SaaS vendors, ISVs, and end-user companies.
(M1): I would like to welcome you to the NuoDB breakfast series. Today is our first event. We will be having multiple events throughout the year with specific topics. Today’s topic is entitled Sharing Experiences in Cloud Adoption. We will be having two speakers at today’s event. The first speaker will be Matt Aslett, who is a senior analyst at The 451 Group, followed by Seth Proctor, who is our CTO at NuoDB. And the presentations will last around an hour. And then there will be a Q&A session and we have this room available until ten o’clock today. OK. Without further ado, Matt, please.
(Matt Aslett): Thank you, Paul. Just to briefly introduce myself. So I’m actually a research director for data platforms at 451 Research. Anybody who’s not come across the company before (inaudible) industry analyst firm focused on IT innovation, founded in 2000, you see here, about 210 employees, just under half of those, about 100 analysts, covering the whole breadth and depth of the IT industry. We have over 1,000 clients including obviously vendors, service providers, enterprises. Also professional services as well. And we have over 10,000 senior IT professionals in our research community who are consumed by research and advise us and guide us in terms of the direction that we think we should go in. If you want to know more about that, just ask me after the speeches. But what I was going to talk about this morning is this data platforms map. See here there is a version available, which is a slightly old version, but we renewed this one, so we’re going to send this one around I think on a PDF by email of the latest version as well. And I’m going to talk about the map itself, why we did it, how we did it. But more importantly what it represents, what it tells us about the data platform landscape and how it’s evolving. So just to start off talking about the map itself.
So obviously, industry analysts, I think we have this urge, somewhat perhaps pathological urge, to put things in boxes. Maybe we didn’t get to play with our toys enough as kids, I don’t know. But it’s also something that clients want to see from us. And they’re consistently asking us to help them understand the choices available to them and which products can be compared with others and which don’t compare with others. Particularly in relation to individual projects. So it’s something industry analysts often do and something we certainly did in terms of back in 2011 we were putting together our first report on the emerging database landscape. And we called that NoSQL, NewSQL and Beyond, and actually invented the term NewSQL for this report to describe the emerging data platforms that we saw and the choices, and trying to get an understanding of why that was happening at that point, that there had been this explosion in the number of products available. And then also where they fit, where they didn’t fit, how they could be compared with each other.
And this, or something like this, it’s slightly a matter of formatting, but we came up with this kind of chart at the time to illustrate the different choices. As you can see it’s pretty much an overly complex Venn diagram with different relational, nonrelational being the core choices and then different segmentations based on that.
And this was pretty well received. It helped them understand these emerging technologies and where they fit, and it did serve a purpose. But almost immediately I was aware that it had a limited life span. And the issue became known to me as the Drawn to Scale problem. I’m not sure people will really know Drawn to Scale. It was a company that emerged a few years ago and didn’t quite make it for various reasons. But they were doing something quite similar to what Splice Machine is now doing which was a Hadoop-based relational database. And so you see you’ve got Hadoop up here and you’ve got the NewSQL databases up here. So somehow we have to squash these together and not mess up everything in between. There were various products and services emerging that, you know, everything was going to end up on top of each other. And this clearly wasn’t going to last. It wasn’t something that kept me awake at night but it was there in the back of my mind that we needed to have an alternative solution to this in the long term in terms of explaining the market. I had no idea what that was going to be and what happened was it was complete accident, I was at an event as it happens in Germany, and I bumped into a former colleague, and we were chatting away just catching up. And she said, “I love what you’ve done with that tube map. I had no idea what she was talking about, but being British, I smiled and thanked her and just carried on talking. And as we carried on talking I realized that she was talking about this. And what I realized was even though she was in the industry she didn’t actually have a database, so all the words on here were not really relevant to her. So probably when she was looking at this she was actually seeing this and some blurry stuff in the middle that wasn’t really that relevant. And of course you only have to add a few stops on here and you realize that actually the important thing was not what’s inside the lines, but the lines itself. And this is the way that you could actually illustrate the convergence of all these different lines of technologies and different product. So I realized that that was probably the way we were going to do it. And obviously we had some inspiration to help us. I’m actually based outside London now. I’ve lived in London for 12 years I think. So Harry Beck’s original iconic London underground map has been something that’s always been there for a long time. And that itself has evolved over time to -- this is the modern-day map. Obviously a lot more complicated and a lot more stylized. So clearly this was something we could work on.
Also being a longtime fan, I’ve always liked Simon Patterson’s The Great Bear which is a work of art which takes the London underground tube map, the exact same map, and puts on it sportsmen, famous people, politicians, and it’s just a really good, just shows the way you can use visual imagery to represent something completely different. And so that obviously was a big inspiration. And then I have to give due credit to another industry analyst firm, the Real Story Group, who’ve actually been doing this for a long time. And so they cover mainly the content technology space, for a long time this content technology vendor map. It seems to have evolved into the digital workplace and marketing technology vendor map over the years. So they’ve been doing this for a long time. So we have to give credit to them as an inspiration as well although, that’s just me.
So this was the inspiration. But I knew it was going to be a big project. And I wasn’t quite sure it would actually work. And it wasn’t until -- well this was Thanksgiving 2012. And of course you guys are all busy relaxing, eating your turkey and watching football, whatever it is you do while we’re over in the UK twiddling our thumbs, actually it’s a really good day to get some work done. Or to mess around trying to do something that you’ve been thinking about for a long time. So I just started drawing it. And I started with the NoSQL space because that seemed to be in my mind where there was a lot of overlap, and I had to figure that out. As you can see by the end of the day I had already realized that this was going to be a huge undertaking and perhaps I shouldn’t have started.
But we persevered. And by December 2012 we had the initial version, which we published -- it was published on the blog really just to get some feedback, see what people thought of it. And what people mainly thought of it was that there was loads of stuff we’d missed out. Just loads of responses, oh, you haven’t got this, haven’t got that, haven’t got this, that, and the other. So obviously we worked on that at the time, and we expanded it in January and April and May and realized that we couldn’t just keep doing this. So what we finally ended up doing is working on it in the background and then publishing versions every now and then. So we had a big update in February which I think is the version you have. We actually had a slight update in September which was more about formatting, which is by the by. And then actually a really big overhaul for this version which is the October version which we’re going to send you.
So that’s the map and how it came about and why it came about. As I said, the most important thing actually is really what it tells us and what we can learn from it. As I say, I started doing it just to see if it could be done. So there was never really any aim that it would have any kind of functional purpose.
We have found some interesting functions for it. It’s actually quite good for the investment community in terms of finding out if there is an investment here and trying to figure out what and who their competitors are or for following the lines to figure out whom they could invest with which isn’t competing with their existing investments. I did get one person apparently that in the NoSQL space was just going round from stop to stop having interviews and trying to figure out where he was going to choose to get work, that’s just one use for it, but it is what it is. As I say, the more interesting thing is what it tells us.
And I think what it tells us, it’s interesting to look back. And obviously you didn’t have the map five, six years ago. But we can take things away and try to figure out what the map would have looked like five, six years ago based on (inaudible) and what we see is there were some interesting things happening isolated around string processing and grid and cache technologies. But really the market was dominated by the incumbent providers in terms of market share and revenue (inaudible) today. Some specialists in analytics and nonrelational. But really the bulk of the action was happening here.
And what happened over that timeframe is then obviously the expansion of these existing market (inaudible) players into the market, also some consolidation, some M&A activity, particularly the analytics space. But the market then began to really evolve in four main areas. So we saw the emergence of this NoSQL space, the new breed of nonrelational databases. Then the NewSQL space, new breed of relational databases (inaudible) modern-day distributed cloud architectures. And obviously Hadoop, whole new ecosystem of providers emerged around Hadoop. And then last (inaudible) tends to bring everything together because everything (inaudible) some specialists over here but really brings everything together in terms of the new providers.
And we even probably can think of this as almost like the ongoing expansion of the universe, and that the database space in that five, six years just seemed to keep expanding. And one of the things we get from our clients is a lot of questions about why this is happening, what’s driving this, particularly these four areas.
So if we look at those four areas, NoSQL, NewSQL, Hadoop, and database as a service, there’s a lot of different things going on which is driving innovation in the space. But for us really they fall into three main buckets, which is about applications, about developers, and about architecture. Obviously all of those things influence each other (inaudible) applications and developers closely related, architecture (inaudible) based on applications, etc.
So they do influence each other. But of course we (inaudible) put them in this one whole. So if we just (inaudible) through them individually. In terms of applications we see the social, mobile, global, local drivers for new application projects all have significant implications for data connectivity. So if you talk about social we see this increased interaction from users generating more data, consistently generating more data. Obviously mobile access being a big part of that. But mobile itself (inaudible) another driver in terms of different form factors, access methods, different requirements in terms of data delivery. Global, I was having a conversation with a client the other day (inaudible) launching a new application or service only in a certain (inaudible) or just in Britain. I just don’t think you can. People expect everything to be available globally. If not immediately then pretty quickly. And also local. So both in terms of local delivery of content, so you expect to get the same performance from your application or you hope to get the same performance from your application if you’re in Europe, you’re in the US. You expect that local delivery of global content. Also of course you expect local content. You want local data, you want to go find a hotel, restaurant, or whatever it is that you’re looking for. So you expect that localized content based on knowledge of where you are. All these things are putting new implications on data delivery. Obviously each of these things we can do a whole presentation about. But in the interest of time we’re going to skip through it pretty quickly.
In terms of developers in particular we’ve got a whole group within 451 of my analyst colleagues who focus on developer (inaudible) and DevOps. But what we see in relation to the database space is that developers are increasingly driving data management choices, database selection in terms of the DevOps movement and the shift towards continuous delivery. So driving the choices about database with rapid development. In particular we see that it’s somewhat inconsistent -- again we can get into this in more detail -- with (inaudible) traditional data management processes, in particular data purchasing processes. And so (inaudible) there is this growing need to unite the application development and data management people and processes to achieve common goals. To some extent this is happening. This quick snapshot of a survey that as I say some of my colleagues did, looking at organizations that are already putting DevOps into practice. And the interesting thing I found about this is looking at why they are reducing release cycles and why they’re moving towards this more continuous delivery. But you can see just over half are doing it for business and strategy reasons, competitive advantage, business productivity, revenue generation. Just under half are doing it for more technology reasons, so functionality, new feature sets, reduced development costs. So as this whole DevOps movement matures we expect and we think we will see the alignment of those two. I think (inaudible) there’s a split, I don’t think it’s a problem (inaudible) both of those are positive (inaudible) but they need to align within organizations rather than within the industry as a whole to drive forward development processes.
In terms of architecture what we see is in terms of database architecture transition from traditional scale-up approaches towards more distributed databases. And this is actually one of the core focuses of that NoSQL, NewSQL and Beyond report when they were looking at the evolution of some of these databases. And what we came up with there was this idea about database SPRAIN. So over the years the existing relational databases have been asked consistently just to do more, to deal with objects, to deal with XML, to deal with distributed environment. And SPRAIN is when you stretch something almost but not quite to breaking point. And this is what we see a lot of database administrators dealing with with their database landscape, a straining under the pressure of this. If it’s not the database (inaudible) it’s the administrator themselves who’s having to manage all of those complexities. And SPRAIN obviously as you see is an acronym which stands for the six key drivers that we saw for enterprise interest in some of these emerging technologies. So scalability, performance, relaxed consistency, agility, intricacy, necessity. Not all of them obviously necessary at the same time. But if we look at different projects we often see a combination of two or three of these that are driving organizations to at least look at emerging alternatives.
Of course in relation to architecture as well as the shift to distributed databases we also see a shift towards (inaudible) on-premises computing, instead of traditional on-premises computing toward the cloud (inaudible) hybrid. Today by and large we do see these trends happening. They are clearly running parallel. But they’re not really as united as they could be. So you’ve got the infrastructure team thinking about the next generation infrastructure choices. You’ve got the database administration team or even perhaps some of the developers thinking about the next generation database choices. And a lot of the same themes are driving them forward. But they’re not as united as they could be. And we think that will increasingly be the case. We’ll see these teams working together a lot earlier in terms of planning that next generation infrastructure.
Just a quick word on public cloud database as a service. We do see some interest in public cloud database as a service. The interesting thing though is if you look at Amazon Web Services’ top enterprise use cases or Amazon Web Services in general in order of the most popular, so development and test, new workloads, supplementing existing workloads, then things like migrating existing workloads, migration over the entire data center, or what they call all-in cloud (inaudible) everything they do is in the cloud. Only the top three of those, the three most popular use cases for public cloud according to Amazon Web Services, are actually quite additive to the existing database landscape. It’s not about migrating existing workloads, it’s not about displacing existing workloads, it’s about new projects, new applications. And this is something we consistently see. The public cloud has a role, will increasingly have a role. But in terms of displacing the existing infrastructure it’s not happening to this point.
Again we’ve got a whole (inaudible) team around this that focus specifically (inaudible) and this is some research that they’ve done from a Cloud Computing - Wave 6 (inaudible) talking to those organizations (inaudible) part of our team of 10,000 end users. And what we see is obviously the majority of workloads today are on internal, on-premises, noncloudy, traditional infrastructures. But as we see that dropping, 41% in a couple years, and a shift towards more private cloud-based from 19% to 30%, so (inaudible) in terms of public cloud it is being adopted, it is growing, but it’s not having a significant impact (inaudible) 5% to 9% for software as a service, 1% to 6% for -- we say other than SaaS, so infrastructure as a service or platform as a service. So it’s growing but it’s not having that significant impact.
What we see actually growing more is hybrid cloud. So 10% to 25%. And it’s our view within 451 Research that the future of the cloud is hybrid. And actually for enterprise applications (inaudible) obviously we’re talking about traditional database-based applications if you like. The route to the hybrid cloud is private. So if you look at enterprise applications, be they customer-facing, back office, they are predominantly for the next two years going to be delivered on internal private cloud or potentially (inaudible) hybrid cloud. They’re not going to jump to the public cloud (inaudible) there will be some isolated organizations that do that. But by and large it just isn’t going to happen.
So back to the three drivers for change and the influence specifically on the database specs. So obviously I talked about that these all influence each other. Clearly we must recognize that actually it’s a combination of all three that’s driving adoption of and the emergence of (inaudible) NoSQL, NewSQL, Hadoop, and database as a service. So these new database types do have some things in common. But obviously they’re actually very different in terms of the use cases and the types of technologies involved. We look at NoSQL. We’re talking about nonrelational data models. Not always but most often trading off consistency for availability. Compare that to NewSQL where you’re talking about obviously maintaining (inaudible) perhaps best aspects of the relational data model but adding availability, flexibility, and scalability to that. Hadoop, obviously actually quite different. You’re talking about batch. Increasingly also interactive. But traditionally batch analytic processing of unstructured data. And then database as a service is something of a red herring in that it’s not actually -- it’s really like a deployment model. It’s got a different (inaudible) of database. It’s all of the above. Or indeed traditional relational database being delivered as a service.
In terms of the core use cases, obviously NoSQL being used predominantly -- obviously again you could do an entire presentation on each of these -- predominantly (inaudible) nontransactional operational applications, dealing with unstructured data and lightweight query. Compare that with NewSQL. We’re talking there about transactional operational applications, more structured data, more complex query. Increasingly what we call operational intelligence. Hybrid transactional analytic workloads. Hadoop, obviously nontransactional analytic applications, multistructured data (inaudible) complex query. Plus you see some examples (inaudible) and then obviously as I said database as a service, bit of a (inaudible) any of the above or traditional relational database management service delivered as a service. Obviously there’s (inaudible) different arguments about why and whether you would deploy as a service versus on-premises traditional requirements.
So we’ve seen this expansion of the database space as I’ve described over the last five, six years. And (inaudible) if you think of this as I said about -- you can compare it to the continuing ongoing expansion of the universe. Driven as I said partly out of (inaudible) polyglot persistence (inaudible) this idea about using specialist databases for specialist workloads. And so a lot of adoption of particular -- the different NoSQL databases being driven by this in recent years. And that gives you some advantages.
But what we’ve found is that you potentially get to a point where you have even -- could be a single application which has two, three databases underneath it supporting that for different data models or different transaction (inaudible) requirements. And that obviously introduces administrative complexity, development complexity.
And so we’ve seen a little bit of a shift. And it’s still early days in terms of this. But we have seen a bit of a shift towards organizations looking at what we call multimodel databases. So databases that actually -- you still want those multiple models. You don’t necessarily want three, four databases to deliver that. Obviously there’s trade-offs involved in that. But it’s an interesting emerging space.
Especially when you consider there’s the potential for NewSQL too to get involved in that as well. So we’ve seen the NoSQL vendors increasingly adding SQL-like capabilities to their databases. Some in terms of an actual (inaudible) SQL itself. And obviously the NewSQL vendors seeing an opportunity to add some (inaudible) or some key-value access (inaudible) so the lines are continuing to blur between these different database categories.
Thinking ahead about where this might end up, we talk about database as a service or X as a service (inaudible) being a deployment model. So it’s wrong to think of it really as a different category. It’s really how you (inaudible) deploy the database technology or consume your database services.
There’s also an interesting thing potentially with Hadoop, in particular in relation to the Hadoop distributed file system. Certainly in some organizations -- so we’re looking further ahead but we are seeing this happen in some early adopter organizations. Where data gravity -- so much of the data is now in the Hadoop distributed file system. Data gravity means that the applications and other database services are being drawn to that data, rather than (inaudible) around. So there’s an interesting prospect that HDFS becomes a common substrate for multiple data processing models and engines on top of HDF (inaudible) and HDFS. It’s not going to suit every requirement obviously but as I said we are seeing some interesting projects working this direction. So there is an argument to be made at the very least if we’re thinking about where this market is going the next five years rather than looking back at the previous five years. Potentially we’re talking here not about different database technologies to be deployed in isolation but actually the building blocks of a next generation data platform.
So I just want to finish off in terms of thinking about how we get from here where obviously there’s lots of different organizations in different stages of development and adoption around database (inaudible) but I think you can see that the mainstream still today is in this space. Centralized, scale-up, SQL relational databases being the majority of the database landscape. And we see that organizations are thinking about and wanting to work towards a place where they have this next generation data platform, which is multitenant, multimodel, multi-data-center, hybrid, agile, elastic, distributed, delivered as a service, and automated.
Actually a lot of things which used to describe the traditional relational database isn’t. And so the question is how you get from here to there. This is a work in progress. I’m really interested to get feedback on this. This (inaudible) thinking about where things are going. But certainly we’ve seen -- what we (inaudible) over the last few years is this shift towards tactical deployments of NewSQL databases, NoSQL databases, Hadoop, database as a service. Proof of concept and individual departmental application-driven deployments, often driven by shadow IT. So this (inaudible) tactical expansion in terms of number of databases that organizations are using.
We talked about the move towards multimodel. And what you can think about that as being is more around tactical consolidation. So organizations looking to reduce perhaps the number of different deployments that they have in terms of the number of suppliers. Looking at managed database as a service. SQL-on-Hadoop (inaudible) you’ve got your army of SQL developers and administrators. You’ve got Hadoop. You want to bring those all together. That’s why we’ve seen this big focus on SQL-on-Hadoop in the last few years. Federated query is part of that.
But at the same time there’s been this tactical consolidation of projects. There’s been a strategic expansion in terms of the types of applications and the mission criticality if you like of the applications that are being deployed on these databases. And they are becoming increasingly important to organizations in terms of their (inaudible) core enterprise applications but those next generation applications. Obviously an increase (inaudible) strategic vendor relationships driven by traditional data processing policies and requirements rather than this developer-led shadow IT adoption that we saw before. So this as I say is a work in progress. Be interested in feedback. It seems to be the way we see the shift going towards then a more strategic consolidation where we will see I think some M&A activity (inaudible) happens in this space. And we will see some consolidation from users demanding these different providers to work together more, bring their technologies together. And in terms of -- you talk about multimodel. We’ve already seen this in terms of the core functionality and core data models and approaches.
So what this means in terms of our map. Just to bring things back to this. I think we’ll see quite a few of these stops vanish either through M&A or through -- they can’t all make it. Just not the way the database market works, particularly when a lot of the money seems to be consolidated here. So we will see M&A. We will see companies disappear. But what I think we’ll also see is these lines becoming increasingly blurred and really being drawn across the map. We’ve already got Oracle supporting the ability to store JSON documents in there and IBM doing that. So the lines are really coming together. So whether this will actually continue to make sense and be usable I’m not sure. But I hope so. But for now it works. So we’ll stick with it.
So that’s (inaudible) thank you for your time. I hope that was potentially useful and interesting. I hope the map (inaudible) you in whatever you (inaudible) as I said we will send you the latest version of that. I think (inaudible) next presentation. Maybe have some questions after. But yeah, thanks very much for your time.
(Seth Proctor): So, I’m Seth Proctor. I am not going to show you nearly as many pretty pictures as Matt did. I apologize in advance. Matt, if you haven’t come across his stuff before, I highly recommend you go track it down, do some reading. He’s doing a lot of (inaudible) not just the subway map which I think is really cool, but he’s got a lot of good stuff he’s been writing on where we’re going in the industry in general.
And he and I have been doing a lot of talking lately and thinking in the same lines. And so what I’m going to try to talk about today are a couple things. Talk to you about what NuoDB is for those of you who maybe are interested in that. I’m going to show you a little bit of a demo at the end of this talk.
But I want to start by continuing that line of thinking that Matt (inaudible) around this trend that we’re starting to see around consolidation, this trend around multimodel. And talk a little bit about why that’s interesting but also architecturally why it’s challenging.
At NuoDB we use the word architecture a lot. If you’ve ever been to one of our talks, if you’ve ever read any of our papers, if you found out front the book we have that is our architecture white paper, we really focus on architecture architecture architecture. Really believe that there are systems that are well suited to do certain kinds of problems and systems that are not as well suited to certain problems and that if we can have a core architecture that’s designed to be flexible, a core architecture that’s designed to solve different kinds of problems, lets you tackle some of the things that Matt was talking about here.
Particularly you just heard this commentary on convergence. Matt mentioned at the end things like Oracle implementing JSON. We certainly see this kind of thing happening increasingly in our industry. We see NoSQL systems adding things that look more like traditional ACID systems, traditional SQL capabilities. Anyone’s looked at Cassandra recently. People talk about capabilities from CQL. You go talk (inaudible) DataStax, they’re talking about how they’re adding essentially limited transactions into the system.
If you look at non-ACID systems out there today, they’re increasingly trying to figure out how do we get some notion of consistency built back into the model. Maybe it’s not full transactional consistency. Maybe it’s statements. Maybe it’s transactional consistency on a single table or on a single element or schema. Maybe it’s some limited ability to run some kind of operation but not adjoined, or maybe adjoined but not adjoined across more than one (inaudible) maybe it’s time-limited. Other little paths that people are putting in place to essentially say how do we get towards something of a transaction consistent model where there didn’t use to be one. Again the reason that’s hard -- well, aside from the fact that (inaudible) that’s a hard thing to do, but the reason that’s hard is again it comes back to the architecture. Systems that are designed to be able to handle (inaudible) transactions. Matt mentioned earlier that many NoSQL systems or specifically non-ACID nontransaction systems have often taken the approach of trading off availability for consistency because that ends up being an easier technical trade-off to make when you’re actually building a system. It’s arguably easier to build something that’s available by sacrificing notions of consistency. It’s just that then your application developers pay for it (inaudible) have to figure out some way to sort out the mess.
And yes, you do see a lot of SQL databases adding support for things like JSON, adding support for key-value interfaces, adding support for RDF (inaudible) and we do see increasingly this interest in how HDFS and SQL work together. I’m not sure I’m in the camp of people who believe that Hadoop and HDFS is going to be the core for everything. I think it solves some very (inaudible) problems but again I think if you come back to core architecture, it’s designed for a set of problems. And I’m not sure I believe it’s a ubiquitous framework that lets you solve all the different kinds of problems (inaudible) interesting question is why is this happening, why now, and why is this happening in our industry. Completely agree with Matt. The subway map is (inaudible) easier to draw in a few years. Completely agree that we’re moving towards this trend.
I think Matt talked about some of the reasons for why this is happening. One of them has to do with simplicity. One of them has to do with the fact that you don’t want to have to maintain four different database technologies, six different database technologies. I talked to companies that are literally deploying dozens of different databases, tens of thousands of (inaudible) they’re juggling many many dozens of different technologies (inaudible) all different and they’re all running in different versions.
So you’ve got lots of different technologies and lots of different assumptions about consistency models. Lots of different APIs. Lots of different (inaudible) on disk that you have to maintain and manage. Different (inaudible) models. It’s a mess. And so I completely agree. That’s one of the core key reasons why people are looking to try and solve more problems in the same technology.
I think another aspect of simplicity increasingly is about the data itself. So something that we’re increasingly concerned about as you scale -- as you scale across a data center, as you scale across multiple data centers, as you start thinking about geographic scale. One of the real problems that people are facing is what happens to that data as it starts traversing different technologies in different locations. So if anyone has thought about deploying service in Europe (inaudible) turns out Europe, while it sounds like it’s a nice big boundary and everything in it is the same (inaudible) anyone who’s ever been to Europe knows that that is far from the truth.
And while there are laws that govern the EU, you’re going to deploy a service in Europe and you’ve got German citizens and you’ve got French citizens and you’ve got Italian citizens all using your service, their data needs to be managed differently. Needs to be stored physically separate locations. Has different rules of audit. Has different rules of governance. Different rules about retention, about who can access that data and what (inaudible) and so just when you step back and think about a really simple problem, like you’ve got your operational database, you’re doing e-tail out of it on a regular basis, you can do some batch analytics, well, now you’ve got two problems with your data. So there’s simplicity like I don’t want to have to (inaudible) operation, I don’t want to have to maintain two databases. But there’s also simplicity of I really wish I didn’t have to worry about the audit trail. Governance and everything else of that data in two different places as well. That (inaudible) is becoming increasingly a driver here.
The other reason. We’re starting to understand how to do it. We didn’t have this explosion of hundreds of databases because we had a generation of people who said, “You know what I really want to do today? I want to invent the n plus oneth database.” That’s not really -- that does happen. There’s a time and place for everything and it’s called college (inaudible) we all go to (inaudible) we all go to university, I know it’s early in the morning. Come on, guys. We all go to university, we all go through that, we all rebuild an OS or database or something like that in computer science. And then we realize that in the real world you don’t really want to be in that industry of constantly (inaudible) but databases really are fundamentally hard systems to build. And the scale-up and scope at which we’re trying to manage systems today is crazy.
Distributed systems are really complicated. Database systems are really complicated (inaudible) really complicated. Put all these things together, it’s hard. And so we really have I think -- part of the reason we had this explosion of technologies for a long time was because as technologists we’re very good at looking at tools we have in front of us and coming up with pragmatic solutions. We’re very good at saying, “Tomorrow I need to be running on 10 times the number of machines I’m running on today because I’m about to get (inaudible) it’s Black Friday. I’ve got to do something.” Or if you’re in Europe it’s sale day in France. Or it’s singles day in China. Round the world (inaudible) happens.
I’ve got to figure out some way of making this scale now. Technology is not there to just take my Oracle database and make it do that. Take my SQL and make it do that. And so we built a lot of tools that got us over those scale points. And it’s (inaudible) hard to say, “Let’s actually step back and take half a dozen years to rethink the architectures and rethink why those technologies don’t scale (inaudible) and then try to build a solution that actually solves them the way we would have liked to from the beginning.”
I think that’s what’s been happening in the industry over the last several years. Companies like NuoDB -- we’re not alone. I’d like to think we’re doing it in a pretty interesting way but there are several technologies out there that are basically stepping back and saying, “If you could have done it right the first time how would you have done it?”
So we’re looking at tipping point now where we really understand what it is about these other approaches that traded off things that didn’t want to trade off, where the complexity is coming trying to manage multiple systems or trying to conflate lots of technologies in the same place, and how do we build new architectures that actually address problems we care about today.
So multimodel. Multimodel I think means a lot of things to a lot of different people. And obviously yes, it’s I should be able to do document, I should be able to do SQL, I should be able to do graph, I should be able to do whatever else in a single thing. Or I should be able to solve operational stuff and analytics in a single thing. Simplicity, it’s really nice.
But I’m going to argue it’s less about is it SQL or is it JSON or is it something else, but it’s really fundamentally about how the data is used. It’s about what is that model for your data, what is the expected access pattern. This is how you optimize a system. So again it’s not about slapping on a new front end parser. It’s not about being able to support a new data type (inaudible) it’s about understanding when I say that I have a SQL database and now I really want JSON, what does that mean, why would I want JSON. Yeah, I’ve got a bunch of Web hackers who I’m bringing on and that’s what they know, great. But what’s really going on, what does that mean in terms of different access patterns, what does that mean in terms of different assumptions around security, what does it mean in terms of how the data should be indexed and accessed, what are the (inaudible) what are the things I care about, what are the things I don’t care about, and can the architecture, the underlying database, understand how to give you those kinds of scaling benefits, not just does it understand how to parse SQL and parse JSON and turn it into something, but then actually give you those kinds of performance characteristics you want.
This is what drives those access patterns, this is what defines where you should store data and how you should store data and how you think about do I need a fast disk, do I need a slow disk, do I need multiple disks, can I parallelize something, can I not parallelize something. RDF tends to be used for applications that are highly parallelizable. They tend to be things that you can distribute in a way that is harder to do in SQL. JSON tends to be flat and then it’s primary key indexing that is really critical. These are different assumptions about how you structure the data (inaudible) and how your architecture should react to them to be able to give you really good performance.
And ideally I would argue -- I know this has not been sexy for a long time, I think it’s making a comeback. Transactions are a really good thing. There’s a lot of discussion. I’ll say (inaudible) lot of discussion today (inaudible) NoSQL community about how transactions are heavyweight and they’re bespoke and it’s really archaic things that few people want and most applications don’t need them. I would argue that actually transactions were invented as a utility. They were invented to make our lives easier, to simplify failure cases, to simplify rules of the system, to make it simpler to program systems that are resilient and (inaudible) and to be more hands-off (inaudible) declarative interface like SQL so that you tell the system what it is you want to have happen. Not how the system should do it. And then you let the system optimize how to actually do the thing that you want to do as effectively as possible.
And so I think ideally this whole multimodel world is also about being able to wrap a transaction boundary around more than one set of data, around more than one use case, and being able to do it in a way where your core architecture understands what the problems are you’re trying to solve, and can optimize for those different sets of problems even though it’s in one system.
And I think this is where the architectures in the next five years get really interesting. This is where it all gets differentiated. And let’s take an example of why that architectural differentiation is so important. Matt talked about database as a service and how database as a service initially was specific products. It was things that were offered as a service. Then it started to become more just a general model (inaudible) if you ever tried to take one of the traditional relational databases and deploy it as a service, what you’ve done is you figured out some way of wrapping it in piles of scripts and building lots of complexity around it and somehow getting it into a cloud model or a multi-data-center model. At the end you haven’t really built a service. You built something that abstracts it, that makes it a little bit easier to get the bits installed somewhere that automates the process of doing it again when you need to do it. But it’s not really a service in the sense that you can just hit a button and something spins up in your data center or the public cloud, scale on demand, give you what you need in terms of resource capacity on demand (inaudible) single end point (inaudible) again that’s because of the architecture. The architectures were not designed to do that. And this is something that we want increasingly from everything we do. When we deploy stuff, when we think about cloud models, we think of our application tier as being something that we treat as a scale-up service. We think about our Web tier as something we treat as a scale-up service. Load balancing. Storage. Want the database to be able to do the same thing (inaudible) in particular we really want it because we want to get more and more towards a notion of whether you call it automation, whether you think about it as autopilot, whether you think about it as some way of being able to get a service running and then make it self-aware, that’s what we want to get to. Because again these large systems we’re talking about are very complicated. And while we’ve had this big trend towards DevOps in the last few years, everyone’s really excited about it, again I think DevOps is one of those things that is pragmatic in nature (inaudible) really know how to decouple the development experience from the operations experience as we start building more and more complicated systems that require embedding in your application logic, knowledge of how your database works. So we have these very rigid things. Then to make them scale, to change your operations model, to scale out, to add more machines, to react to failure, requires knowledge of what’s happening in development.
I think what we really want to get to is we want to get to the point where we can tease those back apart. Database is a service. The database is a service that is automatic. I can describe again just like SQL. I can describe what the problem is I’m trying to solve, not how to make that problem happen. Describe SLN. I can say, “This is the database I want. It needs to be able to solve these kinds of problems. Needs to be able to support these models. I need these kinds of resources. I need this kind of capacity on demand. Make it happen. And now I’m going to go focus on my application. Or I’m going to go take a vacation.”
Whatever it is you do. This is not a nice to have. This is not a wouldn’t it be cool if we had this. Because when you start talking to people in the enterprise who are as Matt says trying to get to that cloud model, who are trying to move from a pure on-premises model either to something in their own data center or something in other data centers, public cloud, trying to move towards service model, the limits of scale people are talking about are crazy. I regularly talk to people who say things like I’ve got 50,000 databases running, I’m running in 25 data centers, I’ve got this many applications, I’ve got whatever.
I say, “How do you manage all of that?” I recently talked to someone who had this incredible answer. They’re like, “Well, whatever percentage -- literally my percentage of the network traffic running the world and I know how it’s being managed blah blah blah, and I’ve got my data center with (inaudible) and there are four screens in it. There’s someone who sits in front of it. That’s it. Managing the system.”
The only reason (inaudible) build a system that’s that complicated and that scales to those levels of extremity is because this is how they think. It’s not about provisioning as a nice simple thing that I can plug into my (inaudible) then I have a DevOps person who’s always sitting there monitoring. But you can’t hit these next generation limits of scale if you want all the multimodel stuff or if we want real (inaudible) unless you also think about a way of setting it on autopilot and just telling when something’s wrong.
That’s as critical to scale as things like core storage. So where does that leave us? In Matt’s vein of what’s happening next. I think there are a couple requirements when you think about what you want in a distributed database (inaudible) it needs to be able to scale in and out on demand. Should be something that can use resources on demand when you need them, drop them when you don’t. Part of what that does is that gives you transaction flexibility (inaudible) part of what it does is it provides resiliency. It provides you a model (inaudible) to be able to handle failure; it provides you a model (inaudible) get ahead of failure. It provides you a model for being able to do online upgrade, online maintenance, without taking down (inaudible) no one is ever -- you should never buy a product from someone who claims 100% on time. Like pro tip, 100% on time doesn’t exist (inaudible) chose design (inaudible) 100% on time (inaudible) that’s a thing you aim for. Probably not something you achieve. But as something you want. You want an architecture that has the audacity up front to say this is designed for 100% on time.
But again you can’t scale. You can’t scale your development team. You can’t scale your operations team. You can’t scale around automation if you’re doing any of the weird tricks around sharding or active/passive models or latencies between when you can write something and when you can read something. You want a database that looks like a single logical database. Because it’s a single logical database, you can make operational changes without affecting application logic. Because it’s a single logical database you can run it wherever you want and it doesn’t matter. It’s a resource allocation problem, not a development problem. Because it’s a single logical database your application developers can start thinking about the next generation of what they’re doing without worrying about where it’s going to run, is it hybrid, is it public cloud, is it on a local data center, is it a mix of things. Doesn’t matter, it’s a database.
I think you absolutely want multimodel (inaudible) across mixed infrastructure. If it’s multimodel and scale-up and hybrid but every machine in the hybrid cloud has to be the same footprint, same size disk, the same memory, then you haven’t really gotten to the scale (inaudible) needs to be something that can run mixed resources, something that can run multiple locations, and be something that is simple to use. Because if you’re doing all these crazy complicated things and it’s not getting easier and easier, you’re never going to be able to take advantage of all these neat things.
At NuoDB we talk about architectures in a couple buckets. We talk about two traditional architectures that people use when they talk about distributed databases. And without going into a lot of detail, this is probably familiar to a lot of people in the room, there’s the shared disk view of the world (inaudible) argue something like RAC from Oracle represents something of a shared disk model. There are other systems that are built around essentially scaling out. By scaling out the disk substructure. That works well because a lot of our current databases today (inaudible) based on System R architecture from the early 1970s. Are still based on a fundamental scaling assumption. That’s cool; it’s amazing that that idea has such longevity. But it also suggests that probably we should be looking at what’s different today in our architectures.
There’s also the shared (inaudible) approach at the other end of the spectrum. Whether you think about that as a sharding solution, whether you think about that as a replication solution, essentially it’s the main thing that (inaudible) to the cloud. Take a database, chunk it up into lots of smaller disjoint databases. And then you can scale those databases independently. Again it’s a nightmare in terms of the number of copies of data you have, different places where the data lives. There’s no ability to think about a single consistent view of the world. Now your developers and your operators have to work very closely together to make sure they understand exactly how something is being separated. But it’s pragmatic, it works. Really interesting solution that’s come out of Google the last few years is Spanner. On top of Spanner is F1. On top of F1 there’s Mesa (inaudible) sorry. Yeah (inaudible) there’s ESOS and there’s a third (inaudible) I’m always getting them mixed. I think Mesa. Really cool stuff. Really really interesting. And really interesting in part because 10 years ago it was Google that stood up and said, “You don’t have to do SQL anymore. We’re Google and we bought into the idea that key-value is totally acceptable way to do enterprise. And here’s big data.” And it was like oh OK, I guess key-value is the way to go. And 10 years later they’ve gone the other way around. And they’ve said, “Actually AdWords, our only application that actually makes money, requires SQL. Without SQL we can’t make it scale. And by the way we’re moving a lot of our other core technologies onto a relational database. And by the way we don’t think MapReduce is the future.”
So interesting politically what they’re doing. But also really interesting technically. If you haven’t read the F1 paper I highly recommend going and reading it. It’s like 10 pages, it’s surprisingly readable. It’s very clear why it works. Why it works is because they bought a whole lot of atomic clocks, they bought a whole lot of high speed networks, they chunk their schema up into very specific models that map well to the geographies, and assume something about how the transactions run across those schemas.
You can do that when you’re Google, good job. The rest of us, we try to come up with a fourth architecture, something durable distributed cache. And it is exactly what it sounds like. It’s a system that assumes that in memory is important. That the way you scale a system is by running in memory through caching architecture. That is something you will want in a cloud environment. But then to be a relational database you have to be durable. And so you can’t skimp on that, you can’t say, “Well, it’s eventually durable, or it’s durable but in these failure models it doesn’t work.” No, you need something that is a cache that understands the rules of a transactionally consistent database. You need to be durable. But you need something that can scale out very effectively these kinds of models we’re talking about.
At a really really really high level this is what NuoDB looks like. Which is totally different than what you see in any other database architecture. NuoDB is a peer-peer-peer system. It’s a system where there is no central coordinator, there’s no master process, there’s no owner for any particular piece of data. It’s a true egalitarian vision. And it’s made up of these independent peers. In the spirit of a good distributed system it’s distributed in that formal sense. It’s seven peers that are all running completely independently but understand on what points they need to coordinate with their peers and know how to do so.
So it’s a system where any one of these peers can fail at any point. Any new peer can be added at any point. And the only difference between these peer processes is they understand that they serve one of two roles (inaudible) they’re either in memory peers, peers that have no interaction with disk, that’s where SQL clients actually connect to the database, that’s where transactions are run, that’s where we maintain atomicity, consistency, and isolation, that front end is also where we understand SQL. But separate from understanding SQL there’s a whole (inaudible) system that understands all of the rules of ACID transactions and how to avoid write-write conflicts, how to make sure the data is consistent at all times, how to coordinate among multiple cached copies of data, how to optimize caches, how to optimize the transactions that are actually running. So you think about SQL and that whole multimodel (inaudible) SQL is a very different problem than how you optimize for a scale-out transactional system.
And we also have these peers down here that focus on durability. No clients talk to them. They don’t care. They have no idea that it’s SQL. All they know is that there’s a distributed object system and that they’re part of that distributed object system and that they can make data durable and they can provide access to that durable data as needed. And in fact because this is an object system our storage tier is very simple, and it really is a key-value system. And so out of the box this could be disk. This could be S3 on Amazon. This could be HDFS. To us that storage tier matters a lot less than how you build out the rest of this peer coordination process. And again this just becomes now a question of resource management, what’s the problem you’re trying to solve, how many dollars do you have to bring to bear on the problem, where do you want your data, what (inaudible) reliability models. So you can have independent points of storage with independent points of access to the database.
And it’s wrapped in a formal provisioning and operations model. So there’s a management tier running (inaudible) at all times that you can access through our tools or through our APIs -- we don’t care if you use our tools -- that gives you information about data. It lets you actually start and manage, stop all these processes (inaudible) track what’s happening at all times and lets you do some really interesting things around automating the system.
At high level (inaudible) on that requirements list this is really what we’re talking about. Something that you do (inaudible) on demand, very high levels of transactional throughput, very low latency as you scale out. Something that’s designed with continuous availability in mind for failures but also for (inaudible) upgrades. Something that can be distributed not just across a data center but across multiple geographies. Continue to do low latency access for those applications that are designed to scale up. Social apps. Operational applications. Blogging, account management, session management, all of those things where -- applications that (inaudible) things that actually scale out would be things that give you low latency, high availability, and some really interesting rules around data durability.
Multitenancy. As a first-class solution it’s actually designed at that architectural level to be based on process level isolation, separate security credentials, separate storage locations. So you get a single place to manage the system but you can do all the traditional (inaudible) multitenancy but you can also get real isolation without having to buy a big honking (inaudible) system and use pluggable databases, whatever you call them these days.
I would give everyone just a couple-minute demonstration, because I think this is interesting. So this is from the new version of software we’ve been working on. NuoDB 2.1. There’s beta available. This is actually bleeding-edge. This is like Saturday night’s build so you’re like the first human eyes outside of NuoDB to see this set of features.
Just to give you an idea of why I’m harping on all this around the things that you usually don’t talk about databases. Simplicity of the automation, the management tools. I haven’t been talking at all about TPS. I haven’t been citing TPC-C numbers. I haven’t been talking about SQL isolation. I haven’t been talking about (inaudible) syntaxes. We do all that stuff, we’re happy to talk (inaudible) but when we’re talking about scaling out a database going forward, talking about the generation of things we’re trying to build now with cloud in mind, these are some of the things that I think are just as critical.
So what you’re looking at here is you’re looking at our new Web-based console for managing our system. And again this layers on top of APIs so you can use our Web GUI, use our command line tools, you can use our REST API, Python API. Interact with our system. Whatever you want to do. This one is a little bit prettier (inaudible) and from a single point of management what you’re able to see is you’re able to see statistics about what’s happening across a collection of provisioned systems. By provisioned systems what I mean is when you install NuoDB what you’re doing is you’re provisioning a host. You’re not starting a database. You’re getting those resources into the pool of resources where you can then decide is it one big database, is it lots of small databases, is it couple small, couple big databases, what is it -- again, what’s the resource management problem you’re trying to solve.
Each of those databases looks to the developer like a single (inaudible) database. So operationally you can decide what you’re going to do separate from writing code. And because I have a nice single point of management here I can just decide there are new things I want to be able to measure. So for example I can say, “Show me (inaudible) in a minute. Show me network traffic breakdown.” So very quickly, very easily out of the box you can start to measure key statistics that you care about from an operational point of view.
And because this is a database product I’m now going to start the database. Because at the end of the day that’s why you use our software. I’m going to do it a little bit differently than you might be used to experiencing. This is running on Amazon. I’ve got some hosts provisioned to my Amazon with no database running. So what I’m going to do is I’m going to say add database.
Anyone who’s used something like RDS or some other wizard, you wondered why you got to walk through five screens and predecide ahead of time all the storage limits and everything else and (inaudible) frustrating to you, this may be for you. What you’re about to see. Because what I’m going to do is I’m going to start a database. Give a name. Initial (inaudible) account. And then I’m going to pick a template. A template is that autopilot thing I was talking about. It is a service level agreement. It’s not a statement about where to run and how much disk (inaudible) provision and whatever else. It’s an abstraction. Just like SQL is declarative language to extract describing a problem you’re trying to solve from how you actually run it as quickly as possible.
I’m going to start with the simplest database you can run in our system. Single host database. Click. Say submit. It’s going to go off. Think about that. And then what we’re going to have is we’re going to have a database running (inaudible) bleeding-edge developer build. It looks really pretty but sometimes the refresh (inaudible) so it’s running.
You see down here at the bottom it says it’s active. Really all you care about. Doesn’t matter how it started. Doesn’t matter what’s going on. It is able to monitor itself and tell you that it’s OK. It’s running (inaudible) it’s made up as it happens of two peer processes. I talked about those two kinds of processes. The storage manager and the transaction engine. The in memory peer and the durability peer. That’s what we’ve got. Got these two things running together. Security credentials all set up correctly. So it’s talking over secure channels yadda yadda yadda.
What you care about is there’s a JDBC connection string now or a .NET connection string or whatever it is you want to use to talk to the database. It’s a database and it’s running. But at the moment it’s running only one host. Which is not very interesting from a distributed database point of view. So what I’m going to do now is live I’m going to edit this database and I’m going to target for a different service level and say, “Actually I’ve been doing some hacking for a while, probably ready to bring some production.” Let me change it to run in a minimally redundant configuration.
By minimally redundant basically what I mean is make sure that any host in the system can fail and I’ve still got full access to my data. And I still can interact with the database (inaudible) that in memory peer. So I’m good with any one host in the system failing. Now I can do this live. And again I can do this with a GUI. I can also do this through command line or APIs.
And so what it’s done is it’s gone off and it’s actually live taken my database. Applications, if they were still running, right now they’d keep running. They’re unaffected. So this really is like a live operational thing I’ve just done. I’m now running across three hosts. And I’m made up of four processes. Two of those in memory peers, and two of those durability peers. And so now any one of these three hosts can fail and I’ve still got a database running.
I didn’t have to do anything. I didn’t have to set up weird replication schemes. I didn’t have to think about how to do additional backup techniques. It just worked. And what’s cool is I can keep doing this. I was going to say I can keep doing this all day, except I can’t, because I’m out of time, and because we’re going to run out of templates.
But just to make the point. I can now change and say, “Actually we want to do now (inaudible) earlier.” So we’re running in Amazon and I’ve got hosts configured right now in Virginia, California, and Oregon, they’re all working together. At the moment this database that I’ve been running is only here in US East. That’s where, if we go back and look at my database called breakfast here, look at the three hosts running, you see they’re all just in Virginia region. So I’ve been scaling out but I’ve only been scaling out Virginia region. So now what I’m going to do is I’m going to go back to my breakfast database. I’m going to change it one more time. This time I’m going to say run against that geodistributed template. Which basically says continue to run at minimally redundant configuration but now do it in multiple locations. And again still looks like a single database. Don’t affect any applications that are currently running. Just scale out.
It’s still thinking. It’s still growing. It doesn’t move across geographies too aggressively because -- for those of you who haven’t come across the word hysteresis go look it up after breakfast. But that’s why it’s not (inaudible) too crazy.
There we go. So now in like 10 seconds we have a database that’s running in all three of those Amazon regions I configured. It’s running on 12 machines. It’s made up now of 16 peer processes that represent that in memory component. Durability. By the way in order to create those extra points of durability it’s had to just automatically bring up new points of durability to synchronize data. It’s just running. It’s just a database. This is where I think the logical extension of what Matt was talking about gets really interesting. Because this is where we’re going as an industry. Increasingly towards systems that have been architected from the start to really care about these problems of scale, distribution. To really understand why an architecture has to be able to support automation, why an architecture (inaudible) support different models, why you might be scaling across multiple regions for availability or latency, and I’d also be doing that because in a big small world I want my operational transactions running on cheaper systems, my analytic transactions running on bigger systems than Amazon, and still make it look like a single database. Just with different caches. Different access patterns.
Those kinds of problems are the things that we’re trying to solve here in NuoDB. Those are the kinds of problems I think are increasingly interest in industry as a whole. And I’ll just leave this running so everyone -- I realize that zero (inaudible) per second isn’t so impressive (inaudible) it’s a real database that’s running. I just didn’t think we’d have enough time to set up an application against it as well. I think that’s all I had. Since we’ve been running over, I’m going to ask Matt to come back up. We’ll take a few questions. Thanks, everyone. Yeah.
(M2): You both use the term multimodel (inaudible) perspective of (inaudible) reference model or from the database organization and management model, both?
(Seth Proctor): I’ll start with the specific use case, lower level. Then Matt, maybe you can (inaudible) some perspective to it. Here’s a use case I’ve heard several times from large IT shops. I got my physical inventory. Massive numbers of devices. IT infrastructure. I got them all in a relational database. I can inventory them so I know when I add something new. I can do depreciation. All the other stuff you need to think about realistically in an IT infrastructure. But because it’s infrastructure it also means that that router isn’t just in my relational database, it’s got a whole lot of cables coming out of it connected to other devices. And I really would like to be able to pose a hypothetical question. If I unplug one of those cables or when I turn off a router or when I change a firewall I’d love to be able to ask the question before I do that of what’s about to happen to my system. That’s not a relational problem, it’s a graph problem. It requires being able to look at the data that I was originally querying through a relational model (inaudible) looks much more like graph (inaudible) but it’s really the same data. It’s just viewed from a different point of view. And when people talk graph they often talk about columns. And what is a column store? A column store may be something about how you lay out data on a disk. But it’s really more about what you assume you can optimize. Column stores work well in large part because they tend to be applied to problems where things like vertical partitioning are really useful, where dictionary compression may be really effective, where parallelization is really good, where you care about being able to pull information as selectively as possible off disk, because every extra latency in the read has an amplitude effect on the overall length of the transaction.
And so when I’m talking about multimodel I’m really saying you may have data that’s even in the same data set or at least related in some fashion but you want to be able to ask different questions or work with that data in different fashions.
And there are different front end languages you want to be able to use. Different tools you want to be able to use for those queries. But there are also now (inaudible) different optimizations you want to make. And your system needs to have some way of understanding how to help (inaudible) optimizations. And keeping it all inside the same database first of all (inaudible) lot easier. But it also means that now I really need to be able to do a relational query, find the infrastructure I’m talking about, perform a query now to do some hypothetical analysis on it, go back to my relational model, and actually make the change, and register the making of that change. I need all of that to be wrapped in the same transaction boundary.
That’s the thing that needs to be (inaudible) when I’m talking about multimodel I’m talking about that end-to-end view of the world that lets you solve some really interesting problems that are hard to solve if they’re in separate databases.
(Matt Aslett): Yeah (inaudible) hypothetical example (inaudible) going to an (inaudible) initial thing is they want to figure out you are you, you’ve been there before, what did you do before. So potentially simple key-value (inaudible) identify who you are. Then they want to deliver you some information about what you looked at last time. And what you purchased last time. And what therefore may be of interest to you. Potentially (inaudible) you then may want to look at what do people in your social network on the site purchase after they purchase that, and what do they look at, what’s interesting to them. Potentially that’s a graph problem (inaudible) purchase something, that’s (inaudible) transaction system for that. So you think about that (inaudible) you could potentially (inaudible) four, five different databases (inaudible) what appears (inaudible) single application. Could actually be multiple applications (inaudible) or potentially (inaudible) single database that supports different models (inaudible) you did say it does mean different things to different organizations. But it is something to bear in mind when you’re talking (inaudible).
(Seth Proctor): I’m not going to say that cloud means the same thing to everyone now. But just like three years ago cloud definitively did not mean the same thing to two different people. I don’t think multimodel has a clear definitive term yet I think in part because it’s the point of view you come from. Again you come from the point of view of (inaudible) you say it’s all about that core architecture (inaudible) multimodel becomes very much about that architecture and getting towards optimizations on different paths for the same data. When you talk to someone that’s really focused on front end technologies, they’re probably going to say multimodel means I can do SQL, I can do document, I can do whatever. And the data is there (inaudible) I think it’s still early days on (inaudible) yes.
(M3): I had a question for Matt (inaudible) two questions. Sorry if I missed this in your presentation (inaudible) NewSQL extension. I’m not seeing that on the map (inaudible).
(Matt Aslett): OK, it’s a work in progress (inaudible) actually for reasons I won’t go into now (inaudible) basically if you look at MySQL Cluster (inaudible) NewSQL database (inaudible) it doesn’t really matter (inaudible) obviously the line is still there on the map even though the (inaudible).
(M3): Are there products associated with that (inaudible).
(Matt Aslett): Not really. It was more back to our requirement to put things in buckets. I just got a bit carried away (inaudible).
(M3): Other question real quick is Postgres-XL. Was that on here, or would it be (inaudible).
(Matt Aslett): I think it is on the new version, yeah.
(M3): Oh. Can we get the new version?
(Matt Aslett): Yeah. We will send that. I think send it out to everybody (inaudible) so you don’t need to look like this to (inaudible) yeah, I think we’ll end on that one (inaudible).
(M4): I have a question for you as well, Matt. So in this platform landscape map what are your thoughts about companies that aren’t necessarily pivoting but almost trying to pick up their station and move it? Like I’m looking at Postgres right now and their EnterpriseDB is really pushing JSON hard. And I think they’re ignoring the access language aspect of it saying, “We can do documents.” Well, you can’t take your Mongo app over. The (inaudible) protocol is different and such. So what are your thoughts about companies picking their dot up and moving it as opposed to sticking with their core competency and trying to make that dot just more substantive?
(Matt Aslett): Yeah. It is interesting. Talk about convergence. Obviously historically in the database space what we’ve seen (inaudible) relational database model (inaudible) absorbed one of the challenges that came out (inaudible) incumbent suppliers have done that. So it’s that classic it’s good enough for most people. But actually clearly there’s an emerging -- talk about emerging application requirements and use cases that (inaudible) that’s why people turn to specialist databases in the first place. So yeah, I think (inaudible) but I think yeah.
(M4): Maybe adding sizes to these dots would be interesting in a future version of this too.
(Matt Aslett): We had thought about that.
(M4): Give it to an intern.
(Matt Aslett): Yeah.
(Seth Proctor): Should also add real estate prices I was thinking.
(Matt Aslett): Yeah.
(M4): And I have a question for you, Seth. So early in your presentation you talked about companies that want to achieve massive scale need to start thinking differently but then later in your presentation you gave the example of -- and I could name Box, Pinterest, Facebook. All these companies that are not -- they’re succeeding without embracing this change. So is it your opinion? Are they succeeding because of or in spite of their stack?
(Seth Proctor): It’s a great question. I think there probably are many reasons, primary among them being that -- like I know some people at Box. Wicked smart. Really good people. Not too surprising that they’re doing well. I think there are a couple answers to that. One is the question of whether you’re trying to build something new when you have something existing.
I certainly spend a lot of my time talking to people who are a mixture of that, who have a lot of legacy systems for existing applications they put effort into that they’re trying to move from the (inaudible) model to service model or to mixed model, maybe it’s cloud, maybe it’s their data center. Trying to get into that common model.
At the same time they’re building new applications. They’d like to have some common infrastructure for that. When you look at something like Box, Pinterest, or some essentially -- or like (inaudible) when you look at those things people who got to start from scratch and say, “We’ve got a very small amount of time to scale something ridiculously, what works today?” And they sat down, they built things on what works today.
I don’t think they said, “We really want (inaudible) or we really want active/passive replication in our (inaudible).” They said, “This is what we know will work. We’ll do it.” And they build the application (inaudible) at that time I’d been able to go to them and say, “Hey, actually here’s a database that just looks like a database,” they’d probably go, “Great, yes. One fewer thing to worry about.”
So I don’t think it’s that -- I would never claim that you can’t make things scale with existing technologies, because there are too many counterexamples of that in industry. Facebook. Very few people have the resources to be a Facebook. Very few people really want to put that kind of engineering effort into something like that today. And I think given the option for something that just looks like a single thing that I can manage and is already cloud-native, I think that’s a preferable place to be. And I think there really is this massive collection of applications that people are trying to modernize. Those are (inaudible) things that you’re going to be able to take and shard or take and say, “Well, consistency models (inaudible) be delays (inaudible) something like that.” Those are things that will always require clearances (inaudible) that’s the way I look at those kinds of applications.
(Matt Aslett): (inaudible) looking at the database space is actually an exercise in cognitive dissonance. Holding two opposing beliefs in your head at the same time. There is definitely a considerable difference between the sort of thing you’re talking about, new projects (inaudible) and obviously all the existing projects which -- people don’t move their databases unless they really really have to. And we do see the landscape is changing. But it is going to take a long time (inaudible) there’s different motivations (inaudible).
(Seth Proctor): Let’s get a new person. Yeah.
(M5): For example. Facebook using NuoDB. Would it have similar problems? Or it would be some progress in (inaudible) you know what I mean? So let’s say (inaudible) such huge company use for example NuoDB --
(Seth Proctor): Well, so Facebook. That’s very much -- I’m going to punt a little bit. I apologize. Facebook, that’s like a whole other hour conversation. Because Facebook is on the scale that is very unusual. We’ve done public demonstrations on the Facebook (inaudible) benchmarks where we’ve taken public infrastructure and we’ve said, “Well, there it is running at 2 million transactions per second.” We don’t usually (inaudible) push it very far. Because for most people, they look at that and they’re like, “I’m never (inaudible) 2 million transactions per second (inaudible).” Facebook is running at -- what is it now, 50, 60 million? Does anyone know what the number is today? It’s something like that. It’s on a different scale than almost anyone else thinks about. So honestly we have very rarely ever tried to say, “What happens if you deploy 1,000 machines and try to push to that level of scale?” That’s really not where we test. We’re much more interested in people who have operational nightmares and are trying to (inaudible) the evolution of the product is towards much larger transaction quantities, much larger data quantities, because we think that’s where the industry as a whole is going. But I couldn’t tell you what would happen today if we tried to take the Facebook application and just make it (inaudible).
(M5): There’s a good chance (inaudible).
(Seth Proctor): I think there are very few things in this world that could really handle. The amount of custom engineering that they have done is just awesome. It’s remarkable the amount of work they put in. And it’s a great example of what you can do when you want to shift resources and put money into the infrastructure and the resources and the engineers and everything else. On the flip side of that of course there was a huge fire drill. The first time someone complained and they said, “My coworker who’s a woman (inaudible) before my wife saw it and she was really angry.” Facebook guys were like, “You’re right, she’s on the same shard, you’re wife isn’t, because when you first met you weren’t married, so we didn’t put you on the same shard.” And literally there was a six-month engineering fire drill (inaudible) to figure out how to solve that.
Because that ends up like seriously, no kidding. Anyone in this room who’s married understands that’s a real problem. It’s a real problem. And it’s not a real problem for me because I actually (inaudible) and joking aside, any number of relationships. Not just your spouse. Any number of relationships, this is an important problem. Trying to do (inaudible) coordination, trying to do other things around social networks. Being able to rebalance things.
If anyone played online games over the last 20 years, this problem of online games scaling by sharding, and then trying to go back and figure out how you get all your friends together on the same shard, so they can play together. There are games that famously that would be something you had to pay for and would take eight hours (inaudible) by the way everyone knows where the word shard comes from? From the gaming industry. Yeah. Because that’s actually one of the early early early online games. Couldn’t make their system scale. They were trying to figure out how to make it scale. They couldn’t figure out how to make it scale. They said, “You know what we’re going to do, we’re going to write into the story of the game, backstory. Reason you’re playing this game is because there used to be a single universe, spiritual energy existed in single jewel, and one day some evil force came along and shattered that jewel. Everyone’s living in these disjoint universes, disjoint shards. The goal is to somehow try to figure out how to reunite (inaudible).” Scale, so it was genius. They actually embedded into the game. And that’s why we have the word shard.
But yeah, so that’s the flip side of this. Could an existing system handle Facebook scale? I don’t know. But Facebook today has all these incredible problems where they can’t do things they want with their application. They can’t fix certain communications problems. There are all these weird challenges, and it’s because they have this architecture. So it means that they’re frantically trying to understand how do they expand, how do they grow what they can do, given this architecture.
They’re also going back to the drawing board trying to figure. They’re doing it from the other -- they’re trying to simulate many fewer numbers of much larger machines and build on that technology. But it’s a similar idea. We’re out of time. So just wanted again to say thank you to everyone. Thank you to Matt for being here. And a couple of us will be around for a little while. If folks have questions, want to chat more. Thanks, everyone.