Sean Catlin, Chief Innovation Officer at Atos, and NuoDB's cloud and data expert, Dai Clegg, touch on how organizations are meeting the challenges associated with building new or migrating existing applications to the cloud.
(Bob Walmsley): Well, good morning, everyone. It looks like despite the best efforts of the circle line it looks like we’ve got a full house here, so thank you for that. I’m just going to start out by introducing some of the folks here from NuoDB. Myself, and Boris actually flew in from the US here. Boris back there is our VP of Technology, and Boris will be staying around afterwards for any deep-dive technical questions you might have. And then I want to introduce the European team. We have [Di?], who’s going to be talking Senior Director of Marketing, [Shaun?], who leads our technical team, and Amir, Martin, who runs (inaudible) for us here, Mark, who’s Account Executive, the person responsible for getting everyone in the room today, and Ross here actually runs the Western region for us, although he’s a guest today here in London.
So we’ve rejigged the agenda a little bit, so I’m actually going to spend a little extra time walking you through some of the customer scenarios that we have in the cloud. Di is going to walk you through a deep dive on the product set and the technology, and then we’ll allow plenty of time for questions after that.
So we’re going to talk about cloud technology, distributed SQL databases, but I want to start out and sort of talk about some -- the meta-trends here to provide some context. So I spent my career helping high-tech companies grow quickly. That’s what I’ve done for the bulk of my career. And the opportunity for a company to go -- grow quickly is very much based on what’s happening in the marketplace. We’re very fortunate right now that there’s three major trends that are happening that are allowing NuoDB to grow so quickly. The first one is a business transition, which is everyone recognizes the value of moving to the cloud in terms of what that can provide, in terms of economies of scale, ease of access, etc. So you’ve got a huge business driver, which is moving to either a public or a private cloud. So that’s driver number one.
Driver number two, then, is -- in the database world is the understanding that now in order to support that business transition you need a distributed database, and since you -- if you’re dealing with important data you need dis-- a distributed database with transactions. And since the world talks SQL, you need a distributed database with transactions in SQL. So that’s the second thread. You’ve got a technology set thread.
The third thread is that the incumbent database vendors typically have not necessarily maintained the best relationships with their customer base. So you’ve got... (laughter) So you’ve got these three things: you’ve got a business trend; you’ve got a technology trend; and then you’ve got an emotional trend, which is, yeah, how long can people put up with Larry? (laughter)
So those three things playing together really create an enormous opportunity for NuoDB, and now I’m going to talk about -- I’m going to start out talking about who we are, what our customers are doing with the technology, then, then Di will give you much more of a deep dive.
So background on the company: headquartered in Cambridge, Massachusetts. For those of you who haven’t been to the US, that’s basically Boston. Cambridge is where pretty much many of the universities are in Boston, just across the river. It’s a single, logical SQL distributed database, so the illusion for a programmer is that they’re writing to a plain vanilla SQL database and it just happens to be distributed anywhere that you want in the world, and all that complexity’s taken care of for you. So that’s the paradigm. We’re a mainstream, general purpose transaction processing system, so we’re not trying to do... If you’ve got, you know, 10 billion petabytes, and you want to lock up some random query, that’s not us. We’re about online transaction processing, and then analytics processing on your sort of real-time dataset. We’re not trying to do -- we’re not trying to be a big database. There’s plenty of big data databases that you can go after. Our focus is on transaction processing analytics over your operational dataset.
I’m going to talk about the management team, because I think that’s a key differentiator in terms of who we are, so we can jump into that. This is what I call my everyone but Larry slide. So the original four database companies, Gary Morgenthaler is the founder and CEO of Ingres. Mitch Kertzman became the CEO of Sybase, and then Roger Sippl over here was the CEO of Informix. So we’ve -- of the core, original core databases, we’ve got everyone but Oracle here as part of an investment in who we are. And then the founder is Jim Starkey, so Jim was one of the key architects behind DEC Rdb. He went on to invent MVCC, multi-version concurrency control, which is now the core database feature that is used in pretty much every leading edge database technology. And he was a person that in 2006 to 2008 came up with a radically new architecture. In order to have a distributed, relational database, he basically had to, you know, step back. Instead of just tweak architectures that were 30 years old, he needed to fundamentally change what the architecture was, and that’s what Di is going to be talking about today: a radically different architecture that makes it very simple for the programmer, makes it a little more complicated to implement, and that’s what we have a patent on. And then the technology team here are all folks that didn’t leave Facebook or Google two weeks ago and decide they know how to build an operational database. All of us have got a lot of battle scars from having built databases, distributed systems many times in the past. We recognize that, you know, the applications we’re going after are serious, critical to your business, and we are an organization that’s been set up to, you know, support that.
So one of our largest customers in [AMIA?] is Dassault Systèmes. Dassault are close to a $3 billion ISV, the second largest ISV in Europe after SAP. And the business driver for Dassault is really very simple: they’re most famous for a product called CATIA. It’s an on-premise 3D modeling product. If you’ve got an automobile, you’ve ever flown in a plane, or, in fact, Larry Ellison’s yacht, it was designed using this. It’s a very, very serious set of software. They now have 13, 14 different applications. And their business driver was they wanted to move to the cloud. They want to take this enormous class of best of breed applications, put them together, move those set of applications to the cloud, and as they looked at the underlying database technology that they were using, DB2 and Oracle, they realized that the sets of requirements that they have for moving to the cloud would just not be sufficient for the existing set of architectures. So they actually came to NuoDB now a little over a year ago, and we worked out a partnership with them. They deployed in January of this year over 20,000 users, and their cloud platform now sits on top of NuoDB.
So there are companies now who do every single thing in the cloud, in the public cloud. An example of that is Tesla. So Tesla have no on-premise software in their building. They build the, you know, the world’s, you know, most modern -- probably the fastest-growing car company in the world, most modern automobiles. They don’t have any on-premise software whatsoever. Everything they do is in the cloud. The entire Tesla system was designed on top of Dassault Systèmes in the cloud, no on-premise software whatsoever.
So we’ve been working very closely with Dassault. We’ve learned a lot from that partnership, and you’ll see -- you’ll gradually see them transition more and more applications over to their cloud environment.
This company is a payments processing company. We actually can’t reveal the name, because they view us as such a competitive advantage that they, as part of our partnership with them, they actually gave us a list of 15 companies that we’re not allowed to do business with, because they view us as such a differentiator in this marketplace. They’re a billion-dollar payments processing company. They have 21 of the world’s 25 largest banks. Little-name companies like Barclays and HSBC run their payments processing on them. They’ve got $13 trillion a day going through their system. Last time I calculated, $13 trillion was a lot of money. And their business driver is they’re -- they’ve grown through acquisition, so they’ve acquired about eight or nine companies in the last few years. As they think about unifying their data management platform, they want to do that on today’s technology. So very much a key driver for them was to integrate -- use the latest technology to do that, use the technology that would allow them to scale very comfortably to the cloud, and not only did they become a customer, they also invested in us, and, in fact, they invested in us in the latest roundup. So very serious investment here. We go into deployment with them the end of the year, at which point they will then, you know, announce -- we’ll do lots of joint press around what we’re doing.
And their technology problem was really that -- for those of you who are familiar with Oracle GoldenGate, it’s a very sophisticated replication technology. It’s incredibly difficult to implement. They have a number of the world’s top folks who actually used GoldenGate before Oracle acquired them. And for two data centers, if you’ve got the smartest Dis in the world maybe you can get it to work. For three data centers, there’s no known examples of it working correctly. So they’re very, very comfortable with technology, have explored everything there is, and reached the conclusion that for what they wanted to do as part of their next generation architecture they really needed NuoDB.
In England you have -- you’re a little more advanced: you don’t use many push-to-talk technologies, but, believe it or not, in the US it’s still very common, construction companies or everything else, to basically have a modern version of a Walkie-Talkie. And that’s push-to-talk, where you just push a button and suddenly you’re talking to someone else on the network. And their challenge was that their large customers like AT&T, Verizon, Ericsson, came to them and said, “We, you know, we love your technology, but you’re currently -- what you’re asking us to do is drop an appliance into our data center. We’ve had it with appliances. We want you to drop your software into our, effectively, private cloud environment and run it that way.” So this is an opportunity for them to step back and say, “Our customer base is growing dramatically. You know, what architecture should we use?” They’re using a technology called Oracle TimesTen, effectively an in-memory database. So, very high performance. The challenge with it was that in order to scale -- so they wanted -- AT&T said, “Well, that’s nice, but we’d like to grow dramatically. We’d like to give you a lot more money, because we’re going to have a lot more subscribers on this system,” and their current challenge was that the way the system was working is they had to shard the database. Effectively, they couldn’t have one logical database. They needed more subscribers; they had to basically re-architect things in order to scale out. So we solved that problem, giving them one logical database, allowed them to think about distributing it geographically, allowed them to get off an appliance model, and just provided a lot of core capabilities. They go into deployment here in the next 60 days.
They have an enormous number of IT assets spread out throughout the world, and the current mechanism for keeping track of those assets was spread-- distributed spreadsheets. (laughter) Everyone just types in that spreadsheet where their assets are, and the challenge with that is, you know, worldwide nobody knows what everyone else is doing. You’ve got no way of keeping track of licenses, of cables, what impact do I have if I take this system down. So very much a significant business problem for an organization that large. As they went to look at the product requirements, they needed a system that could work easily geographically throughout the world. They needed a system in which the person who was entering data in Asia, the person who was entering data in New York, the person who was entering data in London, all got real-time performance. The current architecture before they looked at us had everything sitting in New York, and so if the connections were slow to London or Asia, what actually happened was the people in London and Asia just gave up, didn’t bother entering the information. So these geographic systems here or the latency had a huge impact on your business results, which is if a system is slow nobody wants to hear that, “Sorry, you know, my architecture only allows me to have a database in New York.” If the systems are slow, people won’t lose them. They’re not really business systems after all.
So those are just a few examples there of the customers that we have, but again, the drivers are: business driver of getting to the cloud, private cloud or public cloud; the desire to build on top of a distributed architecture; and then, lastly, in many cases, the desire not to have to do business with a lot of the incumbent companies whose business practices have become very painful.
So with that, I’m going to hand things over to Di here, to jump you through the technology. We’ll be around afterwards to discuss either more business-level examples, customers, business problems, and we’ve also got our technical team here. We can do a deep dive. So we’ll save questions here till later on, but we can go in either direction and spend as much time as you’d like this morning. Thank you.
(M1): I’m sure -- (inaudible) everything else (inaudible), but I’ve got some nice, pretty pictures, as well, (laughter) so stick around for pretty pictures. “Database in the cloud.” Can I just take a show of hands: how many people are actually implementing applications on private, hybrid, or public cloud right now? All right, some of you. And I presume, since you’re here, it wasn’t just the breakfast you got here. The rest of you are seriously planning to do that. So, you know, what Bob said about the importance of the movement to cloud, the migration to cloud has been a major business driver, and that’s no exaggeration, I don’t think, just, you know, [evinced?] by your response to that. And I want to talk about some of the challenges for getting database applications into that environment and get them effective. I’ll talk about -- I’ll [include, then?] a couple more customer examples than Bob talked, and then we’ll do a little bit of a kind of peak under the covers of how NuoDB works.
But first, what I’m going to talk about is the cloud database problem. That was going to end in titters, wasn’t it? (laughter) “Lift and shift” is not the way to get your replications to the cloud. Why is that true? Why does NuoDB fix it? Who says so? So we’ll talk about some customers (inaudible). We’ll peek under the hood. But the reason why migrating a database application to the cloud, why do I think that is fraught with difficulties, I’m hoping some of you will kind of nod sagely when I drill down on that, is that -- you know, and Bob’s already talked about it -- this is a key motivator for a lot of our customers -- ISDs, major organizations. Old SQL, if you want to move that into a cloud environment where you want to scale out -- they don’t, they scale up -- or where you want to do distribution -- well, they don’t, they do replication -- then the NoSQL guys, they were born for the cloud in the sense that scale-out is automatic. Scale out on commodity, elastic, great. Except, where are all the skills in your organizations? Where is all the code? Where are all the tools that only your end users want to use? They’re all built in SQL. Then you’ve got the NoSQL guys, “Oh, it’s no problem, you do Elasticsearch.” Oh, great, then you’ve got -- you’ve got SQL for the Elasticsearch. Well, exactly! [Oh, but just ad hoc here?]. Well, not quite. So everything is like a complete rebuild, re-architect. So I have spent a great many years in the SQL world, with Oracle, with IBM, and with Netezza, but I’ve also worked in the NoSQL world for the last four or five years, so I kind of -- I’m one of the few people who’ve got old SQL, NoSQL, and new SQL skill -- experience. Maybe “skills” is maybe pushing it too far, but I’ve certainly been there. I’ve got some of the scars. And there’s plenty of applications for NoSQL technologies. I’m not saying there aren’t, and Bob alluded to that, some of the things you do there. But if you want -- if you’ve got SQL applications, you’ve got transaction processing (inaudible) applications, moving that to the cloud, that’s a big challenge. And so I want to talk a little bit more about that and a little bit about NuoDB will fix that for you.
But first -- and I’ve turned my own slide around when I was shuffling the deck. I was thinking, yeah, actually, it’s more important to talk about customers who do stuff like you do, which hopefully has resonance to you, than focusing on our technology. This was why they came to us. Active/active -- actually, they’re not deployed active/active/active yet, but that was certainly what they’re expecting to do with our technologies, and this idea that it’s -- distribution doesn’t mean I have a disaster recovery setup somewhere else. It doesn’t mean I have [sharded?] databases that are completely independent and running the same application [and times?] and different databases. It doesn’t mean that I am running on -- I am kind of in the background replicating so that I’ve got two -- the same database twice. It means you’ve got one database. It is accessible by clients in different locations, different geographies. The storage of the data in it is actually distributed across your network, so you’ll often (inaudible) -- for locality reasons, you’ll store the data on the East Coast or in Europe, (inaudible) mostly used by the Europeans, but it doesn’t mean that that data isn’t accessible by US or Asian-based client end users. It’s a distributed database, active/active, and they are all updated, and they’re all executing transactions all the time.
I hope that some of you are going... Because, like, hello, distributed asset transactions -- I hope that that gives you pause for thought, because if it doesn’t, then that’s our differentiator, (laughter) the fact that -- as I said, the NoSQL guy is you give up transactions that you want distributed. The old SQL guy is you give up distributed if you want transactions. What NuoDB does is allows you to do that active/active/active, fully distributed, fully transactional.
The other problem that Bob was talking about was that not just this replication, multiple data centers, it was a nightmare when they wanted to upgrade anything. They want to upgrade hardware, upgrade software, they have to open a third -- create another DR, copy the DR to that, so they could upgrade the DR, switch back to that DR, then make that one the active while they upgrade... Nightmare for doing that, and that was whether it was infrastructure, hardware, software, or their application. I’m sure some of you have experienced that, the difficulty in those situations. We made that a lot simpler for them, and for -- particularly for ISVs, people who -- and it doesn’t matter, actually, whether you are genuinely an ISV or whether you build applications within your -- an enterprise, you know, as a service. You do not want to have to impose on your users a huge administrative blow. A, they don’t want it. They won’t deal with it. They will fail to execute effectively, and then blame you. And it doesn’t scale, because you’ve got this huge service burden on the back, either within your IT team or, if you’re at an ISV, in your service organization, to try and support customers. And also, this whole thing, it’s SQL, therefore you do not have to learn a complete new architecture, languages, toolset, the whole thing, in order to get them going (inaudible).
I’ll talk about another of our migrations, another of our customers, which was -- yes, this was a European ISV, a mobile product. They’d been piloting this in relatively, what I call, emerging markets, i.e. small, (laughter) relatively small counties where the number of subscribers will be no more than a million. They were shipping this on an appliance-type solution, but the problem is it wouldn’t scale. They were getting reluctance to putting their own hardware into their customers’ data centers. In fact, they did the migration themselves. We didn’t help them with the migration. They took their SQL out on this appliance, and they migrated it on NuoDB onto commodity hardware. In fact, their performance when they first did this, which they did as their own -- making the POC themselves, proof of concept project -- when they first did that, the performance was slightly less, about 80% or 90% [short?] of what there was -- what they’d been achieving on a dedicated appliance built to do that software/hardware stack. We went in, helped them tune it up, and actually we were ending up getting better performance on cheaper hardware, on commodity hardware, than they’d been getting on the -- they’d been getting on their appliance. And we broke this scale-out offering. Now they could offer it on the cloud, which is what their customers wanted, and because they had scale-out, they could offer it for larger markets. They could deploy this and move this into larger markets. So we broke the -- we fixed the cloud problems for them, scale-up, performance. Soft commodity hardware, genuine cloud platform, and the ease of migration is the third case (inaudible).
Yeah, this is interesting: they’re actually a cloud provider. We make -- we have two offerings, because IS-- I deliberately picked ISVs and cloud providers for today, for my customer stories, rather than Bob picked his (inaudible), people like that, major banks and huge software companies. It’s because these are typically -- well, if you’re a cloud provider, you’re definitely well committed to cloud technologies, and you want to know what you’re talking about. So we do a couple of things with these guys -- in fact, they’re not our only customer in the same offerings [though?] -- is that we provide the database that runs the cloud in the sense of, you know, subscriptions and connections and provisions and metering, which feeds into the billing systems. The actual metadata that runs the cloud for them is a NuoDB database. In fact, they’re piloting that, because their intention is that they offer database as a service to their customers, to end customers, would be NuoDB on top of it. And we’re seeing that with a number of customers, a number of cloud providers.
So those are my stories, and I want to go back to where I said I wanted to be. This is what our customers are telling us. They want cloud applications. They want geo-distributed -- genuinely distributed -- databases, but they do not want to give up skills, and their SQL code, and they can’t give up transactions. Now, for some applications you can give up SQL transactions. They’re not important. Even some updating applications. A lot of -- for example, some Internet of things type applications where effectively all you’re doing is you are capturing events: this thing happened, this thing happened, this thing happened. There cannot be any locking problems, because no two people -- there is only “I am this device, you know, embedded in this car at this location -- well, at this time, and here is my reading.” You know, it’s in a [pend-only?] write effect in the applications. There is no locking problem. You don’t need transactions in the sense of -- and there’s only one record you’re ever updating: it’s a new one of these. So for something like that -- and, as I said, my background in NoSQL databases, six months I’d (inaudible). That was a use case for me in my Cassandra database and my Couchbase database, of course, yeah?
So I’m not saying that every application, every updating application must go to NuoDB because it’s transactional. That’s not actually entirely true. There are lots of updating applications which don’t need that, but there are plenty, all effectively transaction processing, (inaudible) transactions. You know, if you’re running -- if you are running the sessions store for a big e-commerce site, and somebody puts an extra one of these in the shopping cart, and then when they come to check out that extra one of those wasn’t available actually because you didn’t -- you know, because transactionally you weren’t secure, and somebody else has already checked out with the last one of those, OK, put it on back order, it’s not a big problem. But if you just did a financial transaction, like Bob’s example, where you move some money out of this account, into that account, well, that better still be in that account when this account gets it. Well, (inaudible) already moved out here (inaudible) somewhere else. That’s the difference between transactional and just maybe -- OK, well, it’s updating, but who cares? And that’s what... You know, have I said that -- is that cogent for...? You all recognize that either in your own organizations, your clients? Is that if I am going to go away from SQL, because of some other reason, there will be what I call the hidden cost of free software. You know, the NoSQL stuff (inaudible) all open source, it’s all free. The hidden cost of free software is the data [unicorns?], the big data [unicorns?] you need.
So it’s interesting, right? This is your application. Users connect to an application server, connected to a database server, connects to storage. So that’s pretty much the modern application, nothing controversial, nothing interesting about that. You want more of them? It’s no problem. More users connect to more application servers. You scale out the application servers. The web tier will scale out. That’s all -- you know, that’s all cloud -- that’s all been internet, web-scale stuff, no problem at all with that. Storage: storage servers, a great -- a huge, you know, a huge investment over the last 10 years, in the storage server area with, you know, massively parallel and scale-out database services. You database in the middle, if that’s a traditional SQL database, well, it tend to have got bigger. You’ll have scaled that one up. You’ll have bought bigger processors. You’ll have bought a fatter cluster. You’ll have had more cores in all of the processors. That’s how you would scale up; you know, Oracle rack, and maybe (inaudible) data, yeah, or on HP. So you can’t -- that one gets bigger, those get wider. And that -- you do it again. What happens to the database, right? Eventually, that thing’s going to start straining, because it only scales up so fast. The others scale out.
And that is the problem of the traditional application such that when you take it to the cloud and you want to scale, these are fine for scale-out. They all have built into them the ability to do parallelization. Your database, the traditional database, SQL database structure is a scale-up, and you need to break that, and that’s what I was talking about, and that’s what our customers are seeing. That’s what we’re doing is this stuff here, about the geo-distributed, about the scale-out, and about elasticity. Because the other thing is that even if that big, fat box was big enough and fat enough to handle your peak, how many weeks in a year, days in a year do you need that peak? But you’ve provisioned the hardware for that for three years -- for three weeks of peak demand. You’ve licensed the software for that. You know, you say what you like about, you know, the business models and the commercial practices of the traditional vendors in terms of holding you to the license for the maximum, but the fact is even if they didn’t do that, you’ve still got to buy the kit to run it. That’s the whole point of the cloud is I want to provision twice as much capacity, three times as much capacity for one week or one month in the year. Well, I can do it. I couldn’t do it with those three. That’s the [turkey?] which in the past gave you a real hard time, even if it wasn’t bursting at the seams, even if it was still -- still had that big box all the time, with all the cost and complexity of doing that.
So nothing on this slide I haven’t talked about already. As I said, not all use cases require transactional, but there are plenty that do require ACID transactions. And I’ve listened to the CTO of Cassandra, the biggest NoSQL database vendor, and the most successful -- and, you know, for many good reasons they’re successful. I’ve heard him stand up and say, “We’ve introduced in Cassandra 2.1 lightweight transactions so you have transactional processing in Cassandra.” And the room goes quiet. This is a tech meetup in London when he did this, yeah? “But I don’t recommend you use it at all, because it really slows it down.” (laughter) And a few -- another time I was an analyst with Bloor Research, and I took a briefing from DataStax’s SVP of marketing, [got him in that file?], and I said to Matt in the course of this briefing, I said, “Yes, well, one of the problems a lot of customers have with adopting not just Cassandra but most SQL databases generally is the problem of transactions.” I said, “You know, what’s your roadmap on that?” He says, “Oh, we’ve got transactions.” So, that’s interesting. I heard Jonathan the other month say you shouldn’t use... (laughter) Yeah, and that’s the problem: you’ve got the marketing guy saying, “Oh, we’ve got transactions, we’ve got the problem sorted.” You’ve got the CTO saying, “Yeah, guys, tech guys, we’ve got -- we can do transactions, but you don’t really want to do it unless you absolutely have to.” And then there’s a load of coding, and then there’s a load of rearchitecting, and then there’s a whole load of, you know, learning, and, as I said, the big data [unicorns?], if you want to go that route, are very transaction in SQL, “This is our sweet spot, this is where we want to be.” I’m not trying to claim that every application in the world is ours. I’m just saying that if you’ve got this sweet spot -- and I know lots of people do have -- so our customers are telling us that.
And this is where we fit, in part of the strategy, and particularly (inaudible) the cloud and (inaudible), because active/active/active with Semantics, with ACID Semantics around the world -- this is -- that’s just a chart -- I know you can’t read the chart. That’s on Google Compute Engine where we ran, you know, two, four, eight, 16 NuoDB processes, and we were just pushing a transaction generator at it, because (inaudible) bench market, and, you know, you top out -- that’s the transaction rate that you manage. At this point we kick in more transaction processes. The database does-- isn’t scaling here; it’s the same database. What we’re doing is computing elasticity. Because database -- your actual data storage, you don’t need to be elastic, generally, do you? You know, yes, your database will grow over time, and your problem is not that suddenly on Black Friday you’ve got 50,000 more products to sell, or 150,000 new customers. It’s just they’re the same customers you have all buying more of the same products you have. It’s -- compute scale-out is typically the thing where elasticity really counts and is important. And if the only way, frankly, that you can get elasticity, you can get scale-out, is to put more database nodes in and share the data out between them -- which is how, for example, something like [React?] or CultureBase works -- you have to keep shuffling data when you scale out, which means that scaling back in is a pain, because you’ve got to shuffle data back. You don’t want to do that. What we do is complete elasticity, because that’s typically where you really need that elastic. Scale-out, yes, gradually, on the database. Scale-out on the compute power. Let’s say we just switch more on, and then de-provision them when you don’t need the peak anymore, so (inaudible) you’re not paying for them -- you’re not paying for the cloud that you don’t use, which is obviously one of the main reasons why you might want to go to cloud in the first place.
[Arbitrary?], typical to any scale-out, you’re going to have as much redundancy as you want. You can tune the Commit levels. So, for example, your transactions have to be persistent in at least two regions before you’ll allow it, or you’ll say, no, it’s got to be persistent in the -- in memory cache, and it’s also got to be persistent on at least one back-end store in order to be -- [you tune that?]. Most of our customers will use what we call Commit 1, which means it must have been broadcast to everybody in memory cache, so you’ve got to be one cache and a log, you’ve got to broadcast a synchronicity to every other cache (inaudible) processing cache, and you’ve got it on at least one disk somewhere, and it’s also been broadcast to as many other disks as you want. But you can (inaudible), say Commit 1 is what most of our customers use, and all of this stuff is high levels of automation in it, scripting, preconfiguring, so that when you need to, for example, provision from scale-up, you don’t have to worry about that. You needn’t have it automatically monitor itself when latency is rising on transactions. [As a rule?], I think you need a bit more Commit here, or (inaudible) some more service, and have it de-provision them automatically when you’ve -- when the transaction load drops.
But particularly in the face of what Bob talked about before, I don’t want to repeat myself too much, banging on about this stuff, but... And so I’m going to wrap up and say we’re going to wrap up a little early, but, as Bob said, we have a full kicking team here of the tech guys, and the customer-facing people, and so any follow-up conversations you want to have, we’ve got plenty of time for that. And I hope we’ve got some fresh coffee in the other room so you don’t have to go on your way just yet. But before I do that, I have -- it’s like -- Arthur C. Clarke once famously said, “Any technology, if sufficiently advanced, is indistinguishable from magic.” And my experience is that solution architects, generally speaking, don’t believe in magic. (laughter) So I’m not going to try and prove to you, if you like, exactly how -- or show to you, or [establish?]... I just want to say that if you did raise that eyebrow when I said “distributed asset transactions,” I’m going to show you enough that you said, “Oh, that’s interesting. I can see how that might work. I ain’t convinced yet.” If you are, then you probably convince easy. But I want you to believe -- I would like you to go away thinking, my God, these guys might have something here, and then you can grab the rest of us and have the drill-down conversation after this.
So what does it do? How does it work? Multi-tiered architecture, four levels, basically: storage, and, of course, storage managers, transaction engines, and the brokers and agents. And basically, clients connect to brokers and agents. They connect with the transaction engines. They scale out, as I said, elastically. And all the transaction engine does is it shares, with all the other transaction engines, a distributed cache of all of the let’s call them records that are being used or recently have been used in any transaction. So this is in -- your processing is all an [in-memory?] machine, effectively, except it’s all persistent. And these turkeys down here, what we call storage managers -- as clear in the title, all they do is manage the storage -- they share that same cache, which is replicating automatically between them, all in memory. And all these things do is they don’t -- no clients connect to them. They don’t -- all they have to do is make sure that whichever bit of the cache they’re responsible for, they persist in it, and that’s -- when they get an update. And whenever somebody requests for a transaction for a client any record that isn’t already in the cache somewhere, they serve it up and share it. What typically happens in a (inaudible) system is pretty much everything that’s going on is in cache, and the only things these things are doing is they’re storing updates. They’ve pretty much served everything up that they need to. And none of -- they don’t all have everything that’s in cache. They have what they need, either what they’re responsible for, if they’re a storage manager, or what their clients have asked them for, their transaction engine. And that’s all being done dynamically. So the processing speed is all in-memory. It’s effectively -- it’s like a distributed in-memory database, except that it’s unlike [so?] pure memory databases, which is if you run out of physical memory, and you run -- and it falls over, (inaudible) this thing is not. It is [of a?] system, what we call a distributed persistent cache.
And, of course, at any point in this, you can stick the (inaudible) from a -- if you like, the difference, logically, between a distributed one of these and a local [N1?] data center, for where this is all [LAN-ed?], so where you have one more WAN connections in there, logically there is no difference at all. If I’m over here, if I’m connected to this transaction engine, and I request a record that this storage manager here on the other side of the globe is managing, and it’s not replicated in my -- in the local one to me, then I’m going to have a little bit more latency lost that goes and gets that, unless it’s already in cache. When I [was?] a new record here, what I’ll actually do is say, “Excuse me, can you give me this?” This is a broker and agent job at this point, is this thing knows it’s got a map here of everybody’s cache. It says, “I ain’t got that, but I know that this one’s got it, so I’ll ask for it.” Or, “Nobody’s got it in cache; I’m going to have to ask that storage manager for it.” And they may, either on the WAN, then -- OK, we’re going to get some latency in that, but that’s all. And that’s -- again, on a [warmed-up?] database, that is relatively infrequent, particularly if you do something like, for example, this is Europe and this is mostly European data, and maybe reference data is replicated in both places, and this is the American data. If you looked at DBT-2, it turns out the TPC-C benchmark -- we just did a million [NOTPMs?] for DBT-2 on our latest release. Ooh, should I have said that? Yes, it’s on the website today. (laughter) Phew. We did a million. And the way we did that was that benchmark works where you locally -- most clients will connect locally to their local warehouse (inaudible) processing, and we just partitioned our data to match that.
But it’s not partitioned; it’s segmented. It’s not sharded, in the sense that this is one database and that’s another. It’s the same database, but I can -- any piece of data, I can decide it’s in here, it’s in there, or it’s in both. Or, indeed, we’ve got ten of these things, which you put it into -- table partitions go into storage groups, storage groups (inaudible) to storage managers. And that’s part of the design (inaudible) for that. So you’ve got this (inaudible), but to a person writing SQL up here, or writing a query using Tableaux or something else, or any of the tools that you’re already using, it’s just a SQL database. You don’t care that it’s actually going to Singapore, New York, and Munich to get the data for your query. You’ll care if it’s actually [got to go to the disk?]. (laughter) Seems like a little longer if [it starts going to?] disks in those places, but it probably won’t, and it certainly won’t do the second time you run that [quick?]. But you don’t care, and you don’t care if... You don’t even care if they’re busily upgrading the version, and it’s no long... You know, you’re in Tokyo. You want to be getting it locally, but your local one of these has just gone down. Well, you don’t care. You’ll get it from somewhere else. You won’t know.
And that’s -- you know, when I talked about the cost of migration, that’s a huge thing. Transparency had just [been able to stick to their?] SQL tools. And like I said, you lose one of these things, you lose that, yeah. What will typically happen, if you’ve got that auto provisioning capability turned on, if you lose that, and the system is under load, then a new one will get provisioned anyway. A new transaction engine will get provisioned to pick up the load. If you lose -- a storage manager goes offline, when he comes back online, it will resync itself automatically with the cache, for everything that it is responsible for. So you’ve got this capability to do all of this (inaudible) processing, and this is the only bit of that -- this is the magic bit, which is what happens if you have -- how you do transactions in this distributed environment? And it’s not [two-phase?] commit, nor is it eventual consistency. It’s [asset?] consistency. It’s like there is no single lock manager, which could go down, or which could cause a block. You’ve effectively got distributed locking.
So in this case, say I’ve got record A, and record A is being managed by this storage manager, and this one alone. It happens to be in cache from these three places. Every time a record is in cache anywhere, somebody is in charge of it. Somebody is the chairman for that piece of data. In this case, it’s this one, the red ring around it here, and anybody else knows, if I want -- if I know here, what (inaudible) [the fact?], those two A’s, they both want to update A. They both know they’ve got it in cache, so they can update it immediately, but they know that they are not the chairman for this one. So they say, “Excuse me, please, Mr. Chairman. Can I have update access on this?” And whoever gets it first will get their update accepted. That one will be rejected. The TE over there will back out his transaction. It rolls back. And then that transaction goes through, and will propagate to its storage manager for assistance, and also to update every other cache instance of that, so that when that transaction retries through there, it will now have the latest version. It will get (inaudible).
Now, that’s really easy. There’s a million A’s and B’s and C’s and God-knows-what, and they’ve all got their own chairmen, and the chairmen are scattered around all of these, so it’s slightly more complex than that. I’m not going to -- you know, not going to try and explain that to you now. But you see how -- what the magic here is that you’re not saying that I’m updating here and I’ve got to wait for that one, not doing two phase commit, and you’re not doing a centralized lock manager. You’re doing a distributed lock manager. If you lose that TE, then the chairmanship will -- you know that you haven’t got the... As soon as you know you’ve lost it, chairmanship for everything that it owns will devolve to somebody else, so there’s another chairman for this -- for (inaudible). And maybe there is no other instance, in which case you will go straight back to the SM for it and create a new cache entry for that record. And then the first place -- typically, it’s the first place -- in this case, this person must have been the first one to have rubbed A out of -- in its current version. And, of course, this is -- all of the A’s -- there’s multi-value concurrency control, which is what Bob talked about before -- they’re all behind that.
If somebody wants the... You know, in the middle of that point, if this transaction that only wanted A for read, but this transaction had updated A and had got control, this would have read the previous version of A, because there would be a later version, but it wasn’t committed. So they would get [re-consistency?]. They’d see the previous version, until that was committed within their transaction. And if they wanted more data, then they -- “Sorry, pal, you can’t. There’s an uncommitted update already open on this one, got here before you. You’re going to have to wait.”
And that’s kind of me finished, a little early (inaudible). And my apologies, I should have said I was happy to be interrupted for questions on the way, but I am also equally aware that I rarely breathed during the presentation (laughter) (inaudible) [opportunities?] that prompt. [I’m aware of that?]. Is any -- firstly, questions now? I say technical, customer-related, commercial, whatever. One over here first. Sir?
(Q): It’s Black Friday. You’re [handing out orders?] in North America and Europe, and sequential order numbers. So who’s going to decide who gets the number 1,000,000 or 1,000,001? Do you actually have to wait for every single machine in a cache to --
(M1): No, not --
(Q): -- [to try?] the next sequential --
(M1): Typically, you’d do the database sequence to generate that, and the chairmanship of that would belong to one person, who’d be updating that, yeah. Although it’s -- those little round things, I kept calling them records. I was deliberately not calling them rows. They’re in -- if you go into NuoDB architecture, they’re actually called atoms. They’re the unit of update, and they can be anything. They can be indexes. They can be rows, data rows. They can be sequences, or whatever. [Avi?]? Or, sorry, [Christopher?].
(Q): (inaudible). So the first was, assuming you’ve got a shared -- let’s take (inaudible) shared account balance, which can be accessed on, if you’re a customer, multiple routes. In other words, he can act on that balance in many ways simultaneously, of course.
(M1): Through the different devices, you mean.
(Q): Through different devices, through different products, through -- I’ll take the [earlier?] example, (inaudible) example. He interacts here, consumes some of his balance. At the same time, he interacts here and attempts to consume that same balance. If we’ve got that slight (inaudible), which you discussed, they’re not going to risk, we are going to put them into deficit. If you read from the previous -- maybe I just missed...
(M1): Ah, right. No, hang on. When you do -- you’re going to do a read for update on the second one, aren’t you? Because you are going to update the bal-- it’s not --
(Q): Well, I’m going to read in order to permit a transaction, in other words.
(M1): Yes. Yes, because that transaction goes through, it will change that balance, won’t it, and update it. So that’s -- effectively, you are -- you’re requesting an update on it within two transactions, which are alive at the same time, right?
(M1): This is on my phone and...
(Q): Yeah, I’ve read both. One is [actioned?] -- in other words, the balance is (overlapping dialogue; inaudible) --
(M1): The other one’s now locked out until that commits.
(Q): It is. OK, fine. And (inaudible) second, it is also -- just the question you mentioned about scaling of [compute?], which made sense, but my example was where, for instance, you do have a massive spike in your customer base usage, which, again, is (inaudible) issue but it does happen from time to time, where actually what you need to do is actually you need your entire customer base very rapidly brought into the transaction space. How would that... Would that just then operate at that load, spin out more and more?
(M1): If you’ve got -- so, say, I don’t know, you’ve got, like, a million customers --
(Q): A big race, a big race --
(M1): Right, and so I --
(Q): So it’s two or three a year.
(Q): That’s the race (overlapping dialogue; inaudible) --
(M1): It’s Grand National day, and all of your punters are active, whereas on any given Saturday afternoon only 10% are active. On this Saturday, 80%. That means, yeah, you’re going to need more -- you haven’t got actually more database here. You already had them. You’re going to need more of those, because you’re going to need more of them in cache.
(Q): So I can drive that --
(Q): -- in anticipation of --
(M1): You -- yeah, well, there’s a couple things you can do there. One is you can pre-provision the transaction engines beyond the automatic response. Yeah. And the second thing you can do is warm them up. You can preload -- for example, in that particular case you might say, “I am going to run the script,” which will load every customer across -- you know, every -- before, then you’ve probably got the same script, connected once to each to load every fourth customer, so every customer is somewhere in cache at that point.
(Q): Super, thanks.
(M1): It probably wouldn’t fit into [four?] transaction engines, in your example, on Grand National day, but yeah. Question back there?
(Q): How would transaction size impact the system, if the transaction is replacing potentially 70% of your (overlapping dialogue; inaudible)?
(M1): You mean if you -- you mean, by transaction size, you mean the size of an actual record, or the number of individual...?
(Q): The number of [individual records?] within (overlapping dialogue; inaudible) --
(M1): Yeah, I think that -- the -- I don’t know the answer to that. Boris or [Sean?], do you have any (inaudible)? If you’ve got -- [you’re?] -- you’ve got either one transaction that’s got 100 items for updating there at the same time, and that means the opportunity for an update conflict is higher, yeah? Particularly of other people’s transactions in the same kind of (inaudible). Have we seen that happen anywhere?
(M2): Yeah. So what they -- hopefully everybody can hear me without a microphone. So, to step back, what Di said in the beginning is critical: we’re really a generalized relational database, so anything that works today works with us, right? When we think about SQL server, or Oracle, and so forth. At the same time, it’s very important to, so, like, scope down what is our target, all right? So our target, if you look at the relational databases in general, there are analytical use cases -- right, warehousing, UTL -- and then there is operational use cases, which is -- you know, make some transactions, high-frequency transactions (inaudible) each other, or sometimes you have a pattern where you’re loading things up, and then you read, right? So all those (inaudible), we actually offer much more flexibility. If you look at this kind of, you know, peer-to-peer network of processes (inaudible), you can dedicate certain engines to be responsible for a certain kind of processing. So out of the box, there is -- you don’t have to think about it. It’s just a relational database.
As you get a little bit smarter about how you’d like to separate your [loads?], you can actually add certain affinity between applications, and these transaction engines. Right, so essentially each transaction engine for an application is just like your relational server, right? But depending what kind of [load?] you’d like (inaudible), you may have a set of engines dedicated to [OLTP?] load. You may have a set of engines dedicated through something like reporting, right? And, as you imagine, in the days of operational databases, you always have some sort of, you know, transactional load, but then operational analytics on the side. And in today’s world, the interaction is actually quite contentious, right? Reporting loads or read-only loads are very CPU-intensive. [Wide load TP?] is generally, you know, much less CPU-intensive, but more (inaudible), right? Because (inaudible). So in this system, you can actually create pools of servers, which have special purpose. Therefore, it allows separation of concerns, separation of cache affinity, right, and it gives you much more flexibility. But to your point of, you know, is there kind of a limit in terms of what goes into a transaction? No, right?
And there are many [control mechanics?], one of which is MVCC, multi-version concurrency control, which actually allows you to keep multiple copies of the same data with different timestamps and visibility, and that greatly improves the concurrency of the entire distributed system.
(M3): Can I...? I just have a bit to that, as well. One of the things that we actually provide is different consistency levels, so depending on what the actual use case is, you may say, “Well, you must have a read-consistent transaction (inaudible).” You set that level of consistency, if you want, but there are other cases where you may think, “Actually, we only want to see the last committed value, so we don’t really care if it gets updated as you go through, so therefore the big update’s going through, and you’re going to allow your other transactions to see, you know, the internal updates as they actually occur.” So there are actually different transaction isolation levels that you can actually set within the database, as well, depending on the particular use cases you’ve got.
(M2): Actually, let me jump back, just before we go there. I had a question, which I think is pretty critical. Di talked about magic, right? And one of the [aspects of?] the system which is critical for understanding the scale-out and scaling is the fact that these new transaction engines can be added to the system in a matter of milliseconds, right? So you can spin out additional capacity, and those engines, as they come up, they become essentially usable in a matter of milliseconds. And that -- the real departure from the classical, so a scale-out architecture for a rack or (inaudible) cluster, right, where you’d have to think about and pre-provision that scale-out capability with, you know, high-fidelity hardware, with servers (inaudible) takes, you know, anywhere from hours to days to weeks to plan this kind of scale-out. This system is specifically designed to be able to scale out in a matter of milliseconds, as capacity is needed, and, if need to, it can scale back in, because if you lose engines, it’s OK. You just lose capacity, but the system continues to operate. So, just an important (overlapping dialogue; inaudible).
(M1): Yeah, that was the chart -- that’s exactly what our chart (inaudible), yeah. [Alan?], you have something real quick. I know there’s another question over on the left there.
(Q): Say, for example, you’ve grown your system and you’ve catered for Black Friday, and you’ve got 100 nodes. If your database suddenly is too big, do you then have to scale up all 100 nodes, or can you -- if your database is too big to fit on one node --
(M1): No. You do -- so if I go back through -- back to my architecture picture, as your data is growing, you can replicate everything after every node if you choose, but typically, (inaudible) distributed, to (inaudible) what Boris was saying, a data affinity, you will typically -- as you add these, they will only store the bit that’s most relevant to where they are. It might be data residency requirements. So, for example, it’s European data. It must be stored on a -- only on an EU server. So your Tokyo server and your New York server don’t have that data, so... Or it might be affinity that, well, actually, this data is accessible everywhere, and the system’s available, follows the sun, so it will be accessed from around the globe, but mostly -- 80% of the people are going to be interested in that -- going to be interested in its -- in this geography. Therefore, you would partition around that. Or you might just partition out and say, “Well, hey, I’ll just share the data out that way. There will be a lower transaction rate on any one server manager, and therefore I can sustain higher update transaction levels without, you know, without stressing anything.”
(M3): And the other thing is the 100 nodes we’re talking about, potentially, if you scale out is -- are actually the in-memory cache. It’s not the [persistent?] data. So if this [persistent?] data grows, it really doesn’t matter. You can carry on scaling out and get a new cache to add to the size of the overall cache that you actually have, but you don’t need to upgrade all of these servers. That’s just an in-memory cache, and we’ll just use the hottest data that’s currently being used.
(M1): Yeah, this is -- one of the critical things about this is its elasticity. When you get 10 times as many users connecting, you know, executing 10 times as many transactions, they haven’t -- you haven’t got suddenly 10 times as much data. You’ve got more data, because it -- but those are typically transactional, you know, increments. You’ve got 100,000 orders come in in an hour on Black Friday, but they’re all of them -- they’re only this big. You know, you -- we’ve all done our entity relationship models. We know how big that intersection of the order line item -- that’s a very small row in a table. We’ve still got the same number of products, the same number of customers. We’re just creating the -- typically the smaller [entities?]. So this data is growing, but it’s not -- it hasn’t scaled out ten or a hundredfold. That’s scaled out tenfold plus. So we just spin up those, and we saw that chart. They come up to speed in no time. When they first appear, their cache is empty, and they may be going (inaudible) the disk at first, once they warm up. But they may also be just saying, “Oh, well, you’ve already got it.” They may be pulling it from other memory caches. So they’re not necessarily slowed down when they...
(Q): Actually, my question was probably badly phrased.
(M1): Oh, I’m sorry.
(Q): You have answered the question I asked.
(M1): I’ve answered (inaudible) [you ask another one?] (overlapping dialogue; inaudible).
(Q): (overlapping dialogue; inaudible), yeah, and in a lot of model designs there’s going to be a split between what information needs to go into a (inaudible) SQL database, and you will do a hybrid solution, and then you’ll have the transactional elements. The transactional elements, typically everything does need (inaudible), and then you wouldn’t necessarily have only 20%. You’ll have -- your hot data is going to be everything. Anything that’s not hot, you’re going to have maybe a hybrid solution that takes that load off there. (overlapping dialogue; inaudible) scale up everything again. I’m looking at this maybe not from the enterprise scale that you guys are (inaudible). How do I come in, in a couple of commodity servers, and how do I take (inaudible) really small, and not have to worry about -- and just grow it linearly with the [promise?] that you promise?
(M1): Well, the interesting thing is Boris has already answered that when he talked about different kinds of servers. You know, you can part-- you can tell the brokers and the agents that if they’ve got this kind of tran-- this kind of process, either it’s a transactional or it’s maybe analytic, then you will use different TEs. That means they will have different subsets of the cache. What you can also do is partition this. This is -- you know, the live servers are running this month’s data, but we can have another server that’s got last month’s, and another one with the previous month’s, and another, and those get very low transaction rates, and they only ever get transaction rates from these TEs that are partitioned for that. That, I think, is maybe, in answer to the question you -- maybe you (inaudible). Another question?
(Q): Actually, I think you answered most of my question, which is we have a very large deployment where we have to have data that lives in Brazil, data that lives in Europe, and data that lives in the US, but we need it distributed to all of our agents all over the world. So I think what you said is I can actually have storage units that reside in Brazil, reside in Europe, but the SMs will then distribute that out cached.
(M1): It’s cached where it’s needed.
(Q): Wherever it’s needed.
(M1): I suspect -- yeah, typically the Brazilian data will be more accessed in Brazil, and...
(Q): Yeah, and China, as well, because we have a huge transactional relationship between China and Brazil, and actually China’s starting to do the same thing, so --
(M1): (inaudible), so one -- I mean, one -- sometimes it’s data residency. You have to have it there because whatever. Sometimes it’s affinity, which is let’s move the data, because we’ll -- we will lower the latency the first time it’s needed, and we’ll lower the latency any time. That is the Commit 1 that must have -- I must have that Commit answer from that one. So you want to do that as locally as you can so that --
(Q): Right. We have certain requirements, as well, where the data can’t be updated unless it’s done locally, so we could apply it where it’s a read-only cache. Do you give -- have you got that layer of...? So we can distribute the data --
(M1): Can we do that (inaudible)?
(Q): -- where people can see it, but they’re not allowed to (inaudible).
(M2): So, great question, and let me answer it maybe little bit roundabout way. There are separate issues of where the data is physically stored and where it’s used, right? And in our architecture, really the way to think about the whole architecture -- you know, this is an in-memory database. Everything that we do is in-memory, right? The fact that the data’s stored on a disk is really an afterthought, right? When you really would like to have data to be durably stored somewhere, you introduce this kind of storage manager, and that’s a location where the data may be stored.
And a very simple way to think about this is that if you have two storage managers, then you have replicas of the same database completely. So this is one instance of the database, and you have another instance of the same database running elsewhere, right? But these are two big replicas. A more complex way to think about this -- and sometimes my head really hurts to think about it that way because of the complexity involved -- but you can take this database, logical database, and split it into multiple partitions, and you’ll have multiple storage managers managing a single logical database, right? And now you can start moving in different physical locations. And the reality of the distributed of design, that data is not uniform, right? In the sense that some of it is much more active in the place where it’s originated, but at the same time you may have the need of actually global view of the data across multiple areas, right? While it’s stored locally. For a variety of reasons: governance, efficiency, so forth.
So, currently, we do not necessarily have a layer of control which would say while the data is stored in Switzerland, right, it only can be used by the users in Switzerland. This is the logic which at this point in time general application implements and manages. But we do have in the roadmap the next level of sophistication, where the SQL layer will enable these tables of rules, if you will, to be applied to data residency in a very much logical sense. The architecture allows it, but it’s sort of roadmap in terms of actually making it very transparent. The notion of this distributed, geographically distributed data management, right, where data is stored in multiple locations but can be used globally, is really the overarching architectural solution, because otherwise today you have to manage an application space (inaudible).
(M1): Could be in that particular instance --
(M3): Can I just add to that, as well? One of the other things to remember, as well, is it’s a standard relational database. Therefore, you can actually set up the roles such that a role for a particular country has read access to a particular piece of data. So you can actually set it up in exactly the same way as you would do with any other relational database --
(Q): That’s how we’re doing it at the moment, but we do (overlapping dialogue; inaudible).
(M3): Yeah, so (overlapping dialogue; inaudible) down the line that will actually help towards the data residency a lot more, but it’s a standard relational database. All the things you can do with roles and grants and (overlapping dialogue; inaudible) --
(Q): And that would apply to the cache [states?].
(M3): Absolutely. [You can apply this?] --
(Q): OK, yeah, so you can cache out the data, but the role will only allow them to do certain things with that cached data.
(M3): Based on the (inaudible), yes.
(Q): OK, OK, that answered my question.
(M1): And another question over here?
(Q): (inaudible) departments and managed databases [in?] Oracle (inaudible), [our DBAs?] are a quirky bunch of people who -- and (inaudible) new [emerging?] complexity [in the end?]. What are you doing to see the community of DBAs who have any idea at all how to manage this (inaudible)?
(M3): (inaudible), I can take that.
(M1): Yeah, Sean, please do.
(M3): Great question. So I was -- in fact, we started our Windsor office in January, and one of the first tricks is I took Martin and the team over to Dassault, and I said, “Let’s go in and, you know, sit down with customers and have them describe it.” And so we met with the lead DBA for the sales systems who’s, you know, managing DB-2 and Oracle and everything else. And the way he described it was in NuoDB he needs someone to help part-time, once a week, to basically just look at the screen to make sure everything’s green. With Oracle, he needed half a dozen people to do exactly the same thing. So the way to think about this is the DBA’s life evolves from constantly managing moving things, dealing with complexity, to [wander around?] thinking about how do I think about policies for auto administration so that... Think of it as a DBA is, A, doesn’t have to do a lot of mundane tasks, but, B, can start thinking about, you know, an automated data center. So that’s the vision we have looking forward is how do we set up policies so this is auto-managed, and your performance and other profiles are there. The standard set of relational tools still work, so we integrate with all the -- you know, all the standard set of tools. So their environment doesn’t change, per se, it just gets a lot easier.
(M1): Yes, sir?
(Q): So sort of [answer that?] saying I need less of them, and (inaudible), but there’s still a whole lot skills that they don’t currently know --
(M1): There is.
(Q): -- and I’m wondering how they’re going to --
(M1): OK, well, what I would say to that is my background the last several years has been in the London NoSQL tech startup environment, and the answer to that question, or the equivalent question, for all of those is meetups, is a community that engages with the DBAs. And, as Bob said, we’ve only just started up in Europe. So what I would say to that question is: watch this space. Because it’s -- yes, there is less to do, but they need to understand what it is they have to do, and they have to believe that there is less of it to do. And I think the only way you can address that is community.
(Q): Well, I think --
(M1): Customers [don’t?] (overlapping dialogue; inaudible).
(Q): -- yeah, the [only thing?] to understanding what type of person... So is it a similar skill, that you’re looking for a retrained Oracle DBA? Are you looking for...? What kind of person are you looking for? What kind of (overlapping dialogue; inaudible) --
(M1): Well, existing SQL DBAs who are saying that they’ve done Oracle, they’ve probably done MySQL, as well, and they might’ve done SQL Server, or whatever it is, or even, in their 20 years, dyed-in-the-wool Oracle DBA, they will know about [other things?]. We’re saying this is another flavor, and for this particular reason this is the right -- right for this particular application, and here is what you don’t need to do, and here’s the other thing you do need to do. Yeah, that would be -- it’s like either you have to engage in... I remember kind of getting people into relational from other relational technologies (inaudible) to remember where you had to dissuade people that were -- this newfangled relational stuff was the way (inaudible). And I’ve said, I’ve been through the NoSQL [thing?] as well in the last few years, and you have to engage with people. And we will do [Martin?], won’t we? (inaudible).
(Q): So I have a question about security specific to that, how we secure the data.
(M1): So, do you want to take that?
(M2): So data in flight is all done over secure channels, so communication between all the processes, in all the [entrants?], is secured, as well as communication (overlapping dialogue; inaudible) --
(Q): (inaudible) do you use (inaudible) people could eavesdrop on these sorts of (inaudible) transactions? How do you secure the data?
(M2): I’ll have to tell you about the standard (inaudible), but it’s a standard encryption technology, which actually has been improved for our use by the [base?] (inaudible).
(M1): Processes, sure.
(M1): [Chase processes?].
(M2): So it’s standard, but, you know, it would go through a lot of security [points?]. Data on the fly in transmission is also (inaudible).
(M1): Yeah. So our next one over here?
(Q): Hopefully quite an easy one. If someone wants to invest the next few hours and find out if it’s all for real, do you provide any kind of sandbox environments (inaudible) way you could [run the process?] without needing to set up six machines yourself and (overlapping dialogue; inaudible)?
(M1): You can do a download, but the download works -- is a one-node kind of download, so that’s the simple... That proves that it works and it’s a good SQL database, etc., yeah. What you’re looking for is more kind of, like, what we call an enterprise evaluation. The way we like to engage it is Sean to give you... Yeah, you can download the software. He’ll talk you through the -- you describe the scenarios you want to test and he’ll get it up. And if you want to (inaudible). (laughter) OK, good.
(M3): What we can do, as well -- obviously, you won’t have to originally go all in in physical hardware. We can set up (inaudible) environments (inaudible).
(Q): Right. I’m a very small startup. We’re interested in sleeping at night and not do massive scaling, but there are a lot of others like that on the London startups scene, and having to actually find the time and effort to get -- set up six machines around the globe, when maybe what you want is just a one-hour [play?] or something that’s already up there, and set up the MySQL instance I think will probably help a lot of people to [believe?] (overlapping dialogue; inaudible).
(M3): I know we’ve actually set up (inaudible) [even on?] the free instances that you can get from [ECT?], these [possibles?] that were very, very small [can do that?], so [we’ll have a look at?] --
(M1): Two questions in the back, but was there a question I missed down front somewhere?
(Q): Yeah, my question was with the [startup?] manager, how flexible are they to move data around? If you have the example of I have this one (overlapping dialogue; inaudible) --
(M1): If I -- OK, so I want to take this one out for [PM?], because I have a dodgy disk on that. I’m getting [bad?] -- I don’t trust it anymore. Spin up a new [SN?], map to the same data this one’s got. It will synchronize it. If necessary, run up a script that will cause a PM on all those. It will move -- it will replicate it there automatically, in flight. Take that offline. Question in the back, sir?
(Q): Well, yeah, it was just going back to that issue, you know, lots of startups here and lots of people who... Your webinars are really great, but seeing it is believing it, and even if we go, for example, to something that you set up, what would be your suggested setup for just how many computers, like... I don’t think we’d need six machines, probably three. You know, if we would set it up right now --
(M1): Can I just make...? If we were to (inaudible) -- if we were to offer to host?] --
(Q): Very simple, (overlapping dialogue; inaudible).
(M1): If we were to offer it to have Sean host a workshop half day where we’d do exactly that -- we’d prep it, yeah, and then, like, rather than Sean spending the rest of his life doing (inaudible) (laughter) -- if anybody will be interested -- there’s clearly two people already -- we can do that. If anybody wants to sign up for that one before you leave, please give me or Sean a card. We’ll sort that out.
(Q): (inaudible) community in London.
(M1): Yes, and then a community in London. Yes, it’s already on the plan.
(Q): (inaudible) the AWS space.
(Q): (inaudible) the AWS space? (inaudible).
(M1): No, the -- no.
(Q): It’s a very good example (inaudible) what’s been asked before, (inaudible) [half a day?] workshop.
(M1): OK. I did ask -- have we got fresh coffee outside?
(M1): One more question in the back. Sorry, yes.
(Q): So a fairly easy and possibly irrelevant question, but as far as this architecture goes, it seems fairly (inaudible), right? So you’ve got all the transaction engines talking to each other, blah, blah, blah. Is there any -- in your use cases, is there any significant investment network that’s required to maintain this architecture (inaudible) something where you have, like, replicated storage or something like that?
(M2): Let me grab this.
(M2): So if you’ll think about -- great question, right? I want to sort of, like, step back for a second and reinforce the fact that this architecture is specifically designed for commodity platform, which means the centerpiece of that is the network, right? So if you’ll think about a traditional design, it’s all storage centers, so storage is very important. Here, it’s totally opposite. It’s all in memory. What’s important is CPUs and network, and therefore you’re absolutely right. I mean, better network implies, you know, better ability to synchronize multiple engines. That said, a lot of what we do is actually around [the example of?] that. How do you optimize this traffic between multiple engines in such a way that it doesn’t happen at the time when you need to do -- so, like, have dates between the [object?]. So a lot of what we do in a background is actually synchronizing metadata lazily in the background, so at the time when you need to use a certain type of data, right, at that time you can use it, right? So there is sort of, like, a back [plate?] chatter, if you will, in the lazy mode, which happens in the background, but when you need to do transactional access to (inaudible), at that time data is mostly prepared. And that’s the trick: how do you asynchronous and synchronous messaging together to guarantee the least amount of chatter, but, at the same time, data being prepared for immediate use.
(Q): But [not to mention?] being a provider, so if you want to provision your network (inaudible) or see how much traffic you [might?] have... Let’s say if you had a storage-replicated solution, you know exactly how much traffic you have, because you’ve got exactly X amount of traffic on the database (overlapping dialogue; inaudible) --
(M2): That’s right, but it’s the nature of every distributed system that (inaudible) system [handily?] complies. So, to a certain degree, you cannot as easily to do capacity planning. You always need to experiment with the types of loads that you have and see how a system actually performs. But just to put you a little bit at ease, from our practice -- and we’re doing a lot of the systems -- we generally see that with a very small footprint our system is just as good as the best of the existing relational storage center solutions. So we’re generally, you know, with one TE and one SM, we’re a little bit slower than Oracle, right? With two transaction engines, we’re generally 50% faster than equivalent Oracle installations, right? And then, as you scale out, you can see where you don’t have to get very far to really get to the next level of scalability and performance, while using truly commoditized platform. And that’s the trick. I don’t know if it answers your question --
(Q): Yeah, it does.
(M2): -- but the answer is you don’t have to go to hundreds to see the, you know, the different scale of what the system can do for you.
(M3): And one of the other advantages, obviously, because we’re dealing with these (inaudible), the [amount that actually gets?] data at any given point in time is actually a small amount of data, so it’s not huge, great, big chunks of data.
(Q): So you’re not updating, like, a whole bunch of [rows?]. You’re just updating bits that need to be updated, [server?] (overlapping dialogue; inaudible) --
(M3): Yeah. We’re just updating (inaudible), in the absence of cells, we’re actually [at three grand?]. I mean, a lot of the principles behind this are actually very simple. I mean, (inaudible) complex parts to make sure the transactions are in the correct place, and (inaudible) work together, but a lot of the background stuff is actually very simple, and this is part of the beauty and the ease for which it’s able to do all the things it can do. Actually, moving things around is actually pretty quick, and just talking to each other and saying, “Who’s got this? You know, who’s responding to (inaudible)?” Which is another thing that it actually does, by seeing, you know -- as we were saying, with the data locality, the transaction (inaudible) each other, they speak to the storage manager and see who responds the quickest, and that’s the one they prefer to go to. There’s another one that’s actually on the other side of the Atlantic. It’s not going to use that unless it hasn’t got access to this one, (inaudible) actually have a preference to this. So it tries to minimize the amount of traffic and the amount of movement, and especially as we separate out the storage in the storage managers now, so that US data lives in the US, and that data’s not coming across there anyway, so all the time we’re cutting down the amount of traffic, as well, that’s actually going on.
(Q): Yeah, sorry, I have two questions. One is about the broker and the agents. You said spinning up TE (inaudible) seconds, so how does the broker or agent decide (inaudible) [apply for that?]? Is it just completely (inaudible)? And the other question was about the [path enders?]. And so if we were to put this up in the cloud, would you have to (inaudible) [infrastructure?] and [store it?] yourself, or (inaudible) [background?]?
(M2): So let me start with the second question first. We don’t provide anything special for the cloud vendors. It’s -- but we’re -- you know, what we’re regularly testing -- and actually, we have customers who, for instance, have the database running at Amazon Web Services. And for those of us who work with AWS, it’s probably the most difficult department from the perspective of the quality of service provided on the platform. It’s very (inaudible). So if we think about the network being sort of like a gating factor for this type of system, AWS is a wonderful testing and deployment environment, because if you can run data reliably, you can run anywhere, (laughter) and from that perspective [it’s great?], so...
But the whole beauty of -- not the whole beauty, but the mindset is this is just a database, right? Which can deploy AWS. It can deploy it in the private cloud, or you can deploy it on a [bear market?]. It really doesn’t matter where it is deployed. Over time, we see ourself integrating much closer into the stack of the cloud providers, whether it’s (inaudible) cloud providers or private cloud providers, where we are at the (inaudible) infrastructure. That is your (inaudible) data store, and that eventually couple [use?] cases where you can use it before that. Or you can pull existing applications -- so one of the -- so our focus for now is actually taking existing relational applications, which are very valuable, and moving them almost seamlessly to the cloud, and actually adding the benefit of the cloud -- scale out, right, ease of use, ease of management, geo-distribution -- to the existing applications. So that’s where we’re concentrating now. But in the future, we see ourself much more as a database as a service as a part of the [flavor?]. But each one of those environments is very specialized, and we’re really not investing ahead of the curve, but working with those cloud providers to integrate into their environments. So they are the ones managing database as a service, whether it’s (inaudible) or it’s IBM’s, you know, SoftLayer, or it’s Amazon, we’re working with all the -- Rackspace, for that matter -- we’re working with all those cloud providers, but it’s on them to integrate us as one of their resources available within their platforms. We design for it [very well?].
(Q): Yes, it’s prompted me to think that the network thing is probably the thing that is disturbing me the most from what I’ve heard this morning, and having had significant experience now with AWS, their levels of service scale -- the network, the networking (inaudible) is not [under?] control, but the hosting provider [or myself?], it’s this element that is out there, that is influenced by so many (inaudible), and has seen many valuations, has failed (inaudible), which has driven us more and more towards thinking, again, about moving back to [ten?], about a single location. So to simply overcome this significant problem (inaudible). So I guess this still worries me. I’m not -- I haven’t yet quite understood, I think, how you’re overcoming the fact that there is a network issue going across the (inaudible). It’s just the reality, and it’s a fact. There’s an [internet?] issue going --