NayaTech CTO David Yahalom and NuoDB VP of Products Ariff Kassam discuss the primary motivators behind the growing adoption of next-generation, cloud-centric database technologies and five steps to ensure database migration projects succeed.
Lorita Ba: Welcome to our webinar, “Five Steps for Migrating Relational Databases to Next Generation Architectures.” My name’s Lorita Ba. I’m from NuoDB, and I’ll be the moderator for today’s webinar. I’m joined by David Yahalom, CTO of Naya Tech, a leading provider of database and managed IT services. David, thanks for joining us. And Ariff Kassam, VP of products at NuoDB. Hi, Ariff.
Today, David will walk through the practical considerations involved in migrating your database to the cloud, and the five steps you can take to ensure such migrations are successful. From there, Ariff will talk through some of the characteristics you should expect of a next generation database, and why traditional databases just don’t cut it. He’ll also provide a quick overview of NuoDB’s elastic SQL database.
At the end, we’ll open it up for Q and A. But before we begin today’s presentation, I’d like to review a few logistics. Our webinar will last just under an hour. The webinar will be recorded and made available for replay, and the slides will be made available after the call as well. Attendees will be muted during the call; however, you may submit questions at any time during the presentation using the Questions box in the GoToWebinar control panel, probably on your right. We’ll answer as many questions as time allows at the end of the presentation.
So with that, I’m going to go ahead and get started. David, I’ll turn it over to you.
David Yahalom: Thank you very much, Lorita. The purpose of my presentation is to discuss about the five steps, when it comes to database migrations. We are, as Lorita mentioned, we are a consulting company. We do a lot of professional services; centered around databases and data platform technologies. We are located in San Jose, California in the heart of Silicon Valley. We do a lot of database migrations. We help our customers, and you’ll see why and how during my presentation, but we help many of our customers who are looking to migrate away from commercial database engines, monolithic database engines to either open source solutions or better alternatives which are cloud-centric database systems. I’m going to structure my presentation in a way we’ll discuss why we see, first of all, this paradigm shift that’s currently going on in the marke. And why customers are kind of like putting a lot of focus around database migration; specifically, database migrations to the cloud, share some of our experience about how to structure these database migration products. It really doesn’t matter what is your source database and target database for migrations, most of these projects are structured in a similar way. So we’re going to talk about that. I would, of course, love to answer any questions that you may have. Next slide, please.
OK, so let’s start by talking about the paradigm shift that’s currently going on in the market before going into the nitty gritty about how database migrations actually, actually work. Next slide, please.
So this is the slide I usually like to show whenever I speak about database migrations. Just to put things in perspective, this is not a tombstone. It’s meant to be a monolith. It’s meant to present the traditional architectures for database systems as they existed, I don’t know, in the past 20 years. When you look into the 800-pound gorillas off the database markets, companies such as Oracle, Microsoft, IBM -- they have amazing database products, which are extremely feature-full, and extremely capable. But because these are database engines, database platforms, they have a lot of heritage, a lot of baggage. They’ve been around for a long, long, long time. They are structured and modeled around monolithic architecture. What does that mean? Next slide, please.
A monolithic database engine basically means that the database itself contains a ton of different features and capabilities, built directly into the database itself. Now that made sense -- next slide please -- with a traditional database architecture, when you have relatively few applications, accessing relatively few databases. Going back, let’s say, a decade ago, the amount of databases you had running in your data center was relatively limited. You had relatively few applications before the advent of IoTs and digital transformation, and the cloud, and the need for global scale. You probably had your, I don’t know, like an ERP application, CRM application -- very few applications accessing very few databases. Now these databases, these commercial database engines, being monolithic, had a lot of functionality and features built directly into them. They were very expensive to license; they still are. It made sense that when the customer is purchasing the license for a commercial database engine -- and again, it could be Oracle, it could be SQL Server, it could be Db2 -- it doesn’t matter. They’re paying quite a bit of money for the database licenses themselves, so they want to get the most bang for their buck, so to speak. They kind of expected the database to have tons of built-in features and functionality. They wanted to use the databases as much as possible.
So these are the traditional architectures. Relatively few databases; these databases are very monolithic. They are feature-packed, but they are also expensive, and complex to manage. Next slide, please.
The problem is that this architecture of having very few, extremely high end, big, monolithic, 800 pound gorillas of databases in your data center, made sense when you have relatively few applications. That’s because these databases, these traditional databases, are also feature-full, but they also incur a lot of complexity and overhead when it comes to operations and management. You need quite a big staff of database experts, database administrators, network experts, storage experts, to make sure that they are properly maintained, tuned, and make sure that all of these features -- which we’ll see a few examples in a moment -- function as expected. Next slide, please.
If next generation data architectures, where you see the monolithic database architecture changing to a more maybe micro services-based architecture, or when you need to have different databases distributed globally, the requirements and expectations from the database engine itself are also shifting away from this monolithic approach. Because in a modern data architecture, when you have many, many different applications, each of their own requirements for scale, and elasticity, and flexibility, and security deploying one of those monolithic commercially -- those 800 pound gorilla databases so to speak, deploying each one of those databases for each one of your applications quickly becomes very complex and very expensive, because again, these are very expensive databases. They are limited in their ability to scale. Next slide, please.
Again, this becomes complex and expensive. Deploying an Oracle database or a SQL Server database or a Db2 database for each one of your applications, and supporting these databases, is going to be -- again, it’s doable, but it’s going to be a little bit complex and it’s going to incur a lot of operational overhead. Maybe there is a different approach. This is kind of like the state of the market right now, in terms of why our customers are looking into these migrations.
In the second part of our presentation, we’re going to go into the actual process of doing these database migrations which, by the way, when I mention migrations, I’m talking about heterogeneous migrations. It’s not about when you have like an Oracle database, you want to migrate to AWS, for example, because that’s like when you lift and shift your databases without switching database engines, that’s relatively easy. What I would like to focus on is heterogeneous database migration. So when you want to migrate from a commercial database -- again, could be SQL Server, Oracle, Db2 -- any other commercial database flavor, traditional database into a next general cloud-centric database engine. So switching the database engine itself -- that’s the tricky part. That’s what’s known as heterogeneous database migrations. That’s what I’m going to be focusing about in my presentation. Next slide, please.
OK, so database migrations. Next slide. It looks easy, right? Let’s say you have your existing database architecture running these monolithic database engines, and you’re looking to switch, you’re looking to migrate. You want to migrate to any ‑‑ it doesn’t matter if you want to migrate to MySQL, Postgres, NuoDB -- it doesn’t matter. But when you’re going to do those migrations and switch database engines, these heterogeneous projects, these heterogeneous migration projects, they only look easy on slides. But they are actually relatively -- when you approach them for the first time, they might look intimidating and they might look complex. But there is a way, based on our experience, to break down these database migration projects to a five-step process, that we kind of use each time we have a customer or a client who is looking to make the switch. I would like to share with you throughout this presentation how we would basically structure these database migration projects -- what are the steps, and kind of trying to maybe clear some of the fog about how to do those migrations. Next slide, please.
OK, when you look at the detailed steps that are required to do a heterogeneous database migration project, switching between database engines, there are quite a few steps. You have to, first of all, identify which databases you want to migrate, and you have to kind of map out what are the proprietary features that you are using in your source database engines, and what are the equivalent in the target database engine. You have to convert your application code to QA. You document the new architecture, set it up for higher availability. You have quite a few steps, and it’s a detailed list. But I think we can simplify the process. This is like the low-level, very granular steps to doing those migrations, but we can simplify that process into a five-step process, which I think would be easier to understand. We can group some of those steps into high-level categories. Let’s go over them one by one, and hopefully we can kind of lay out how a heterogeneous migration project, when you switch database engines, looks like. Next slide, please.
OK, so kind of like grouping those individual micro steps into, let’s call it, high-level migration steps, or database migration categories, after doing so many different heterogeneous database migrations. Probably from every imaginable source database engine to any imaginable target database engine, we kind of have devised the five-step approach which I think is ideal, and very, very optimized for these types of projects. And this is the same approach you can implement when you consider doing these database migration projects. So there are five steps involved in this process, from identifying your database migration targets to achieving feature parity, schema migration, converting your applications in your data access layer to actually migrating your production data from the source to target database, and going live to target system. So let’s go over each one of those steps in additional details and see what it means. Next slide, please.
We’ll start with identifying your migration targets. This might seem a bit silly, but it’s actually a very important step. It kicks off the database migration project. I would recommend, based on our experience, to spend time on this step, and make sure that you choose the databases that you want to migrate wisely. What do I mean by that? Modern organizations usually have more than one database -- it makes sense. You have tens of databases, hundreds, even thousands for larger corporations. The need to migrate from one database engine to another can come from both nonfunctional requirements and functional requirements. So when it comes to nonfunctional requirements, we have customers who are doing these migrations mainly so that they can reduce licensing costs. They are paying a lot of money for their monolithic commercial database engines which are, again, they’re fantastic products, but they are very expensive. So they’re paying a lot of money, and they want to see how they can reduce their licensing spend by embracing next generation databases which provide lower cost both in terms of the database software license itself, but also in terms of reduced operational costs. This could be a nonfunctional requirement, because you’re not looking to actually gain better skill ability or better high availability. You’re just looking to reduce operational costs, and that’s the trigger for the database migration, because again, database licenses are usually some of the worst offenders when it comes to the IT budget. They’re, again, amazingly -- Oracle, SQL Server, Db2 are amazing database engines, but they’re also very expensive. This could be one of the nonfunctional requirements that can kick off, or trigger, a database migration project.
Or, for some of our customers, another known functional requirement that we see is customers looking to reduce vendor lock-in. They feel uncomfortable, perhaps, having all of their databases, so most of the databases powered by a single database vendor, because then, that vendor can, you know, increase the prices for the licensing in the next year. So they want to be able to at least migrate some of the databases. Again, this is a nonfunctional requirement. It’s a business requirement, but again, from our experiences, some which we’ve seen with many of our customers. So they want to break, or reduce, vendor lock-in, and be able to put their investment in more than one basket. So be able to run parts of their data architecture with one vendor, and other parts with another existing vender, so that they know that they have, you know, spread their investment, and not have all of their databases exclusively powered by single database engines. So these are nonfunctional requirements.
And if you’re approaching a database migration project, coming from these nonfunctional requirements, it’s very important to choose wisely when you identify the databases you want to migrate. By that I mean is that if you don’t have a specific database that you want to switch to another platform because you need better scale, or better high availability, or move to the cloud, but you are looking from a business angle to reduce licensing costs or just vendor lock-in, then it’s very, very, very important to choose this for the first wave of migrations, to choose the least complex databases. And we’ll see what that means in a moment, because no two databases are identical, they are snowflakes. So migrating a specific SQL Server or Oracle database to another database platform could be for one database, could be an easy and relatively fast project, but for another database could be extremely complex. We’ll see what affects that in a moment. But it’s very important, if you’re coming to the database migration projects from a business perspective to pick and choose just the first database so you’re going to migrate wisely, so that you can get success fast and show value in your organization relatively quickly.
And of course, there are also functional requirements for doing those migrations. So you might have an existing system that doesn’t scale as easily or scale at all, and you need to change that. You need to be able to scale rapidly on the mend; scale-out, for example. Maybe migrate to the cloud so you can achieve, like, rapid scalability on the -- sorry, on demand. You also want -- might want to increase flexibility, or improve high availability, or adopt cloud platforms. So these are functional requirements, if you have an existing system, where the existing database engine doesn’t meet your requirements, then this could be a trigger for a heterogeneous database migration project by itself. So again, there is a mix of nonfunctional and functional requirements, so I would recommend spending some time in this first step to identify which databases you want to migrate. Understand if you’re coming from a business perspective, then you can pick and choose the databases you want to migrate based on their migration complexity. Or coming from a functional requirement, you have a specific system or systems that you want to migrate, and then you have to kind of deal with how complex or not complex these systems are. So this is the first step, identify your database migration targets. Next slide, please.
Once you’ve finished that, and you can identify the databases you want to migrate, you come to the next step, which is feature parity, which is also extremely important. Next slide, please.
So when you have an existing data architecture running those monolithic database engines, those 20-year-old databases, you know, it could be, again, Oracle, SQL Server, Db2 -- you know, other databases as well. These are just like the most popular ones. These databases are very complex, very feature-full, and have a lot of functionality built into them. For example, this is the slide showing some of the features you get built into an Oracle database. Again, Oracle is a great database engine, has a lot of different features built into each from Oracle Advanced Queuing to Transparent Data Encryption, to very fancy table functioning methods that you can use to the ability to store XML documents inside a database, JSON documents, result cache, Oracle schedules -- tons of features. When you decide -- once you’ve decided which databases you’re going to migrate, the next step is to take a look and see, OK, so what are the vendor proprietary features that I’m using, that my application is currently using, in the source database engine, and create basically a map of what you can do in your target database engine to achieve what I’d like to call “feature parity.” So let’s dive deeper into this part of the database migration program. So next slide, please.
So for example, map out -- so when you -- after you’ve identified the databases you want to migrate, map out the proprietary features, the vendor proprietary features that you’re using in your source database. This could be features that relate to high availability, for example. You might be using Oracle Real Application Clusters to get high availability or scalability, or you might be using SQL Server Always On. Or if this is a MySQL or Postgres database you want to migrate, sometimes we see customers migrate from open source databases, which are relatively cheap to operate, to next generation cloud-centric databases, because they need function -- because they basically need the specific functionality that are missing in Postgres or MySQL. So in that case, they might be using Postgres of MySQL while replication or building a replication for high availability and read scalability. Again, it’s important to map up these features.
Security -- OK, your source database, whichever engine is currently -- you’re currently using, might be because, you know, when it comes to database security, you have, like, those major pillars of database security, including authentication. Are you using a user name and password? Or maybe you was using strong authentication with (inaudible 00:21:31) or active directory, or a single sign-on user authentication, authorization, audit, encryption -- all of those security features you might have some of them which you have used, or are currently using in your source database. So when you do this migration, you need to be able to identify them and see, OK, so what can I do in my target database to make sure I have the same level of security, or even better security? I’m going to say a few examples in a moment.
It’s also when it comes to DR capabilities, they might be using DataGuard, or any type of log shipping replication in your source database engine. So it’s important to map these features as well. And of course, monitoring. This is just -- again, this slide is just to show you that you might be using certain proprietary features in your source database. And when you’re doing your migrations and you kind of decide, OK, so I’m going to migrate these databases to whatever database target engine you have chosen, you need to be able to map up these features that you’re using, and which feature parity. And I think that one of the most important things here is that these monolithic database engines that I spoke about, Oracle, SQL Server or Db2, for example, because they have 20 years, some of them even more, of R&D efforts put into them, so they have all of those features built into -- and it might not be very easy when you decide to migrate from one of those source database engines to NuoDB or to MySQL or to Postgres. It might not be easy to reach feature parity, to have the exact same features that you’re currently using in your source database be supported by your target database. But there is a way around that, and this is something that I think is very important to kind of put forward, especially if you plan to do these migrations into a cloud platform. So if you plan to run NuoDB, MySQL, Postgres, in AWS or in Azure, because these cloud platforms -- I want to show you a couple of examples -- offer features and functionality. And the cloud platform itself that can help mitigate missing functionality from your target database. For example, if you plan to migrate, let’s say, Oracle to Postgres, in AWS as an example, then Postgres doesn’t have the same features that Oracle does. Quite a bit of features that Oracle has built in are missing from Postgres. But embracing some of the functionality that Amazon offers, similar if it was like migrating to MySQL in Azure, so you can leverage some of the capabilities of the cloud platform to achieve each parity and mitigate functionality that might be missing in the target database engine. So let’s see a couple of examples. Next slide, please.
Before that, there is one thing I want to talk about, is that before we talk about the features that the cloud platforms can provide in terms of missing functionality, I also want to make sure that we talk about the fact that that for many of the, let’s say, hallmark features of your source database. So for example -- taking Oracle RAC as an example -- Oracle RAC, by the way, for those of you who are unfamiliar with the technology, is a great active, active database clustering technology allowing you to use a cluster of machines. All of them are open for reads and writes, accessing shared storage volume. RAC is mainly used for high availability, but also, to some extent, read scalability and write scalability. So we see a lot of our customers who are running Oracle RAC databases, or SQL Server Always On databases, who are kind of in a mindset that there is no way, no other technology on the market right now that can go head to head and give us the same high availability and scalability functionality that we get from our source, for example RAC Cluster. But this is not the case. So next slide, please.
Just showing NuoDB as an example, and Ariff will show the product in additional details in his presentation -- so NuoDB, for example, is an elastic database cluster, allowing you to get the same levels you get of high availability and scalability as you’re getting in Oracle RAC, but from a next generation cloud database. So NuoDB allows you to scale out your cluster. It’s made out of a set of transaction engines, which are processes running on your database cluster that handles the incoming SQL connections from your applications, and actually executes the SQL queries and cache results in memory. And a set of storage managers which allows you to especially make sure that the database is persistent in this. So Ariff will go into this in additional detail in his slides, but I just want to put it out there that we are all very lucky that right now, in the blooming age of next generation data with platforms that we have, they our next-gen databases, cloud-centric databases that allow you to get similar or even better functionality of what you currently have in your source database engines. Even when it comes to the hallmark features -- SQL Server Always On, Oracle RAC -- these features that are kind of considered, again, the hallmark features in your source database engine. So don’t be afraid, if you have, like, very complex source databases using advanced high availability or DR features. Don’t be afraid about considering migrating to next generation databases, because it’s actually practically doable. So this is just something I want to put out there, and giving your Oracle RAC and NuoDB as an example. So next slide.
Now, this is what I was referencing earlier. So another thing which I think is extremely important, is to remember that when you want to reach feature parity -- so you have, let’s say, a source Oracle database. And using Oracle Advanced Queuing and XML DB and result cache, and pluggable databases, and all of the great functionality built into the database, and you decide to migrate to a next generation database -- you know, Postgres, MySQL, NuoDB -- whichever database you choose. And you do this and you plan to deploy your target database in the cloud, in AWS or in Azure, I think it’s something that we kind of -- you have to think outside the box. When you understand the possibilities that are made possible by leveraging some of the features and functionality built into the cloud platforms themselves -- and I’m going to give you a couple of features to kind of show you what I mean in a moment -- but you can leverage those cloud features. You can leverage services, solutions that are available in AWS and Azure, as an example, to mitigate functionality that your target database engine might not have, but your source database engine, your Oracle, your SQL Server, your Db2 database did have. And this way, you can reach feature parity. Let’s put my money where my mouth is and see a couple of examples. So next slide, please.
For example, you might have -- let’s say you have an application currently that’s using an Oracle database. You want to migrate it to a next-gen database engine through a heterogeneous migration. And that application is using Oracle Advance Queuing feature, which is the ability to deploy and run queues, asynchronous queues, messaging queues, inside the database. It’s a great feature. We have a lot of customers whose applications rely on being able to use the database as a service bus. So you want to migrate, let’s say you want to migrate to Postgres or NuoDB, and these databases might not have the specific functionality inside the database itself. So what do you do? Are you kind of stuck? Is there no way to migrate from your source database engine if you’re using Oracle Advanced Queuing? So this is not the case. Again, breaking the monolithic database that has all of those features built in, and you know you basically utilize all of the functionality of your source database, migrating that to a next generation database engine that runs in Azure or AWS, in this example, you can use some of the cloud services that are offered by AWS.
Let’s say you’re using Oracle Scheduler. Again, I feel comfortable showing you Oracle examples, just because I’m an Oracle DBA originally. For me, it’s more comfortable to discuss Oracle functionality and equivalent functionality in cloud platforms, but the same thing is true for SQL Server, Db2, Sybase, other databases as well. So you might have an application right now that’s using Oracle Scheduler to do some data transformations, so maybe you have PL/SQL logic in your database that does a lot of transformations of data. You have incoming data to your database, and then leveraging Oracle Scheduler jobs and job chains, that basically to act like a data pipeline within the database itself, running business logic. So all of that might exist in your Oracle database. So you want to do a migration to another database engine, heterogeneous migration to a database engine, and say you chose Azure, Microsoft Azure as your target platform. So that database, again, could be any database that’s not, say, Oracle, might not have the specific functionality that you were using in Oracle. It might not have a robust fully configurable scheduler. Maybe the (inaudible 00:32:12) language built into the target database engine might have some gaps. It might not have the exact functionality that PL/SQL does, because again, PL/SQL’s been out there for, I don’t know, what, 20 years now? So it’s a very, very, very robust and extensive language. What do you do?
So, for example, you can leverage Azure Data Factory. Azure Data Factory, it’s basically a way for you -- it’s a platform, it’s a service solution offered by Microsoft, allowing you to create data pipelines in Azure and process your data, so read your data from a source database, process the data, manipulate the data, load it to back to the database or to another database. So if you’re using, like, a scheduler in your Oracle database with PL/SQL logic to create those complex data transformations, again, you could use, if your target database engine has a built-in scheduler and a procedural language, you can of course migrate and use that. But thinking outside the box, thinking about micro services, about breaking monolithic databases into more agile, flexible, next generation architectures, why not use the functionality offered by the cloud platform? I can say with firsthand experience, using Azure Data Factory, that it offers -- it allows you to do some things that Oracle Scheduler does not. You might even gain additional functionality, and actually gain capabilities by embracing these cloud features. These are, again, the previous slide where I showed the Amazon SQS (inaudible 00:33:44) on this slide where I’m giving Data Factory as an example, these are just to show you, kind of to open your minds into the possibility of when you migrate a source database to a target database engine that is different, that might not have the exact same functionality, but to deploy that in one of the cloud platforms. And by the way, the same holds true if you’re going to use Google Cloud. But I’m more intimately familiar with AWS and Azure, so I’m referencing these solutions. So you can leverage equal system components that exist in the target cloud platform to achieve functionality that might be missing from the target database engine when you compare it to Oracle or SQL Server, just an example. So I’m putting it out there. It’s a very important, I would say, architecture paradigm that we implement for our customers who are doing these migrations. So next slide, please.
So this is feature parity. Once you have identified your database for migrations and created, like, this feature-to-feature map between your source and target database engine, or if you deploy to the cloud, like leveraging the cloud because of system capabilities as well, you reach the third step of the migration, which is schema migration. Let’s go to the next slide.
So schema migration, it’s quite straightforward, actually. It basically means if you take the schema and all of the schema objects, tables, constraints, data types, indexes, views, procedures, functions -- everything that you had in your source database, and migrate it to your target database dialect. This is done usually by a combination of automated tools, as well as some manual touchup. This is never done as a fully automated process. It doesn’t matter what target database engine you choose to do. Again, for heterogeneous migrations, if you migrate from Oracle to SQL Server, from SQL Server to MySQL, from Postgres to NuoDB -- it doesn’t matter. But as long as when you switch your database engine, schema conversion is always a combination of an automated process, as well as a manual DBA-level intervention. That’s because most database software vendors provide a tool that allows you to migrate schemas from other database engines to their own dialect. So NuoDB, for example, has a migrator tool that does that. Microsoft has SSMA, which is their migration assistant. And Amazon provides their database migration service and schema conversion tools. So these tools exist. They are never able to fully, because there is never like a one-to-one match between whatever you’re using in your source database, schema, in terms of objects, and your target database. So there’s always some manual touchup that’s required, so keep that in mind. We have seen customers where 50, 60, 70 percent of their schema objects are able to be automatically converted to the target schema, but there is still manual touchups always. Always. We never had -- did a heterogeneous migration where the automated tool, whichever tool they used, depending on the vendor of the database they migrated to was able to do the entire process automatically. So this is something that has to be done. It’s a monotonous process, so it’s not fun, but it still has to be done. You have to convert your objects, your views, your indexes, your functions, procedures, business logics and all of that. This is the schema conversation piece of the puzzle. Next slide, please.
So now three down, two to go. Let’s talk about data access layer conversion. Next slide, please. So actually, you have identified the database migration targets. Make sure you have feature parity, so you mapped up the source features you are using, and what would be the best target equivalence, and converted your schema. The next step, which I think is the most complex step in the migration, not that it cannot be done. Just, you know, you have to put, let’s say, the required effort into it, is to take your applications. This could be Java applications, dot net applications, Python application, in-house developed applications, third-party applications -- whatever, and convert their data access layer, modify their data access layer, so that it can work with your target database platform.
This can be relatively straightforward. We have some applications where all we have to do is switch a JDBC driver from that same Oracle driver to a Postgres driver, for example. But some applications will require a little bit more manual, like having developers change and modify parts of the application code. This has to be done. There is no way around it. But it still, for many customers, this process is still very much worthwhile because the benefits of moving to a more agile, flexible database engine that gives you higher capabilities in terms of scale, and high availability and security, and maybe even the reduction in licensing cost -- that is worthwhile. It kind of outweighs the efforts of putting in some resources into a team that can modify the data access layer in your application. You have to do this step, it’s important.
You have to also keep in mind that this step involves QA and performance testing. So every database engine has its own unique optimizer, and this optimizer is the component responsible for parsing, executing, running your SQL queries. The SQL dialect between database engines is usually compatible. There is an ANSI started for SQL, but there is, you know, each vendor kind of has their own extensions to the SQL dialect. So if you’re migrating from Oracle SQL Server, you might have SQL statements that will fail to execute as is in the target database engine you are choosing. So for example, using certain OLAP functions which are not ANSI-compliant. So there is a QA component here, but also performance testing. So it’s not -- even if all of your SQLs execute correctly, and again it’s not that -- we see applications even if there’s something break when you switch database engines, the modifications and requirements to make them work again are, again, for many applications, they’re not massive. It’s not like years and years of efforts that have to be put into modifying the application. But it still has to be done.
But in addition to QA, just also want to mention that you have to do performance testing, again, because the database optimizer changes. So some queries might run on both your source and target database engine, but the execution plan might change, because each database has its own optimizer. So you might need to tune and tweak your SQL queries for getting ideal performance and optimal performance in your target database engine. And there are a lot of tools that allows you to automate this process. We don’t have time to go into them, but tools such as Selenium, LoadRunner, JMeter, and so on can help you automate the process of testing your applications and making sure that your target database engine runs your SQL statement at least as good as your source database engine. So this is the data access layer conversion piece of the puzzle. Next slide, please.
So once you have done all of that, OK, you reach feature parity, converted your schema, made sure you made the required modifications to your applications so they are compatible with the target engine. The final step is doing the actual data migration itself. Next slide, please.
So here, it basically means that once you have your target database ready, deployed, schema converted, applications modified and all of that, you still have to move the data that exists in your source database to the target database. There are tools out there that allows you to do that with minimal downtime. You basically bring your own tools. There are a lot of really good products out there that can facilitate data application across heterogeneous database engines, such as Oracle Golden Gate, Attunity Replicate, SharePlex -- these are tools that can do CDC copy. So they can basically mine changes for redo vectors from your source database and applied them to the target, but you can also use probably, I think, any ETL tool that you are currently using to copy the data across those different databases. So there isn’t a best practice here. Each organization, depending on their own internal competency in terms of which ETL or data application tools they currently have in place. If they don’t have anything in place, there are a lot of open source ETL tools out there that you can use. But again, this is the final piece of the migration process. Copy the data either in real time with the CDC tool, change the capture tool to minimize downtime, or if you can tolerate some downtime when switching between your source and target databases, maybe over a weekend or a certain period of the day when there’s less activity, you can just use any ETL tool that you want and load the data in batches from your source database to the target database. And the approach we recommend here is to essentially bring your own tool. So next slide.
I think -- yeah, so we are done now. These are the five steps of database migrations. Again, no matter what is your source or target database, you usually have to go through these steps. Hopefully this is going to help break down database migrations from a complex, (inaudible 00:43:13), how do we do it? How do we tackle it? What do we need to do to make it successful into a more structured approach, in a structured plan? Again, we have a lot of customers who have done countless heterogeneous migrations with success. If there is a requirement for that, functional or nonfunctional, I just want to let you know that it’s definitely doable. There is an old saying, “You need to into it, but it’s definitely doable.
So I think that’s it, yeah, from my end, at least.
Ariff Kassam: Excellent, thanks, David. This is Ariff from NuoDB. David, I appreciate your time in explaining some of the best practices and experiences that you’ve seen in the countless migrations that you’ve done. What I want to quickly go through is, is sort of take a step back again towards the beginning of David’s talk and talk through why would customers -- why would you want to look at migrating all that work? It is work, right? Why would you take all that work on to migrate to some of the modern databases?
So one of the benefits that David talked about was scalability, right? Maybe your existing database just does not meet the scalability of your applications, or your --(break in audio 44:31--45:02) -- migration. They allow you to move from an existing server to a larger server, without any downtime at all. All on the back end, right? So a lot of services around Amazon’s RDS and even Azure, the ability to sort of scale up that database, and migrating from a small server to a bigger server is pretty transparent to the application. Again, that’s a benefit to how some of these cloud databases have been architected and have services behind the scenes that you don’t see that you don’t have to worry about to allow you to move servers, and get that instant scalability, if you will. Other modern databases don’t take that approach. They have a different approach. And what the other approach is, is a scale-out approach, right? You may be familiar with scale-out databases, like NoSQL databases, like Mongo or Cassandra. But newer relational databases and newer SQL databases also have the ability to do scale-out, and be able to scale out your performance by simply adding nodes to the environment. Again, it’s seamless from the application perspective. There’s no outage, there’s no downtime. You simply add a node or add a server to the existing cluster, and that allows the application to take advantage of the additional resources available to that database environment.
Another benefit that David talked about earlier on in the talk was the increased flexibility, right? We are all in an age of application and architectures where we’ve got to produce more functionality, change things in a more agile or quicker way, right? We are in an environment where things change daily. Your requirements change daily, your needs change daily. And be able to anticipate and/or be able to react to those changes quickly is part of what a next generation or a modern database architecture can provide.
So shown here is a standard web stack that I’m sure everybody on the call has a copy of somewhere, right? It’s a browser talking to a load balance or with an app server where your pieces of your app are in that app server. You scale out that app server to get scalability, all talking to a single monolithic, single database.
As we move to micro services, this idea of flexibility and independence and taking pieces of the application stack and breaking it up into individual services that are independent and can be deployed continuously and independently from everything else, you get to this idea of a very complicated micro services architecture that’s distributed across servers, right? That flexibility does come with a cost of additional complexity and the ability to access data across different services and be able to aggregate that data across different services. Again, the benefits of modern day architectures, or database architectures, have the benefits to be able to work in a containerized or a micro services environment. It’s very -- you could take a monolithic application and put it into container, it’s not really the canonical definition of a container, it’s fairly heavy weight. But you could do that. But it’s very hard to continually deploy that and migrate that across different services. Smaller and more agile databases that have services that could be distributed across environments and for or across services is a much better way to sort of look at a micro services architecture for modern day applications.
Then high availability -- again, one of the key value propositions of migrating to a modern day database is better availability. Oracle, SQL Server, Db2 -- they all have good replication technologies. They’re phenomenal, always on. They provide very seamless ways to failover from one database to another database. But in today’s architecture and today’s environment, the idea of a failover event is going away, right? What you have in today’s environment is more active, active environments where the database -- there’s no failover event. All the database servers are running and are active at any point in time, right? So you get rid of this idea of buying servers for failover or for some events, and then having to wait for that failover to happen, right? We are in an environment where customers are asking for, I want all my database servers to be actively accepting reads and writes. Again, modern day database architectures typically have that built in into their architecture, right, the ability of having multiple instances or multiple services where the data and transactions are replicated between those servers, both for a local HA solution. So there’s no single point of failure within an availability zone, and both across availability zones, so that you can have DR protection as well. So if the loss of an availability zone does not impact the application and the application continues to run in an active, active environment. So these are all examples of benefits that modern day database architectures can provide.
Very quickly -- obviously, this webinar is hosted by NuoDB, and NuoDB is a database that provides a modern day architecture. We have sort of taken the single, monolithic database stack that David talked about earlier, and split that into two different services, so we have an in-memory transaction layer, which processes all transactions in memory, as well as a layer that does the historic persistence, and manages the transactional asset capabilities of data persistently. Each of these layers can be scaled independently of the other, so you could tune the environment to have better throughput for read-heavy workloads through the transaction engines, or write-heavy workloads through additional storage engines. But at the end of the day, this is a database that provides both SQL transaction asset capabilities in an architecture that has been built for the cloud, so a cloud-native database.
One last advantage that NuoDB provides that some other cloud databases may not provide is total flexibility on cloud. We are an independent software vendor. We work on AWS, we work in Azure, we work in Google Cloud platform. And we work across all those clouds. We also work across on-premises environments if you have a private cloud, or even just on physical servers. So vendors like us who are sort of not locked into a particular cloud stack allows you to span multiple clouds, in case you want to protect yourself against outages against one cloud. They deploy a hybrid or multi-cloud solution, and they migrate easily from an on-premises to an cloud environment.
So in summary, David did a great job explaining sort of why we want to go to next generation data architectures and databases, how to migrate that to five sort of simple, but still five steps that you have to work through. Then I tried to sort of explain some of the benefits that modern day databases apply and provide, as well as a brief introduction into NuoDB. If you want to hear more about NuoDB, we do have further webinars that you could sign up for, which we will go into more details about our architecture and capabilities.
With that, I think we want to open it up for questions.
Lorita Ba: Great, thanks so much, Ariff, and also to David. We do have a few more minutes. We have a few questions that have come in through the GoToWebinar control panel. Please, if you have a question, feel free to submit it there, and we’ll try to get to it, presuming that we have time. The first question, David, is for you. And the question is, is there a similar feature, like Microsoft Service Broker, available in MySQL or in Postgres?
David Yahalom: OK, that’s a great question. I think it ties back into the slides that I had about leveraging cloud services and solutions to mitigate some of the functionality. Service Broker or Oracle Advanced Queuing -- all of these technologies -- and by the way, I’m not a SQL Server DBA, but I am familiar with Service Broker at the higher level. But from my experience, Service Broker is pretty much similar to Oracle Advanced Queuing. It’s a service bus in the database, allowing you to queue, and then queue in the queue messages asynchronously from applications and clients. So when we have customers who are heavily leveraging these functionalities in SQL Server or Oracle, and they migrate to any other database in a major cloud platform such as AWS or Azure, because this functionality doesn’t exist in a lot of the target database engines that they choose, such as MySQL, for example, my advice is to -- sorry?
Lorita Ba: No.
David Yahalom: Oh, OK, some background noise. I’m saying, what we would recommend our customers to do is, basically leverage Amazon SQS, for example. So deploy your MySQL database or any target database engine you choose in AWS, and use Amazon Simple Queue Services and Lambda functions, which can help you easily in queue and to queue messages. So the functionality might not exist in your target database engine, but if your target database engine is running in AWS, for example, you gain the same functionality you got from Service Broker and Advanced Queuing, but using Amazon SQS. By the way, it’s a platform as a service solution. So you don’t have to babysit the cluster of nodes, like Java message queues and stuff like that, to handle the queuing and the queuing of messages. It’s all handled by Amazon. Just set up a queue, give it a name, and start sending and receiving messages from it.
Lorita Ba: Great, thank you, David. All right, this next question I think is more for you. How do you synchronize multiple database servers in modern databases?
Ariff Kassam: That’s a great question. Typically, the synchronization is part of a replication capability built into modern databases. Each database does it differently. There is no one way to do it, each database has a different implementation. For example, NuoDB -- we synchronize data between our instances through a replication queue that we have internally. And you can tune the application to say whether you want synchronous replication or asynchronous replication for performance or durability, depending on how far away the nodes are for your database.
Lorita Ba: Right. David, a question for you. Are there any tools to sanitize the source database prior to migrating it?
David Yahalom: Sanitize -- do we have any additional details about what “sanitize” means? Is it in terms of removing duplicate data, or sensitive data?
Lorita Ba: It doesn’t clarify, but I would just answer it, make some assumptions, yeah.
David Yahalom: Yeah, OK. So I would say that the first thing that comes to my mind when someone says, “sanitize your source database” would be to remove duplicate data, I guess, because you don’t want to remove sensitive data. Because the assumption is that you migrate and you want all of your data, all of your actual useful business data in your target database. So there are a lot of tools that exist on the market that allows you to de-duplicate data in your source database. But I would say that this is a different discussion. The tools that I’ve mentioned, which allow you to replicate your data from the source to target across heterogeneous environments, they are meant to take the data as is, and basically make sure that your target database is a copy of your source database -- which is, by the way, very important for the migration. Some of these tools can also help you with the data validation. So make sure that all of the data from the source has been replicated successfully to the target. But I’m not -- I don’t think that these tools do actual data sanitation. Customers can either write their own procedures, and there are also a different set of tools that help you sanitize your data, if needed.
Lorita Ba: Great, thanks so much, David. Unfortunately, we’ve hit the top of the hour. For those of you that have asked questions that we haven’t been able to get back to, I will have someone follow up with you following the webinar. For all of you, I wanted to extend a special thanks to David and Ariff for leading our discussion today. And thanks to you, our audience, for attending our session. We hope that you found it interesting and useful, and hopefully you got some tips for when you migrate your own database to modern architectures. Again, take a look at the upcoming webinar next week on an Introduction to NuoDB, if you’re interested in learning more. And in the meantime, thanks for attending, and this concludes today’s webinar. Have a great day!