Sharding Not the Only Way to Scale a Database
When it comes to addressing data requirements, scale is often a primary concern. If the database cannot scale to the required level, it doesn’t matter how advanced the algorithms are or how robust the underlying infrastructure.
With both virtualization and the cloud now in play, however, scalability is invariably tied to distributed architectures and is usually addressed with a variety of sharding and/or partitioning schemes. While these approaches have so far proven effective at maintaining speed and agility of scale-out workloads, there is still a danger that once things get really large they will start to falter.
Sharding is essentially breaking up the database (a shard) so it can be placed on a discrete resource set, either through standard virtualization or, increasingly, containers. This makes it easy to ramp up the database in a linear fashion as the workload rises and falls, and it lends itself well to the cloud where shards can be placed on abstract, dynamic architectures running atop commodity hardware. It also provides benefits to management and automation, as it is easier to oversee multiple small databases than a single gargantuan one.
Shards are usually partitioned horizontally, with tables spread out across multiple servers so that the total number of rows per table is reduced. This is particularly useful for search functions because it also lessens the size of the index, and again, with more smaller machines servicing the database you not only gain access to tremendous computing power but you have the ability to handle multiple smaller workloads in parallel – a key advantage when it comes to large transactional applications.
The problem, as I mentioned, is when the database becomes distributed across so many points that management stacks and compute resources can no longer cope. For example, distributed architectures depend a lot on network connectivity, so as traffic increases the ability of sharded tables to interact with one another diminishes, particularly once we start dealing with long-haul connectivity. As well, applications that are not coded specifically for distributed architectures will need to rely on sophisticated automation software to keep everything in perspective.
Sharding, therefore runs the risk of losing critical data when related fields are split among multiple shards. At best, this can slow down the query process, but at worst it can produce inaccurate results. Proper management and perhaps some retraining in the correct commands that execute across multiple shards will alleviate this problem, but again, this becomes more problematic as related tables expand from hundreds of thousands of rows to million, or even billions.
Of course, sharding, partitioning and even clustering are a symptom of the fact that traditional databases do not accommodate on-demand scaling very well, at least, not without massive hardware expansion. But by virtualizing the database itself, however, these limitations can be overcome without a lot of overhead. In NuoDB’s case, the three-tiered architecture decouples the management, transaction and storage layers so each tier can scale a single, logical database on-demand across multiple hosts, while at the same time improving performance for both high-volume and highly concurrent data flows. In this way, it isn’t unheard of to handle millions of transactions per second on less than $100,000 in hardware.
In this light, deploying standard database technology on distributed architectures is like trying to put the proverbial square peg in the round hole. It’s possible, but it takes a bit of work and the results are less than stellar. Deploying a multi-tiered, abstract database, however, shaves the corners off the square peg: not only does it fit better in the overall data architecture, but it enables much more efficient and effective scalability.
Arthur Cole has been covering the high-tech media and computing industries for more than 20 years, having served as editor of TV Technology, Video Technology News, Internet News and Multimedia Weekly. His contributions have appeared in Communications Today and Enterprise Networking Planet. Follow Art on Twitter @acole602.