Optimizing Cost vs. Time: Batch Processing to the Cloud

Online banking. Ecommerce. Social media. Digital applications are generally transactional, as users must complete interactions with back-end services in order to achieve their goals. In today’s world of ever faster, real-time analytics, big data applications are often transactional as well.

In spite of this real-time, transactional nature of today’s digital world, there is still a place for high performance computing (HPC) batch jobs. After all, sometimes all you want to do is crunch some numbers – although today, it’s a boatload of numbers that need crunching.

Of course, enterprises have been using computers for such number crunching for over fifty years, since back in the day we called it data processing. And with every generation of computing since, we’ve upped the ante on such batch jobs, processing increasingly large quantities of information, while dedicating more and more resources along the way.

Today, the challenge most enterprises face when dealing with such HPC jobs is ballooning infrastructure cost. Typically such jobs require several high-end servers, and will typically monopolize those servers during processing – where such processing may take days.

As a result, companies must clear all other jobs off of available infrastructure in order to run these jobs, leading to battles among departments over limited HPC resources, as they struggle to take turns on the equipment.

Simply investing in more HPC power is rarely a cost-effective alternative. After all, today’s CIOs are concerned with how to achieve a more efficient IT infrastructure. It’s no wonder that many organizations are turning to the cloud to address their HPC batch job needs.

The Story of Observant Insurance

Take for example the story of Observant Insurance, a large US-based insurance company (the name has been changed, but the story is true). Observant’s director of Solution Delivery discussed the context for their HPC batch processing.

Enterprises like Observant conduct financial risk modeling using the Monte Carlo Method – a statistical sampling technique that approximates solutions to quantitative problems. Observant Insurance uses this method to simulate sources of uncertainty that affect the value of their various risk portfolios.

To complete their monthly and quarterly financial reporting requirements, Observant runs massive Monte Carlo simulations on a large scale. However, the simulation algorithms run on a decade-old Visual FoxPro financial analysis application that would take up to two years to rewrite.

They had been running these legacy batch jobs on premise, where monthly jobs would take up to three weeks to run, tying up expensive computing resources and limiting the ability to run other jobs.

Then the financial crisis of the last decade drove new regulations, raising the bar on Observant’s financial reporting. In order to meet government-mandated compliance deadlines, the insurance company had to move the data processing to the cloud. It was simply not feasible to run such jobs on-premise any more.

As a result, they were able to reduce the three weeks necessary to run some jobs down to as little as eight hours. The cost benefits of moving Observant’s HPC jobs to the cloud were dramatic. Each run costs a few thousand dollars, as opposed to millions of dollars of infrastructure investment that would have been required to support their increased financial reporting needs.

Fortunately, Observant’s Monte Carlo-based application was fully parallelizable. In other words, it was possible to run as many instances of the legacy application as necessary, where each node processes a subset of the larger data set.

Unexpected Business Benefits

In the old days, people had to wait their turn to run jobs because of the limited availability of necessary systems. Now in the cloud, multiple departments can run jobs at the same time. Furthermore, it’s possible to provision additional cloud resources as necessary if Observant requires additional jobs or faster completion.

Furthermore, because of the reduced time to run each job and the modest infrastructure expense, it became possible – and then advantageous – to run the jobs more frequently. Instead of simply running the minimum number of jobs necessary to satisfy government reporting requirements, Observant Insurance found additional business value from their cloud strategy as different individuals uncovered new value in the reports.

The Observant Insurance story underscores the role the public cloud plays, even for highly regulated, security-conscious industries like insurance. In many cases, there is simply no reasonable economic argument for obtaining IT infrastructure any other way. With the capacity on demand available in the cloud today, there’s little reason to build a data center unless an organization can keep it busy 24 x 7 – a difficult task for even the largest enterprises.

Jason Bloomberg is the leading industry analyst and expert on achieving digital transformation by architecting business agility in the enterprise. He writes for Forbes, Wired, DevX, and his biweekly newsletter, the Cortex. As president of Intellyx, he advises business executives on their digital transformation initiatives, trains architecture teams on Agile Architecture, and helps technology vendors and service providers communicate their agility stories. His latest book is The Agile Architecture Revolution (Wiley, 2013).

Add new comment