I've been frustrated by a lack of any comprehensive comparison benchmarks between Infrastructure-as-a-Service (IaaS) providers or even within individual IaaS providers. Every benchmark I’ve found tested a small number of available “instance types” (to use Amazon Web Services’ nomenclature) over a small number of regions and didn’t provide an easy way to match that to pricing data or particular types of application performance. As an AWS customer, I know from my own experience that the c1.xlarge performs really well for many applications, but no one appears to have ever benchmarked that instance type against AWS competitors.
I'm not an expert in selecting and running benchmarks, but given the massive hole in what’s available out there, I hope to add to our collective understanding of IaaS performance by running my own IaaS benchmark project.
So, I will be benchmarking every instance type across every major public IaaS provider region on as consistent a setup as possible. I will post the results in a series of articles.
My primary focus in evaluating different instance types in different data centers across different providers is to provide a broad, comprehensive view of essentially every serious compute option available to IaaS customers. To do this, I will rely primarily on two well-respected benchmarking suites: UnixBench and SysBench’s MySQL tests.
I chose UnixBench because it has long been considered to be a good way of comparing the performance of different Linux machines, testing string handling, floating-point math, file transfers, and various system-level actions, such as creating processes and context switching. UnixBench runs a number of different tests, and rolls them all up to a single score, where higher is better. Additionally, UnixBench runs tests for both a single-threaded score as well as a multi-threaded (one thread per core) score, which gives good visibility into two different types of use.
I chose SysBench’s MySQL tests because I have found that relational database master performance is one of the bigger performance issues with the cloud. Because it’s usually painful to scale relational databases to multiple masters, it can be helpful to understand which instances might be the best choices for running a master database server.
[Startup ThousandEyes monitors the performance of cloud applications. Read how it can help IT identify and troubleshoot problems in "ThousandEyes Peers Into Cloud Performance."]
To conduct my testing, I built a ServerTemplate in RightScale (a cloud management platform) that sets up a 64-bit CentOS 6 system with UnixBench and SysBench, runs a series of benchmarks, sends me the results in a text file, and then self-terminates.
I can use this for every cloud that’s supported by RightScale, and I’ll replicate this as best I can for clouds not supported by RightScale. I selected CentOS 6 because, in my experience, it’s a widely-supported Linux OS across IaaS providers. UnixBench and SysBench were easy to install and run automatically on a headless system. (One note: in order to get UnixBench properly running in headless mode, I had to skip the 2D and 3D graphics testing, so all UnixBench scores are complete except for those).
I am going to focus on four benchmarks over this project: single-threaded UnixBench, multi-threaded UnixBench, 4-thread SysBench Read/Write MySQL, and 64-thread SysBench Read/Write MySQL. These seem like good approximations for simple raw machine power, parallel operations machine power, low-load database servers, and higher-load database servers.
After running applications on IaaS for more than five years now, these are the general metrics I want to know. At the end of the project, I will post all of the results and all relevant code I have used to generate them, so that anyone can re-run them or make modifications as needed.
When you read these benchmark results, you should see them as a starting point -- information that helps you narrow down your choice of instance types and providers for your own tests. I do not see these results as comprehensive enough -- or real-world enough -- to be definitive for anyone. In your own tests, you should stand up your particular workloads and test the real-world performance.