One of the challenges for cloud computing is to provide effective platforms to deal with workloads that are episodic in nature but very intense when active. Examples of such workloads include video editing and rendering, where the effort is paced by manual entries, and big data analytics, where time-limited jobs run at fixed intervals.
Today we are seeing the development of specialist clouds targeting these use cases, where graphics processing units (GPUs) improve performance dramatically. NVIDIA, as the leading GPU vendor, and Adobe, as the premier vendor for editing software, have created such clouds. Before delving into this trend, let's first look at what led to its development.
Editing jobs are typically run on very powerful workstations, while rendering and big data applications use clusters of expensive servers. Technology to process this work is evolving rapidly, obsoleting the server farm approach in favor of systems with very large memories and/or onboard GPU chips. The trend is amplified by the growth in GPU performance, which is roughly twice as fast as Moore's Law, along with improvements in programming tools (the CUDA compiler) and memory management between CPUs and GPUs.
The optimal system for big data today has one or more high-end GPU cards and a terabyte of DRAM. This is bolstered with very fast networking, likely multiple 10 GbE links or even a 40 GbE link, and top-end, high-performance SSD or flash cards as buffer storage.
Translating this on to the cloud isn't easy. The older method of using standard x64 servers in large clusters is too slow, and it uses a lot of server instances and storage and network bandwidth. The GPU/in-memory approach speeds job runs up as much as 100x while using less than a third of the server count.
Today's cloud orchestration software allows the definition of very large instances with big memory spaces and multiple cores, but the need for very fast local SSD/flash storage creates a major problem. This has been resolved by making these virtual machine instances stateful, in the sense that they have an SSD inside the server that is available while the instance is alive.
However, instance storage complicates security, because guaranteeing the full erasure of an SSD is difficult due to the large overprovisioned space used during the deletion process.
Overall, these large instances aren't sufficient to boost performance into the comfort zone. Editing is much more efficient if the editors have a reasonably fast response for each edit generated.
NVIDIA has created a public cloud based on GPUs. Its cloud approach makes a public offering available for evaluating the cloud concept, as well as either a full production environment or surge capacity for an in-house GPU farm. NVIDIA also will assist in the development of private GPU clouds, and this is helping the company win many of the new top 100 supercomputer designs. The NVIDIA GPU cloud is a template for GPU private cloud creators. It allows them to become comfortable with the approach and to characterize their own workloads.
Adobe's cloud service is now the preferred way to deliver video editing functions from Adobe. The substantial upfront costs of buying powerful computers and expensive software licenses are history. At the same time, Google, VMware, and NVIDIA have created a reference design for a tablet with streamlined transmission and image rendering in a browser. This tablet can front end Adobe cloud services and replace the expensive workstation. At around $399, the tablet is much cheaper than the workstation it replaces and has the advantage of being totally portable.
One other great advantage of the Adobe cloud approach is that collaboration becomes very easy. The data is available to New York and Los Angeles simultaneously, for instance. This is a game changer for the editing space. We are now seeing the GPU-cloud approach expand to other public and private clouds, such as AWS, so it clearly resonates with a sizable market niche.
In the near future, we may see Hybrid Memory Cube architectures with DRAM bandwidth in the terabyte/second range. This will greatly increase GPU system performance, over and above the gains from increases in the core count. Next-generation GPUs will continue to buck Moore's Law on performance, too. We can expect GPU-based clouds to continue expanding as the solution of choice for high-performance computing.