The orchestration options for Docker, the popular container technology that has taken the data center world by storm over the last couple of years, recently expanded. Specifically, Docker Inc. -- the company that manages the Docker open source project -- released beta versions Docker Machine and Docker Swarm.
In this blog post, I'll examine these new Docker orchestration tools, their role in a Docker-heavy environment, and how, if at all, they should affect your plans for introducing Docker into your data center.
Docker Machine
Docker Machine is intended to help automate and streamline the process of getting a target host ready to run Docker containers. Instead of requiring users who wish to use Docker to deploy the Docker daemon (now referred to as Docker Engine), Docker Machine aims to simplify the process of deploying instances of Docker Engine:
- If you want that instance to run locally -- say, on your laptop for testing purposes -- then Docker Machine can talk to VirtualBox or VMware Fusion to spin up a virtual machine (VM) and install Docker Engine onto that VM.
- Want that Docker Engine instance to run in your data center? Docker Machine can work with OpenStack and VMware vSphere to provision a new VM and install Docker Engine onto that VM.
- Let’s say you want that instance of Docker Engine to run in the cloud. In this case, Docker Machine can talk to public cloud services like Amazon EC2, Azure, Google Compute Engine, and VMware vCloud Air to provision a VM and install Docker Engine onto that VM.
Docker Machine interacts with all these various products (hosted virtualization, enterprise virtualization, and public cloud services) through the idea of providers, which allow vendors to extend Docker Machine to work with their own products.
Docker Swarm
Docker Swarm aims to provide native clustering functionality for Docker containers. What does this mean, exactly? The idea is that you could use the same Docker API that you use now with Docker Engine, but against an endpoint that has the ability to pool together multiple Docker Engine instances transparently. By using the standard Docker API, any tool that works with Docker will also -- in theory -- work with Docker Swarm.
As a cluster manager, Docker Swarm needs to be able to do more than just pool resources. It also needs to handle the scheduling and placement of those workloads. By default, Docker Swarm uses what’s known as a “bin packing” strategy for container placement in the cluster. Bin packing can help optimize overall utilization of the cluster, but it can also introduce some performance complications, since new workloads are likely to be placed where existing workloads already run (assuming resource availability). I encourage you to read through Docker Swarm’s scheduler strategy document for more details.
Naturally, Docker Swarm also supports the idea of constraints (containers must run on an instance of Docker Engine that supports a particular feature, like high-speed storage) and affinities (containers that must -- or must not -- be placed on the same Docker Engine instance).
Beyond these features, though, there are some issues you’ll want to take into account before using Docker Swarm:
- Swarm currently operates in single-master mode. If the master goes down, you can’t schedule workloads onto the cluster. This is a single point of failure that Docker plans to address in a future release.
- Swarm doesn't detect if a container scheduled through Swarm fails, and therefore won’t automatically restart a failed container. Until this functionality is implemented in an unspecified future release, you’ll need some sort of monitoring solution that can help address this need.
- Swarm doesn’t currently interoperate with other schedulers, like Mesos or Kubernetes. Although plans call for a scheduler API that will allow Swarm to integrate with these other schedulers, the extent or depth of this integration is unclear.
- You’re also going to need a discovery service (this helps the Docker Swarm nodes and the Swarm manager find each other). Docker offers its own hosted discovery service tied to the Docker Hub, but you can also use tools like Consul, ZooKeeper, or etcd. These tools introduce some additional complexity and require some additional expertise, but do eliminate any dependency on an external service in order to turn up your Docker Swarm cluster. Just be aware that there are currently some interoperability challenges as these various projects evolve (for example, etcd 0.4.6 doesn’t work with Docker Swarm).
To be fair, Swarm is still in beta, and Docker plans to address these shortcomings in future releases of Docker Swarm. Moreover, Docker Swarm isn’t the only container orchestration option around; there’s also the aforementioned Kubernetes and a planned container service for OpenStack.
So, does the introduction of Docker Machine and Docker Swarm mean you should rush out and start containerizing all your applications? Not exactly. While these are both useful tools (you should definitely track their progress), neither project addresses the “day 2” operational considerations that come with embracing a containerized architecture -- issues like uncovering and understanding application and service dependencies, or addressing the behavior of log collection or log indexing.
I'll discuss the architectural and operational considerations of containers in my session, Container Challenges: Know Before You Deploy at Interop Las Vegas this spring. See you there!
Register now for Interop, April 27 to May 1, and receive $200 off.