The IT industry is renowned for chasing the tail of the “next big thing” when it comes to technology. The current favorite du jour is containerization, with Docker leading the charge with its lightweight application deployment framework. As on the next step from server virtualization, containers are meant to be efficient, lightweight and stateless; you can fire up a new container instances on-demand using either virtual or physical technology.
As we move past anecdotal examples, in order to deploy containers in the enterprise the whole subject of data management has to be addressed -- in particular, the fact that data storage has a persistency requirement.
Docker volumes & data containers
For the most part, server virtualization continued the emphasis on the server as an object to be managed and cared for. In most cases, the virtual machine and data are inextricably linked -- think about how applications such as Oracle, Exchange or SharePoint are implemented. Containers offer a different approach that treats the container as a transient object, specifically with an assumption that scaling can be achieved by firing up more containers on the infrastructure.
To date (and remember it’s a fast moving industry), the Docker approach has been to provide two options for managing data: Docker volumes and data containers. Docker volumes allow a directory on the hosting server to be mounted within the container. The hosting server can be using local or shared storage as the back-end; the container doesn’t care. However, a problem arises if the container is moved or restarted on another physical/virtual server. The data still resides on the previous host -- the container moves but the data doesn’t.
The second solution is to have a specific data container that can be shared between other containers. Again, the restriction of the physical/virtual server means the data container and all dependent application containers have to be maintained together.
Keeping data with the container
Startups are looking to address this data mobility issue using techniques that move the data with the container. One technology is Flocker from ClusterHQ. At a high level, Flocker ensures that when an application container moves, the data container moves with it. The Flocker architecture uses a set of agents and daemons on each host to maintain information on applications and their data, ensuring both are kept together across the infrastructure.
As an initial first step, keeping data and applications together is a positive move, however there are some big issues with this approach. There isn’t the flexibility to run containers across hosts and have them access the same shared data.
And there are only very weak security controls in place, implemented through keeping application and container data together on the same physical/virtual host. As applications and data become separated, security credentials need to ensure that data is accessible by only the right application and, of course, that the data is encrypted in-flight and can’t be intercepted through man-in-the-middle or other exploits.
A dedicated data layer
Another approach to consider for container storage is implementing a dedicated data layer. In this model, data services (databases, file systems) are implemented on more persistent entities such as virtual machines and physical servers. Containers interact with these services directly across the network in a configuration analogous to client/server application architecture.
This halfway house means that containers provide application scalability, but there’s still a manual process in scaling the data layer. In addition, one other key feature is needed: A data name server to map the logical data services to their physical location on the network. The name server also provides the ability to implement a solution to validate access credentials.
We’ve seen the data server idea implemented already in OpenStack as part of the Cinder project, and some of the concepts used there could be adapted for container-based storage solutions. Ultimately, the answer to the container and storage issue needs to be driven by the Open Container Initiative. Hopefully, with members that include EMC, IBM, HP, Microsoft, Oracle, Red Hat and others, we should see some more robust solutions appearing in the near future.