It’s possible that containers and container management tools like Docker will be the single most important thing to happen to the data center since the mainstream adoption of hardware virtualization in the 90s. In the past 12 months, the technology has matured beyond powering large-scale startups like Twitter and Airbnb and found its way into the data centers of major banks, retailers and even NASA. When I first heard about Docker a couple years ago, I started off as a skeptic. I blew it off as skillful marketing hype around an old concept of Linux containers. But after incorporating it successfully into several projects at Spantree I am now a convert. It has saved my team an enormous amount of time, money and headaches and has become the underpinning of our technical stack.
If you’re anything like me, you’re often time crunched and may not have a chance to check out every shiny new toy that blows up on Github overnight. So this article is an attempt to quickly impart 8 nuggets of wisdom that will help you understand what Docker is and why it’s useful.
Docker is a container management tool.
Docker is an engine designed to help you build, ship and execute applications stacks and services as lightweight, portable and isolated containers. The Docker engine sits directly on top of the host operating system. Its containers share the kernel and hardware of the host machine with roughly the same overhead as processes launched directly on the host machine.
But Docker itself isn’t a container system; it merely piggybacks off the existing container facilities baked into the OS, such as LXC on Linux. These container facilities have been baked into operating systems for many years, but Docker provides a much friendlier image management and deployment system for working with these features.
Docker is not a hardware virtualization engine.
When Docker was first released, many people compared it to virtualization hypervisors like VMware, KVM and Virtualbox. While Docker solves a lot of the same problems and shares many of the same advantages as hypervisors, Docker takes a very different approach. Virtual machines emulate hardware. In other words, when you launch a VM and run a program that hits disk, it’s generally talking to a “virtual” disk. When you run a CPU-intensive task, those CPU commands need to be translated to something the host CPU understands. All these abstractions come at a cost– two disk layers, two network layers, two processor schedulers, even two whole operating systems that need to be loaded into memory. These limitations typically mean you can only run a few virtual machines on a given piece of hardware before you start to see an unpleasant amount of overhead and churn. On the other hand, you can theoretically run hundreds of Docker containers on the same host machine without issue.
All that being said, containers aren’t a wholesale replacement for virtual machines. Virtual machines provide a tremendous amount of flexibility in areas where containers generally can’t. For example, if you want to run a Linux guest operating system on top of a Windows host, that’s where virtual machines shine.
Docker uses a layered file system.
As mentioned earlier, one of the key design goals for Docker is to provide image management on top of existing container technology. In Docker terms, an image is a static, immutable snapshot of a container’s file system. But Docker rather cleverly takes this snapshotting concept a step further by incorporating a copy-on-write filesystem into its design. I’ve found the best way to explain this is by example:
Let’s say you want to build a Docker image to run your Java web application. You may start with one of the official Docker base images that have Java 8 pre-installed. In your Dockerfile (a text file which tells Docker how to build your image) you’d specify that you’re extending the Java 8 image, which instructs Docker to pull down the pre-built snapshot associated with this image. Now, let’s say you execute a command that downloads, extracts and configures Apache Tomcat into /opt/tomcat. This command will not affect the state of original Java 8 image. Instead, it will start writing to a brand new filesystem layer. When a container boots up, it will merge these file systems together. It may load /usr/bin/java from one layer and /opt/tomcat/bin from another. In fact, every step in a Dockerfile produces a new filesystem layer, even if only one file is changed. If you’re familiar with the Git version control system, this is similar to a commit tree. But with Docker, it provides users with tremendous flexibility to compose application stacks iteratively.
At Spantree, we have a base image with Tomcat pre-installed and on each application release we merely copy the latest deployable asset into a new image, tagging the Docker image to match the release version as well. Since the only variation on these images is the very last layer, a 90MB WAR file in our case, each image is able to share the same ancestors on disk. This means we can keep our old images around and rollback on-demand with very little added cost. Furthermore, when we launch several instances of these applications side-by-side, they share the same read-only filesystems.
Docker can save you time.
Many years ago, I was working on a project for a major restaurant chain and on the first day I was handed a 12 page Word document describing how to get my development environment set up to develop against all the various applications. I had to install a local Oracle database, a specific version of the Java runtime, along with a number of other system and library dependencies and tooling. The whole setup process cost each member of my team approximately a day of productivity, which unfortunately translated to thousands of dollars in sunk costs for our client. Our client was used to this and considered this part of the cost of doing business when onboarding new team members, but as consultants we would have much rather spent that time building useful features that add value to our client’s business.
Had Docker existed at the time, we could have cut this process from a day to mere minutes. With Docker, you can express servers and services through code, similarly to configuration tools like Puppet, Chef, Salt and Ansible. But, unlike these tools, Docker goes a step further by actually pre-executing these steps for you during its build process snapshotting the output as an indexed, shareable disk image. Need to compile Node.js from source? No problem. The Docker runtime will do that on build and simply snapshot the output for you at the end. Furthermore, because Docker containers sit directly on top of the Linux kernel, there’s no risk of environmental variations getting in the way.
Nowadays, when we bring a new team member into a client project, they merely have to run `Docker-compose up`, grab a cup of coffee and by the time they’re back they should have everything they need to start working.
Docker can save you money.
Of course, time is money, but Docker can also save you hard, physical dollars as it relates to infrastructure costs. Studies at Gartner and McKinsey cite the average data center utilization as between 6% to 12%. Quite a lot of that underutilized space is due to static partitioning. With physical machines or even hypervisors, you need to defensively provision the CPU, disk and memory based on the high watermark of possible usage. Containers, on the other hand, allow you to share unused memory and disk between instances. This allows you to pack many more services onto the same hardware, spinning them down when they’re not needed without worrying about the cost of bringing them back up again. If it’s 3am and no one is hitting your Dockerized intranet application but you need a little extra horsepower for your Dockerized nightly batch job, you can simply swap some resources between the two applications running on common infrastructure.
Docker has a robust ecosystem of existing images.
At the time of writing, there are over 14,000 public Docker images available on the web. Most of these images are shared through Docker Hub. Similar to how Github has largely become the home of most major open-source projects, Docker Hub is the de facto resource for sharing and working with public Docker images. These images can serve as building blocks for your application or database services. Want to test drive the latest version of that hot new graph database you’ve been hearing about? Someone’s probably already gone to the trouble of Dockerizing it. Need to build and host a simple Rails application with a special version of Ruby? It’s now at your fingertips in a single command.
Docker helps you avoid production bugs.
At Spantree, we’re big fans of “immutable infrastructure.” That is to say, if at all possible, we avoid doing upgrades or changes on live servers at all costs. Instead, we build out new servers from scratch, applying the new application code directly to a pristine image and rolling the new release servers into the load balancer when they’re ready, retiring the old server instances after all our health checks pass. This gives us the ability to cleanly roll back if something goes wrong. It also gives us the ability to promote the same master images from dev to QA to production with no risk of configuration drift. By extending this approach all the way to the developer machine with Docker, we can also avoid the “it works on my machine” problem because each developer is able to test their build locally in a parallel
Docker only works on Linux (for now).
The technologies powering Docker are not necessarily new, but many of them, like LXC and cgroups, are specific to the Linux kernel. This means that, at the time of writing, Docker is only capable of hosting applications and services that can run on Linux. That is likely to change in the coming years as Microsoft has recently announced plans for first-class container support in the next version of Windows Server. Microsoft has been working closely with Docker to achieve this goal. In the meantime, tools like boot2docker and Docker Machine make it possible to run and proxy Docker commands to a lightweight Linux VM on Mac and Windows environments.
Have you used Docker? What has your experience been like? If you're interested in learning more about how Spantree and GreenPages can help with your application development initiatives, please reach out!
By Cedric Hurst, Principal, Spantree Technology Group, LLC
Cedric Hurst is Principal at Spantree Technology Group, a boutique software engineering firm based primarily out of Chicago, Illinois that focuses on delivering scalable, high-quality solutions for the web. Spantree provides clients throughout North America with deep insights, strategies and development around cloud computing, devops and infrastructure automation. Spantree is partnered with GreenPages to provide high-value application development and devops enablement to their growing enterprise client base. In his spare time, Cedric speaks at technical meetups, makes music, mentors students and hangs out with his daughter. To stay up to date with Spantree, follow them on Twitter @spantreellc