It's been just over a year since my last review of Docker, heavily criticising it's flawed architectural design and poor user experience. The project has since matured into 1.0 and gained some notoriety from Amazon, but has suffered growing user frustration, hype accusations and even breakout exploits leading to host contamination. However the introduction of private repos in Docker Hub, which eliminated the need to run your own registry for hosted deployments, coupled with webhooks and tight Github build integrations, looked to be a promising start.
So I decided to give Docker another chance and put it into production for 6 months. The result was an absolute shit show of abysmal performance, hacky workarounds and rage inducing user experience which left me wanting to smash my face into the desk. Indeed performance was so bad, that disabling caching features actually resulted in faster build times.
Dockerfile has numerous problems, it's ugly, restrictive, contradictory and fundamentally flawed. Lets say you want to build multiple images of a single repo, for example a second image which contains debugging tools, but both using the same base requirements. Docker does not support this (per #9198), there is no ability to extend a Dockerfile (per #735), using sub directories will break build context and prevent you using ADD/COPY (per #2224), as would piping (per #2112), and you cannot use env vars at build time to conditionally change instructions (per #2637).
Our hacky workaround was to create a base image, two environment specific images and some Makefile automation which involved renaming and sed replacement. There are also some unexpected "features" which lead to env
$HOME disappearing, resulting in unhelpful error messages. Absolutely disgusting.
Docker has the ability to cache Dockerfile instructions by using COW (copy-on-write) filesystems, similar to that of LVM snapshots, and until recently only supported AuFS, which has numerous problems. Then in release 0.7 different COW implementations were introduced to improve stability and performance, which you can read about in detail here.
However this caching system is unintelligent, resulting in some surprising side effects with no ability to prevent a single instruction from caching (per #1996). It's also painfully slow, to the point that builds will be faster if you disable caching and avoid using layers. This is exacerbated by slow upload/download speeds performance in Docker Hub, detailed further down.
These problems are caused by the poor architectural design of Docker as a whole, enforcing linear instruction execution even in situations where it is entirely inappropriate (per #2439). As a workaround for slow builds, you can use a third party tool which supports asynchronous execution, such as Salt Stack, Puppet or even bash, completely defeating the purpose of layers and making them useless.
Docker encourages social collaboration via Docker Hub which allows Dockerfiles to be published, both public and private, which can later be extended and used by other users via FROM instruction, rather than copy/pasting. This ecosystem is akin to AMIs in AWS marketplace and Vagrant boxes, which in principle are very useful.
However the Docker Hub implementation is flawed for several reasons. Dockerfile does not support multiple FROM instructions (per #3378, #5714 and #5726), meaning you can only inherit from a single image. It also has no version enforcement, for example the author of
dockerfile/ubuntu:14.04 could replace the contents of that tag, which is the equivalent of using a package manager without enforcing versions. And as mentioned later in the article, it has frustratingly slow speed restrictions.
Docker Hub also has an automated build system which detects new commits in your repository and triggers a container build. It is also completely useless for many reasons. Build configuration is restrictive with little to no ability for customisation, missing even the basics of pre/post script hooks. It enforces a specific project structure, expecting a single Dockerfile in the project root, which breaks our previously mentioned build workarounds, and build times were horribly slow.
Our workaround was to use CircleCI, an exceptional hosted CI platform, which triggered Docker builds from Makefile and pushed up to Docker Hub. This did not solve the problem of slow speeds, but the only alternative was to use our own Docker Registry, which is ridiculously complex.
Docker originally used LXC as their default execution environment, but now use their libcontainer by default as of 0.9. This introduced the ability to tweak namespace capabilities, privileges and, use customised LXC configs when using the appropriate exec-driver.
It requires a root daemon be running at all times on the host, and there have been numerous security vulnerabilities in Docker, for example CVE-2014-6407 and CVE-2014-6408 which, quite frankly, should not have existed in the first place. Even Gartner, with their track record for poor assessments, expressed concern over the immaturity of Docker and the security implications.
Docker, by design, puts ultimate trust in namespace capabilities which expose a much larger attack surface than a typical hypervisor, with Xen having 129 CVEs in comparison with the 1279 in Linux. This can be acceptable in some situations, for example public builds in Travis CI, but are dangerous in private, multi user environments.
Namespaces and cgroups are beautifully powerful, allowing a process and its children to have a private view of shared kernel resources, such as the network stack and process table. This fine-grain control and segregation, coupled with chroot jailing and grsec, can provide an excellent layer of protection. Some applications, for example uWSGI, take direct advantage of these features without Docker, and applications which don't support namespaces directly can be sandboxed using firejail. If you're feeling adventurous, you can support directly into your code
Containerisation projects, such as LXC and Docker, take advantage of these features to effectively run multiple distros inside the same kernel space. In comparison with hypervisors, this can sometimes have the advantage of lower memory usage and faster startup times, but at the cost of reduced security, stability and compatibility. One horrible edge case relates to Linux Kernel Interfaces, running incompatible or untested combinations of glibc versions in kernel and userspace, resulting in unexpected behavior.
Back in 2008 when LXC was conceived, hardware assisted virtualisation had only been around for a couple of years, many hypervisors had performance and stability issues, as such virtualisation was not a widely used technology and these were acceptable tradeoffs to keep costs low and reduce physical footprint. However we have now reached the point where hypervisor performance is almost as fast as bare metal and, interestingly, faster in some cases. Hosted on-demand VMs are also becoming faster and cheaper, with DigitalOcean massively outperforming EC2 in both performance and cost, making it financially viable to have a 1:1 mapping of applications to VMs.
There are some specific use cases in which containerisation is the correct approach, but unless you can explain precisely why in your use case, then you should probably be using a hypervisor instead. Even if you're using virtualisation you should still be taking advantage of namespaces, and tools such as firejail can help when your application lacks native support for these features.
Docker adds an intrusive layer of complexity which makes development, troubleshooting and debugging frustratingly difficult, often creating more problems than it solves. It doesn't have any benefits over deployment, because you still need to utilise snapshots to achieve responsive auto scaling. Even worse, if you're not using snapshots then your production environment scaling is dependant on the stability of Docker Hub.
It is already being abused by projects such as baseimage-docker, an image which intends to make inspection, debugging and compatibility easier by running init.d as its entry point and even giving you an optional SSH server, effectively treating the container like a VM, although the authors reject this notion with a poor argument.
If your development workflow is sane, then you will already understand that Docker is unnecessary. All of the features which it claims to be helpful are either useless or poorly implemented, and it's primary benefits can be easily achieved using namespaces directly. Docker would have been a cute idea 8 years ago, but it's pretty much useless today.
On the surface, Docker has a lot going for it. It's ecosystem is encouraging developers towards a mindset of immutable deployments, and starting new projects can be done quickly and easily, something which many people find useful. However it's important to note that this article focuses on the daily, long term usage of Docker, both locally and in production.
Although most of the problems mentioned are self explanatory, this post makes no effort to explain how Docker could do it better. There are many alternative solutions to Docker, each with their own pros/cons, and I'll be explaining these in detail on a follow up post. If you expect anything positive from Docker, or its maintainers, then you're shit outta luck.
I'd like to say thank you to everyone who took time to give their feedback. It's fantastic to see so many people enjoy my style of writing, and reading responses from several high profile engineers, including those who have inspired me for many years, has been very humbling.
And not a single fuck was given that day... pastebin link here.