Unpacking Docker images with Undocker

In some ways, the most exciting thing about Docker isn’t the ability to start containers. That’s been around for a long time in various forms, such as LXC or OpenVZ. What Docker brought to the party was a convenient method of building and distributing the filesystems necessary for running containers. Suddenly, it was easy to build a containerized service and to share it with other people.

I was taking a closer at the systemd-nspawn command, which it seems has been developing it’s own set of container-related superpowers recently, including a number of options for setting up the network environment of a container. Like Docker, systemd-nspawn needs a filesystem on which to operate, but unlike Docker, there is no convenient distribution mechanism and no ecosystem of existing images. In fact, the official documentation seems to assume that you’ll be building your own from scratch. Ain’t nobody got time for that…

…but with that attracting Docker image ecosystem sitting right next door, surely there was something we can do?

The format of a Docker image⌗

A Docker image is a tar archive that contains a top level repositories files, and then a number of layers stored as directories containing a json file with some metadata about the layer and a tar file named layer.tar with the layer content. For example, if you docker save busybox, you get:

4986bf8c15363d1c5d15512d5266f8777bfba4974ac56e3270e7760f6f0a8125/
4986bf8c15363d1c5d15512d5266f8777bfba4974ac56e3270e7760f6f0a8125/VERSION
4986bf8c15363d1c5d15512d5266f8777bfba4974ac56e3270e7760f6f0a8125/json
4986bf8c15363d1c5d15512d5266f8777bfba4974ac56e3270e7760f6f0a8125/layer.tar
511136ea3c5a64f264b78b5433614aec563103b4d4702f3ba7d4d2698e22c158/
511136ea3c5a64f264b78b5433614aec563103b4d4702f3ba7d4d2698e22c158/VERSION
511136ea3c5a64f264b78b5433614aec563103b4d4702f3ba7d4d2698e22c158/json
511136ea3c5a64f264b78b5433614aec563103b4d4702f3ba7d4d2698e22c158/layer.tar
df7546f9f060a2268024c8a230d8639878585defcc1bc6f79d2728a13957871b/
df7546f9f060a2268024c8a230d8639878585defcc1bc6f79d2728a13957871b/VERSION
df7546f9f060a2268024c8a230d8639878585defcc1bc6f79d2728a13957871b/json
df7546f9f060a2268024c8a230d8639878585defcc1bc6f79d2728a13957871b/layer.tar
ea13149945cb6b1e746bf28032f02e9b5a793523481a0a18645fc77ad53c4ea2/
ea13149945cb6b1e746bf28032f02e9b5a793523481a0a18645fc77ad53c4ea2/VERSION
ea13149945cb6b1e746bf28032f02e9b5a793523481a0a18645fc77ad53c4ea2/json
ea13149945cb6b1e746bf28032f02e9b5a793523481a0a18645fc77ad53c4ea2/layer.tar
repositories

In order to re-create the filesystem that would result from starting a Docker container with this image, you need to unpack the layer.tar files from the bottom up. You can find the topmost layer in the repositories file, which looks like this:

{ “busybox”: { “latest”: “4986bf8c15363d1c5d15512d5266f8777bfba4974ac56e3270e7760f6f0a8125” } }

From there, you can investigate the json file for each layer looking for the parent tag.

Introducing undocker⌗

I wrote the undocker command to extract all or part of the layers of a Docker image onto the local filesystem. In other words, if you want to use the busybox Docker image, you can fetch and unpack the image:

# docker pull busybox
# docker save busybox | undocker -o busybox

This will first look in the repositories file for the busybox entry with the latest tag, then build the necessary chain of layers and unpack them in the correct order.

Once you have the filesystem extracted, you can boot it with systemd-nspawn:

# systemd-nspawn -D busybox /bin/sh
Spawning container busybox on /root/busybox.
Press ^] three times within 1s to kill container.
Timezone America/New_York does not exist in container, not updating container timezone.
Failed to copy /etc/resolv.conf to /root/busybox/etc/resolv.conf: Too many levels of symbolic links
/bin/sh: can't access tty; job control turned off
/ #

Undocker is able to extract specific layers from the image as well. We can get a list of layers with the --layers option:

$ docker save busybox | undocker --layers
511136ea3c5a64f264b78b5433614aec563103b4d4702f3ba7d4d2698e22c158
df7546f9f060a2268024c8a230d8639878585defcc1bc6f79d2728a13957871b
ea13149945cb6b1e746bf28032f02e9b5a793523481a0a18645fc77ad53c4ea2
4986bf8c15363d1c5d15512d5266f8777bfba4974ac56e3270e7760f6f0a8125

And we can extract one or more specific layers with the --layer (-l) option:

$ docker save busybox |
  undocker -vi -o busybox -l ea13149945cb6b1e746bf28032f02e9b5a793523481a0a18645fc77ad53c4ea2
INFO:undocker:extracting image busybox (4986bf8c15363d1c5d15512d5266f8777bfba4974ac56e3270e7760f6f0a8125)
INFO:undocker:extracting layer ea13149945cb6b1e746bf28032f02e9b5a793523481a0a18645fc77ad53c4ea2

I’m using the -i (--ignore-errors) option here because this layer contains a device node (/dev/console), and I am running this as an unprivileged user. Without the -i option, we would see:

OSError: [Errno 1] Operation not permitted

A Docker image archive can actually contain multiple images, each with multiple tags. For a single image, undocker will default to extracting the latest tag. If the latest tag doesn’t exist, you’ll see:

# docker pull fedora:20
# docker save fedora:20 | undocker -o fedora
ERROR:undocker:failed to find image fedora with tag latest

You can specify an explicit tag in the same way you provide one to Docker:

# docker save fedora:20 | undocker -o fedora fedora:20

If an archive contains multiple images, you’ll get a different error:

# docker save busybox larsks/thttpd | undocker -o busybox
ERROR:undocker:No image name specified and multiple images contained in archive

You can get a list of available images and tags with the --list option:

# docker save busybox larsks/thttpd | undocker --list
larsks/thttpd: latest
busybox: latest

# docker save fedora | undocker --list
fedora: heisenbug 20 21 rawhide latest

You can specify the image (and tag) to extract on the command line:

# docker save busybox larsks/thttpd | undocker -o busybox busybox