Manage your Docker image layers with Ansible

If you are managing infrastructure at large scale as we do in OpenSooq you had better have good level of automation and orchestration, your SysAdmins and operators had better handle their infrastructure as code. In OpenSooq, our preferred tool is Ansible.

Ansible is exceptionally great to put your servers into the desired state, which make it ideal for deployment orchestration and configuration management. For containerized environment, typically people use Dockerfiles to describe their setup inside the container, so they end up maintaining two sets Ansible playbooks and Dockerfiles. In this article we will discuss why it makes more sense to only maintain Ansible which would generate the docker image for you

No developer like pulling gigabytes of images

Ideal world

In an ideal world layers inside a docker image should be a common base image, a milestone release, a single layer containing all small hot-fixes over the milestone release. After each hot-fix release the top layer is replaced with one containing the new one (so that the total number of layers is always the same)

When the hot-fix layer is too big, we can replace the milestone layer with a new milestone layer.

ideal-layers

Dockerfile hell

If you compose your images by writing Dockerfiles, the most obvious thing is that it can only produce docker images (unlike Ansible playbooks) but this is not the worse thing.

First you have to stick with one hard-coded base image, your Dockerfile is an imperative steps and dependent on your choice of base image.

To accomplish the ideal scenario we talked about you might end up maintaining multiple Dockerfiles. One of them would stack endless layers over the milestone release and the other would take bare OS base image and put every thing on top of it. The first would make it slow, might reach number of layers in AUFS (if you are using debian/ubuntu host).

Big layers over base image mean your developers need to pull large images after each release

dockerfile

Ansible Features

Unlike Dockerfile format, Ansible playbooks are declarative YAML-based language that describe the desired state which mean you can feed it whatever distro or whatever server in whatever state, and it would end up in the desired state.

Ansible support “vars”, and it can gather facts and adjust the behavior based on them like distro. It can apply those “vars” and “facts” into Jinja templates to produce config files. It have a rich library of built-in core modules (that can manipulate INI files, install packages, create users, …) and community roles.

Ansible can work on many hosts at once, or do something on one host, then another thing on another host, then back to the first and so on, it can deligate tasks from one to another.

Ansible is agent-less, can work on any thing with SSH and python2. But this is not the limit, Ansible can connect via “docker exec” then put it in the “desired state”.

How does it work?

Regular non-containerized deployment

Inside your project source code (which typically lives in GIT) have a directory called “ansible” in which you have directories like “vars”, “files”, and “tasks” beside your regular playbook “deploy.yml”.

---
- name: start regular non-containerized deployment
  vars_files:
    - vars/defaults.yml
    - vars/no-container.yml
  tasks:
    - include: tasks/main.yml

You can apply this playbook using a simple command like this

ansible-playbook deploy.yml

You can pass custom inventory of hosts, only apply tagged parts or a specific host for example

ansible-playbook -i "myhost," deploy.yml

Adding docker image building playbook

Beside your regular “deploy.yml” add a file called “img-build.yml” which look like this

---
- name: start up a docker container
  hosts: localhost
  vars_files:
    - vars/defaults.yml
    - vars/container.yml
  tasks:
    - name: start up a docker container by running bash
      docker_container: image={{ base_image }} name={{ container_name }} command=/bin/bash recreate={{container_recreate}} state=started tty=yes 
      changed_when: false
    - name: add the host
      add_host:
         name: "{{ container_name }}"
         ansible_connection: docker
         ansible_docker_extra_args: "{{docker_connection_args}}"
         ansible_user: root
      changed_when: false
    - name: fix minimal containers
      delegate_to: "{{ container_name }}"
      include: tasks/fix-minimal.yml
    - name: setup to gather facts
      delegate_to: "{{ container_name }}"
      setup:
    - name: setting the container
      delegate_to: "{{ container_name }}"
      include: tasks/main.yml
    - name: save image
      local_action: command /bin/bash -c 'docker commit -a "{{ image_author }}" -m "`date '+%F'`" -c "USER {{app_user}}" -c "LABEL build_date=`date '+%F'`" -c "VOLUME /data" -c "ENTRYPOINT [ \"/usr/local/bin/dumb-init\", \"--\" ] " -c "CMD [ \"/start.sh\" ]" {{ container_name }} {{ target_image }}'

Simple we have done the same “tasks/main.yml” just like the previous “deploy.yml” but we have wrapped it with the following

  • we start with “localhost” instead of remote machines
  • we loaded vars, but loaded a different one (“vars/container.yml” instead of “vars/no-container.yml”) to do some adjustments
  • we created a container from whatever desired base-image (either the default one in vars or the overridden one)
  • we added the container to the inventory using docker connection method
  • we delegate execution to the container
    • “tasks/fix-minimal.yml” is used to install python2 using “raw” module
    • we specifically call “setup” module to gather facts inside container
    • at last we call the desired “tasks/main.yml”
  • we commit the container into a docker image, we set some metadata using LABEL like build date, we can add git branch and last commit hash.

An alternative way would be

---
- name: start up a docker container
  hosts: localhost
  vars_files:
    - vars/defaults.yml
    - vars/container.yml
  tasks:
    - name: start up a docker container by running bash
      docker_container: image={{ base_image }} name={{ container_name }} command=/bin/bash recreate={{container_recreate}} state=started tty=yes 
      changed_when: false
    - name: add the host
      add_host:
         name: "{{ container_name }}"
         ansible_connection: docker
         ansible_docker_extra_args: "{{docker_connection_args}}"
         ansible_user: root
      changed_when: false
    - name: fix minimal containers
      delegate_to: "{{ container_name }}"
      include: tasks/fix-minimal.yml
- name: setup the container
  hosts: "{{ container_name }}"
  vars_files:
    - vars/defaults.yml
    - vars/container.yml
  tasks:
    - include: tasks/main.yml
    - name: save image
      local_action: command /bin/bash -c 'docker commit -a "{{ image_author }}" -m "`date '+%F'`" -c "USER {{app_user}}" -c "LABEL build_date=`date '+%F'`" -c "VOLUME /data" -c "ENTRYPOINT [ \"/usr/local/bin/dumb-init\", \"--\" ] " -c "CMD [ \"/start.sh\" ]" {{ container_name }} {{ target_image }}'

here we have two sections one that connects to localhost and the other connects to the container, we did not use delegate to run “tasks/main.yml” but we made a fresh connection with its own implied fact gathering setup for this to work you need functional python2 on the target, one can use “gather_facts: no” but I short-cut this with delegate and for the saving the resulted image I used “local_action” to move from the container host to the “localhost” instead of starting over again.

Personally I prefer the first one because it has fewer context jumps at least visually.

How do you “Fix minimal”s?

Well, just good old shell scripts, I used “raw” module (which works even if target does not have functional python2)

---
- name: install python on redhat family
  raw: /bin/bash -c " [ ! -e /ostree ] && [ -x /bin/yum ] && [ ! -x /bin/python ] && yum install -y python python2-dnf || :"
  changed_when: False
- name: install python2-dnf on fedora
  raw: /bin/bash -c " [ ! -e /ostree ] && [ -x /bin/dnf ] && [ ! -d /usr/lib/python2.7/site-packages/dnf/ ] && dnf install -y python2-dnf || :"
  changed_when: False
- name: install python on debian family
  raw: /bin/bash -c " [ -x /usr/bin/apt-get ] && [ ! -x /usr/bin/python ] && DEBIAN_FRONTEND=noninteractive apt-get install -y python || :"
  changed_when: False
- name: install python-apt on debian family
  raw: /bin/bash -c " [ -x /usr/bin/apt-get ] && [ ! -d /usr/lib/python2.7/dist-packages/apt/ ] && DEBIAN_FRONTEND=noninteractive apt-get install -y python-apt || :"
  changed_when: False