At $JOB we often find ourselves at customer sites where we see the same set of basic problems that we have previously encountered elsewhere ("your clocks aren't in sync" or "your filesystem is full" or "you haven't installed a critical update", etc). We would like a simple tool that could be run either by the customer or by our own engineers to test for and report on these common issues. Fundamentally, we want something that acts like a typical code test suite, but for infrastructure.

It turns out that Ansible is almost the right tool for the job:

  • It's easy to write simple tests.
  • It works well in distributed environments.
  • It's easy to extend with custom modules and plugins.

The only real problem is that Ansible has, by default, "fail fast" behavior: once a task fails on a host, no more tasks will run on that host. That's great if you're actually making configuration changes, but for our purposes we are running a set of read-only independent checks, and we want to know the success or failure of all of those checks in a single operation (and in many situations we may not have the option of correcting the underlying problem ourselves).

In this post, I would like to discuss a few Ansible extensions I've put together to make it more useful as an infrastructure testing tool.

The ansible-assertive project

The ansible-assertive project contains two extensions for Ansible:

  • The assert action plugin replaces Ansible's native assert behavior with something more appropriate for infrastructure testing.

  • The assertive callback plugin modifies the output of assert tasks and collects and reports results.

The idea is that you write all of your tests using the assert plugin, which means you can run your playbooks in a stock environment and see the standard Ansible fail-fast behavior, or you can activate the assert plugin from the ansible-assertive project and get behavior more useful for infrastructure testing.

A simple example

Ansible's native assert plugin will trigger a task failure when an assertion evaluates to false. Consider the following example:

- hosts: localhost
  vars:
    fruits:
      - oranges
      - lemons
  tasks:
    - assert:
        that: >-
          'apples' in fruits
        msg: you have no apples

    - assert:
        that: >-
          'lemons' in fruits
        msg: you have no lemons

If we run this in a stock Ansible environment, we will see the following:

PLAY [localhost] ***************************************************************

TASK [Gathering Facts] *********************************************************
ok: [localhost]

TASK [assert] ******************************************************************
fatal: [localhost]: FAILED! => {
    "assertion": "'apples' in fruits",
    "changed": false,
    "evaluated_to": false,
    "failed": true,
    "msg": "you have no apples"
}
    to retry, use: --limit @/home/lars/projects/ansible-assertive/examples/ex-005/playbook1.retry

PLAY RECAP *********************************************************************
localhost                  : ok=1    changed=0    unreachable=0    failed=1

A modified assert plugin

Let's activate the assert plugin in ansible-assertive. We'll start by cloning the project into our local directory:

$ git clone https://github.com/larsks/ansible-assertive

And we'll activate the plugin by creating an ansible.cfg file with the following content:

[defaults]
action_plugins = ./ansible-assertive/action_plugins

Now when we re-run the playbook we see that a failed assertion now registers as changed rather than failed:

PLAY [localhost] ***************************************************************

TASK [Gathering Facts] *********************************************************
ok: [localhost]

TASK [assert] ******************************************************************
changed: [localhost]

TASK [assert] ******************************************************************
ok: [localhost]

PLAY RECAP *********************************************************************
localhost                  : ok=3    changed=1    unreachable=0    failed=0

While that doesn't look like much of a change, there are two things of interest going on here. The first is that the assert plugin provides detailed information about the assertions specified in the task; if we were to register the result of the failed assertion and display it in a debug task, it would look like:

TASK [debug] *******************************************************************
ok: [localhost] => {
    "apples": {
        "ansible_stats": {
            "aggregate": true,
            "data": {
                "assertions": 1,
                "assertions_failed": 1,
                "assertions_passed": 0
            },
            "per_host": true
        },
        "assertions": [
            {
                "assertion": "'apples' in fruits",
                "evaluated_to": false
            }
        ],
        "changed": true,
        "failed": false,
        "msg": "you have no apples"
    }
}

The assertions key in the result dictionary contains of a list of tests and their results. The ansible_stats key contains metadata that will be consumed by the custom statistics support in recent versions of Ansible. If you have Ansible 2.3.0.0 or later, add the following to the defaults section of your ansible.cfg:

show_custom_stats = yes

With this feature enabled, your playbook run will conclude with:

CUSTOM STATS: ******************************************************************
    localhost: { "assertions": 2,  "assertions_failed": 1,  "assertions_passed": 1}

A callback plugin for better output

The assertive callback plugin provided by the ansible-assertive project will provide more useful output concerning the result of failed assertions. We activate it by adding the following to our ansible.cfg:

callback_plugins = ./ansible-assertive/callback_plugins
stdout_callback = assertive

Now when we run our playbook we see:

PLAY [localhost] ***************************************************************

TASK [Gathering Facts] *********************************************************
ok: [localhost]

TASK [assert] ******************************************************************
failed: [localhost]  ASSERT('apples' in fruits)
failed: you have no apples

TASK [assert] ******************************************************************
passed: [localhost]  ASSERT('lemons' in fruits)

PLAY RECAP *********************************************************************
localhost                  : ok=3    changed=1    unreachable=0    failed=0

Machine readable statistics

The above is nice but is still primarily human-consumable. What if we want to collect test statistics for machine processing (maybe we want to produce a nicely formatted report of some kind, or maybe we want to aggregate information from multiple test runs, or maybe we want to trigger some action in the event there are failed tests, or...)? You can ask the assertive plugin to write a YAML format document with this information by adding the following to your ansible.cfg:

[assertive]
results = testresult.yml

After running our playbook, this file would contain:

groups:
- hosts:
    localhost:
      stats:
        assertions: 2
        assertions_failed: 1
        assertions_passed: 1
        assertions_skipped: 0
      tests:
      - assertions:
        - test: '''apples'' in fruits'
          testresult: failed
        msg: you have no apples
        testresult: failed
        testtime: '2017-08-04T21:20:58.624789'
      - assertions:
        - test: '''lemons'' in fruits'
          testresult: passed
        msg: All assertions passed
        testresult: passed
        testtime: '2017-08-04T21:20:58.669144'
  name: localhost
  stats:
    assertions: 2
    assertions_failed: 1
    assertions_passed: 1
    assertions_skipped: 0
stats:
  assertions: 2
  assertions_failed: 1
  assertions_passed: 1
  assertions_skipped: 0
timing:
  test_finished_at: '2017-08-04T21:20:58.670802'
  test_started_at: '2017-08-04T21:20:57.918412'

With these tools it becomes much easier to design playbooks for testing your infrastructure. I'll be following up this article with some practical examples of this.


Better bulk filtering for Gmail

Fri 07 July 2017 by Lars Kellogg-Stedman Tags gmail

I use Gmail extensively for my personal email, and recently my workplace has been migrated over to Gmail as well. I find that for my work email I rely much more extensively on filters and labels to organize things (like zillions of internal and upstream mailing lists), and that has …

read more

OpenStack, Containers, and Logging

Wed 14 June 2017 by Lars Kellogg-Stedman Tags openstack logging

I've been thinking about logging in the context of OpenStack and containerized service deployments. I'd like to lay out some of my thoughts on this topic and see if people think I am talking crazy or not.

There are effectively three different mechanisms that an application can use to emit …

read more

FAA Cannot Require Drone Registration

Thu 25 May 2017 by Lars Kellogg-Stedman Tags uav faa

This is now old news if you're already following the drone industry, but if you're not, I'd like to highlight a recent decision made by the US Court of Appeals regarding the FAA's drone registration requirements.

To place this in context, back in 2015 the FAA established a new set …

read more

Making sure your Gerrit changes aren't broken

Sun 22 January 2017 by Lars Kellogg-Stedman Tags openstack gerrit git

It's a bit of an embarrassment when you submit a review to Gerrit only to have it fail CI checks immediately because of something as simple as a syntax error or pep8 failure that you should have caught yourself before submitting...but you forgot to run your validations before submitting …

read more