Automatic configuration of Windows instances in OpenStack, part 1

This is the first of two articles in which I discuss my work in getting some Windows instances up and running in our OpenStack environment. This article is primarily about problems I encountered along the way.

Motivations⌗

Like many organizations, we have a mix of Linux and Windows in our environment. Some folks in my group felt that it would be nice to let our Windows admins take advantage of OpenStack for prototyping and sandboxing in the same ways our Linux admins can use it.

While it is trivial to get Linux instances running in OpenStack (there are downloadable images from several distributions that will magically configure themselves on first boot), getting Windows systems set up is a little trickier. There are no pre-configured images to download, and it looks as if there aren’t that many people trying to run Windows under OpenStack right now so there is a lot less common experience to reference.

Like the cool kids do it⌗

My first approach to this situation was to set up our Windows instances to act just like our Linux instances:

Install Cygwin.
Run an SSH server.
Have the system pull down an SSH public key on first boot and use this for administrative access.

This worked reasonably well, but many people felt that this wasn’t a great solution because it wouldn’t feel natural to a typical Windows administrator. It also required a full Cygwin install to drive things, which isn’t terrible but still feels like a pretty big hammer.

As an alternative, we decided we needed some way to either (a) allow the user to pass a password into the instance environment, or (b) provide some way for the instance to communicate a generated password back to the user.

How about user-data?⌗

One of my colleagues suggested that we could allow people to pass an administrative password into the environment via the user-data attribute available from the metadata service. While this sounds like a reasonable idea at first, it has one major flaw: data from the metadata service is available to anyone on the system who is able to retrieve a URL. This would make it trivial for anyone on the instance to retrieve the administrator password.

How about adminPass?⌗

When you boot an instance using the nova command line tools…

nova boot ...

You get back a chunk of metadata, including an adminPass key, which is a password randomly generated by OpenStack and availble during the instance provisioning process:

+------------------------+--------------------------------------+
|        Property        |                Value                 |
+------------------------+--------------------------------------+
...
| adminPass              | RBiWrSNYqK5R                         |
...
+------------------------+--------------------------------------+

This would be an ideal solution, if only I were able to figure out how OpenStack made this value available to the instance. After asking around on #openstack it turns out that not many people were even aware this feature exists, so information was hard to come by. I ran across some documentation that mentioned the libvirt_inject_password option in nova.conf with the following description:

(BoolOpt) Inject the admin password at boot time, without an agent.

…but that still didn’t actually explain how it worked, so I went diving through the code. The libvirt_inject_password option appears in only a single file, nova/virt/libvirt/connection.py, so I knew where to start. This led me to the _create_image method, which grabs the admin_pass generated by OpenStack:

if FLAGS.libvirt_inject_password:
    admin_pass = instance.get('admin_pass')
else:
    admin_pass = None

And then passes it to the inject_data method:

disk.inject_data(injection_path,
                 key, net, metadata, admin_pass, files,
                 partition=target_partition,
                 use_cow=FLAGS.use_cow_images,
                 config_drive=config_drive)

The inject_data method comes from nova/virt/disk/api.py, which is where things get interesting: it turns out that the injection mechanism works by:

Mounting the root filesystem,
Copying out /etc/passwd and /etc/shadow,
Modifying them, and
Copying them back.

Like this:

passwd_path = _join_and_check_path_within_fs(fs, 'etc', 'passwd')
shadow_path = _join_and_check_path_within_fs(fs, 'etc', 'shadow')

utils.execute('cp', passwd_path, tmp_passwd, run_as_root=True)
utils.execute('cp', shadow_path, tmp_shadow, run_as_root=True)
_set_passwd(admin_user, admin_passwd, tmp_passwd, tmp_shadow)
utils.execute('cp', tmp_passwd, passwd_path, run_as_root=True)
os.unlink(tmp_passwd)
utils.execute('cp', tmp_shadow, shadow_path, run_as_root=True)
os.unlink(tmp_shadow)

Do you see a problem here, given that I’m working with a Windows instance? First, it’s possible that the host will be unable to mount the NTFS filesystem, and secondly, there are no passwd or shadow files of any use on the target.

You can pass --config-drive=True to nova boot and it will use a configuration drive (a whole-disk FAT filesystem) for configuration data (and make this available as a block device when the system boots), but this fails, hard: most of the code treats this as being identical to the original root filesystem, so it still tries to perform the modifications to /etc/passwd and /etc/shadow which, of course, don’t exist.

I whipped some quick patches that would write the configuration data (such as admin_pass) to simple files at the root of the configuration drive…but then I ran into a new problem:

Windows doesn’t know how to deal with whole-disk filesystems (nor, apparently, do many windows admins). In the absence of a partition map, Windows assumes that the device is empty.

Oops. At this point it was obvious I was treading on ground best left undisturbed.