Cloud-init and the case of the changing hostname
Setting the stage⌗
I ran into a problem earlier this week deploying RDO Icehouse under
RHEL 6. My target systems were a set of libvirt guests deployed from
the RHEL 6 KVM guest image, which includes cloud-init in order to
support automatic configuration in cloud environments. I take
advantage of this when using libvirt
by attaching a configuration
drive so that I can pass in ssh keys and a user-data
script.
Once the systems were up, I used packstack to deploy OpenStack
onto a single controller and two compute nodes, and at the conclusion
of the packstack
run everything was functioning correctly. Running
neutron agent-list
showed all agents in good order:
+--------------------------------------+--------------------+------------+-------+----------------+
| id | agent_type | host | alive | admin_state_up |
+--------------------------------------+--------------------+------------+-------+----------------+
| 0d51d200-d902-4e05-847a-858b69c03088 | DHCP agent | controller | :-) | True |
| 192f76e9-a816-4bd9-8a90-a263a1d54031 | Open vSwitch agent | compute-0 | :-) | True |
| 3d97d7ba-1b1f-43f8-9582-f860fbfe50df | Open vSwitch agent | controller | :-) | True |
| 54d387a6-dca1-4ace-8c1b-7788fb0bc090 | Metadata agent | controller | :-) | True |
| 92fc83bf-0995-43c3-92d1-70002c734604 | L3 agent | controller | :-) | True |
| e06575c2-43b3-4691-80bc-454f501debfe | Open vSwitch agent | compute-1 | :-) | True |
+--------------------------------------+--------------------+------------+-------+----------------+
A problem rears its ugly head⌗
After rebooting the system, I found that I was missing an expected Neutron router namespace. Specifically, given:
# neutron router-list
+--------------------------------------+---------+-----------------------------------------------------------------------------+
| id | name | external_gateway_info |
+--------------------------------------+---------+-----------------------------------------------------------------------------+
| e83eec10-0de2-4bfa-8e58-c1bcbe702f51 | router1 | {"network_id": "b53a9ecd-01fc-4bee-b20d-8fbe0cd2e010", "enable_snat": true} |
+--------------------------------------+---------+-----------------------------------------------------------------------------+
I expected to see:
# ip netns
qrouter-e83eec10-0de2-4bfa-8e58-c1bcbe702f51
But the qrouter
namespace was missing.
The output of neutron agent-list
shed some light on the problem:
+--------------------------------------+--------------------+------------------------+-------+----------------+
| id | agent_type | host | alive | admin_state_up |
+--------------------------------------+--------------------+------------------------+-------+----------------+
| 0832e8f3-61f9-49cf-b49c-886cc94d3d28 | Metadata agent | controller.localdomain | :-) | True |
| 0d51d200-d902-4e05-847a-858b69c03088 | DHCP agent | controller | xxx | True |
| 192f76e9-a816-4bd9-8a90-a263a1d54031 | Open vSwitch agent | compute-0 | :-) | True |
| 3be34828-ca8d-4638-9b3a-4e2f688a9ca9 | L3 agent | controller.localdomain | :-) | True |
| 3d97d7ba-1b1f-43f8-9582-f860fbfe50df | Open vSwitch agent | controller | xxx | True |
| 54d387a6-dca1-4ace-8c1b-7788fb0bc090 | Metadata agent | controller | xxx | True |
| 87b53741-f28b-4582-9ea8-6062ab9962e9 | Open vSwitch agent | controller.localdomain | :-) | True |
| 92fc83bf-0995-43c3-92d1-70002c734604 | L3 agent | controller | xxx | True |
| e06575c2-43b3-4691-80bc-454f501debfe | Open vSwitch agent | compute-1 | :-) | True |
| e327b7f9-c9ce-49f8-89c1-b699d9f7d253 | DHCP agent | controller.localdomain | :-) | True |
+--------------------------------------+--------------------+------------------------+-------+----------------+
There were two sets of Neutron agents registered using different hostnames – one set using the short name of the host, and the other set using the fully qualified hostname.
What’s up with that?⌗
In the cc_set_hostname.py
module, cloud-init
performs the
following operation:
(hostname, fqdn) = util.get_hostname_fqdn(cfg, cloud)
try:
log.debug("Setting the hostname to %s (%s)", fqdn, hostname)
cloud.distro.set_hostname(hostname, fqdn)
except Exception:
util.logexc(log, "Failed to set the hostname to %s (%s)", fqdn,
hostname)
raise
It starts by retrieving the hostname (both the qualified and
unqualified version) from the cloud environment, and then calls
cloud.distro.set_hostname(hostname, fqdn)
. This ends up calling:
def set_hostname(self, hostname, fqdn=None):
writeable_hostname = self._select_hostname(hostname, fqdn)
self._write_hostname(writeable_hostname, self.hostname_conf_fn)
self._apply_hostname(hostname)
Where, on a RHEL system, _select_hostname
is:
def _select_hostname(self, hostname, fqdn):
# See: http://bit.ly/TwitgL
# Should be fqdn if we can use it
if fqdn:
return fqdn
return hostname
So:
cloud-init
setswriteable_hostname
to the fully qualified name of the system (assuming it is available).cloud-init
writes the fully qualified hostname to/etc/sysconfig/network
.cloud-init
sets the hostname to the unqualified hostname
The result is that your system will probably have a different hostname after your first reboot, which throws off Neutron.
And they all lived happily ever after?⌗
It turns out this bug was reported upstream back in October of 2013 as bug 1246485, and while there are patches available the bug has been marked as “low” priority and has been fixed. There are patches attached to the bug report that purport to fix the problem.