VerifyHostKeyDNS
… or how I enroll new hosts into my infrastructure.

Prologue

I run my own infrastructure. I self-host my email, DNS, this website, a git server, backups, and probably a bunch of other stuff that I forgot about. Ah yes, monitoring, Ubiquiti uniFi for my Wi-Fi access points at home and probably even more stuff.

All of it running OpenBSD, except for one machine running debian. It's all tied together with ansible1.

So far it's eight machines. I was reinstalling and consolidating some VMs and physical machines the other day and hooking up new machines became annoying because of ssh host-keys.

StrictHostKeyChecking

My ansible orchestration host needs to be able to talk to new machines over ssh. New machines need to talk to the backup server over ssh and submit passive check results over ssh to the monitoring server. The monitoring server needs to talk to new hosts over ssh2.

So we have the issue of existing infrastructure needing to verify host-keys of new hosts and new hosts needing to verify host-keys of existing infrastructure. One way to deal with this is to run a CA, sign host-keys with it and roll certificates out.

I on the other hand, prefer to use DNS3. RFC4255 provides facilities to store host-keys in SSHFP resource records in DNS and we can secure those with DNSSEC.

VerifyHostKeyDNS

ssh​_config(5) explains how ssh(1) can use SSHFP records to verify host-keys:

VerifyHostKeyDNS
Specifies whether to verify the remote key using DNS and SSHFP resource records. If this option is set to yes, the client will implicitly trust keys that match a secure fingerprint from DNS. Insecure fingerprints will be handled as if this option was set to ask. If this option is set to ask, information on fingerprint match will be displayed, but the user will still need to confirm new host keys according to the StrictHostKeyChecking option. The default is no.

One problem with this is, if you put

Host *
    VerifyHostKeyDNS yes

into your .ssh/config it will not work. The magic is secure fingerprint. What the documentation means is that a DNS answer for SSHFP needs to have the Authentic Data (AD) flag set. The flag gets set by a validating name-server if it can DNSSEC validate the SSHFP.

But when the libc stub resolver4 gets that answer it will strip the AD flag for security reasons. You see, it does not know that it can trust the validating name-server. One way to have a trustworthy validating name-server is to run one on localhost.

resolv.conf(5) explains the trust-ad option:

trust-ad
A name server indicating that it performed DNSSEC validation by setting the Authentic Data (AD) flag in the answer can only be trusted if the name server itself is trusted and the network path is trusted. Generally this is not the case and the AD flag is cleared in the answer. The trust-ad option lets the system administrator indicate that the name server and the network path are trusted. This option is automatically enabled if resolv.conf only lists name servers on localhost.

The easiest way is to run unwind(8):

doas rcctl enable unwind
doas rcctl start unwind

resolvd(8) will then add nameserver 127.0.0.1 to /etc/resolv.conf and comment out all other dynamically learned name servers. Just make sure that you are not using any static configured name servers6 because you really want to have only nameserver 127.0.0.1 in there.

Putting it all together

When I install a new host I have out of band access in one way or another. It might be a serial console, a fake html5 console or some KVM contraption. Heck, I even used qemu to get OpenBSD running on some Hetzner physical machine.

On the installed machine I use said out of band access to run

ssh-keygen -l -f /etc/ssh/ssh_host_ed25519_key.pub

This gives me one ssh host-key fingerprint and I can login over ssh.

I have to add IPv6 and legacy-IP addresses to DNS for the machine so I also grab the SSHFP to add them at the same time:

ls /etc/ssh/*.pub | xargs -n1 ssh-keygen -r $(hostname) -f

While still logged in, I install python3 and add an ssh-key for ansible. I then add the host to the ansible inventory. The ansible orchestrator can now finish the installation of the host over ssh while trusting the SSHFP it finds in DNS.

Ansible also hooks up the host to my monitoring system and the monitoring system can connect to the new host over ssh, again trusting that it talks to the correct host because of SSHFP in DNS.

The newly installed host knows that it's talking to my backup and monitoring server using their published SSHFP records.

Epilogue

I have some ideas how to streamline this even more, but I do not install new machines that often. This strikes a reasonable balance between manual work and working on automation. It's probably best to leave it like this.

Footnotes:

1

I started out with ansible, switched to salt stack and moved back to ansible. Because reasons.

2

I don't trust nrpe. I have seen the code. Instead I use by​_ssh to monitor hosts. Ansible adds an ssh public-key to a monitoring user with a force-command. The force-command is a shell-script switching over ${SSH​_ORIGINAL​_COMMAND} to run specific check​_commands. It does not trust the remote ssh at all.

3

I have a laptop sticker and travel mug with "We reject kings, presidents and voting. We believe in rough consensus and running code." crossed out with "Fuck that! Just put it in DNS." I also have a RUN DNS sticker. I am biased

4

The thingy5 that ssh uses to talk to the validating name-server. On OpenBSD that is asr.

5

Thingy is a technical term, don't worry about it.

6

I use ! route nameserver $if 149.112.112.9 2620:fe::9 9.9.9.9 2620:fe::fe:9 in my main hostname.if(5) to add some static name servers in case unwind(8) crashes7.

7

Not sure why it would do that though. Sounds unpleasant.

Published: 2023-01-15