Jan 15 2009

VPN DNS

I have a VPN and a DNS server that serves up forward and reverse DNS for my VPN hosts, which zone I call wan. When I want to look at my Cacti graphs, I go to gwythaint.wan and as long as my laptop is on the VPN I can see them wherever I am. In theory anyway. In practice, getting this to work without screwing up other things is harder.

I’ll leave out the myriad permutations that I tried over the past couple of weeks and show you the one that actually works well. That is to have a caching and forwarding name server on your laptop, and to add localhost to your list of nameservers. For best results, you would have it forwarding to the name server your DHCP server gives you, with an explicit forward over the VPN for the wan zone (and its reverse). resolvconf on Linux can do this. Your situation may warrant a static forwarder for non-wan addresses, in which case you just set that forwarder and be done with it. If your various DHCP nameservers are a bit more subtle—perhaps serving up internal domains of their own—then you may have to not forward and/or recurse except explicitly for wan.

I just took the default BIND9 configuration on my system and tweaked it thus:

// local/vpn stuff
zone "wan" {
    type forward;
    forwarders { 172.17.77.1; 172.17.0.1; };
};
zone "17.172.in-addr.arpa" {
    type forward;
    forwarders { 172.17.77.1; 172.17.0.1; };
};

On most systems the default named.conf is already some reasonable caching setup, so you wouldn’t have to tweak it beyond that. Then I added localhost to the nameserver list (/etc/resolv.conf on Linux, in the network preferences pane on OS X) and checked that it works with a dig @localhost gwythaint.wan.

Things got tricky because dig and host on my laptop were taking forever to
return when I queried localhost—6 seconds or so. I chased this wild goose for
awhile and in the end I didn’t find the reason (it still does it), but I
verified that it’s not a problem. If you use the -v flag to host you notice
that the actual queries took <1ms, so whatever else host and dig are doing may
not be relevant. Even stranger, if you do host -v gwythaint.wan and don’t
specify to query localhost, everything resolves instantly and yet it reports
that it queried localhost (which you can verify with the non-traffic on repeat
requests via tcpdump). It hasn’t slowed down any other applications (a 6-second
slowdown on DNS lookups would be very obvious), so I chalk it up to “who
cares?” If host and dig on OS X return the right answer, and you verify they’re
querying the right server, then you’re good to go.


Jan 7 2009

Subnet-to-Subnet Routing

This is a note to myself, since I always seem to get this wrong and spend an hour or two racking my brain over it, and yet it’s so simple.

Consider the following network:

172.17.77.0/24 -- A --+-- B -- 172.17.82.0/24
                      |

                      S

A: 172.17.77.1/24 and 172.17.0.77/24
B: 172.17.82.1/24 and 172.17.0.82/24
S: 172.17.0.1/24

It is instructive to watch a tcpdump on A, S, and B while you ping between these three hosts. In particular, S sees nothing when A and B ping eachother. Well, not nothing—S will see the arp requests—but if you were running tcpdump icmp you wouldn’t see anything. Now, if A is the gateway for its subnet and B is the gateway for its subnet, and you put a route for 172.17.0.0/16 via S on both A and B, the two subnets can find each other. But what if you instead put a route for 172.17.0.0/16 via the interface alone, and try to leave S out of it? A will not respond to ARP requests for 172.17.77.42, and so packets from B’s subnet for A’s subnet will fall off the edge of the network at B.

I hope that makes sense. It’s rather simple when you look at it that way and not much to sing about. But when I make this little modification my brain always seems to go on vacation:

172.17.77.0/24 -- A      B -- 172.17.82.0/24
                   \    /
                    tap0
                     |
                     S

Now A, B, and S are connected by OpenVPN using TAP. TAP is like a virtual switch (Layer 2), so in reality it’s the exact same setup. But for some reason whenever I set this up I tend to think that a route on A and B for 172.17.0.0/16 via dev tap0 will work. And so it does, when pinging just A and B. Then when I finally get around to hooking up their subnets, they can’t see the other side of the VPN and I get confused. Then I fire up umpteen tcpdumps and having forgot to look for ARP traffic I get utterly flabbergasted. My mind thinks that since S is the VPN server that I should see ping traffic from A to B (or A’s subnet to B’s subnet) on S, if it’s making it through the VPN. Then I assume that OpenVPN is doing something funky. At this point I get confused by the client-to-client option, and things go downhill fast.

So lets set things straight once and for all. OpenVPN’s client-to-client option, when used with TAP, makes the VPN behave like a true switch. When it is set, A can see B’s ARPs, and vice versa. When it is not, they can’t. Think of it as S having one NIC for each client and they’re all bridged together on tap0, or not, depending on the setting of client-to-client.

If you set routes for 172.17.0.0/24 via 172.17.0.1 then A can reach B anyway, but S will helpfully send ICMP redirects which won’t work if followed. I suppose you could turn off this “helpfulness”, but if you want to get from A to B just turn on the aptly-named client-to-client option.

The next important thing is to remember that when client-to-client is set and you’re using TAP, the VPN behaves like a true switch. Packets direct from A to B will not show up on tap0 as far as external programs like tcpdump are concerned. That also goes for packets from A’s subnet to B’s subnet. Of course, they are still running through the VPN, and so S is playing the middleman as far as bandwidth, firewall, and encryption go. But you won’t see it with tcpdump. (It makes me wonder if tap0 is behaving like a switch in that traffic from A to S never travels to C at all—I think this is probably the case.) Switch, not hub.

Finally, the important thing to realize when doing TAP is that the network looks like this:

A -- + -- B
     |

     S

not like this:

A -- S -- B

And the final take-home lesson is, use tcpdump icmp or arp to avoid confusion and hair loss.

There. hopefully that straightens me out, if nobody else.


Jan 6 2009

Putting OpenVPN in its place

Update: I had some errors and oversights in my general config that didn’t have any direct bearing on the main message of this post. I have fixed them below and I beg you to pretend they never happened.

OpenVPN is a fantastic piece of software. No, it’s an essential piece of software. A godsend.

But it has this tendency to try to be all that and a bag of chips.

My primary gripe with OpenVPN over the years has been what I call “psuedo-DHCP”. It pretends, poorly, to be a DHCP server. If you have the audacity to prefer a real DHCP server you find very little help and sometimes even resistance from the tools and the community. I once tried to get it working and failed.

This week I was refreshing my OpenVPN setup and reading through the manpage for version 2.1, and saw a few references to people actually using DHCP. Still no explicit documentation, but it gave me hope. So I duly tilted at that windmill.
Now I will show you how to get DHCP working with OpenVPN. What’s more, we’ll get rid of ifconfig and route options (for the most part). In short, we’ll put OpenVPN in its place: as a secure tunnel manager.

The important paradigm shift here is that you aren’t required to do anything from withing OpenVPN to configure the interface. You can just bring up the tunnel and your TUN/TAP device will be alive but unconfigured. At that point you could do something like this:

ip link set tap0 up
ip addr add 172.17.0.1/24 dev tap0

You could do this manually, or in an up script, or whatever. Or you could let your distro do it. Ah, so we can have a tap0 stanza in /etc/network/interfaces (Debian-based distros) that will configure tap0 when we ask it to. Let’s look at a client example:

# in /etc/network/interfaces
iface tap0 inet dhcp
    hostname falcon
    # dhclient doesn't pay attention to this, so if you use dhclient (you
    # probably do) see /etc/dhcp3/dhclient.conf
    client falcon

# in the openvpn config
dev tap0
route-delay 10
cd /etc/openvpn
up "up.sh"
down-pre
down "down.sh"
…

# up.sh
#! /bin/bash
ifdown tap0 2>/dev/null
ifup tap0 &

# down.sh
ifdown tap0

There’s some subtlety here, let’s talk about it. Note that we’re specifying both the DHCP client id and the DHCP hostname—more on that later. We use an external script because of the way OpenVPN’s up option works, so that we can background the ifup call. This is important because the tunnel isn’t fully up at this point, so your DHCP client won’t succeed unless we background it (I tried up-delay to no avail). I have the ifdown bit in there as a safety measure—if for whatever reason Debian thinks the interface is already up it won’t start the DHCP client and that would be bad. But hopefully this doesn’t happen much thanks to the down option. Finally, the route-delay option gives the DHCP negotiation a chance to finish before any routes are applied (and in my setup there is one important route that I push to clients).

On the server side, we need to set up the DHCP server. ISC DHCP (dhcp3-server on Debian) isn’t very intelligent about interfaces that materialize out of nowhere, so we’ll need to set up a persistent TAP device.

# in /etc/network/interfaces
auto tap0
iface tap0 inet static
    address 172.17.0.1
    netmask 255.255.255.0
    pre-up openvpn --dev tap0 --mktun

# in openvpn config
dev tap0

Now tap0 will be brought up automatically at boot, and will stay up even if you restart OpenVPN (you can bring it up now with ifup tap0). Notice that no ifconfig option is needed in the OpenVPN config. Now you can configure your DHCP server for the subnet:

# in dhcpd.conf
subnet 172.17.0.0 netmask 255.255.255.0 {
    # example options for VPN hosts
    option domain-name "vpn.example.com";
    option domain-name-servers 172.17.0.1;
    option netbios-name-servers 172.17.0.1;
    option ntp-servers 172.17.0.1;

    range 172.16.0.100 172.17.0.199;
}

host falcon {
    option dhcp-client-identifier "falcon";
    fixed-address 172.17.0.77;
}

Observe the dhcp-client-identifier option, and its matching entry in foo’s /etc/network/interfaces (or /etc/dhcp3/dhclient.conf). This is important because TAP MAC addresses don’t persist—you get a new one every time. dhcpd will use the client identifier to match a host, but alternatively you could spoof a static MAC address in foo’s /etc/network/interfaces config. I think the client identifier is cleaner. Even if you don’t use static leases, this way dhcpd will know it’s the same client and give him the IP address he had before. Of course if you don’t need (semi-)static leases you don’t need to worry about client identifiers. You’ll have some cruft leases but they should expire and disappear.

Unfortunately dhcpd doesn’t use the client identifier for dynamic dns updates (one of the big reasons I wanted to use real DHCP in the first place), which is why I specify the hostname option in foo’s /etc/network/interfaces. dhclient (as configured on Debian) sends the hostname whether or not you specify it in /etc/network/interfaces.

Other DHCP clients that do honor /etc/network/interfaces are available. See interfaces(5). I’m kind of partial to udhcpc, especially for hand-testing, though I usually end up sticking with dhclient.

Caveats: I haven’t been able to get DHCP working with an OS X client. I tried initiating DHCP on the TAP interface with ipconfig set tap0 DHCP but it didn’t work and once locked up my machine. So for this situation, or for any other reason you may have, you can still push ifconfig and route options in the client configuration directory entry for that client.

I haven’t tried DHCP over OpenVPN on Windows clients yet but I see no reason why it wouldn’t work.

Finally, I tried briefly to do it with a TUN device and though I can think of no obvious reason why it shouldn’t work, it didn’t. I like TAP better anyway.

Now after all this I can see some of you shaking your heads wondering what the point of all this is. “Surely this is more complicated than ifconfig and route in OpenVPN.” Yes, it’s more complicated, but it’s more powerful. If all you need is pseudo-DHCP, then by all means use pseudo-DHCP. But if you are a sysadmin serving a gaggle of clients you soon find yourself pining for a real DHCP server. Or perhaps you want dynamic dns updates, or proper DHCP option support. (You do realize DHCP options sent by OpenVPN’s dhcp-option are not applied on linux unless you do so manually by reading the environment variables in an up script, don’t you?)

When you realize OpenVPN can just set up the tunnel and get out of the way, you realize that all your fancy networking knowledge and tools can come into play to create the ultimate VPN tailored exactly to your needs. Plus, I think it snaps things into focus so that things just make more sense in your head.

And now, I present my OpenVPN configs (sanitized) for the server (frodo) and a client (falcon):

## frodo (server)
dev tap0
mode server
tls-server

cd /home/fugalh/vpn
ca cacert.pem
dh dh.pem
cert frodo.pem
key frodo.pem

keepalive 10 60
comp-lzo
client-to-client
# this new option is nifty
passtos

client-config-dir ccd

# See /etc/network/interfaces for interface configuration and routing.
# (reproduced here for our web audience)
# auto tap0
# iface tap0 inet static
#         address 172.17.0.1
#         netmask 255.255.0.0
#         pre-up openvpn --dev tap0 --mktun
#         up ip route add 172.17.64.0/24 via 172.17.0.64
#         up ip route add 172.17.77.0/24 via 172.17.0.77
#         up ip route add 172.17.82.0/24 via 172.17.0.82
#         up ip route add 172.17.83.0/24 via 172.17.0.83
push "route 172.17.0.0 255.255.0.0 172.17.0.1"

#verb 3
mute 2
status /var/log/openvpn.status 60

## falcon (client)
dev tap0
client
remote frodo.fugal.net
nobind

cd /etc/openvpn
ca falcon-cacert.pem
cert falcon-cert.pem
key falcon-key.pem
tls-remote frodo.fugal.net

comp-lzo
passtos

route-delay 10
cd /etc/openvpn
up "up.sh"
# (reproduced here)
# #!/bin/bash
# ifdown tap0 &>/dev/null
# ifup tap0 &

down "down.sh"
# (reproduced here)
# #!/bin/bash
# ifdown tap0

mute 2
#verb 3

In my setup the 172.17.0.0/24 subnet is for the OpenVPN server and clients, and each client is a gateway to a 172.17.x.0/24 subnet for his LAN. Assuming a static route on the LAN for 172.17.0.0/16 via the OpenVPN client, frodo will route everything so people on one LAN can find people on another.

I also have dynamic dns updates for both forward and reverse DNS in my vpn.fugal.net zone.

One thing I haven’t set up which is feasible is for the LAN DHCP servers to do ddns to frodo.

OpenVPN is in its place, and our relationship is that much stronger. Good luck with yours!


Nov 23 2007

OpenVPN for LARTC Readers

I love OpenVPN, but man what a man page. I’ve always been annoyed that I have to go digging through the manpage to figure out which options I need and what parameters they take, just to do stuff I already know how to do with iproute2, no doubt familiar to you all from your dutiful study of the LARTC. Right? Right?!

Finally I got fed up with it. I took a step back and looked at the big picture. There’s 3 big pieces to this puzzle. First is tun/tap, which allows a userspace program to read and write IP or ethernet frames to a virtual network device. One could set up a virtual unprivate network using tun/tap. OpenVPN sets up a virtual private network, by adding all the authentication and encryption stuff. The third piece of the puzzle is the piece that’s always there whenever you have any kind of network device, virtual or not. That’s configuring the device and setting up the routing. OpenVPN has a plethora of options for all this, but IMHO it’s better done with the proper tools, in good UNIX style.

So I changed up my OpenVPN server config to look like this:

# tap because we act like a switch
dev tap

# mode server and tls-server to allow us to have many tls clients
mode server
tls-server

# certificate stuff
ca falcon-cacert.pem
cert falcon-cert.pem
key falcon-key.pem
dh dh2048.pem

# I don't like openvpn's --ifconfig and --route family of options for my
# server, so I do it with a script instead
up "/etc/openvpn/falcon.up"

# allow clients to find other clients and their subnets with this pushed
# route
push "route-gateway 172.17.0.3"
push "route 172.17.0.0 255.255.0.0"
client-config-dir clients

# eventually it would be nice to use a real dhcp server instead of this
ifconfig-pool 172.17.0.200 172.17.0.250 255.255.255.0

# options
keepalive 10 60
comp-lzo
learn-address "echo $*"
status /var/log/openvpn.status 60

Let’s walk through it. I use dev tap because I’ve never had occasion to notice the overhead and it makes wrapping your head around the routing a lot easier. Basically (along with a couple of bits later), the server acts as a switch into which all the clients plug. mode server and tls-server denote server mode (with public key authentication), instead of peer-to-peer (one-to-one) mode. We don’t use server-bridge because that sets some options I want to take care of myself. The cert stuff is nothing new.

The next line is the important bit. We use an up script to do the equivalent of OpenVPN’s ifconfig and route commands. Here is where the added flexibility and tool familiarity comes into play. Here’s my up script:

#!/bin/sh

ip addr add 172.17.0.3/24 dev $dev
ip link set $dev up
ip link set $dev mtu $tun_mtu

for i in 64 77 79 82 84; do
    ip route add 172.17.$i.0/24 via 172.17.0.$i
done

When this script is run, $dev is tap0. I add the address and set the link up and set the mtu for the link. This is the equivalent of the OpenVPN ifconfig 172.17.0.3 255.255.255.0 which is implied by server-bridge with the appropriate options. The next bit is the equivalent of several route options in OpenVPN. Note how I stay DRY while adding 5 routes. If I was doing something more fancy, this script would allow me to do it in the natural way using the appropriate tools for the job.

Back to the OpenVPN config. I’m using push instead of putting the burden on future me or whoever is going to set up new clients down the road. In this case route and ifconfig-push (in the client configuration file) are the right tools for the job, and we’re glad they’re there. As an aside, notice the routes I’m setting up. The route-gateway 172.17.0.3 tells them where the “switch” is, and the route 172.17.0.0 255.255.0.0 tells them how to find other subnets. A quick example: ungol (172.17.0.64) and caradhras (172.17.0.82) to falcon (172.17.0.3). ungol’s LAN subnet is 172.17.64.0/24. caradhras’ LAN subnet is 172.17.82.0/24. If 172.17.64.x pings 172.17.82.x, these very important routes come into play. Otherwise, you’d only be able to reach falcon from either subnet. Naturally, the subnets need the correct routing too (ungol and caradhras are probably the default route for those subnets anyway). This is roughly the same thing that happens when you do server-bridge and client-to-client, but not precisely. The reverse routes (in the up script) are needed in either case. Incidentally, you could put client-to-client here too, which would keep the packets from leaving OpenVPN into the kernel stack halfway through their journey. I’m not sure how much overhead that is, but in the interest of minimalism (i.e. “because I can”) I left it out.

The ifconfig-pool bit is playing at being a DHCP server. I’m even less happy about OpenVPN playing DHCP server than I am about ifconfig and route. I think configuring my DHCP server on falcon to serve up addresses and options instead of OpenVPN is doable, but there may be some timing issues and I still need to figure out the best way to trigger the DHCP behavior on the client. I just haven’t tackled it yet. At least, not this time around. I did try to do this once upon a time using tun instead of tap, and it was mostly a crash and burn. But IIRC that was primarily due to the nature of tun.

The rest is just some standardish options.

So, I can hear you now, “you lunatic, look how much more work that was than just using the OpenVPN options”. It may look that way, but in the long run it is not. I know what that up script is doing, at a glance. I know now. I know 3 months from now. I know a year from now. Reading OpenVPN options is like reading a second language. It takes effort and I just don’t do it often enough, compared to the iproute2 version. I can remember what an up script does, and I can form a mental model of the network topology from that up script a lot easier than I can from the equivalent OpenVPN config, and that will save me time and frustration down the road, and it is definitely worth it.

As a bonus, I actually understand exactly where OpenVPN fits into the picture, and by putting it in its place I have expanded my understanding of the situation and opened the door to more advanced techniques down the road, should I find that I need them. Even if you take the road more traveled by, hopefully your mind will also have been expanded by reading this post. Happy VPNing!