Skip to content →

Building a failover gateway (and VPN access point)

Introducing single points of failure into a production system is almost always a bad idea, but having something like a gateway/firewall is almost always necessary. I was looking for a way to build a layer 3 failover gateway with Ubuntu/Linux.

The reason I wanted this to be on layer 3 was that every time I had to do a setup like this, I was working in a virtualized environment. The problem with that is, that the cloud providers usually limit Layer 2 traffic quite a bit (for good reasons of course). This means that I could not use a technology like PFsense CARP, because it is essentially based on a MAC spoofing technique.

The solution I will describe in this article works completely on layer 3 and has a few drawbacks compared to the very complete PFsense implementation. For example this is not able to sync the tcp state table across the failover servers. Which will result in a few lost TCP sessions in case of failover. This is also the main reason I would most likely prefer a PFsense setup over this, if it is possible.

I will guide you through the setup of a simple but powerful solution that will provide you with 2 (Ubuntu) Linux servers that provide the following services and that are able to do a failover for all services in case one of them goes down.

Provided services on the failover gateway servers

The services I usually want on a system like this, are the following. All of them can be provided in a redundant way with this technique

  1. Gateway functionality to provide internet access to machines in the private network via NAT.
  2. DNS to forward requests for external domains and resolve internal domains based on the hosts file.
  3. DHCP (DHCP server which will provide statically assigned IP addresses to the target servers). If you want to do DHCP with dynamically assigned IPs, you will need to do a little more here. The failover server needs to be kept in sync with the DHCP leases that the primary hands out, so in case of a failover you will not end up with duplicate IPs. This is not part of this howto. I’m using static MAC to IP address mappings. Take a look here to get a first idea on how to accomplish this with real dynamic DHCP.
  4. NTP to provide accurate time to our internal network. This is fairly optional, I still like to to it, because its quite simple and allows stricter filter rules on the gateway.
  5. OpenVPN Server to allow access to the machines behind the gateway.

The moment you get these working in your setup, adding new services should be easy, as long as keeping their states in sync can be done easily.

General Network Setup

Before I’m going to describe the needed bits and pieces to get this to work, I’ll describe the general network layout and the concepts behind this solution. The network setup I like to work with usually looks something like this:

network overview
network overview

This is pretty basic. The servers in the internal network have no direct connection to the internet. So to have the ability to login to those I’m going to setup a VPN server on the gateways. These gateways can also be seen as a form of firewall. If you route all traffic from and to the internet over them, you can easily filter the traffic and allow only specific connections.

As you can see the gateways get 2 interfaces, 1 with a publicly reachable IP address and one to connect to the private (non-routed) network.

Since I’m are not providing any services on the public interface that needs a failover mechanism on our side (OpenVPN can do this from the client side, more on that later), I can ignore that side for the failover.

On the internal interface, I need to provide an IP that I can hand out to all the servers as a default gateway and as a DNS resolver. I’m going to call this the virtual IP.

This virtual IP will get assigned to one of the gateway servers. As long as this server is up the IP will stay on that server. I will install a service called keepalived that will run on both servers and monitor the reachability of the other server. If the server currently assigned to the virtual IP goes down, the secondary server will assign the IP to his internal interface and respond to all traffic that goes to that IP.

As soon as the primary server is back up again, the virtual IP will get assigned to his internal interface and removed from the secondary server. Since I provide the same services, with the same configuration on both servers this switch is mostly transparent to the nodes using these services.

The one thing you need to keep in mind is, that you will want to keep the configuration of both servers in sync. So I strongly recommend to use a tool like salt, puppet or chef to achieve that.

redundantgateways02

Setup of services

Now to the configuration and setup part. I will only show the important ones for the failover technique I describe here.

OpenVPN

The OpenVPN client can select a random server out of a list to connect to. If it can’t connect, it just will try the next one. Keep in mind that with this setup you are actually using the failover server more as a load-balancer. Both systems are running in production. It just doesn’t matter so much if one goes down.

There is not much else to do. Just make sure both servers run with (almost) the same configuration on both systems and that the user is able to login to both of them. Also you should use a different network IP and netmask for the primary and failover system.

So make sure this is different on the two systems in /etc/openvpn/server.conf:

System 1:

System 2:

The gotcha here is that you need to make sure to set the routes on the gateways correct. Both gateways need to now the route to the OpenVPN network on the “other” gateway otherwise it would send packages coming from the other gateway vpn to the default route which can not work.

What I like to do is to push the virtual IP (10.1.1.1 in this example) as a DNS server over the VPN. That way you can have name resolution for all your internal systems. (Like backend01.local)

Do this in the OpenVPN server.conf:

On server 1 do this in /etc/network/interfaces (eth0 is the internal interface):

On server 2 do this in /etc/network/interfaces (eth0 is the internal interface):

Otherwise you can just follow the “normal” OpenVPN howtos. Just configure everything the same way as you would if you would just setup one server. In fact you can configure them both separately and check if they work on their own and after that change the configuration so that they work in a redundant setup.

Gateway

If you have the virtual IP (10.1.1.1) set as the default gateway on your systems all you need to take care of is, that this IP is always up. I’m going to use keepalived for this. Here are the 2 keepalived configs for server 1 and 2.

Server 1:

Server 2:

This configures the virtual IP 10.1.1.1 to be assigned to eth0. Keepalived decides based on the priority which server assigns the IP if both systems are up. The higher priority wins, so in this case server 1 will be your gateway. If one system goes down the other will assign the virtual IP to the configured interface.

The notify.sh script looks like this:

I use it to disable the DNS and DHCP server if the system is not in MASTER mode. This is optional. It doesn’t hurt to keep the DNS service running if the system is in backup mode. For DHCP it should not make a difference as long as you keep giving out static IPs. But to prevent sending out duplicate DHCP replies I like to shut the service down if it isn’t needed.

So now all you need to do is to advertise the virtual IP as the default route and you are set.

DNS

You can just advertise the virtual IP as your DNS server and you are done.

DHCP

DHCP listens on broadcast traffic. So whenever the server is up, it answers to requests. As stated before, this is fine if you run only with static IPs. As soon as you want to use dynamic IPs, you need to take care of synchronisation.

NTP

NTP is also stateless. You can just advertise the virtual IP as our main NTP server and you are done. You could also advertise both servers real addresses as NTP servers and use both.

Published in Solutions

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *