Load Balanced DNS with dnsdist

In recent weeks I’ve found the need to configure and deploy a proper load balancing solution for an authoritative DNS cluster. Now for most solutions (up to a certain scale, and you’d know if you were there) a single-purpose authoritative DNS resolver doesn’t really need a balancing frontend; you can reasonably expect a decent-sized box running a modern kernel to handle several hundred thousand UDP packets per second, with a minimal amount of complimentary TCP traffic. Putting a frontend load balancing tier in front of an authoritative DNS cluster is really only necessary when either hardware redundancy or significant traffic shaping is a requirement, or the generation of authoritative data is expensive and needs to be horizontally scalable. I found myself needing to satisfy a few of these conditions, and have had a wonderful time playing and poking at a purpose-built FOSS DNS load balancing solution in dnsdist.

dnsdist, produced by the same team behind PowerDNS, is billed as follows:

dnsdist is a highly DNS-, DoS- and abuse-aware loadbalancer. Its goal in life is to route traffic to the best server, delivering top performance to legitimate users while shunting or blocking abusive traffic

Having some experience with PowerDNS, I was intrigued by the offering, but wary of its beta tag and short development lifetime (note: dnsdist is not currently in beta; it was at the time I started doing initial research). Its feature set is fairly impressive, and covered the range of requirements we needed:

  • Active layer 7 health checks
  • Visualized upstream data
  • Dynamic configuration
  • In-memory proxy caching (bonus, not a necessity)

In reality, dnsdist is probably one of the best solutions available anyway, despite its youth. Consider the following open source alternatives:

  • Kernel-level load balancing via IPVS would likely be the most purely performant solution, but severly lacking in features – no DNS health checking, no downstream caching
  • Nginx offers UDP stream proxying, but this feature is largely maintained as part of the open-core framework; documentation is fairly sparse and the available configuration options are very limited, so at this point we’d have the overhead of userspace packet processing with no real gain
  • In the same vein, there’s been chatter about an OpenResty authoritative DNS server, which would likely provide the features we’d want (and more), but I’ve yet to see it released
  • HAProxy doesn’t offer UDP balancing
  • pen seemed like another viable solution, but again lacking in features and any significant community (not a good sign when you’re about to deploy something to prod and ship a few Hail Marys with it)

Any other marketed UDP/DNS traffic balancing solution is commercial/black-box hardware. So dnsdist it is!

Building dnsdist from source was relatively straightforward (the developer provide packages for a handful of distros for the hurried or build-averse among us). The configuration file (and runtime configuration client) is handled directly in Lua, which is wonderful- defining a configuration in a scripting language allows for a lot of flexibility, which we’ll explore in a bit. I found the documentation to be fairly straightforward, but a bit sparse on some details. Most of it is consumable enough, so I’ll touch on a few things that were either initially unclear or incompletely/vaguely documented.

One of the features I enjoyed playing with was the built-in PacketCache, which largely leverages the existing PowerDNS caching codebase. The idea here is simple- cache responses for repetitive queries to save traffic and computation upstream. PowerDNS employs this in largely the same manner to avoid querying its data source when unnecessary, but unlike the authoritative server, dnsdist configures packet cache memory with an upper limit as defined by the newPacketCache directive. A side effect of this is that, also unlike PowerDNS, dnsdist will allocate all of the necessary memory to hold cache data, so be wary of this when create caches of to hold hundreds of millions of elements (we found that, on x64, 10 ^ 9 cache slots took around 8G of allocation up front).

dnsdist really shines through as an abuser and reflection mitigator. Truncation of large/ANY queries is trivial:

There is a sizable interface for examining packet data, allowing for the creation of custom rules based on query behavior, all of which are fairly well documented. In addition, dnsdist provides a recurring mechanism to run repeated actions via the maintenance() function. This functions runs once a second (as long as it’s defined), and provides a way to work with cumulative or query rate data, allowing for dynamic rule creation based on client behavior over time via the addDynBlocks function (among other things). As the example from the docs shows:

addDynBlocks has a fairly straightforward signature, but exceedQRate is a bit more vague. Documentation for this function indicates that it returns a “set of addresses”. To me, “set” indicated an array of IP addresses, but this turned out not to be the case; it returns a hash-type table whose key is a cdata type and value is a number. It was easy to figure out that the value represented the number of queries seen, but the opaque key made manipulation tricky. This value is actually a ComboAddress, a C++ object that the PowerDNS codebase uses to represent IPv4/IPv6 objects. This is noted in, of all places, the PowerDNS Recursor docs, and nowhere in the dnsdist documentation, but a quick read-through of the available methods for this object makes manipulation of exceedQRate results fairly straightforward. For our use case, we needed a way to whitelist certain CIDR blocks from any dynamic rules. Ideally, this would be handled by ensuring that exceedQRate (or whatever related function is being called) doesn’t contain a whitelisted range, but this doesn’t seem possible at the moment. Instead, we can simply nil any results in the returned table that match our desired CIDR block. CIDR blocks can be defined as a separate PowerDNS-native type (again, noted in the recursor documentation):

We define a NetmaskGroup object and call the addMask method as needed. From here, this object can be used to compare each ComboAddress inside our maintenance function:

Of note, the documentation for the NetmaskGroup:match function is as follows:

match(str) – true if the address passed in str matches

While the type of the function’s str param isn’t defined, the name indicates that it could take an address-looking string, but other examples in the documentation show usage via a ComboAddress. I might also add that, at least for testing purposes, it’s worth configuring dynamically-blocked queries to be sent a REFUSED packet instead of merely dropping the request on the floor:

Overall, I’m very pleased with the performance and flexibility of dnsdist. I think the documentation leaves a bit to be desired (as does a good chunk of the PowerDNS codebase), but the development team seems to have done a solid job of addressing customer bugs and maintaining a frequent-enough release schedule, and there are a host of other features (eBPF filtering, upstream pools, packet teeing) that I haven’t poked at deeply. Being able to dynamically (and conveniently) shape DNS traffic at this level, while maintaining a small footprint, is a big win for infrastructure operators.

Update: Within a day of publishing this, a few folks from PowerDNS got in touch and opened a GitHub issue to improve the docs (https://github.com/PowerDNS/pdns/issues/4835 – nice!). They also noted that dnsdist is not longer in beta; I’ve correct my original statement (it was in beta when I started my initial search, but it is currently marked stable). Thanks @PowerDNS_Bert and @Habbie!

Leave a Reply

Your email address will not be published. Required fields are marked *