Minimal Resources, Maximum Ludicrousness

There are already plenty of articles on optimizing Nginx performance, so I’m not going to focus on that here. Instead, I’m going to demonstrate a quick-and-dirty method of measuring traffic without the use of the biggest performance drag in a standard Nginx setup- on-disk logging.

The idea of this was to squeeze as much performance out of a single Nginx instance on the smallest available hardware. We have a virtual machine with 512 MB of RAM, 1 vCPU, and 20 GB of shared SSD-backed disk space, communicating over a shared gigabit NIC. A quick look at our setup:

Pretty standard stuff in the event block. We’re using the Openresty bundle, which includes the standard Lua module. We setup a 64 MB key-value shared memory zone for stats logging. We’ve also configured reverse-proxy caching, and set the proxy cache path to a tempfs mount point:

Note also that the proxy temp path is kept on the same mount point; fresh cache writes to the cache directory using a cheap renaming operation, instead of being written over a file system boundary. In practice, this doesn’t result in much performance gain- high concurrency tests will result in a near 1:1 cache hit ratio, but setting this doesn’t hurt anything.

We’ve setup a vanilla reverse proxy server block to use for our test bed:

Note that access logs are disabled completely- even using buffering, there’s still an impressive amount of I/O dedicated to writing to disk (buffered writes must be atomic; on this test bench, this was 64K, which still resulted in hundreds of writes per second). Error logs inherit their level from the http block (which is WARN by default), so we’ll likely not have to deal with a performance drain here.

We use the previously-defined shared dictionary to store request data. We’re only interested in counting requests per second, so we implement a simple counter pattern, using the timestamp as the key. A separate location block is setup to iterate over the contents of the dictionary; we can access this at any time after testing is complete (shared dictionaries are only cleared after a SIGQUIT).

High-concurrency testing is often bottlenecked by the client, not the host, so we need to distribute our request pattern. Since we’re shooting for pure numbers, we’ll use a series of requests from a number of hosts using ApacheBench, requesting the root resource:

Balancing the number of concurrent hosts can be tricky, as our test bench will OOM when too many connections are made at once- this is a function of the TCP stack having to handle too many connections at once. The Nginx core barely uses any memory:

14M of RSS memory for the worker process- the master process only lives to handle signals.

So, how did we do? Nginx was written to solve the C10K problem, but surely we can’t reach that with such limited resources:

With some more tweaking, we might be able to hit 10K- or, we could pay more than $5 per month.

3 thoughts on “Minimal Resources, Maximum Ludicrousness

    1. Hi Mike! Thanks for letting me know about this! I’ve corrected the issue and RSS should be working again 🙂 Good hunting!

Leave a Reply

Your email address will not be published. Required fields are marked *