FreeWAF Updates and New Features

January tends to be a pretty quiet month in the admin/operations world. Most people are still coming back from holiday, new yearly plans are being made, meetings are held, and the server monkeys… sit and watch the graphs scroll by. The rest of the world’s gradual return to work means the start of a seasonal upswing, but we’re still in a relatively low point, so that generally means a light workload. That extra free time has given me a chance to put in a good chunk of work towards FreeWAF, cleaning up code, adding new features, and interacting with a total stranger (score!). I’ve just tagged a new release, v0.4, which provides a handful of new features that were sorely missing:

  • Configurable event logging: In addition to logging rule match events to the server’s error_log location, users can now ship logs to a remote UDP server, or write to a separate location on the file system. Both options take advantage of lua-resty-logger-socket‘s buffering (file logging implements a simple fork of the module, using Lua’s built-in I/O library). File logging is, by nature, expensive, so it’s recommended that high-concurrency environments use UDP socket logging. Additionally, the amount of data in each event log entry is now customizable (thanks to nebpor for the suggestion).
  • Data collection transformation: Data transformation is a relatively simple anti-evasion technique. As part of the collection parsing process, input data is transformed based on rule signature; FreeWAF supports encoding and decoding base64 data, decoding HTML characters, and lowercasing input for non-regex operators (REGEX is case-insensitive by default). The provided XSS ruleset now performs HTML decoding transformation on all rules.
  • Persistent storage mechanism: Previously, FreeWAF’s analysis was entirely stateless. Individual transactions were parsed, single data collections were matched against rule signatures, and no concept of state or session tracking was possible, due to an inherently simple (and efficient) design. The addition of persistent storage dramatically improves the ability of FreeWAF to perform complex analysis, providing for functionality like long-term client behavioral analysis, brute force protection, tarpitting, and more. Nginx’s shared memory zone interface allows for a fast, simple method to store persistent data, and the introduction of a dynamic data key definition syntax allows users to store data based on granular, request-specific data, such as client IP and request URI (further development of this syntax will be a focus in future work).
  • Various bug fixes and performance improvements enhance the overall stability of the module; the inclusion of a new static resource whitelisting ruleset results in performance increases of 300% for static GET/HEAD media requests, and transactions with no request data. This can provide significant load reduction in high-concurrency environments.

The inclusion of persistent storage really opens up possibilities for more advanced functionality like state tracking and brute force prevention. An simple ruleset to prevent brute force WordPress logins might look like this:

The first rule sets up a chain if /wp-login.php is requested. The second rule in the chain sets a persistent variable if the request was a POST. Persistent storage is a key/value store, so we define our key as the client’s IP address, the request URI, and the string ‘hitcount’, catted with periods. The syntax of dynamically setting the key string is in the form %{VAL}, where VAL is either the IP or the URI. Currently this dynamic assignment is only available to persistent variable keys, though I’m looking at expanding it to other portions of the rule syntax to make rule definitions more flexible. The value of the new variable is an increment operator; if the variable already exists, we increment by 1 (otherwise, we just instantiate a new value as 1). The third rule is not part of the chain, but just checks the value of the persistent variable we previously defined; if this value exists and is greater than 5, the request is denied. Note also that when we set the hitcount variable, we provided an expire time. If the second rule in the chain is not triggered within the expire time, the key will be deleted from the storage zone.

Beyond adding new features, I wanted to maintain a lightweight and efficient Lua module. A major focus during FreeWAF’s initial design was maintaining as efficient a design as possible, and that focus continued during the development of these latest features. Using tools like flame graphs and nginx-systemtap-toolkit, we can easily profile how small changes affect our overall performance. Sometimes small, seeming inconsequential , changes have a major impact on performance. Take, for example, this commit, added during some relatively mindless code cleanup. The change takes a global lookup table that decides how collection data should be matched against a rule’s signature pattern, and moves it into a function call. At the time it seemed innocuous, but I noticed that following the change overall performance runtime had jumped around 350 microseconds, more than a 50% spike. Flame graphs quickly revealed the cause of the problem. Before, CPU usage was distributed across a number of function calls:

And following the change, we see a massive jump in the amount of time spent in the new function _do_match, where we moved the lookup table:

The jump in response time came from directly from the latency introduced in the new function. Instead of being allocated globally at the module level, we were now creating a new table (with the same content) during every rule. This type of wasteful allocation of expensive resources in a hot loop is a performance killer, and reverting the change brought peace (and quicker runtimes) back to the world.

Another focus in maintaining an efficient run loop is memoization of hot data. If you look at the most expensive rulesets (XSS and SQLi, which are nearly identical copies of the ModSecurity XSS/SQLi CRS rulesets), you’ll find that the var type and opts are all identical. Earlier performance testing showed that parsing the request data collection took a significant chunk of CPU time, so I added in to the per-request context table a key to store parsed data. Instead of parsing the same data with the same options for each rule, we can simply check the run cache to ask for that same parsed data. During the integration of data transforms, I initially elected to not add the transformation process to the collection parse cache. The first few rounds of testing showed that this didn’t incur any significant performance penalty, but adding in expensive HTML decoding transformation and applying it to each rule in the XSS ruleset painted another picture:

This test was performed with request data that would more accurately simulate a real-world web application. Generic session cookies and a few request params were added to tax the regex parser. However, nearly half of all sampled stacks were spent doing HTML decode transforms- the same transform over and over again. This was another example of needlessly wasting CPU cycles in a hot loop. Adding the transform process into the creation of the parse cache resulted in a significant improvement:

Now, the majority of CPU time is spent performing expensive regex matching. Leveraging JIT PCRE further optimizes performance.

From here, I’m going to focus on further testing and development, building unit tests to verify the efficacy and stability of the included rulesets, as well as continuing to add new rule operators, collections, transform actions, and other shiny gadgets.

Leave a Reply

Your email address will not be published. Required fields are marked *