FreeWAF: A High-Performance, Scalable, Open Web Firewall

I’ve spent the better part of the last six months reworking the project I wrote for my Master’s thesis. The idea behind the project was to explore the costs, risks and requirements associated with developing a cloud WAF infrastructure, similar to what commercial cloud security providers like Cloudflare and Incapsula provide- and then provide that service free of charge. Totally unsustainable, of course, but as an academic exercise it was an incredibly educating experience. I’ve since decided to focus on releasing the source of the firewall engine powering the service, continuing to develop features and exploring new methods of anomalous and malicious behavior detection.

Originally the project called for leveraging an existing open-source WAF project, which brought its own set of challenges. ModSecurity was the first candidate, and I even spent a few days reworking its logging routine to make logging and parsing audit data more palatable, but ultimately stability issues in the Nginx port forced me to look for other options (and a lack of scalability meant Apache wasn’t an option). I considered several Varnish projects, including VCF and security.vcl, but decided against those as well, largely because of a lack of available functionality, and limited familiarity with VCL and Varnish’s guts. Ultimately, I decided on developing my own engine using the Lua module for Nginx, using the OpenResty bundle.

FreeWAF relies heavily on Lua tables to do its heavy lifting- request context data, rule information, and configuration data are all represented via Lua tables. Lua table creation is inherently expensive, so we rely on passing a single table around as much as possible on hot code paths, and localize rule table data during each process action. The central rule processing logic of the module is fairly straightforward. Rulesets, developed based on the ModSecurity CRS, are represented as individual Lua module, with each rule represented in a sub-table structure. This tabular design can result in rulesets that are not easily human-read; eventually I would like to develop a separate, human-friendly rule syntax that will be parsed down into Lua tables. Below is a rule included in the project, based off a comparable ModSecurity SecRule definition:

Each rule contains a unique id, an opts table to define rule-specific data (such as a logic regarding rule skipping or chaining, logging data, etc)., a description, and a separate table that defines the rule’s signature. The sub-table defines what portions of the request to match against a defined pattern- typically a string for comparison, or a regular expression. Each signature is processed against a collection of data parsed out, based on the rule’s type and options. In the case above, the all values in the REQUEST_ARGS collection are used; the collection holds query string and POST body data. If the signature matches the data type pulled from the transaction, the corresponding action is taken. In the example shown above, a positive rule match results in the transaction’s running anomaly score incrementing by 4; this score can then be compared to a predefined threshold in the post-processing phase to determine if the request should be rejected.

FreeWAF was designed to be as efficient as possible. The main module runs inside the access phase, avoiding context switches and I/O operations during its processing. This allows it to process thousands of simultaneous transactions, each with a runtime of around 300-500 microseconds, using a ruleset comparable to the ModSecurity CRS base ruleset. I tested performance initially using inline microtime comparisons, and then switched to the ngx-lua-exec-time systemtap micro provided by nginx-systemtap-toolkit:

Using a testing harness I’ve used in past benchmark analyses, I was able to max out performance at roughly 15,000 requests per second on a single E3-1230 CPU. Each worker process maintained roughly 7800 to 8900 KB of RSS memory usage, with no apparent change during runtime. In all fairness, this doesn’t attempt to accurately mimic real-world traffic (a lack of headers, no varying query strings, etc), but it’s a starting point in performance analysis. It also indicates a significant improvement in performance over the original module, by an order of magnitude (average runtime before I started rewriting the module was around 3-5 ms).

Another incredibly effective tool in performance analysis and bottleneck identification was the use of backtraces visualized in flame graphs, which can identify hot spots in the code path. These hot spots can the be eliminated or optimized for increased performance. A flame graph of the original module used during the initial project showed a few spots, one of which was the logging module which contained the following code:

Two separate os.date() calls meant two separate syscalls during -each- log print. Even with debugging disabled, Lua’s eager evaluation meant that every string gets built as its passed to the log function, which gets very expensive very quickly (you can see many calls to vprintf and vfprintf in flame graph). We can also see that a linear table search function was taking almost 13% of the CPU cycles; changing the design of the module’s config options to be saved as a table key (which can be searched must faster) eliminated this CPU waste. Further performance increases resulted from memoizing the results of the collection parser, though the overhead of additional table creation made gains hard to see before adding in the XSS and SQLi rulesets (which all use the same collection key). The final module results in a much cleaner and prettier flame graph:

FreeWAF v0.1

So we can see that the majority of the time spent in the rule processor, and that the collection parser and regex matcher are potential points of further optimization. At this point, though, my plan is to focus adding additional features and adding in behavioral analysis and tracking. The project is available on GitHub, and my original written submission regarding the project that spawned this idea is available here. It’s been very satisfying to reworking the original code into a stable, open-source project, and I’m looking forward to adding more to it.

4 thoughts on “FreeWAF: A High-Performance, Scalable, Open Web Firewall

    1. There is no current way to do this. In creating the ruleset released with FreeWAF, I manually converted the ModSecurity CRS to FreeWAF table definitions (with the exception of the XSS and SQLi rulesets- because they are so large, I just parsed them using some Bash and Perl). Because ModSecurity rules are so complex, trying to translate them into a simple tabular structure would be difficult- ModSecurity rule syntax is complex and tricky to parse. I am currently working on a script to automatically parse and convert rules into a JSON format that could be used for FreeWAF, and will write a post about it when I have a decent working version available.

Leave a Reply

Your email address will not be published. Required fields are marked *