Recently in the Rust world, a new tool called cargo fuzz was released. Fuzzing is a technique to intelligently generate arbitrary input for a program in order to find bugs in it.
cargo fuzz promises a very simple way to fuzz a cargo project using LibFuzzer, a coverage-guided, evolutionary fuzzing engine.
Since I had just done a change in the parsing module of Rust’s regex crate, I thought I’d try fuzzing that and see what happens. This post will go through the steps to set it all up and what I found.
The first step is to install it:
cargo install cargo-fuzz
Then, in the crate directory (regex-syntax
in my case) you create a subproject:
cargo fuzz init
This generates a fuzz
directory with a Cargo.toml
and some files.
There’s a subfolder that holds fuzzer scripts.
The generated one is in fuzz/fuzzers/fuzzer_script_1.rs
and looks like this:
#![no_main]externcratelibfuzzer_sys;externcrateregex_syntax;#[export_name="rust_fuzzer_test_input"]pubexternfngo(data:&[u8]){// fuzzed code goes here}
The idea is that the script is invoked repeatedly, providing different data in the byte slice as input.
So in our case, what we want to do is try to parse the input as a regex. The library is supposed to be able to handle any string as input and not crash. In case the input is an invalid regular expression, the parser should return an error result.
Since the library only accepts a string, what we can do is convert it to a string first. So let’s change the script to do that:
#![no_main]externcratelibfuzzer_sys;externcrateregex_syntax;usestd::str;useregex_syntax::Expr;#[export_name="rust_fuzzer_test_input"]pubexternfngo(data:&[u8]){ifletOk(s)=str::from_utf8(data){Expr::parse(s);}}
Only if the byte slice is a valid UTF-8 string, we pass it to the parser.
Now we want to run the fuzzer. It currently requires nightly Rust, which we can
do by adding +nightly
to the command:
cargo +nightly fuzz run fuzzer_script_1
If we try running it on Mac OS, we get this (some parts omitted):
Updating git repository `https://github.com/rust-fuzz/libfuzzer-sys.git`
Updating registry `https://github.com/rust-lang/crates.io-index`
Compiling regex-syntax v0.4.0 (file:///Users/rstocker/Projects/rust/regex/regex-syntax)
Compiling gcc v0.3.45
...
error[E0463]: can't find crate for `std`
|
= note: the `x86_64-unknown-linux-gnu` target may not be installed
...
Turns out it currently also requires Linux. But nowadays, even if you don’t run Linux, there’s Docker. So you can use one of the existing Rust images which works just fine for cargo fuzz:
docker run -v $PWD:/volume -w /volume -t clux/muslrust:nightly \
sh -c "cargo install cargo-fuzz && cargo fuzz run fuzzer_script_1"
This mounts the current directory into the Docker container as a volume, installs cargo fuzz and then runs it.
So after running for a few minutes, here’s what the output was:
...
thread '<unnamed>' panicked at 'valid octal number', /checkout/src/libcore/option.rs:785
note: Run with `RUST_BACKTRACE=1` for a backtrace.
==2160== ERROR: libFuzzer: deadly signal
#0 0x56032e5e18f9 (/volume/fuzz/target/x86_64-unknown-linux-gnu/debug/fuzzer_script_1+0x29c8f9)
...
NOTE: libFuzzer has rudimentary signal handlers.
Combine libFuzzer with AddressSanitizer or similar for better crash reports.
SUMMARY: libFuzzer: deadly signal
MS: 4 ShuffleBytes-ChangeByte-ChangeBinInt-ShuffleBytes-; base unit: b6b52807ed22997123cb048f286c450e5bb1b395
0x6d,0x3a,0x28,0x3f,0x78,0x78,0x78,0x78,0x78,0x78,0x78,0x78,0x78,0x78,0x78,0x78,0x78,0x6d,0x73,0x29,0x6d,0x6d,0x6d,0x0,0x1,0x0,0x2e,0x0,0x2b,0x40,0x2d,0x0,0xa,0x0,0x10,0x10,0x10,0x10,0x10,0x10,0x10,0x10,0x27,0x10,0x2,0x5b,0x2d,0x2d,0xa,0x5b,0x0,0x0,0x24,0x5c,0xa,0x33,0x3,0x5b,0x0,0x3a,0x36,0x3,0x44,0x0,
m:(?xxxxxxxxxxxxxms)mmm\x00\x01\x00.\x00+@-\x00\x0a\x00\x10\x10\x10\x10\x10\x10\x10\x10'\x10\x02[--\x0a[\x00\x00$\\\x0a3\x03[\x00:6\x03D\x00
artifact_prefix='artifacts/'; Test unit written to artifacts/crash-a855ca46b72a30e34db264fc4f9968df1e6b4ddb
Base64: bTooP3h4eHh4eHh4eHh4eHhtcyltbW0AAQAuACtALQAKABAQEBAQEBAQJxACWy0tClsAACRcCjMDWwA6NgNEAA==
Lots of output there. The interesting bit for us is the input that it used. The bytes are printed in hex, but also as a string with escapes:
m:(?xxxxxxxxxxxxxms)mmm\x00\x01\x00.\x00+@-\x00\x0a\x00\x10\x10\x10\x10\x10\x10\x10\x10'\x10\x02[--\x0a[\x00\x00$\\\x0a3\x03[\x00:6\x03D\x00
If you squint your eyes and ignore a lot of the \x
, that kinda looks like a
regex.
After putting the string into a normal test and running it, it indeed panics
with the valid octal number
message!
Looks like we found an actual bug! Isn’t it exciting to find a bug sometimes?
Then we can try to simplify the string to narrow down the problem, by removing parts of it and see if it still crashes. Doing that, I found a minimal test case for the problem:
#[test]fnfuzz(){Expr::parse("(?x)\\\x0a3");}
Or using a space instead of a newline, and a raw string to not have to escape the backslash:
#[test]fnfuzz(){Expr::parse(r"(?x)\ 3");}
The bug can also be seen in tools that rely on the regex crate, such as ripgrep.
After that, I looked at the code, found the problem and a solution. You can see that in the pull request for regex.
So that’s it, pretty simple! Let’s fuzz all the things and find all the bugs! Have a look at the LibFuzzer page for details about how it works.