In a previous entry we gave a brief introduction to the concept of fuzzing and why we use it. In this entry we’ll guide you through using a fuzzer on Linux to help identify bugs and vulnerabilities in Linux’s main archiving application “tar”.
A little about the fuzzer
''American Fuzzy Lop'' as well as being a variety of rabbit, is a well-designed and versatile file fuzzer. The software is built and maintained by Michal Zalewski and offered freely under an Apache 2.0 licence. As a file fuzzer “American Fuzzy Lop” is targeting issues in data parsing, particularly in binary file formats which are prone to problematic issues and strong candidates for fuzzing.
AFL has two main components, an instrumentation suite that can be used to get our target application ready for fuzzing, and the fuzzer itself which controls mutation of the input files, execution and monitoring of the target.
The instrumentation suite allows for binaries to be prepared for fuzzing in one of two ways. The preferred manner is to rebuild the target using the AFL compiler which is stated to be “a drop-in replacement for gcc or clang”. This allows for accurate injection of code that will allow the fuzzer to trace the path of execution that each file takes when run through the target. Where the source code isn’t available, however, AFL also has experimental support for instrumentation of the native binaries though QEMU. This is a more complex topic and during this tutorial we’ll focus on instrumentation via source code.
The fuzzer itself is the real “brains” of the utility. It takes an initial set of samples to benchmark what is considered “normal” behaviour, reduce the input complexity and create the “input queue”. It then begins fuzzing by taking an item from the input queue (1), using targeted algorithms to mutate the input (2), and then uses these derived children as input to the target application (3). A monitor observes each execution (4) and where new behaviours are observed the derived input which caused them is added back into the “input queue” so it too can be further mutated (5). Note that different behaviour is not necessarily “bad” just different. This cumulative mutation helps the fuzzer fully exercise by getting better “coverage”.
Downloading and building AFL
We’ll be working with version AFL 1.56b (the latest version available at this time) on a vanilla Ubuntu 14.10 virtual machine. The commands might differ slightly if you are using a different version or operating system but will hopefully be similar. The fuzzer is built from source code. Our first task is to get and build this code. This can done in the following manner:
What the script actually does:
1. Gets the “American Fuzzy Lop” source code and extract it on your local machine.
[cpp]
user@ubuntu:~$ wget -q <a href="http://lcamtuf.coredump.cx/afl/releases/afl-1.56b.tgz">http://lcamtuf.coredump.cx/afl/releases/afl-1.56b.tgz</a>
user@ubuntu:~$ tar zxf afl-1.56b.tgz
[/cpp]
2. Builds the fuzzer from the source code.
[cpp]
user@ubuntu:~$ cd afl-1.56b
user@ubuntu:~/afl-1.56b$ make -s
[*] Checking for the ability to compile x86 code...
[+] Everything seems to be working, ready to compile.
[*] Testing the CC wrapper and instrumentation output...
[+] All right, the instrumentation seems to be working!
[+] All done! Be sure to review README - it's pretty short and useful.
NOTE: If you can read this, your terminal probably uses white background.
This will make the UI hard to read. See docs/status_screen.txt for advice.
[/cpp]
Assuming everything was successful, you should now have all the tools built in the directory. These can be used to fuzz our target application.
Instrumenting a binary from source
We are going to test the linux “tar” archive utility. To do this we’ll start by building a version of the target from source using the AFL compiler so that it can be instrumented. The instrumentation process injects code that can communicate with American Fuzzy Lops monitoring system allowing the fuzzer to accurately observe the path of execution through the target.
1. Start by getting the tar source code
[cpp]
user@ubuntu:~/afl-1.56b$ cd
user@ubuntu:~$ wget -q http://ftp.gnu.org/gnu/tar/tar-latest.tar.gz
user@ubuntu:~$ tar zxf tar-latest.tar.gz
[/cpp]
2. Next we need to build our target using AFL’s compiler. The correct way to do this can be complex and AFL’s README covers this topic in greater detail. For tar we can do the following:
[cpp]
user@ubuntu:~$ cd tar-1.28
user@ubuntu:~/tar-1.28$ CC=~/afl-1.56b/afl-gcc ./configure
[Output truncated for this blog article]
user@ubuntu:~/tar-1.28$ make
[Output truncated for this blog article]
[/cpp]
3. Assuming the build is successful this will result in an instrumented version of tar being built in ‘src’ as shown below:
[cpp]
user@ubuntu:~/tar-1.28$ file src/tar
src/tar: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.32, BuildID[sha1]=28d3095ca5633069721e79cb09bba7ff6d6110b7, not stripped
[/cpp]
Fuzzing commences
Finally, to the actual fuzzing. First just a little more preparation. American fuzzy lop requires two directories one for the inputs, and a working directory that’ll contain current state and output information, the exact structure of which we’ll cover later. Let’s start by creating an input and output directory and the initial sample file for the fuzzer to work with (if you want to make more, that’s great). We’ll also create a third directory to collect any output from tar.
[cpp]
user@ubuntu:~/tar-1.28$ cd
user@ubuntu:~$ mkdir fuzz-state fuzz-input fuzz-garbage
user@ubuntu:~$ cd fuzz-input/
user@ubuntu:~/fuzz-input$ echo "Hello World" > file1
user@ubuntu:~/fuzz-input$ echo "Hello Fuzzer" > file2
user@ubuntu:~/fuzz-input$ tar cfJ sample1.tar.xz *
user@ubuntu:~/fuzz-input$ rm file*
[/cpp]
The above commands create three directories, one for input, one for output and one for any garbage thrown out by our instrumented version of tar. The input directory will contain a single xz archive containing two simple files. The final piece of preparation involves preventing core dump notifications being passed to apport, as this can hinder the fuzzer if the application actually crashes.
[cpp]
user@ubuntu:~$ echo core | sudo tee /proc/sys/kernel/core_pattern
[/cpp]
To start fuzzing:
[cpp]
user@ubuntu:~$ ./afl-1.56b/afl-fuzz -i fuzz-input/ -o fuzz-state/ -t 10000 ~/tar-1.28/src/tar xfJ @@ -C fuzz-garbage/ --force-local
[/cpp]
That's a bit of a mouthful, lets break it down a little.
1. Run the fuzzing tool: ./afl-1.56b/afl-fuzz
a. Source input samples from ‘fuzz-input’: -i fuzz-input/
b. Use the following directory for state and output: -o fuzz-state/
c. Timeout of 10 seconds (usually this isn’t needed): -t 10000
d. Everything else is the command line to fuzz, in our case: ...
2. Run our instrumented “tar” binary: - ~/tar-1.28/src/tar
a. Extract the contents of the following XZ archive file: xfJ
b. @@ is replaced by the fuzzer with a fuzzed file name: @@
c. Write output to fuzz-garbage: -C fuzz-garbage/
d. Required due to the naming convention used by AFL: --force-local
Interpreting the output
The following is a brief overview of the UI. This will be displayed whilst the fuzzer is running.
- Process timing: How long things have been running, and how long since we last saw results.
- Cycle progress: How far through the input queue we are.
- Stage progress: How far through mutating the current file we are.
- Details about which methods of mutation are yielding the most new behaviours and results.
- General overview of the fuzzer’s current state.
- Coverage metrics (how much of the target we have found paths through).
- Information about the number of execution paths, exceptions and hangs we have found.
- Information about the execution paths we have found.
For each unique hang and error, the fuzzer will create a symbolic link to the file that caused the behaviour in the output directory at “fuzz-state/hangs” and “fuzz-state/crashes” respectively. The symbolic link will point to the file in the input queue so beware modifying or moving these directly when the fuzzer is running. Another interesting metric is “variable” (in “path geometry”, number 8 above) which indicates paths that are giving variable or unpredictable behaviour. This could be by design, via a call to rand() for example, for an indicator of uninitialised memory. Files with variable behaviour are symbolic linked from “fuzz-state/queue/.state/variable_behavior”
Fuzzing in less than 3 minutes?
Feeling lazy? From a clean install of Ubuntu 14.10 you could be fuzzing in under 3 minutes.
[cpp]
user@ubuntu:~$ echo core | sudo tee /proc/sys/kernel/core_pattern
core
user@ubuntu:~$ wget -q -O - http://nettitude.com/scripts/fuzz-xz.sh | bash
[/cpp]