Fuzz testing, or fuzzing, is a way of stress testing services by sending them potentially unexpected input data. I remember being very impressed by one of the early descriptions of testing software this way (Miller, Barton P., Louis Fredriksen, and Bryan So. 1990. "An empirical study of the reliability of UNIX utilities". Communications of the ACM. 33 (12): 32-44), but had never tried the technique.
Recently, however, Jenny Toves spent some time extending VIAF's date parsing software to handle dates associated with people in WorldCat. As you might imagine, passing through a hundred million new date strings found some holes in the software. While we can't guarantee that the parsing always gives the right answer, we would like to be as sure as we can that it won't blow up and cause an exception.
So, I looked into fuzzing. Rather than sending random strings to the software, the normal techniques now used tend to generate them based on a specification or by fuzzing existing test cases. Although we do have something close to a specification based on the regular expressions the code uses, I decided to try making changes to the date strings we have that are derived from VIAF dates.
Most frameworks for fuzzing are quite loosely coupled, typically they pass the fuzzed strings to a separate process that is being tested. Rather than do that, I read in each of the strings, did some simple transformations on it and called the date parsing routine to see if it would cause an exception. Here's what I did for each test string, typically for as many times as the string was long. At each step the parsing is called
- Shuffle the string ('abc' might get replaced by 'acb')
- Change the integer value of character up or down (e.g. 'b' would get replaced by 'a' and then by 'c')
- Change each character to a random Unicode character
For our 384K test strings this resulted in 1.9M fuzzed strings. This took about an hour to run on my desktop machine.
While the testing didn't find all the bugs we knew about in the code, it did manage to tickle a couple of holes in it, so I think the rather minimal time taken (less than a day) was worth it, given the confidence it gives us that the code won't blow up on strange input.
The date parsing code in GitHub will be updated soon. Jenny is adding support for Thai dates (different calendar) and generally improving things.
Possibly the reason I thought of trying fuzzing was an amazing post on lcamtuf's blog Pulling JPEGs out of thin air. That post is really amazing. By instrumenting some JPEG software so that his fuzzing software could follow code paths at the assembly level, he was able to create byte strings representing valid JPEG images by sending in fuzzed strings, a truly remarkable achievement. My feeling on reading it was very similar to my reaction reading the original UNIX testing article cited earlier.
--Th
Now I'm curious how much work it would take to wire your date parser up so afl-fuzz can compile & run it. I've been testing it on some other libraries and the impressiveness is not marketing – it's pulled out a bunch of edge cases relatively quickly and did a good job of identifying unique crashing paths to avoid wasting too much time on the same failing path.
Reply: Seems as though you would want to watch it run at the Python interpreter level. I'm not sure how you would do that, but it should be possible. --Th
Posted by: Chris Adams | February 24, 2015 at 14:15