Welcome to Day 3 of the Advent of Radare2!
Today’s we will explore the functions within a binary and discuss ways to improve code coverage during static analysis.
Radare2 offers several useful commands for listing functions and gathering statistics, which are essential steps for understanding the function landscape. Different static analysis techniques offer varying levels of insight which may lead to identify potential dead or unreachable code, improve code coverage or better function boundaries.
Finally, function listing and its generated statistics will help us learn and understand how well the analysis options taken work to give us better analysis results for our specific purposes.
let’s start by just loading the program right from radare2 we get an empty function listing:
$ r2 /bin/ls
[0x100003a58]> afl
[0x100003a58]>
This is a common missconception but the reason is that radare2 by
default does not analyze anything, and functions are discovered while
analyzing, so at this point we have an empty function list but you
probably want to enumerate the symbols with the is
command.
So, well, let’s do some basic analysis, in radare2 we have the
-A
command line flag that maps to aaa
,
therefor, appending another A
results into running one more
a
after the aaa. The more a’s you
add the more analysis steps are performed. Note these aa's
are the standard and generic code analysis commands. This means that
they are tested to work more or less well for most common binaries, they
are not designed to be the most performant and sometimes can result on
false positives or not cover as much as we liked. We will learn more
tricks to make the code analysis work the best for our needs later on in
this post.
Coming back into The afl
command, we can now enumerate
all the recognized functions within a binary. The listing contains the
address, function size, amount of basic blocks and name:
[0x04007000]> afl
0x00400700 16 1 sym.main
0x00400720 32 1 sym.func1
0x00400740 64 3 sym.func2
...
This command is excellent for a quick overview of all functions
Radare2 has recognized, giving you insight into function entry points,
lengths, and types, and you can use the Vv
command to
navigate them all in a visual mode.
If you need a CSV (comma-separated values) format for easier handling of the output, you can use:
[0x04007000]> afl,:csv
This format is useful if you want to import the function list into spreadsheet software or analyze it with command-line tools.
Let’s investigate some comma expressions and learn useful usecases for them:
[0x04007000]> , :help
RTableQuery> comma separated. 'c' stands for column name.
c/sort/inc sort rows by given colname
c/sortlen/inc sort rows by strlen()
c/cols/c1/c2 only show selected columns
c/gt/0x800 grep rows matching col0 > 0x800
c/lt/0x800 grep rows matching col0 < 0x800
c/eq/0x800 grep rows matching col0 == 0x800
c/ne/0x800 grep rows matching col0 != 0x800
*/uniq get the first row of each that col0 is unique
*/head/10 same as | head -n 10
*/skip/10 skip the first 10 rows
*/tail/10 same as | tail -n 10
*/page/1/10 show the first 10 rows (/page/2/10 will show the 2nd)
c/str/warn grep rows matching col(name).str(warn)
c/nostr/warn grep rows not matching col(name).str(warn)
c/strlen/3 grep rows matching strlen(col) == X
c/minlen/3 grep rows matching strlen(col) > X
c/maxlen/3 grep rows matching strlen(col) < X
c/sum sum all the values of given column
:r2 .tostring() == .tor2() # supports import/export
:csv .tostring() == .tocsv() # supports import/export
:tsv .tostring() == .totsv() # supports import/export
:fancy .tostring() == .tofancystring()
:html .tostring() == .tohtml()
:json .tostring() == .tojson()
:simple simple table output without lines
:sql .tostring() == .tosql() # export table contents in SQL statements
:header show column headers (see :quiet and :noheader)
:quiet do not print column names header
Dump the list of functions in tab separated values to use it later with standard spreadsheet software
[0x04007000]> afl,:tsv > functions.tsv
Get the top 10 functions with more basic blocks:
[0x100003a58]> afl,nbbs/sort/dec,/head/10
addr size name noret nbbs nins refs xref axref calls cc file
――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
0x100003a58 3020 main 1 135 518 404 0 155 20 147
0x100004b44 2220 sym.func.100004b44 0 124 555 245 2 88 26 67
0x100005638 1836 sym.func.100005638 0 82 459 277 1 58 29 47
0x100005e00 804 sym.func.100005e00 0 42 201 120 1 37 12 24
0x100006f18 604 sym.func.100006f18 0 39 151 70 1 145 7 24
0x10000644c 820 sym.func.10000644c 0 38 205 103 1 25 7 23
0x1000047a8 744 sym.func.1000047a8 0 36 134 73 1 37 9 34
0x1000069ac 400 sym.func.1000069ac 0 21 100 67 1 12 4 16
0x100006b3c 332 sym.func.100006b3c 0 17 83 38 1 12 6 10
0x100006780 360 sym.func.100006780 0 17 90 47 1 14 3 8
[0x100003a58]>
Find which functions have more xrefs
[0x100003a58]> afl,xref/sort/dec,/head/5
addr size name noret nbbs nins refs xref axref calls cc file
―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
0x100007708 16 sym.imp.putchar 0 1 4 3 28 28 0 1
0x1000076f8 16 sym.imp.printf 0 1 4 3 21 21 0 1
0x100007788 16 sym.imp.snprintf 0 1 4 3 11 11 0 1
0x1000075e8 16 sym.imp.getenv 0 1 4 3 10 10 0 1
0x100007798 16 sym.imp.strcmp 0 1 4 3 9 9 0 1
[0x100003a58]>
There are many ways to count the amount of functions found, typically
we can just use afl~?
which is the same as
afl | wc -l
. But as a short and more performant way it is
good to use aflc
.
This gives you a quick function count, helping you estimate the scope of the binary’s coverage.
[0x100003a58]> afl~?
135
[0x100003a58]> aflc
135
Another interesting technique to spot more functions requires
performing a linear scan over the regions specified by the
-e anal.in
variable. Which by default targets the regions
with executable permissions.
The aap: This command analyzes function preludes, identifying their entry points that might have been missed by other recursive analysis techniques.
Function prelude are the first instructions commonly found at the beginning of functions. Modern compilers tend to be more creatives when generating code, but still this technique is useful because most of them will use generic patterns, let’s say for example on ARM64:
[0x100003a58]> pi 3
pacibsp
stp x28, x27, [sp, -0x60]!
stp x26, x25, [sp, 0x10]
or on X86 we can find:
[0x00005ae0]> pi 3
endbr64
xor ebp, ebp
mov r9, rdx
[0x00005ae0]>
or even the classic and well known
push ebp; mov ebp, esp
[0x08048d60]> pi 3
push ebp
mov ebp, esp
push ebx
[0x08048d60]>
If we are analyzing a new architecture or compiler, it’s possible to provide our own custom prelude patterns, which by default are provided by the RArch plugin by running several analysis commands and then manually reviewing the results of this oneliner:
p8 16 @@F |sort -u
We can play with different lengths to determine which is the shortest common pattern used for introduction functions, for example on ARM64:
[0x100003a58]> p8 4 @@F | uniq -c | sort | tail -n 4
6 7f2303d5
18 110000b0
20 7f2303d5
66 110000b0
[0x100003a58]>
Let’s analyze this oneliner:
p8 4
- print 4 bytes, on arm64 this is 1
instruction@@F
- on every function found (assuming we ran
aaa
before)uniq -c
- count how many times each pattern is
repeatedsort | tail -n 4
- show the 4 most common instructions
used to start a functionThe aac
command performs analysis on call destinations,
exploring potential references within the binary to uncover additional
code paths. Assuming we do a linear scan on the executable regions
looking for CALL
instructions, checking that the
destination points to a valid area with code and performing
af
in the destination.
There are additional options like anal.hasnext
that will
assume that at the end of the function there’s another one, or the
anal.calls
will give us better code coverage and spot more
functions with very few false positives.
To get an overview of code coverage and other statistics, Radare2 provides the a command, which displays analysis results, including the number of functions found, code coverage percentages, and analysis depth.
To use it just run a
which is an alias for the
aai
command.
This command is a powerful way to review the effectiveness of the analysis, especially after applying aap, aac, and setting anal.hasnext. It allows you to adjust your analysis strategy based on coverage, helping you achieve a more complete understanding of the binary.
[0x100003a58]> aai
fcns 135
xrefs 2250
calls 330
strings 97
symbols 140
imports 84
covrage 17148
codesz 32768
percent 52%
[0x100003a58]>
With commands like afl
and its variants
(afl
, afll
, afl,
or
aflc
to share some examples), you can learn from the output
of different analysis commands and how they combine to achieve better
code coverage.
To exercise this we will write a shellscript that runs different analysis commands and shows some statistics about it:
BINFILE="/usr/bin/awk"
CMDS="aa aaa aaaa aaaa aap aac aab afr"
for a in `echo ${CMDS}` ; do
for b in anal.hasnext=false anal.hasnext=true ; do
c=`r2 -q -e ${b} -c ${a} -c 'aai~[1]' ${BINFILE} 2> /dev/null | tr '\n' ' '`
echo "$a\t$b\t$c"
done
done
The output for this script is the following:
Columns: fcns xrefs calls strings symbols imports covrage codesz percent
-------------------------------------------------------------------------
aa anal.hasnext=false 262 3135 1243 481 284 74 85440 114688 74%
aa anal.hasnext=true 288 3186 1294 481 284 74 85544 114688 74%
aaa anal.hasnext=false 265 8198 1453 481 284 74 85676 114688 74%
aaa anal.hasnext=true 291 8249 1504 481 284 74 85780 114688 74%
aaaa anal.hasnext=false 265 8198 1453 481 284 74 85676 114688 74%
aaaa anal.hasnext=true 291 8249 1504 481 284 74 85780 114688 74%
aaaa anal.hasnext=false 265 8198 1453 481 284 74 85676 114688 74%
aaaa anal.hasnext=true 291 8249 1504 481 284 74 85780 114688 74%
aap anal.hasnext=false 346 3070 1006 481 284 74 84220 114688 73%
aap anal.hasnext=true 305 3187 1291 481 284 74 85704 114688 74%
aac anal.hasnext=false 206 2852 1414 481 284 74 68688 114688 59%
aac anal.hasnext=true 262 3201 1464 481 284 74 78444 114688 68%
aab anal.hasnext=false 276 0 0 481 284 74 67996 114688 59%
aab anal.hasnext=true 276 0 0 481 284 74 67996 114688 59%
afr anal.hasnext=false 66 1004 227 481 284 74 39036 114688 34%
afr anal.hasnext=true 215 1998 698 481 284 74 61940 114688 54%
Obviously using /usr/bin/awk as example for this script is probably
not the best target, but serves as an idea that combining with different
analysis strategies like aap
, aac
, and
enabling anal.hasnext
, you can greatly enhance code
coverage and function discovery in Radare2.
These techniques will us help discover hidden functions or missing xrefs, providing valuable insights into the binary’s behavior and code structure. As well, we must understand that finding more functions doesn’t mean having better results, because the order of analysis, the type propagation and other important details need further analysis than just plain statistics.
Now it’s your turn to test the script on your favourite binaries, be careful analyzing the results and compare the difference in functions discovered looking for false positives, luckily you will spot some new good generic tips to improve the default analysis.
Discuss, test and share your thoughs about all these techniques and which analysis behaviours work the best for your personal usecases as an exercise for today, you’ll surely find out better ways to analyze your binaries and reduce processing times which clearly makes a difference when working with large binaries at scale.
Stay tuned for tomorrow’s advent task, where we’ll explore further aspects of binary analysis with Radare2!
–pancake