Welcome to Day 7 of the Radare2 Advent of Code!
Welcome to an advanced guide on leveraging Radare2 to analyze symbols
and functions in binaries. This document expands on essential
techniques, including reverse callgraph listing, identifying symbols as
functions or data, and optimizing auto-naming with AI tools like
r2ai
.
In the context of binary analysis, symbols are identifiers for functions, global variables, and other entities within the binary. Symbols can represent internal or external functions, variables, or data. They play a crucial role in navigating and interpreting binaries, as they help label addresses with meaningful names instead of raw hexadecimal values.
In Radare2 we have different commands that unify the metadata from different types of file formats so we don’t need to dig into petools or objdump to check the symbol visibility, content types, etc.
But from a global perspective we want to find out at least 3 types of symbols:
is
: Lists all symbols in the
binary.ii
: Lists all imported symbols
(functions and data).iE
: Lists all exported symbols.Despite all those commands respond with named addresses, there are some subtle details that make them different:
We should already know that most radare2 commands have a * subcommand (e.g., is, ii, iE*), which automatically sets flags in radare2 based on the listed addresses. Flags in radare2 act as named markers for specific addresses, making navigation and analysis easier.
For example. Running is* will list all symbols and create corresponding flags for them. If we want to interpret those commands we can prefix the command with just a dot.
> f~sym # list flags containing sym
> f-sym* # remove all flags starting with sym
> .is* # set all the symbol flags again
As anything in r2, we can also dump the output of the command into a
file just using the redirect operator >
like this:
is* > symbols.r2
One of the main needs for symbolicating programs is when debugging, because radare2 does some lazy job here by not parsing any file from disk and just picking the map names for you as a guide.
For those cases we can use the dmi
command:
[0x7de3612bd540]> d?* | grep symb
| dmi [addr|libname] [symname] list symbols of target lib
| dmi* [addr|libname] [symname] list symbols of target lib in radare commands
| dmi. list closest symbol to the current address
| dmis [libname] same as .dmi* - import all symbols from given lib as flags
| dmiv show address of given symbol for given lib
[0x7de3612bd540]>
Or create a radare2 script with rabin2 that sets all the flags rebased to the map we like using a oneliner like this:
$ rabin2 -B <base_address> -r -s <binary_file>
This workflow is especially useful for creating scripts for binaries where the load address might differ (e.g., shared libraries or position-independent executables). It ensures that the symbols, imports, and exports remain correctly aligned to the new base address, simplifying subsequent analyses.
Feel free to pass -i
or -E
in addition of
the -s
one to get the full set.
Sometimes the symbols for a program are defined in external files, each operating system handles this in a different way, even some use symbol servers to provide a way for userst o submit a hash of the binary and download the symbolication details for that specific build to the deverlopers, this way the users can use stripped builds and developers can understand the unnamed backtraces from the crashes reported.
On Linux/Mac we may probably find an ELF or MACHO file containing
only the DWARF sections, r2 will find them and load that automatically
for you. But we can also use the rabin2 -rs
trick explained
about to load that by hand if needed.
On Windows we can find PDB files that can be pulled from PDB Servers
using the idp
command. Checkout the r2book for more details. But you have
some basic help in here:
[0x00000000]> e~pdb
pdb.autoload = 0
pdb.extract = 1
pdb.server = https://msdl.microsoft.com/download/symbols
pdb.symstore = /home/pancake/.local/share/radare2/pdb
pdb.useragent = microsoft-symbol-server/6.11.0001.402
[0x00000000]> id?
Usage: idp Debug information
| id show DWARF source lines information
| idp [file.pdb] load pdb file information
| idpi [file.pdb] show pdb file information
| idpi* show symbols from pdb as flags (prefix with dot to import)
| idpd download pdb file on remote server
| idx display source files used via dwarf (previously known as iX)
[0x00000000]>
Additionally Visual Studio generate some vsmap files which are no more than SQLITE dump files that can be parsed with the vsmap.r2.js script.
R2Frida handles all the symbols from all the binaries loaded in memory, which can result in millions of flags being set when we probably just care about few of them.
To minimize the pain, r2frida :i
subcommands will only
list the symbols, imports, classes, .. available in the current binary
in memory, so we will need to seek to any address inside any of the
libraries we like:
[0x755f3a000000]> :il
0x0000650c45287000 0x0000650c452ab518 ls
0x00007ffd0c96f000 0x00007ffd0c9700ab linux-vdso.so.1
0x0000755f3a269000 0x0000755f3a2956c8 libselinux.so.1
0x0000755f3a000000 0x0000755f3a211d90 libc.so.6
0x0000755f39f66000 0x0000755f39fff310 libpcre2-8.so.0.11.2
0x0000755f3a2c7000 0x0000755f3a3002d8 ld-linux-x86-64.so.2
0x0000755f3a261000 0x0000755f3a265010 libdl.so.2
0x0000755f3a25c000 0x0000755f3a260010 librt.so.1
0x0000755f39e7d000 0x0000755f39f65018 libm.so.6
0x0000755f3a257000 0x0000755f3a25b010 libpthread.so.0
[0x755f3a000000]>
And then run .:iE* @ lib.libc.so.6
to set flags to all
the exported symbols from the libc library in r2frida.
The aa command series in Radare2 helps automate the analysis of symbols, functions, and control flow. Each version adds a layer of depth to the analysis, so let’s look at each:
Most of those analysis commands perform recursive function discovery which means that even if the public symbol have a name, the ones called inside the binary don’t need to have one.
This complicates the understanding process of those inner functions, but there are ways to give them a name automatically, but first we need to lear about the importance of the order of the analysis/
Proper analysis order is essential for accurate function typing, name resolution, and reference propagation. Here’s a recommended sequence for optimal analysis starting from the external and going down to the lowest internal procedures:
After this we proabably want to perform an aac
or
aap
to find all calls and function preludes, maybe using
also -e anal.hasnext=true
to enlarge the code coverage but
having in mind that all the named functions with their type details are
exposed before hand.
We can use the aflm
command to list all the functions
and the functions they call in “makefile” format, which is very readable
and understable to have a quick look on the
The command afla
, as noted in the help message will list
the functions in reverse order:
[0x00006d30]> afl?~revers
| afla reverse call order (useful for afna and noret, also see afba)
Why does it matter and why this needs to be used for analysis and autonaming functions? Exactly! for the same reason of the type propagation and function boundary detection we learned before.
This command will compute a callgraph, take the nodes with no exits, and create a list for all the reference calls to each of them, and keep doing that recursively until covering the whole graph, and then list the offsets of the functions in the right order so we can it for scripts or abusing the @@ foreach operator to name them all in the right order like this:
.afna @@c:afla
The @@?
command will discover for us that we also have
an alias for this useful action which is @@F
[0x00006d30]> @@?~recursive
| x @@F alias for @@c:afla - inverse recursive function list
Radare2 provides some basic autonaming algorithm, which relies on symbols and strings referenced by a function to determine a better name that may help us navigate through the function list and the callgraph.
The autonaming depends on two variables:
The first option will run aan
to name all the functions,
and the second will chose between a slow or a fast algorithm which uses
emulation or just plain call references to construct a unique function
name to be assigned to each function.
Sadly this is probably never enough and requires too much manual intervention
The decai’s r2ai plugin leverages the language model capabilties to understand the logic of a function to find out a good name that describes its decompilation based on function calls and the strings referenced, type metadata, etc.
This command applies machine learning-based heuristics to suggest better function names across all functions identified in the reverse callgraph.
decai -n @@F
Analyzing symbols with Radare2 allows you to uncover the names hidden in the program and discover more functions.
As an exercise for today I want you all to experiment with
afna
, aan
and decai -n
to improve
the algorithm implemented in libr/core/canal.c (search for autoname_)
and the prompt used in decai -n when used with local models via ollama
or the r2ai webserver to improve the results and final namings listed by
the afl
command.
Do you have more symbolication tricks? is there any symbol database or binary format not supported by radare2? Let us know!
Stay tuned for tomorrow’s challenge as we continue exploring Radare2’s features!