Welcome to Day 6 of the Radare2 Advent of Code!
Today, we’re diving into one of the most useful aspects of reverse engineering: finding cross-references (xrefs) to functions or strings within a binary.
By locating these references, you can spot the code that interacts with the strings or code you are interested on, giving you insight into how a binary operates.
Radare2 provides several commands to handle cross-references (xrefs). We’ll explore these commands for analyzing code and data, along with useful tricks to better understand the results and the relationships between different components.
Assuming the binary has been analysed (using r2 -A
or
aaa
, we may have at least some xrefs to start playing with
the ax
subcommands. The axt
command is the
command for finding all “analysis xrefs to” a particular address.
For example, to find all references to the “puts” import symbol located in the PLT:
$ r2 -A /bin/ls
[0x100003a58]> axt sym.imp.puts
(nofunc) 0x100004940 [CALL:--x] bl sym.imp.puts
(nofunc) 0x100004984 [CALL:--x] bl sym.imp.puts
[0x100003a58]>
In the other way around, we have axf
which perfoms a
list of the xrefs that are originated from the current or given
address.
While axt
is great for finding xrefs to a specific
place, ax list all cross-references within the entire address space.
This command is handy when you want an overview of the code’s structure
or need to locate xrefs in bulk for analysis.
For clarity we will use the comma separated table output available in
the ax,
command:
[0x000019f0]> ax,:fancy
.-----------------------------------------------------------------------.
| from | to | size | type | perm | fromname |
)-----------------------------------------------------------------------(
| 0x1254 | 0x5ff0 | 4 | DATA | --x | section..plt.got + 4 |
| 0x1264 | 0x5ea8 | 4 | DATA | --x | section..plt.sec + 4 |
| 0x1274 | 0x5eb0 | 4 | DATA | --x | sym.imp.__errno + 4 |
| 0x14b5 | 0x2 | 4 | DATA | r-- | main + 37 |
| 0x14ee | 0x2f | 4 | DATA | r-- | main + 94 |
| 0x14f6 | 0x1320 | 0 | CALL | --x | main + 102 |
| 0x150d | 0x6 | 4 | DATA | r-- | main + 125 |
| 0x151c | 0x40b3 | 4 | DATA | r-- | main + 140 |
| 0x1523 | 0x1280 | 0 | CALL | --x | main + 147 |
| 0x1ab2 | 0x1250 | 0 | CALL | --x | entry.fini0 + 34 |
| 0x1ab7 | 0x1a20 | 0 | CALL | --x | entry.fini0 + 39 |
| 0x1abc | 0x6018 | 4 | DATA | -w- | entry.fini0 + 44 |
| 0x1ad4 | 0x1a50 | 0 | CODE | --x | entry.init0 + 4 |
`-----------------------------------------------------------------------'
The problem here, is that usually, this listing is pretty large, so you will need to perform some filtering to get some insights. By default this command outputs every reference detected in the binary, categorized by type (e.g., call, data, jump). You can filter or search within the output to locate particular references of interest.
These are some of the common tricks for filtering:
ax~hello
ax~...
axlj
ax,:help
When analyzing binaries, you might encounter situations where references to specific functions, variables, or addresses are not immediately visible due to optimizations, indirect calls, or obfuscation. Radare2 provides powerful tools and techniques to uncover these “missing references” and understand program interactions better.
References can be created in many different ways:
As you can see by reading this list, it’s very easy that the automatic analysis won’t spot all the possible references, and it’s important to understand the nature of each reference to know the reason why we are not catching some or why some false positives appear during our analysis process.
An important aspect of how references are handled in radare2 is that they also store information about direction and permissions. This means you can determine whether the referenced address is being used for reading, writing, or executing.
When we run aaaa
the list of commands executed under the
hood are shown:
[0x00000000]> aaaa
INFO: Analyze all flags starting with sym. and entry0 (aa)
INFO: Analyze imports (af@@@i)
INFO: Analyze symbols (af@@@s)
INFO: Analyze all functions arguments/locals (afva@@@F)
INFO: Analyze function calls (aac)
INFO: find and analyze function preludes (aap)
INFO: Analyze len bytes of instructions for references (aar)
INFO: Finding and parsing C++ vtables (avrr)
INFO: Analyzing methods (af @@ method.*)
INFO: Recovering local variables (afva@@@F)
INFO: Type matching analysis for all functions (aaft)
INFO: Propagate noreturn information (aanr)
INFO: Scanning for strings constructed in code (/azs)
INFO: Enable anal.types.constraint for experimental type propagation
So, as we have learned, not just the code analysis is affected by the order and the type of analysis performed, but also the references. It’s known to be a good practice in r2land to not to assume the default analysis is the best option for all cases, and we may run the ones we need to understand what they are doing and reduce the analysis time optimized for our needs.
Take into account, that sometimes running less commands will give
better results, and some experimental analysis under aaaaa
may end up creating false positives depending on the target arch and
binary.
Radare2 also provides commands for searching references through
immediate values and computed addresses. The /r
and
/re
commands offer powerful ways to locate indirect and
dynamic references.
These commands use search.in
boundaries instead of
anal.in
and are not triggered by the default analysis, but
we can assume that aar
will find all the direct references,
and /r
will find only one direct reference (and the same
goes for aae
and /re
).
/r
: Searching for Immediate ReferencesThe /r
command searches for references by analyzing
immediate values in instructions. This is useful when a function or data
reference is passed directly within an instruction.
/r 0x00400620
Here, Radare2 searches for any instructions with an immediate value of 0x00400620, potentially leading to indirect calls or accesses of a function or variable at this address.
/re
: Emulating Code to Find Computed ReferencesThe /re command takes this a step further by emulating code linearly, identifying references that might not appear in immediate values but are calculated at runtime.
/re 0x00400620
Radare2 tries to follow the flow of execution to determine if any instructions compute a reference to 0x00400620. This method is valuable for analyzing functions with computed or dynamically generated addresses.
Sometimes it’s easier for us to just perform a full disassembly of the whole code section and grep for values or strings. And radare2 provides multiple ways to do that:
[0x100003a58]> e emu.str=true
[0x100003a58]> pD $SS @ $S ~ jailbreak
And maybe we just want to filter out all the flags, commands, xrefs,
functions in the visual hud with V_
.. but we have also a
large set of low level commands to scan for instructions matching some
specific characteristics, like for example, those having an immediate
that can be used to reference some data or delta inside a struct.
Take a look at /ai
:
[0x100003a58]> /ai 0x70
0x100004e38 eb3b40f9 ldr x11, [sp, 0x70]
0x100004e70 087140b9 ldr w8, [x8, 0x70]
0x1000052b4 e83b40f9 ldr x8, [sp, 0x70]
0x100005b1c 08c10191 add x8, x8, 0x70
0x1000067a0 fdc30191 add x29, sp, 0x70
0x1000074ac 31c20191 add x17, x17, 0x70
[0x100003a58]>
Check /a?
to find the help for all the assembly
search subcommands for you to get an idea about the
capabilities in case you need to go deeper in here.
In MIPS binaries or firmware files in general, finding references
isn’t limited to instructions that directly access a function or
address. In such cases, tools like aav
, /v
,
and /V
offer additional insights.
These commands help uncover references that may be embedded within
data structures, especially useful in languages where function pointers
or callbacks are stored dynamically when used on running processes like
via r2frida
, remote gdb or the native
debugger (r2 -d
).
The aav command analyzes values within code sections, helping locate pointers or data that may reference functions or addresses indirectly.
aav
By running aav
, Radare2 scans the code for potential
dword addresses inlined as immediate values, assisting in identifying
otherwise hard-to-find function references.
To begin with we can try with /v
and all its variants
/v4
or /v8
that will encode the given value
into a 4 or 8 byte endian-aware pattern to search for, those values must
be aligned for performance reasons so we can use
-e search.align=4
to make the scan 4 times faster and
reduce false positives.
[0x00000000]> /V?
| /V[1248] min max look for an `cfg.bigendian` 32bit value in range
[0x00000000]>
One practical use case for this would be to find in a firmware where the memory mapped devices are mapped. Because it’s usually easier for programs to load a dword for the destination address than computing the value which can’t fit in an immediate value on 8 or 16bit microprocessors.
Once those values are found you can check for xrefs in the given
addresses using a command like /ar @@ hit*
but take into
account that you can manually register new xrefs using the
ax
command.
Programs on ARM-16 mode can combine 16 and 32 bit wide instructions, this is a problem because the Thumb bit of the cpsr register can change at any time and this can be used for obfuscation or the binary headers can report invalid details about the location of the thumb or non-thumb symbols, resulting on pretty bad results.
Radare2 uses the anal hints
as a way to specify the
bitness of a specific memory region with the ahb
command.
[0x00000000]> ahb?
Usage: ahb [8|16|32|64] [@ addr] Define asm.bits hint at given address
| ahb 16 set asm.bits=16 in the given address
| ahb get asm.bits used in given addr (current seek)
| ahb-$$ delete all the hints in the given address
| ahb* show defined bits hints as r2 commands
[0x00000000]>
Some ELF binaries can become a big mess because the bin loader will
set a lot of ahb 16
and ahb 32
hints here and
there and cause the analysis to missbehave.
Also, note that aae
and aaef
can perform
recursive control flow graph emulation, finding code patterns that
modify the CPU mode and propagate the ahb
hints over
functions.
A quick solution to override this and reset to defaults consists in
just removing all those hints and start over before running any analysis
with the ahb-*
command.
Scanning for computed references using aae
after that
will result in much better results for those kind of binaries.
Today’s challenge consists in using all this knowledge and your battery or binaries, analyze the binary and try to find the reason why there’s a missing or a false positive reference.
V_
commandstr
and choose a string you likeOnce you spot once, it will be great if you could submit a PR fixing it by:
aaa
default analysisDon’t worry if you feel overwhelmed. If you’re not comfortable fixing this issue yourself, you can create a ticket and include the binary file along with reproduction steps. What’s most important right now is demonstrating your ability to research and analyze the references.
HINT: you are free to use other tools to find out those references in case they are not available in r2, comparing between tools is another good way to learn more about how the analysis and references are constructed.
Stay tuned for tomorrow’s Advent of Radare2!