/advent

06 - Finding References

Welcome to Day 6 of the Radare2 Advent of Code!

Today, we’re diving into one of the most useful aspects of reverse engineering: finding cross-references (xrefs) to functions or strings within a binary.

By locating these references, you can spot the code that interacts with the strings or code you are interested on, giving you insight into how a binary operates.

Radare2 provides several commands to handle cross-references (xrefs). We’ll explore these commands for analyzing code and data, along with useful tricks to better understand the results and the relationships between different components.

The Basics

Assuming the binary has been analysed (using r2 -A or aaa, we may have at least some xrefs to start playing with the ax subcommands. The axt command is the command for finding all “analysis xrefs to” a particular address.

For example, to find all references to the “puts” import symbol located in the PLT:

$ r2 -A /bin/ls
[0x100003a58]> axt sym.imp.puts
(nofunc) 0x100004940 [CALL:--x] bl sym.imp.puts
(nofunc) 0x100004984 [CALL:--x] bl sym.imp.puts
[0x100003a58]>

In the other way around, we have axf which perfoms a list of the xrefs that are originated from the current or given address.

Listing References

While axt is great for finding xrefs to a specific place, ax list all cross-references within the entire address space. This command is handy when you want an overview of the code’s structure or need to locate xrefs in bulk for analysis.

For clarity we will use the comma separated table output available in the ax, command:

[0x000019f0]> ax,:fancy
.-----------------------------------------------------------------------.
| from    | to      | size  | type  | perm      | fromname              |
)-----------------------------------------------------------------------(
| 0x1254  | 0x5ff0  | 4     | DATA  | --x       | section..plt.got + 4  |
| 0x1264  | 0x5ea8  | 4     | DATA  | --x       | section..plt.sec + 4  |
| 0x1274  | 0x5eb0  | 4     | DATA  | --x       | sym.imp.__errno + 4   |
| 0x14b5  | 0x2     | 4     | DATA  | r--       | main + 37             |
| 0x14ee  | 0x2f    | 4     | DATA  | r--       | main + 94             |
| 0x14f6  | 0x1320  | 0     | CALL  | --x       | main + 102            |
| 0x150d  | 0x6     | 4     | DATA  | r--       | main + 125            |
| 0x151c  | 0x40b3  | 4     | DATA  | r--       | main + 140            |
| 0x1523  | 0x1280  | 0     | CALL  | --x       | main + 147            |
| 0x1ab2  | 0x1250  | 0     | CALL  | --x       | entry.fini0 + 34      |
| 0x1ab7  | 0x1a20  | 0     | CALL  | --x       | entry.fini0 + 39      |
| 0x1abc  | 0x6018  | 4     | DATA  | -w-       | entry.fini0 + 44      |
| 0x1ad4  | 0x1a50  | 0     | CODE  | --x       | entry.init0 + 4       |
`-----------------------------------------------------------------------'

The problem here, is that usually, this listing is pretty large, so you will need to perform some filtering to get some insights. By default this command outputs every reference detected in the binary, categorized by type (e.g., call, data, jump). You can filter or search within the output to locate particular references of interest.

These are some of the common tricks for filtering:

Missing references

When analyzing binaries, you might encounter situations where references to specific functions, variables, or addresses are not immediately visible due to optimizations, indirect calls, or obfuscation. Radare2 provides powerful tools and techniques to uncover these “missing references” and understand program interactions better.

References can be created in many different ways:

As you can see by reading this list, it’s very easy that the automatic analysis won’t spot all the possible references, and it’s important to understand the nature of each reference to know the reason why we are not catching some or why some false positives appear during our analysis process.

An important aspect of how references are handled in radare2 is that they also store information about direction and permissions. This means you can determine whether the referenced address is being used for reading, writing, or executing.

When we run aaaa the list of commands executed under the hood are shown:

[0x00000000]> aaaa
INFO: Analyze all flags starting with sym. and entry0 (aa)
INFO: Analyze imports (af@@@i)
INFO: Analyze symbols (af@@@s)
INFO: Analyze all functions arguments/locals (afva@@@F)
INFO: Analyze function calls (aac)
INFO: find and analyze function preludes (aap)
INFO: Analyze len bytes of instructions for references (aar)
INFO: Finding and parsing C++ vtables (avrr)
INFO: Analyzing methods (af @@ method.*)
INFO: Recovering local variables (afva@@@F)
INFO: Type matching analysis for all functions (aaft)
INFO: Propagate noreturn information (aanr)
INFO: Scanning for strings constructed in code (/azs)
INFO: Enable anal.types.constraint for experimental type propagation

So, as we have learned, not just the code analysis is affected by the order and the type of analysis performed, but also the references. It’s known to be a good practice in r2land to not to assume the default analysis is the best option for all cases, and we may run the ones we need to understand what they are doing and reduce the analysis time optimized for our needs.

Take into account, that sometimes running less commands will give better results, and some experimental analysis under aaaaa may end up creating false positives depending on the target arch and binary.

Computed references

Radare2 also provides commands for searching references through immediate values and computed addresses. The /r and /re commands offer powerful ways to locate indirect and dynamic references.

These commands use search.in boundaries instead of anal.in and are not triggered by the default analysis, but we can assume that aar will find all the direct references, and /r will find only one direct reference (and the same goes for aae and /re).

The /r command searches for references by analyzing immediate values in instructions. This is useful when a function or data reference is passed directly within an instruction.

/r 0x00400620

Here, Radare2 searches for any instructions with an immediate value of 0x00400620, potentially leading to indirect calls or accesses of a function or variable at this address.

The /re command takes this a step further by emulating code linearly, identifying references that might not appear in immediate values but are calculated at runtime.

/re 0x00400620

Radare2 tries to follow the flow of execution to determine if any instructions compute a reference to 0x00400620. This method is valuable for analyzing functions with computed or dynamically generated addresses.

Immediates

Sometimes it’s easier for us to just perform a full disassembly of the whole code section and grep for values or strings. And radare2 provides multiple ways to do that:

[0x100003a58]> e emu.str=true
[0x100003a58]> pD $SS @ $S ~ jailbreak

And maybe we just want to filter out all the flags, commands, xrefs, functions in the visual hud with V_.. but we have also a large set of low level commands to scan for instructions matching some specific characteristics, like for example, those having an immediate that can be used to reference some data or delta inside a struct.

Take a look at /ai:

[0x100003a58]> /ai 0x70
0x100004e38             eb3b40f9  ldr x11, [sp, 0x70]
0x100004e70             087140b9  ldr w8, [x8, 0x70]
0x1000052b4             e83b40f9  ldr x8, [sp, 0x70]
0x100005b1c             08c10191  add x8, x8, 0x70
0x1000067a0             fdc30191  add x29, sp, 0x70
0x1000074ac             31c20191  add x17, x17, 0x70
[0x100003a58]>

Check /a? to find the help for all the assembly search subcommands for you to get an idea about the capabilities in case you need to go deeper in here.

Inlined Values

In MIPS binaries or firmware files in general, finding references isn’t limited to instructions that directly access a function or address. In such cases, tools like aav, /v, and /V offer additional insights.

These commands help uncover references that may be embedded within data structures, especially useful in languages where function pointers or callbacks are stored dynamically when used on running processes like via r2frida, remote gdb or the native debugger (r2 -d).

The aav command analyzes values within code sections, helping locate pointers or data that may reference functions or addresses indirectly.

aav

By running aav, Radare2 scans the code for potential dword addresses inlined as immediate values, assisting in identifying otherwise hard-to-find function references.

To begin with we can try with /v and all its variants /v4 or /v8 that will encode the given value into a 4 or 8 byte endian-aware pattern to search for, those values must be aligned for performance reasons so we can use -e search.align=4 to make the scan 4 times faster and reduce false positives.

[0x00000000]> /V?
| /V[1248] min max  look for an `cfg.bigendian` 32bit value in range
[0x00000000]>

One practical use case for this would be to find in a firmware where the memory mapped devices are mapped. Because it’s usually easier for programs to load a dword for the destination address than computing the value which can’t fit in an immediate value on 8 or 16bit microprocessors.

Once those values are found you can check for xrefs in the given addresses using a command like /ar @@ hit* but take into account that you can manually register new xrefs using the ax command.

Thumb Code

Programs on ARM-16 mode can combine 16 and 32 bit wide instructions, this is a problem because the Thumb bit of the cpsr register can change at any time and this can be used for obfuscation or the binary headers can report invalid details about the location of the thumb or non-thumb symbols, resulting on pretty bad results.

Radare2 uses the anal hints as a way to specify the bitness of a specific memory region with the ahb command.

[0x00000000]> ahb?
Usage: ahb [8|16|32|64] [@ addr]   Define asm.bits hint at given address
| ahb 16  set asm.bits=16 in the given address
| ahb     get asm.bits used in given addr (current seek)
| ahb-$$  delete all the hints in the given address
| ahb*    show defined bits hints as r2 commands
[0x00000000]>

Some ELF binaries can become a big mess because the bin loader will set a lot of ahb 16 and ahb 32 hints here and there and cause the analysis to missbehave.

Also, note that aae and aaef can perform recursive control flow graph emulation, finding code patterns that modify the CPU mode and propagate the ahb hints over functions.

A quick solution to override this and reset to defaults consists in just removing all those hints and start over before running any analysis with the ahb-* command.

Scanning for computed references using aae after that will result in much better results for those kind of binaries.

Reference Challenge

Today’s challenge consists in using all this knowledge and your battery or binaries, analyze the binary and try to find the reason why there’s a missing or a false positive reference.

Once you spot once, it will be great if you could submit a PR fixing it by:

Don’t worry if you feel overwhelmed. If you’re not comfortable fixing this issue yourself, you can create a ticket and include the binary file along with reproduction steps. What’s most important right now is demonstrating your ability to research and analyze the references.

HINT: you are free to use other tools to find out those references in case they are not available in r2, comparing between tools is another good way to learn more about how the analysis and references are constructed.

Stay tuned for tomorrow’s Advent of Radare2!