Welcome to Day 13 of the Radare2 Advent of Code!
Today, we will explore techniques for navigating through the disassembly in Radare2. Understanding the flow of code and the relationships between instructions, basic blocks, and functions is crucial for effective reverse engineering. Let’s dive into some commands and scripting capabilities that make this process intuitive.
Radare2 provides simple shortcuts for moving through disassembly one instruction at a time:
so+1
to move to the next instruction.so-1
to move to the previous instruction.Check the offset in the prompt showing is where we go after seeking 1 and 2 instructions forward and backward.
[0x100003a58]> so+1
[0x100003a5c]> so-1
[0x100003a58]> so+2
[0x100003a60]> so-2
[0x100003a58]>
We can also use forward and backward disassembly to inspect the instructions we have around like this:
[0x100003a58]> so+4
[0x100003a68]> pd-4
0x100003a58 7f2303d5 pacibsp
0x100003a5c fc6fbaa9 stp x28, x27, [sp, -0x60]!
0x100003a60 fa6701a9 stp x26, x25, [sp, 0x10]
0x100003a64 f85f02a9 stp x24, x23, [sp, 0x20]
[0x100003a68]> pd 4
0x100003a68 f65703a9 stp x22, x21, [sp, 0x30]
0x100003a6c f44f04a9 stp x20, x19, [sp, 0x40]
0x100003a70 fd7b05a9 stp x29, x30, [sp, 0x50]
0x100003a74 fd430191 add x29, sp, 0x50
[0x100003a68]>
A fancy trick to have both at the same time is to use the
pd--
command, which disassembles N instructions forward and
N instructions backward:
[0x100003a68]> pd--4
0x100003a58 7f2303d5 pacibsp
0x100003a5c fc6fbaa9 stp x28, x27, [sp, -0x60]!
0x100003a60 fa6701a9 stp x26, x25, [sp, 0x10]
0x100003a64 f85f02a9 stp x24, x23, [sp, 0x20]
0x100003a68 f65703a9 stp x22, x21, [sp, 0x30]
0x100003a6c f44f04a9 stp x20, x19, [sp, 0x40]
0x100003a70 fd7b05a9 stp x29, x30, [sp, 0x50]
0x100003a74 fd430191 add x29, sp, 0x50
[0x100003a68]>
When we are working with large addresses (kernel addresses or those
exceeding the x86-64 compatibility mode), radare2 provides a helpful way
to type partial addresses. Instead of typing the complete address, you
can use relative addressing with the s..
command followed
by the last digits of the target address.
For example:
[0x100003a68]> s..32
[0x100003a32]>
In this case, instead of typing the full address
0x100003a32
, we just used s..32
to seek to
that location. The s..
command maintains the higher bits of
the current address and only changes the specified lower bits.
This feature is particularly useful when: - Analyzing large executables or kernel code - Debugging processes with high memory addresses - Navigating through memory regions with similar address prefixes - Reducing the chance of typing errors when entering long addresses
You can also use this shorthand notation with other radare2 commands that accept addresses as parameters.
A basic block is like a piece of a puzzle in computer code - it’s a sequence of instructions that always run together, from start to finish, with no jumps or branches in between. Think of it as a straight path where once you start walking, you have to keep going until you reach the end. The only way in is at the beginning, and the only way out is at the end of the block.
These blocks are super helpful when you’re trying to understand how a program works. Imagine a flowchart where each box is a basic block, and the arrows between them show where the program might go next. Basic blocks make it easier to analyze code because you can focus on one chunk at a time, knowing that these instructions will always run together. They’re like the building blocks that help reverse engineers understand the bigger picture of how a program flows and what it does.
Radare2’s exposes the addresses and numbers that are related to basic
blocks under the $B
numvars. You can access these by
running ?$?~B
. Here’s a breakdown:
$BB
: Start address of the current basic block.$BE
: End address of the current basic block.$Bj
: Jump address from the current basic block.$Bf
: Fail/fall-through address from the current basic
block.$Bi
: Number of instructions in the current basic
block.$BS
: Size (in bytes) of the current basic block.$BC
: Number of cases (e.g., in a switch statement) in
the block.$BC:{#}
: Address of the nth case in the current
block.Numvars are basically variables that are exposed and defined by radare internally and have a constant value, we cannot modify their value but we can use them in any command that takes a math expression as argument.
First of all we will analyze the function. Let’s open
/bin/ls
and then we will run the classic aaaa
(do r2 -A /bin/ls if you are short in lowercase a’s), after
that seek to the main
symbol.
Jump to the beginning of the current basic block:
s $BB
Check the size of the block you’re currently in:
?v $BS
Disassemble the two basic blocks that are connected to the current one:
pdb @ $Bj
pdb @ $Bf
Usually basic blocks can have multiple input and output edges, we can assume the following statements:
These variables are highly useful for programmatically analyzing and navigating code structure. And we can use them from r2js, r2pipe or any other scripting language we like.
You can also use afb
to enumerate all the basic blocks
of the current function and interpret the listing. Use the graph view in
a separate terminal to verify your assumptions.
It’s often helpful to use visual mode to navigate through a function’s flow or to follow references across different functions or pointers of interest.
Learning the key combinations is essential to become comfortable with the interface. Here’s a quick guide you should practice with to solve the challenge.
agfv
- enter visual interactive function graph
viewV
- enter visual mode (press p
to switch
to the disassembly)Now you’ll encounter reference keystroke hints that look like this:
;[1]
or ;[oe]
. The text between the brackets
represents the keys you need to type to make r2 jump to that
location.
Here are some essential navigation keys you should know for keyboard-based navigation:
u
- undo to the previous seek (same as s-
command)U
- redo last undone seek (same as s+
command)r
- rotate between data, branch, call, computed
reference hintst
- follow the true output edge branch from
the current basic blockf
- same for the false branchx
- open the xrefs view and use j/k
keys
to select the destinationn
/N
- seek to the next/previous function
(see -e scr.nkey
to choose the target)We can navigate through the disassembly instruction by instruction
using the so
command. In Visual mode we can use
j
and k
to move down and up respectively.
However, radare2 also provides other numeric variables (numvars) that enable us to perform calculations or express navigation in more sophisticated ways.
radare2 provides a variety of numvars to help navigate instructions effectively. You can list all available numvars by running:
?$?~i
Here’s a breakdown of some useful instruction-related numvars:
$in:{n}
: Address of the nth instruction forward.$ip:{n}
: Address of the nth instruction backward.
Example: s $I1@$Fe
jumps to the last instruction in a
BB.$is[:{n}]
: Size of the nth instruction.$ij
: Jump address for instructions like
jmp
, jz
.$ie
: Returns 1 if it’s the end of a block, else
0.$if
: Jump fail address for conditional jumps. Example:
jz 0x10
will point to the next instruction.$ir
: Pointer value referenced by the instruction
(e.g., lea rax, [0x8010] -> 0x8010
).$iv
: Immediate value in the instruction (e.g.,
mov eax, 42 -> 42
).Quick usage examples:
Skip the current offset to the 5th instruction forward.
s $in:5
Change the program counter to skip two instructions:
ar PC=$in:2
Check if the current instruction is the end of a basic block:
? $ie
Most commands in radare2 have the ability to display the information
using a structured JSON text that we can easily use later for scripting.
It’s just about suffixing the command with j
.
To visualize all the information of the current basic block in JSON
format, we must use the abj
command.
This includes their size, jump, fail, and the instructions they contain. Here’s an example:
[0x100003a58]> abj~{}
[
{
"addr": 4294982232,
"size": 76,
"jump": 4294982312,
"fail": 4294982308,
"opaddr": 4294982232,
"inputs": 0,
"outputs": 2,
"ninstr": 19,
"instrs": [
4294982232, 4294982236, 4294982240, ...
],
"traced": 1
}
]
If we want to see all the basic blocks of the current function we
must use afbj
instead:
[0x100003a58]> afbj~ninstr
"ninstr": 19,
"ninstr": 1,
"ninstr": 17,
"ninstr": 5,
"ninstr": 7,
"ninstr": 7,
"ninstr": 2,
...
"ninstr": 2,
"ninstr": 5,
There are some interesting metrics that we can extract from the basic block sizes, edges, amount and type of instructions that they contain. These computations can help us to understand the purpose of the function, or how it is constructed, if it’s obfuscated, or how complex it is:
These are shown with the afi
command. Let’s highlight
the most relevant ones:
[0x100003a58]> afi
offset: 0x100003a58
name: main
size: 3020
is-pure: false
realsz: 2068
stackframe: 1696
cyclomatic-cost: 117
cyclomatic-complexity: 147
num-bbs: 135
num-instrs: 518
edges: 180
minbound: 0x100003a58
maxbound: 0x100004624
is-lineal: false
end-bbs: 5
trace-coverage: 62
maxbbins: 33
midbbins: 3.84
ratbbins: 8.60
noreturn: true
recursive: false
in-degree: 0
out-degree: 38
locals: 21
args: 2
I would invite you to inspect how all the numbers are computed in the
source code of radare2. Use your grep
friend and look for
end-bbs
or any other string to locate the file and line
that is doing this:
$ git grep trace-coverage
Using Radare2’s r2pipe JavaScript API, you can iterate through all functions and their basic blocks.
This script iterates over all basic blocks for each function,
extracting essential information such as their size, jump destinations,
and fall-through addresses. We can also perform RTable queries
(afl,
, afb,
) to filter and analyze the
function information, including cross-references, function calls, and
control flow graphs. The collected data can be used for various
purposes, such as identifying code patterns, analyzing program flow, or
detecting potential vulnerabilities. Additionally, the script can
generate statistical data about basic block distribution and function
complexity, which is valuable for program analysis and optimization.
Here’s an example script:
const blocks = [];
const functions = r2.cmdj('aflj'); // List functions in JSON
.forEach(func => {
functionsconst bbs = r2.cmdj(`afbj @ ${func.offset}`); // List basic blcoks
.forEach(bb => {
bbs.push({
blocksblock: bb.addr,
size: bb.size,
jump: bb.jump,
fail: bb.fail
;
});
});
})console.log(JSON.stringify(blocks));
r2 -i script.r2.js -A /bin/ls
to run the
script. script.r2.js
At this point we will notice how the script fails with this error:
[0x100003a58]> . a.r2.js
ERROR: SyntaxError: redeclaration of 'blocks'
ERROR: at <eval> (-:1:1)
ERROR: [uninitialized]
[0x100003a58]>
The reason for the error is that those constants are located in the global scope, and according to JavaScript rules, we cannot redefine them. To fix this issue, we must wrap the whole code inside an anonymous function (also known as an IIFE - Immediately Invoked Function Expression) and try again. This creates a new scope for our variables and prevents them from polluting the global namespace.
function() {
(// Your code here will have its own scope
const myConstant = 'value1';
// More code...
;
})()
// In another file or section
function() {
(// You can now use the same constant name
const myConstant = 'value2';
// More code...
; })()
This pattern is commonly used in JavaScript modules and libraries to avoid naming conflicts and maintain clean, modular code. It’s particularly useful when working with multiple files or third-party libraries that might use similar variable names.
function() {
(const blocks = [];
const functions = r2.cmdj('aflj'); // List functions in JSON
.forEach(func => {
functions...
;
})console.log(JSON.stringify(blocks));
; })()
Sometimes we want to know who is calling a specific function, and while xref exploration can be helpful, this approach can be manual and tedious. This is why radare2 provides commands that help identify the shortest path a program needs to take to reach a destination.
The abp
and abpf
commands serve this
purpose. The first one (abp
) works only within the same
function, while the second one (abpf
) works across
functions. For example, if you want to analyze a crash in a particular
function, these commands can help determine the shortest path to reach
that point.
[0x100003a58]> a?*~path
| abp[?] [addr] follow basic blocks paths from $$ to `addr`
Usage: abp [addr] [num] # find num paths from current offset to addr
| abp [addr] [num] find num paths from current offset to addr
| abpf [addr] same as /gg find the path between two addresses across functions and basic blocks
| abpj [addr] [num] display paths in JSON
| afco path open Calling Convention sdb profile from given path
| w [path] write to path or display graph image (see graph.gv.format)
[0x100003a58]>
The abp
command computes paths between basic blocks,
allowing you to find out the basic blocks that need to be executed or
emulated to reach the final address. This is interesting because we can
use this information to colorize the graph to clarify which are the
conditions that need to happen to reach the final point.
[0x100003a58]> abp [destination_address]
NOTE See abc
command to colorize basic
blocks.
But abp
has a problem. it’s just tied to following basic
block references. It won’t be able to find out data references or
indirect references via pointers or call instructions. This is,
abp
only works if you specify the begining of the basic
block address and you are inside the very same function.
To perform the same operation between functions we have
abpf
(formerly known as /gg
) which will
perform a basic block, call and reference graph and walk all the nodes
to find out the shortest path to reach the point starting from a
different address.
There are several things to improve here, so it will be great if someone spends some time reading the code and improving it because there are many cool features that can be implemented on top of these commands. But first of all you may want to play a little with different commands and solve today’s challenge.
Explore the binary winmain.exe
in the testbins
repository and respond to the following questions:
entrypoint
to the largest basic blockSubmit your results as a JSON file listing the path of addresses and post the result.
Happy reversing!