13 - Moving Around

Welcome to Day 13 of the Radare2 Advent of Code!

Today, we will explore techniques for navigating through the disassembly in Radare2. Understanding the flow of code and the relationships between instructions, basic blocks, and functions is crucial for effective reverse engineering. Let’s dive into some commands and scripting capabilities that make this process intuitive.

Radare2 provides simple shortcuts for moving through disassembly one instruction at a time:

Use so+1 to move to the next instruction.
Use so-1 to move to the previous instruction.

Check the offset in the prompt showing is where we go after seeking 1 and 2 instructions forward and backward.

[0x100003a58]> so+1
[0x100003a5c]> so-1
[0x100003a58]> so+2
[0x100003a60]> so-2
[0x100003a58]>

We can also use forward and backward disassembly to inspect the instructions we have around like this:

[0x100003a58]> so+4
[0x100003a68]> pd-4
 0x100003a58      7f2303d5       pacibsp
 0x100003a5c      fc6fbaa9       stp x28, x27, [sp, -0x60]!
 0x100003a60      fa6701a9       stp x26, x25, [sp, 0x10]
 0x100003a64      f85f02a9       stp x24, x23, [sp, 0x20]
[0x100003a68]> pd 4
 0x100003a68      f65703a9       stp x22, x21, [sp, 0x30]
 0x100003a6c      f44f04a9       stp x20, x19, [sp, 0x40]
 0x100003a70      fd7b05a9       stp x29, x30, [sp, 0x50]
 0x100003a74      fd430191       add x29, sp, 0x50
[0x100003a68]>

A fancy trick to have both at the same time is to use the pd-- command, which disassembles N instructions forward and N instructions backward:

[0x100003a68]> pd--4
 0x100003a58      7f2303d5       pacibsp
 0x100003a5c      fc6fbaa9       stp x28, x27, [sp, -0x60]!
 0x100003a60      fa6701a9       stp x26, x25, [sp, 0x10]
 0x100003a64      f85f02a9       stp x24, x23, [sp, 0x20]
 0x100003a68      f65703a9       stp x22, x21, [sp, 0x30]
 0x100003a6c      f44f04a9       stp x20, x19, [sp, 0x40]
 0x100003a70      fd7b05a9       stp x29, x30, [sp, 0x50]
 0x100003a74      fd430191       add x29, sp, 0x50
[0x100003a68]>

When we are working with large addresses (kernel addresses or those exceeding the x86-64 compatibility mode), radare2 provides a helpful way to type partial addresses. Instead of typing the complete address, you can use relative addressing with the s.. command followed by the last digits of the target address.

For example:

[0x100003a68]> s..32
[0x100003a32]>

In this case, instead of typing the full address 0x100003a32, we just used s..32 to seek to that location. The s.. command maintains the higher bits of the current address and only changes the specified lower bits.

This feature is particularly useful when: - Analyzing large executables or kernel code - Debugging processes with high memory addresses - Navigating through memory regions with similar address prefixes - Reducing the chance of typing errors when entering long addresses

You can also use this shorthand notation with other radare2 commands that accept addresses as parameters.

Basic Blocks

A basic block is like a piece of a puzzle in computer code - it’s a sequence of instructions that always run together, from start to finish, with no jumps or branches in between. Think of it as a straight path where once you start walking, you have to keep going until you reach the end. The only way in is at the beginning, and the only way out is at the end of the block.

These blocks are super helpful when you’re trying to understand how a program works. Imagine a flowchart where each box is a basic block, and the arrows between them show where the program might go next. Basic blocks make it easier to analyze code because you can focus on one chunk at a time, knowing that these instructions will always run together. They’re like the building blocks that help reverse engineers understand the bigger picture of how a program flows and what it does.

Radare2’s exposes the addresses and numbers that are related to basic blocks under the $B numvars. You can access these by running ?$?~B. Here’s a breakdown:

$BB : Start address of the current basic block.
$BE : End address of the current basic block.
$Bj : Jump address from the current basic block.
$Bf : Fail/fall-through address from the current basic block.
$Bi : Number of instructions in the current basic block.
$BS : Size (in bytes) of the current basic block.
$BC : Number of cases (e.g., in a switch statement) in the block.
$BC:{#} : Address of the nth case in the current block.

Using Numvars

Numvars are basically variables that are exposed and defined by radare internally and have a constant value, we cannot modify their value but we can use them in any command that takes a math expression as argument.

First of all we will analyze the function. Let’s open /bin/ls and then we will run the classic aaaa (do r2 -A /bin/ls if you are short in lowercase a’s), after that seek to the main symbol.

Jump to the beginning of the current basic block:

s $BB

Check the size of the block you’re currently in:

?v $BS

Disassemble the two basic blocks that are connected to the current one:

pdb @ $Bj
pdb @ $Bf

Usually basic blocks can have multiple input and output edges, we can assume the following statements:

No input edge : entrypoint
No output edges : exit node
One output edge : splitted node, connecting two parts of the function
Two or more input edges : beginning of loop
Two output edges : conditional branch
Multiple output : switch table

These variables are highly useful for programmatically analyzing and navigating code structure. And we can use them from r2js, r2pipe or any other scripting language we like.

You can also use afb to enumerate all the basic blocks of the current function and interpret the listing. Use the graph view in a separate terminal to verify your assumptions.

It’s often helpful to use visual mode to navigate through a function’s flow or to follow references across different functions or pointers of interest.

Learning the key combinations is essential to become comfortable with the interface. Here’s a quick guide you should practice with to solve the challenge.

agfv - enter visual interactive function graph view
V - enter visual mode (press p to switch to the disassembly)

Now you’ll encounter reference keystroke hints that look like this: ;[1] or ;[oe]. The text between the brackets represents the keys you need to type to make r2 jump to that location.

Here are some essential navigation keys you should know for keyboard-based navigation:

u - undo to the previous seek (same as s- command)
U - redo last undone seek (same as s+ command)
r - rotate between data, branch, call, computed reference hints
t - follow the true output edge branch from the current basic block
f - same for the false branch
x - open the xrefs view and use j/k keys to select the destination
n/N - seek to the next/previous function (see -e scr.nkey to choose the target)

We can navigate through the disassembly instruction by instruction using the so command. In Visual mode we can use j and k to move down and up respectively.

However, radare2 also provides other numeric variables (numvars) that enable us to perform calculations or express navigation in more sophisticated ways.

radare2 provides a variety of numvars to help navigate instructions effectively. You can list all available numvars by running:

?$?~i

Here’s a breakdown of some useful instruction-related numvars:

$in:{n} : Address of the nth instruction forward.
$ip:{n} : Address of the nth instruction backward. Example: s $I1@$Fe jumps to the last instruction in a BB.
$is[:{n}] : Size of the nth instruction.
$ij : Jump address for instructions like jmp, jz.
$ie : Returns 1 if it’s the end of a block, else 0.
$if : Jump fail address for conditional jumps. Example: jz 0x10 will point to the next instruction.
$ir : Pointer value referenced by the instruction (e.g., lea rax, [0x8010] -> 0x8010).
$iv : Immediate value in the instruction (e.g., mov eax, 42 -> 42).

Quick usage examples:

Skip the current offset to the 5th instruction forward.

s $in:5

Change the program counter to skip two instructions:

ar PC=$in:2

Check if the current instruction is the end of a basic block:

? $ie

Exploring the code using JSON

Most commands in radare2 have the ability to display the information using a structured JSON text that we can easily use later for scripting. It’s just about suffixing the command with j.

To visualize all the information of the current basic block in JSON format, we must use the abj command.

This includes their size, jump, fail, and the instructions they contain. Here’s an example:

[0x100003a58]> abj~{}
[
 {
  "addr": 4294982232,
  "size": 76,
  "jump": 4294982312,
  "fail": 4294982308,
  "opaddr": 4294982232,
  "inputs": 0,
  "outputs": 2,
  "ninstr": 19,
  "instrs": [
    4294982232, 4294982236, 4294982240, ...
  ],
  "traced": 1
 }
]

If we want to see all the basic blocks of the current function we must use afbj instead:

[0x100003a58]> afbj~ninstr
    "ninstr": 19,
    "ninstr": 1,
    "ninstr": 17,
    "ninstr": 5,
    "ninstr": 7,
    "ninstr": 7,
    "ninstr": 2,
    ...
    "ninstr": 2,
    "ninstr": 5,

Metrics

There are some interesting metrics that we can extract from the basic block sizes, edges, amount and type of instructions that they contain. These computations can help us to understand the purpose of the function, or how it is constructed, if it’s obfuscated, or how complex it is:

These are shown with the afi command. Let’s highlight the most relevant ones:

[0x100003a58]> afi
offset: 0x100003a58
name: main
size: 3020
is-pure: false
realsz: 2068
stackframe: 1696
cyclomatic-cost: 117
cyclomatic-complexity: 147
num-bbs: 135
num-instrs: 518
edges: 180
minbound: 0x100003a58
maxbound: 0x100004624
is-lineal: false
end-bbs: 5
trace-coverage: 62
maxbbins: 33
midbbins: 3.84
ratbbins: 8.60
noreturn: true
recursive: false
in-degree: 0
out-degree: 38
locals: 21
args: 2

size : highest instruction address minus the lowest address
realsz : sum the basic block sizes
pure : is this function depending on other functions?
cyclomatic cost : how expensive the
maxbbins : instruction count in the basic block with maximum amount of instructions
midbbins : sum all basic block instructions and divide by the amount of basic blocks.

I would invite you to inspect how all the numbers are computed in the source code of radare2. Use your grep friend and look for end-bbs or any other string to locate the file and line that is doing this:

$ git grep trace-coverage

Scripting Basic Block Analysis

Using Radare2’s r2pipe JavaScript API, you can iterate through all functions and their basic blocks.

This script iterates over all basic blocks for each function, extracting essential information such as their size, jump destinations, and fall-through addresses. We can also perform RTable queries (afl,, afb,) to filter and analyze the function information, including cross-references, function calls, and control flow graphs. The collected data can be used for various purposes, such as identifying code patterns, analyzing program flow, or detecting potential vulnerabilities. Additionally, the script can generate statistical data about basic block distribution and function complexity, which is valuable for program analysis and optimization.

Here’s an example script:

const blocks = [];
const functions = r2.cmdj('aflj'); // List functions in JSON
functions.forEach(func => {
    const bbs = r2.cmdj(`afbj @ ${func.offset}`); // List basic blcoks
    bbs.forEach(bb => {
        blocks.push({
            block: bb.addr,
            size: bb.size,
            jump: bb.jump,
            fail: bb.fail
        });
    });
});
console.log(JSON.stringify(blocks));

Use r2 -i script.r2.js -A /bin/ls to run the script
Modify the script to adjust to your needs
Re-run the script inside r2 with just . script.r2.js

At this point we will notice how the script fails with this error:

[0x100003a58]> . a.r2.js
ERROR: SyntaxError: redeclaration of 'blocks'
ERROR:     at <eval> (-:1:1)

ERROR: [uninitialized]
[0x100003a58]>

The reason for the error is that those constants are located in the global scope, and according to JavaScript rules, we cannot redefine them. To fix this issue, we must wrap the whole code inside an anonymous function (also known as an IIFE - Immediately Invoked Function Expression) and try again. This creates a new scope for our variables and prevents them from polluting the global namespace.

(function() {
    // Your code here will have its own scope
    const myConstant = 'value1';
    // More code...
})();

// In another file or section
(function() {
    // You can now use the same constant name
    const myConstant = 'value2';
    // More code...
})();

This pattern is commonly used in JavaScript modules and libraries to avoid naming conflicts and maintain clean, modular code. It’s particularly useful when working with multiple files or third-party libraries that might use similar variable names.

(function() {
  const blocks = [];
  const functions = r2.cmdj('aflj'); // List functions in JSON
  functions.forEach(func => {
    ...
  });
  console.log(JSON.stringify(blocks));
})();

Finding Path

Sometimes we want to know who is calling a specific function, and while xref exploration can be helpful, this approach can be manual and tedious. This is why radare2 provides commands that help identify the shortest path a program needs to take to reach a destination.

The abp and abpf commands serve this purpose. The first one (abp) works only within the same function, while the second one (abpf) works across functions. For example, if you want to analyze a crash in a particular function, these commands can help determine the shortest path to reach that point.

[0x100003a58]> a?*~path
| abp[?] [addr]        follow basic blocks paths from $$ to `addr`
Usage: abp  [addr] [num] # find num paths from current offset to addr
| abp [addr] [num]   find num paths from current offset to addr
| abpf [addr]        same as /gg find the path between two addresses across functions and basic blocks
| abpj [addr] [num]  display paths in JSON
| afco path       open Calling Convention sdb profile from given path
| w [path]                write to path or display graph image (see graph.gv.format)
[0x100003a58]>

The abp command computes paths between basic blocks, allowing you to find out the basic blocks that need to be executed or emulated to reach the final address. This is interesting because we can use this information to colorize the graph to clarify which are the conditions that need to happen to reach the final point.

[0x100003a58]> abp [destination_address]

NOTE See abc command to colorize basic blocks.

But abp has a problem. it’s just tied to following basic block references. It won’t be able to find out data references or indirect references via pointers or call instructions. This is, abp only works if you specify the begining of the basic block address and you are inside the very same function.

To perform the same operation between functions we have abpf (formerly known as /gg) which will perform a basic block, call and reference graph and walk all the nodes to find out the shortest path to reach the point starting from a different address.

There are several things to improve here, so it will be great if someone spends some time reading the code and improving it because there are many cool features that can be implemented on top of these commands. But first of all you may want to play a little with different commands and solve today’s challenge.

Challenge

Explore the binary winmain.exe in the testbins repository and respond to the following questions:

What’s the address of the largest basic block?
Trace the path from the first instruction in entrypoint to the largest basic block

Submit your results as a JSON file listing the path of addresses and post the result.

Happy reversing!