Welcome to second day of the Advent of Radare2!
Today, we’re exploring breakpoints. Learning about how to use the
db
command in the radare2 debugger, stopping on the
main
function, understanding what happens before the
main
, and using Radare2’s advanced features like
dcu main
and the r2frida
plugin for performing
early instrumentation on a variety of programs.
Breakpoints are essential for debugging. They allow us to pause
program execution at specific points, inspect the current state, and
step through instructions. In Radare2, breakpoints are handled by the
db
command, which can set and delete breakpoints with
simple commands.
There several ways to interrupt the execution of a program and bringing back the control to the attached debugger. Some architectures support hardware breakpoints, which mean that memory is not modified, but instead the CPU is configured to stop the execution when a specific condition is reached.
Software breakpoints modify the program code by replacing instructions at the target address with a special one that triggers an exception. For example:
INT 3
(0xCC) is used.BKPT
instruction is
inserted.BRK
.EBREAK
opcode is
used.Note that a breakpoint instruction can be also any invalid instruction, not just those specially used for this. On certain architectures like ARM the breakpoint instruction takes an immediate as argument, which is used by software debuggers to differentiate between different types of interruptions, making it easier for the analyst to understand the point of break.
At source level, radare2 provides the r_sys_breakpoint()
API, which can be used to directly inject a breakpoint instruction in a
portable way, helping developers get into the debugger control without
having to care about typing boring source debugging commands.
Some packers and binary obfuscators make use of these special instructions to break control on the execution and make it harder to understand the control flow and make debuggers harder to use. These are some of the reasons why it’s important for us as reverse engineers to understand the role and use of hardware and software breakpoints.
In radare2 we can use the dbg.swstep
and
dbg.hwbp
to choose if we want to use software or hardware
breakpoints for stepping or stopping the program execution by
default.
To set a breakpoint at a specific address, we use the following command:
db <address>
Setting a breakpoint on a function, like main, can be as simple as using its flag name:
db sym.main
Note that any numeric argument taken by commands in r2 is parsed as
an RNum expression, which means that you can use math operators as well
as special dollar numbers (See ?$?
output).
[0x00000000]> ?$?~B
| $S[:{name}] section offset (alias for $SB)
| $SB[:{name}] section begin
| $B base address (aligned lowest map address)
| $DB alias for $D
| $BB begin of basic block
| $BE end of basic block
| $Bj jump out address from basic block
| $Bf fail/fall out address from basic block
| $Bi basic block instructions
| $BS basic block size
| $BC cases count for this block
| $BC:# address of the nth case
| $F same as $FB
| $FB begin of function
| $MB alias for $M
[0x00000000]>
For example, if we want to add a breakpoint at the beginning of the current basic block we can use:
[0x00000000]> db $BB
And if we want to do it 5 instructions after main:
db $in:5 @ main
Here, sym.main refers to the main function’s symbol in Radare2, But
we can use any other symbol, function or address that is
flagged, in other words. In radare2 we use flags to
name offsets, and all math operations can use named offsets as
reference. You can use f
to list all the flags,
afl
for the functions or is
for the
symbols.
is
After setting a breakpoint, you can start the program with
dc
(debug continue) and the program will pause when it hits
the breakpoint. At this point we can run dc
again to
continue the execution after the breakpoint or inspect the registers and
memory at this point. But I want you to understand what’s the debugger
doing here behind the scenes:
Note that in the whole process, we are assuming that a breakpoint instruction is of the size of the smallest instruction possible for the given architecture. This is, on intel 1 byte, on arm64: 4 bytes. Why’s that? On intel or arm32/thumb it’s possible to jump into an unaligned memory address in the middle of an instruction.
To delete a breakpoint at a specific address or function, use this
command, where <address>
can be also a flag name. But
considering that we can only have one breakpoint per address, r2 uses
that as an id. Use db
to enumerate the registered
breakpoints.
[0x1028cd074]> db-<address>
If we just want to get rid of all of them at once just run this one.
[0x1028cd074]> db-*
If we look carefully into the help message of db?
we
will find out that there are a couple of attributes:
Toggling the trace bit can be done with the dbite
command (debug breakpoint (by index) trace enable.
Also, we can use dbic
to associate an r2 command to be
triggered when a breakpoint is hit.
$ r2 -d ls /
[0x100e79048]> db main
[0x100e79048]> dbi
0 0x100b9fa58 E:1 T:0
[0x100e79048]> dbite 0
[0x1027a4638]> dbic 0 ?e hello world
[0x1027a4638]> dc
INFO: hit tracepoint at: 0x102723a58
hello world
bin etc usr proc sys
INFO: ==> Process finished
[0x102723a5c]>
While db
is useful, manually managing breakpoints can
get tedious, especially if you only need to break once at a function.
Radare2 offers a convenient command to handle this automatically:
dcu
.
The dcu
(debug continue until) command sets a breakpoint
at the specified function or address, continues execution until reaching
it, and then removes the breakpoint once hit. This is especially useful
for functions like main when we only want to pause there once:
dcu sym.main
When we run r2 -d ls
the process will stop, as soon as
possible. Which means that the kernel will fork to create a new process,
run execve, which will run the program interpreter
defined in the binary headers (On UNIX systems, ELF or MACH0 only)
$ rabin2 -I /bin/ls | grep intrp
intrp /usr/lib/dyld
$
If we inspect memory maps, we may find out that there’s not even libc loaded in memory yet. So we have lots of room for messing around with the program before even the main is hit. Traditional debuggers don’t let you dig that way because they are designed to be source debuggers, assuming they are only useful for inspecting the target program. Low level debuggers like radare2 are designed to work with assembly code, outside the boundaries of the operating system or common runtimes.
It’s a common misconception to think about main
being
the first function executed in a program. However, in most binary
formats, main is actually called later in the program’s execution
sequence. Here’s why:
To illustrate this, we can use Radare2 to examine the binary’s entrypoint and its references to main:
The pif
command disassembles the instructions of the
function in the given address (entry0
). Within this code,
you’ll typically find a call to main after the necessary runtime setup
is completed.
NOTE pif, pdf, pdr, .. and all those commands
need the function to be previously analized. You can use
r2 -A
or just run af
Let’s inspect this entrypoint from a 32 bit Linux binary. Note that
macOS and Windows binaries use a different way to pass initialize the
program and prepare the memory to execute main
.
NOTE if you don’t have a Linux box or you need
binaries from other operating systems or architectures check the
radare2-testbins repository which is cloned under the
test/bins
directory of radare2
$ r2 /bin/ls
[0x08048420]> s entry0
[0x08048360]> af
[0x08048360]> pif
xor ebp, ebp
pop esi
mov ecx, esp
and esp, 0xfffffff0
push eax
push esp
push edx
push sym.__libc_csu_fini
push sym.__libc_csu_init
push ecx
push esi
push main
call sym.imp.__libc_start_main
Note that libc_start_main
takes several arguments from
the stack by popping them one by one. The most interesting ones here are
the 3 function pointers passed that are:
sym.__libc_csu_init
main
sym.__libc_csu_fini
So we can inspect what the program will execute before the main:
[0x080484a0]> pd 10 @ sym.__libc_csu_init
; DATA XREF from entry0 @ 0x8048370(r)
┌ 99: sym.__libc_csu_init
│ 0x080484a0 push ebp
│ 0x080484a1 mov ebp, esp
│ 0x080484a3 push edi
│ 0x080484a4 push esi
│ 0x080484a5 xor esi, esi
│ 0x080484a7 push ebx
│ 0x080484a8 call sym.__i686.get_pc_thunk.bx
│ 0x080484ad add ebx, 0x1b47
│ 0x080484b3 sub esp, 0x1c
│ 0x080484b6 call sym._init
...
In the case of malware, or code hidden in the binary we may probably
want to patch that binary to just do nothing on the executable
constructor. (Note that libraries, not just programs, can also execute
code at dlopen
time).
oo+ # reopen in read-write
wao ret @ sym.__libc_csu_init # write a ret in there
TLS stands for thread-local-storage, which, by definition it is just an independent memory space reserved to be used for each different thread in programs.
On UNIX systems (read it as BSD, Linux, Darwin, ..) this is just allocated and filled by zeroes.. But Windows takes this concept a step forward and adds an array of functions to be executed when a new thread is created. And by thread I mean also creating the process.
This is achieved through TLS callbacks—functions that the operating system invokes during thread creation and termination. These callbacks are defined in the Portable Executable (PE) header and execute before the program’s main entry point, enabling initialization of thread-specific data structures.
Malware often exploits TLS callbacks to execute code prior to the main function, thereby evading detection by debuggers that typically break at the program’s entry point. To effectively debug such binaries using radare2, it’s crucial to identify and handle TLS callbacks. By setting breakpoints at these callbacks, analysts can gain control over the execution flow from the earliest stages, allowing for thorough examination of the program’s behavior.
Radare2 can enumerate those callbacks by listing them with the
iee
command, as they are considered alternative entrypoints
(ie
).
[0x00401000]> ie
paddr vaddr phaddr vhaddr type
―――――――――――――――――――――――――――――――――――――――――――――――――
0x00000200 0x00401000 0x00000068 0x0040068 program
Now using iee
to enumerate constructors and
destructors…
[0x00401000]> iee
paddr vaddr phaddr vhaddr type
―――――――――――――――――――――――――――――――――――――――――――――――――
0x00000220 0x00401020 0x00000384 0x00400384 tls
[0x00401000]>
Checkout how rabin2 creates the flags to register this information
inside the current radare2 session by combining the -r
(radare2 commands output) flag and the -e
(entrypoint):
$ rabin2 -re /bin/ls
'fs+symbols
'f entry0 1 0x100003a58
'f entry0_haddr 1 0x000005f0
's entry0
'fs-
$
We will learn more about flags in the future, but for now we need to know that:
'
: single quote avoids evaluating special characters
in the linefs+fsname
: create or set the current flag space (group
of flags)fs-
: select no flagspacef name size addr
: creates a flag with
name
and size
in the given
address
s
: seek to change the current offset to move to the
entry0
Some architectures, like x86, support raising a trap before executing an instruction. This may let debuggers control single-stepping at the hardware level, with less need for exception handling. But under some circumstances, we may not be able to achieve this; for example, on MIPS or when using a JTAG to instrument firmware running on a raw development board.
For those cases we probably need to end up using software stepping which requires writing breakpoint instructions in the very next instruction and resuming the execution.
Note that this operation may look as simple just checking the size of
the instruction pointed by PC
and using this to determine
the breakpoint instruction. But we must take into consideration that
MIPS will always execute 2 instructions, and different architectures
will do differently, so this needs to be taken into account to determine
all the possible control flow path changes and place all the breakpoints
needed to avoid falling on conditional branches.
Luckily for you, radare2 implements all the code analysis
requirements to perform software stepping in r2 and r2frida with all the
available debugger backends (dL
) by just setting the
e dbg.swstep = true
configuration option.
[0x00000000]> dL
- bf BF debug plugin
- bochs bochs debug plugin
- esil esil debug plugin
- evm evm debugger backend
- gdb gdb debug plugin
- io io debug plugin
o native native debug plugin
- null null debug plugin (does nothing)
- qnx qnx debug plugin
- rap rap debug plugin
- rv32ima experimental riscv32ima emulator
- winkd winkd debug plugin
For early breakpoint placement or tracing on systems with complex initializations, r2frida is an ideal solution. r2frida uses Frida’s dynamic instrumentation to debug running applications and even spawn new ones for analysis. One of the key features here is the ability to attach breakpoints early using the frida://spawn method, allowing us to catch events that occur before main or other late-loading functions are reached.
NOTE: To install r2frida you don’t need to install
frida
at all. r2frida is self contained and you can install
it and check if it’s installed with the following commands:
$ r2pm -ci r2frida
$ r2 -L | grep frida
$ r2 frida://0
r2frida can spawn (create the process without executing any instruction), launch (create and resume execution), as well as attaching to an already existing process by specifying the process id.
$ r2 frida:///bin/ls
[0x00000000]> :db main
The :db
main command places a breakpoint directly on
main using Frida’s API. But r2frida also provides our new favourite way
to stop in functions which mimics the one in the radare2 debugger.
[0x00000000]> :dcu main
NOTE: At this point you have probably noticed that
all r2frida commands start with :
. This is
because r2frida is an io plugin, and some io plugins provide an
interface to execute commands thru them by prefixing them with this
character.
By following these steps, Radare2 will pause at the beginning of the
main function, allowing you to inspect registers, memory, as well as
modify the memory layout, load libraries or add extra hooks if needed.
Frida is not designed to work as a debugger, but we can achieve the same
as the trace option of the breakpoints with the dtf
which
add a tracepoint to the given function, associating an r2 command,
showing function return, register or argument values.
Note that latest versions of Frida provide native breakpoint and watchpoint APIs, so you can leverage all these actions, not just from r2 or r2frida commands, but also write JS scripts that control the execution flow with precision.
We will cover r2frida in more detail in future posts, and we have messed enough, so I won’t go into more details for now.
Today we have covered so much stuff for just a simple debugger feature, it may feel a bit overwhelming because we didn’t went into real practical examples like how to bypass anti-debugging tricks that abuse these features, but it’s important to understand the basics first.
So for today, the only challenge is just to practice the explained
commands and experiment with different binaries of your choice on
different operating systems. This will help you remember the theory and
get fluent with the tooling. Take some time to install
r2frida
and have fun inspecting binaries!
See you tomorrow for more knowledge bits!
Happy debugging!
–pancake