Welcome to Day 1 of the Advent of Radare!
Today, we’re diving into one of the foundational steps in binary analysis: identifying the specific CPU architecture.
Knowing a binary’s architecture is crucial as it affects everything from disassembly to emulation. Radare2 provides various tools and commands to help us uncover this information, most of the time this information is autodetected or directly exposed by the binary headers, but sometimes it’s not straightforward.
Let’s look at how we can use Radare2 commands like rasm2, i, and asm.cpu settings to investigate architectures. We’ll also explore an advanced script to attempt automatic architecture detection for binaries that lack headers, such as firmware binaries.
One of the quickest ways to familiarize yourself with the
architectures Radare2 supports is by running rasm2 -L
. This
command outputs a comprehensive list of architectures, along with the
bits, endianness, and available CPU types associated with each.
The output of rasm2 -L will list architectures such as x86, arm, mips, powerpc, and more. Here’s a sample of what it looks like:
$ rasm2 -L
_de 8 6502 Disassembler for the 6502 microprocessor family (NES, c64, ..)
_de 8 6502.cs Capstone mos65xx 8 bit microprocessors
ade 8 16 8051 8051 microcontroller (also known as MCS-51)
_de 64 alpha ALPHA architecture disassembler based on GNU binutils
_de 32 amd29k AMD 29k decoder
a__ 16 32 64 any.as Use system's gnu/clang 'as' assembler
a__ 8 16 32 64 any.vasm Use asm.cpu=6502, 6809, c16x, jagrisc, m68k, pdp11, ppc,qnice, tr3200, vidcore, x86, z80
_de 16 32 arc ARC processor instruction decoder
a__ 16 32 64 arm.nz Custom thumb, arm32 and arm64 assembler
_de 16 32 64 arm Capstone ARM analyzer
_de 16 32 64 arm.gnu ARM code analysis plugin (asm.cpu=wd for winedbg disassembler)
_de 64 arm.v35 Vector35 ARM analyzer
ade 8 16 avr AVR microcontroller CPU by Atmel
ade 32 bf brainfuck architecture
ade 32 bpf.mr BPF the Berkeley Packet Filter bytecode
_de 32 64 bpf Capstone BPF bytecode
...
Note that for parsing purposes you can always append -j
to get the output in JSON rasm2 -jL
.
The first column indicates the following information:
a
the plugin supports assembling instructions
(encode)d
the plugin supports disassembling instructions
(decode)e
the plugin supports emulating instructions
(emulate)If the architecture we are looking for is not listed there we may
probably want to use r2pm -s
to search for 3rd party
plugins and install them like this:
$ r2pm -ci hexagon
Most of the time we will be loading binaries with a structured header that specifies all this information.
The i
command outputs basic information about the
binary, often including arch, bits, endian, class, and machine. For
instance:
$ r2 -qci /path/to/binary
arch x86
bits 64
endian little
This metadata helps you confirm if radare2 has correctly detected the binary’s architecture. However, when working with raw binaries (like firmwares or memory dumps) that lack those meta headers, radare2 will default to an incorrect or the host architecture, requiring manual intervention.
But there’s no need to load the entire binary inside radare2 to
retrive the architecture information, we can achieve the same output
using just rabin2
from the shell like this:
$ rabin2 -I /path/to/binary
arch x86
bits 64
endian little
...
This information is also exposed in JSON format by just appending the
-j
flag:
$ rabin2 -j -I /bin/ls | jq .
{
"info": {
"arch": "arm",
"baddr": 4294967296,
"binsz": 89088,
"bintype": "mach0",
"bits": 64,
"canary": true,
"injprot": false,
"class": "MACH064",
...
To configure the CPU model for the given architecture we must use the
asm.cpu
variable. This can be essential when dealing with
binaries optimized for specific processors, such as ARM Cortex-M or MIPS
R3000.
To list valid asm.cpu options for the currently loaded architecture, use:
e asm.cpu=?
You can list the cpus for the arm.gnu plugin with the command from the shell with the following oneliner:
$ r2 -a arm.gnu -b 32 -qc 'e asm.cpu=?' --
v2
v2a
v3M
v4
v5
v5t
v5te
v5j
XScale
ep9312
iWMMXt
iWMMXt2
Changing the asm.cpu will show immediate change after disassembling
code, this may help us discover what some invalid
instructions are really doing. Note that inside the r2 shell you can
also use -e
(like the commandline flags of the very same
tool):
-e asm.cpu=v5t
pd 10
Setting the asm.cpu appropriately can enhance disassembly accuracy by accounting for architecture-specific opcodes and behaviors, providing a more precise interpretation of the binary’s instructions.
When a binary has no header information, architecture detection becomes a manual process. However, we can leverage Radare2’s flexibility with an r2js script that tries different architecture and bit configurations, analyzes the disassembly, and measures the ratio of valid to invalid instructions. This process can give us a strong indication of the correct architecture by narrowing down configurations that yield the fewest decoding errors.
The script below attempts various arch and bits combinations, performs a short disassembly (pd), and counts invalid instructions. The configuration with the least invalid instructions is likely the correct one.
This script is written in r2js, Radare2’s JavaScript interface, which allows for dynamic command execution and result parsing.
const architectures = [
arch: "arm", bits: [64, 32, 16]},
{arch: "x86", bits: [64, 32, 16]},
{arch: "mips", bits: [64, 32, 16]},
{arch: "ppc", bits: [64, 32]}
{;
]
let bestMatch = {arch: "", bits: 0, invalidCount: Infinity};
for (const config of architectures) {
for (const bit of config.bits) {
// Set architecture and bit width
.cmd(`e asm.arch=${config.arch}`);
r2.cmd(`e asm.bits=${bit}`);
r2
// Perform a short disassembly
const disasm = r2.cmdj('pdj 80');
let invalidCount = 0;
// Count invalid instructions
.forEach(instruction => {
disasmif (instruction.opcode === 'invalid') {
++;
invalidCount
};
})
console.log(`Testing ${config.arch}-${bit}: ${invalidCount} invalid instructions`);
// Track the best configuration
if (invalidCount < bestMatch.invalidCount) {
= {arch: config.arch, bits: bit, invalidCount};
bestMatch
}
}
}
console.log(`Best match: ${bestMatch.arch}-${bestMatch.bits}`);
// Set the best configuration
.cmd(`e asm.arch=${bestMatch.arch}`);
r2.cmd(`e asm.bits=${bestMatch.bits}`); r2
The script iterates through different configurations, disassembling the first 80 instruction and counts how many of them can’t be decoded and considered invalid.
Best Match Selection: It tracks the configuration with the lowest count of invalid instructions, which is likely to be the correct one.
$ uname -m
arm64
$ r2 -i whatarch.r2.js /bin/ls
Testing arm-64: 0 invalid instructions
Testing arm-32: 12 invalid instructions
Testing arm-16: 3 invalid instructions
Testing x86-64: 3 invalid instructions
Testing x86-32: 1 invalid instructions
Testing x86-16: 0 invalid instructions
Testing mips-64: 18 invalid instructions
Testing mips-32: 2 invalid instructions
Testing mips-16: 18 invalid instructions
Testing ppc-64: 4 invalid instructions
Testing ppc-32: 4 invalid instructions
Best match: arm-64
-- Now with more better English!
[0x100003a58]>
There are several assumptions this script is doing that can be improved and it’s important to have them into consideration.
As for today we give you the challenge to write a better version of this script that is able to solve all the problems described in this post and share it! ideally as a pull request into the examples/ directory in GitHub!
If you are curious about radare2, I would recommend you to checkout the following links:
Feel free to contribute and open tickets to improve the documentation from what you learned here!
Identifying a binary’s architecture is the foundation of effective reverse engineering, and Radare2 offers robust tools to assist in this process. By using commands like rasm2 -L, i, and asm.cpu, we can investigate architecture and bit options manually. In cases with limited metadata, scripts can automate the detection process, saving valuable time.
Hope you all learned something new and see you tomorrow in the second advent post of radare2!
–pancake