/advent

18 - Hello Papi

Welcome to Day 18 of the Advent of Radare!

We’ve heard about r2pipe and how easy it is to script and automate actions with just the simple command calling interface that captures the command output and returns it.

Today we will review all that and take some steps forward to understand the facilities provided by r2papi; the high level and idiomatic API written on top of r2pipe.

And at the end get a quick overview of what’s r2pipe2 and how it bypasses the limitations of the first protocol by keeping backward compatibility.

The Pipe

During my experience with reverse engineering crackmes, I encountered a significant limitation: the inability to effectively script debugging operations in GDB. At that time, GDB’s scripting capabilities were quite limited, which made automating complex debugging tasks particularly challenging. This limitation became a major bottleneck in my reverse engineering workflow, especially when dealing with sophisticated protection mechanisms that required repetitive debugging steps.

This challenge motivated me to implement native debugger support in radare2. So I designed and developed a straightforward scripting interface based on running commands and capturing the output in exchange.

Creating bindings is boring and tedious, it’s very error prone and adds a huge tech debt to the project because every single change in the C APIs or the commands requires tests, updates interfaces and re-designing the same thing several times to make them idiomatic and clean to be used.

Also, having because of the costs of creating bindings, it restricts the support to very few languages. Like, for example: Python.

After considering how to avoid creating bindings for all internal APIs and defining structured objects for every data exchange between the scripting language and core, I realized the solution was straightforward: using strings and JSON.

Languages

Since every programming language has its limitations and drawbacks, I prefer not to become overly dependent on any single one.

Python would be the primary choice for many users, but it have so many problems:

While I enjoyed writing Perl one-liners back in the day, creating bindings for it isn’t its strongest feature. Languages like Scheme or TCL offer clean, well-designed C interfaces, but their limited JSON support and smaller ecosystem of libraries can be challenging for modern development.

I had an idea: what if there was a way to make r2 communicate with any programming language without needing to create specific bindings for each one? After all, we could potentially reduce all the logic to a single function call:

func r2cmd(cmd: String) String;

I have a function that accepts a command as an argument. This function communicates with radare2, executes the provided command, captures its output, and returns it as a string.

Using this simple primitive I was able to implement bindings for more than 30 programming languages.

Communication Channels

There are multiple ways to stablish a communication channel. Let’s mention some of them, because there’s no need to be tied to a single solution when we can choose the one that fits better for us.

Native

Just calling r_core_cmd_str is enough to emulate r2cmd. So if the language permits doing dlopen we can just call r_core_cmd0 and pass the pointer and convert the string back to the user.

Pipes

This is probably the easiest and more common way to use r2pipe. When spawning a process via the #!pipe rlang plugin, it will create two environment variables named R2PIPE_IN and R2PIPE_OUT with the file descriptor ids. Those are exposed to the child process in UNIX, so we can just write the command and read the result.

Stdio

Using the -q0 command-line flag, we instruct r2 to append a null byte at the end of each command’s output. Additionally, we will read a null-terminated string from stdin to capture the user-provided command to be executed.

HTTP

Radare2 comes with an embedded webserver, which can be started with the =h command. Then we can use curl to run the command and get the output in response.

curl http://localhost:8080/cmd/x

HTTP is not the only network communication protocol provided by r2, there’s support for:

Structured Data

Having plain string output is functional but not ideal. While the UNIX design principle of working with simple text strings makes debugging easier, and performance isn’t a major concern (since payload size is minimal compared to command execution time), there are limitations.

Parsing command output can be challenging because it’s primarily designed for human readability, and the format may change between versions. This makes it less reliable for programmatic use. We need a more structured way to handle data that can be consistently processed by programming languages.

This is where JSON comes in. To address this need, I’ve added JSON output capability to all commands by allowing them to output JSON format when appending ‘j’ to the command.

Let’s see an example:

[0x00006d30]> ij~{}
{
  "core": {
    "type": "DYN (Shared object file)",
    "file": "/bin/ls",
    "fd": 3,
    "size": 142312,
    "humansz": "139.0K",
    "iorw": false,
    "mode": "r-x",
    "block": 256,
    "format": "elf64"
  },
  "bin": {
    "arch": "x86",
    "baddr": 0,
    "binsz": 140327,
    "bintype": "elf",
    "bits": 64,
    "canary": true,
    "injprot": false,
    "class": "ELF64",
    "compiled": "",
    "compiler": "",
    "crypto": false,
    "dbg_file": "",
    "endian": "little",
    "havecode": true,
    "guid": "",
    "intrp": "/lib64/ld-linux-x86-64.so.2",
    "laddr": 0,
    "lang": "c",
    "linenum": false,
    "lsyms": false,
    "machine": "AMD x86-64 architecture",
    "nx": true,
    "os": "linux",
    "cc": "",
    "pic": true,
    "relocs": false,
    "relro": "full",
    "rpath": "NONE",
    "sanitize": false,
    "static": false,
    "stripped": true,
    "subsys": "linux",
    "va": true,
    "checksums": {
      
    }
  }
}
[0x00006d30]> 

THanks to the JSON output avilable to almnost every command we can have an r2pipe wrapper function that looks like this:

func r2cmdj(cmd: String) Object {
    return JSON.parse(r2cmd(cmd))
}

This way we can write something as cool as this:

if r2.cmdj("ij").bin.arch == "x86" {
    println("Hey, this is an intel binary!")
}

R2JS

Radare2 comes with a javascript interpreter which can be triggered when launching it with the -j command or commandline flag.

$ r2 -j
[r2js]>

To run `` scripts from the r2 shell we can use:

Note that the 3 commands will do the exactly the same, with the only difference that . and -j are aliases that detect which rlang plugin to use depending on the file extension (.js vs .r2.js) and the last one -j will use the RLang.qjs plugin only.

Higher Level API

While scripting commands directly is possible, it may not be easily readable for those unfamiliar with the radare2 shell or their outputs.

To address this, we can create a higher-level API that provides interface definitions for JSON representations and exposes clean, structured, and idiomatic APIs tailored to different programming languages.

This implementation is called r2papi (R2PIPE API) and is currently available for Python and JavaScript (specifically TypeScript).

Since r2 includes a JavaScript runtime based on QuickJS, we can execute JavaScript directly from the shell without installing additional runtimes like Node.js or Deno. This greatly enhances our automation capabilities.

The R2Papi library comes embedded within r2js, allowing us to use high-level idiomatic APIs either through global instances or by creating a new instance based on the global r2pipe instance linked to the current r2 session:

r2 = r2pipe.open(); // not necessary
R = new R2Papi(r2);
r2p.analyzeProgram();

The r2papi implementation is designed to be asynchronous to support backends like the HTTP one. However, when executed from within the r2js shell, it’s transpiled into a synchronous API for easier use, eliminating the need for boring await statements.

Nested R2Pipe

Another cool feature of r2pipe is that we can nest and use multiple parallel instances.

R2Pipe inside r2js can be also used to spawn new instances of r2, execute commands on them and get them back to the parent r2js session. This can be useful to use r2 as a top-level programming language because.

const r = r2pipe.open("/bin/ls");
const a = r.cmd("?e hello from the child");
const b = r2.cmd("?e hello from host");
console.log(a, b);

Shell

Considering r2 implements the most common posix shell commands and we can execute commands from the system with the ! shell escape command.

This allows us to use r2js to create portable shellscripts or programs that run seamlessly on windows, linux and even embedded systems that don’t have a posix shell installed.

console.log(r2.cmd("cat /etc/motd"));
const uname = r2.cmd("!!uname -a ").trim();

Compiling ESM Typescript

We have mentioned previously that r2papi is implemented in TypeScript, therefor we have type definitions and language server autocompletions for any IDE we like, you can pick vim or Visual Studio Code and have a more modern development experience when scripting r2.

Considering javascript transition between requires and modules is currently a big mess, I decided to implement the same solution that was made in Frida, packing all the resources into a single binary file that contains all the ESM modules all together.

Thanks to this, r2 comes with the classic require, which loads and evaluates js files from local filesystem, but also permits to load those binary packs compiled with any of these:

Limitations and R2Pipe2

Unfortunately, working with plain strings and JSON doesn’t address all the challenges we face when scripting complex tools like radare2. While the original implementation is flexible, simple, and powerful, it comes with several limitations:

After careful consideration of these limitations, I’m excited to introduce r2pipe2!

The second version of the r2pipe protocol maintains full backward compatibility with the first version. Isn’t that great? Let’s dive in and explore how it works:

[0x00000000]> {?
Usage: {"cmd":"...","json":false,"trim":true} # `cmd` is required
[0x00000000]> {"cmd":"?e hello world"}
{"res":"hello world\n","error":false,"value":256,"code":0,"code":0}
[0x00000000]> 

The functionality is built upon the “{” command. When an input command passed to r2pipe begins with a brace, it’s interpreted as plain JSON, and consequently, responds using the r2pipe2 structured format.

A notable advantage is the ability to prefix commands with a single quote to prevent command injection ('{...). In return, we receive comprehensive context information including: - Error codes - Numeric values from the last math operation - Error messages directed to stderr - Results as strings within the JSON object

This streamlined design was implemented in the following commit and released as part of r2-5.9.4:

commit 5f76b95bf3753aa1804b95a6f7cd981ee0381087
Author:     pancake <pancake@nowsecure.com>
AuthorDate: Mon May 27 11:57:13 2024 +0200

Work is currently underway to refine return codes, error messages, and JSON results across all commands to ensure r2pipe2 provides full transparency for all use cases.

Sample script

This script is designed to extract and analyze potential Base64-encoded strings from binary files. It searches through the binary content, identifies strings that match Base64 pattern characteristics, and attempts to decode them. This can be particularly useful in reverse engineering, malware analysis, or when trying to discover hidden information within executable files.

// r2js Script to decode and verify Base64 strings from flags prefixed with 'str.'
let flags = r2.cmdj('fj'); // Get all flags as JSON
let result = []; // Array to store results

// Iterate through the flags
flags.forEach(flag => {
    if (flag.name.startsWith('str.')) {
        let flagName = flag.name;
        let flagOffset = flag.offset;

        // Get the string data at the flag's offset
        let stringValue = r2.cmd(`ps @ ${flagOffset}`).trim();

        try {
            // Decode Base64 and check validity
            let decodedValue = r2.cmd(`?e base64:${stringValue}`).trim();
            if (decodedValue) {
                result.push({ flag: flagName, validBase64: true, decodedValue });
            } else {
                result.push({ flag: flagName, validBase64: false, decodedValue: null });
            }
        } catch (err) {
            result.push({ flag: flagName, validBase64: false, error: err.message });
        }
    }
});

// Output results
console.log('Base64 Decoding Results:');
result.forEach(entry => {
    console.log(`Flag: ${entry.flag}, Valid Base64: ${entry.validBase64}`);
    if (entry.validBase64) {
        console.log(`Decoded Value: ${entry.decodedValue}`);
    } else if (entry.error) {
        console.log(`Error: ${entry.error}`);
    }
});

Challenge

Today we have covered so much stuff about scripting and automating actions inside r2, but it was all about theory, now it’s time to put all this knowledge into practice and get something done!

You can find some examples of scripts under the ./scripts` directory of radare2 for inspiration.

To practice all the learnings today you must write an .r2.js script that uses the NativePointer API and parses some bits of an ELF or MACHO headers.

The NativePointer class is inspired by the implementation made by Ole Andre in Frida. It provides a nice way to use 64bit pointers and read data in different endians and sizes, moving along the address space in a clean and idiomatic way.

The global function ptr returns a new instance of that class, so it can be used in a more convenient way, here’s an example:

[0x00000000]> -j
[r2js]> G.p = ptr("$$"); // pointer to current address
[r2js]> G.p.read<TAB>
readByteArray        readPointer          readS32
readU32              readU64le            readCString
readRelativePointer  readS8               readU32be
readU8               readHexString        readS16
readU16              readU32le            readWideString
readInt32            readS16be            readU16be
readU64              readPascalString     readS16le
readU16le            readU64be         
[r2js]> G.p.readU16le()
0

Note that when using the repl we can use the G object which is an alias of the global scope to save variables across multiple scripts or lines.

Summary

This has been an intense and enlightening journey, showcasing just a fraction of what’s possible with radare2’s scripting features. While we’ve covered several powerful techniques and applications, this merely scratches the surface of radare2’s full potential. The framework offers numerous additional capabilities, including automated analysis, custom plugin development, and integration with other security tools.

All the learned contents today can be applied to local static analysis, native debugging, emulation and even to script r2frida or automate complex processes combining multiple plugins in multiple local and remote instances of r2.

Remember, mastering these tools is an ongoing journey, and there’s always more to learn and discover in the fascinating world of reverse engineering.

Stay tuned for tomorrow’s Radare2 challenge as we explore more interesting topics!