20 - Welcome Contributors

Welcome to 20th Day of the Advent of Radare!

Free software projects advance thanks to their surrounding communities. Contributions come in many forms: discussions, creating projects, bug reports, adding tests, documentation all help them grow and evolve over time.

Radare2 is no exception, and while it has a reputation for having a huge and complex codebase, the reality is quite different when people actually engage with it. There have been numerous presentations at r2cons specifically aimed at helping new contributors overcome their initial hesitation and understand the fundamentals.

In fact, there are many ways to contribute, and in most cases it’s easier than you might think. Before moving on to the next section, remember that coding is not the only way to contribute to the project!

Introduction

There are some good resources online and offline to learn more about contributing, but I would encourage you to read the following files from the root of radare2:

DEVELOPERS.md guidelines
CONTRIBUTING.md instructions

Also, it’s a good practice to use tig (reverse git) to inspect the git history and find commits that are similar to the ones you want to do to learn from previous contributors.

Tests

It’s generally good and recommended practice to write tests to ensure functionality won’t break in the future and maintain consistency. Even with a comprehensive test suite, code coverage is often far from 100%, so keep this in mind.

Don’t hesitate to submit patches focused solely on improving the test suite. You can use this as an opportunity to contribute even if you’re not modifying the main codebase.

The test suite we use is called r2r, a command-line tool that you can use to verify that your local build of radare2 passes all the tests. However, it’s common practice to delegate this task to the CI (Continuous Integration) and just keep the local copy.

This tool is shipped as part of the radare2 toolchain and is also used to test third-party plugins like r2yara, r2frida, etc.

There are for types of tests:

asm - test arch plugins to assemble and disassemble instructions
cmd - run commands and compare output
json - run cmds and verify output is valid json
fuzz - runs fuzzing checks on inputs for: fs, bin, cmd
unit - test behaviour and usage of various C APIs

The tool accepts 3 types of input arguments:

Directories containing test files
Individual test files (format described below)
C source files containing special comments like // R2R ... that specify which tests to run

The test files follow this format:

NAME=testname              # must be unique
FILE=malloc://1024         # file to open
BROKEN=0|1                 # is the test broken?
ARGS=-a x86                # optional
CMDS=<<EOF                 # all the commands
?e hello world
?ee errmsg
EOF
EXPECT=<<EOF               # command output
hello world
EOF
EXPECT_ERR<<EOF            # optional
errmsg
EOF
RUN

Tests marked as broken will be reported as FIXED or BROKEN depending on whether they pass or not. Another good way to contribute is to spot these tests and remove the BROKEN statement to make them pass.

The tool has several interesting flags worth explaining:

-i : interactively asks you to fix, edit, or patch broken tests; the flag will update the test output if requested
-j : runs tests in separate threads in parallel
-S : performs a shallow run of a percentage of tests (randomly skips some to speed up testing)
-o : specifies output JSON file containing all timestamps and test results

The resulting JSON file generated with -o is particularly useful for obtaining metrics and identifying bottlenecks or slow operations. There’s a set of tests under db/test/slow that contains the most time-consuming tests to run, and it’s recommended to use them with a profiler to optimize the r2 code.

The r2r test suite runs using the testbins repository. This repository contains binaries for different architectures and Operating Systems that are copyright-friendly, allowing them to be distributed and used as seeds for mutation-based fuzzing.

Third Party

Radare is self-contained, which means you can compile it with only libc as a dependency.

Traditionally, dependencies were packed inside the same repository using a monorepo approach. This was later changed to handcrafted git clones because git submodules are problematic for several reasons:

GitHub tarballs don’t include them
Release tarballs require git
Git requires network connectivity, preventing offline builds
Submodule commits are easily broken or changed

Recently, we began migrating all external dependencies to meson-wraps, which provide a more robust way to manage external source repositories. This approach resolves all the issues mentioned above, simplifies maintenance and updates, and reduces the overall codebase.

r2-6.0 will be the first version to implement this system.

While this is a meson-specific feature, wrap files are also supported by ACR, an autotools-replacement project created by pancake. ACR generates configure/make files that are 20 times smaller, faster to execute, and easier to maintain, with zero external dependencies. It has been tested on numerous UNIX systems.

For make builds, acr-wrap handles pulling external repositories and can be executed in the preconfigure script, enabling offline builds.

Build System

As mentioned before, radare2 supports make and meson for building its source. Both have their good and bad sides, and maintaining two build systems is totally not a problem, not even for a large project like this.

The make machinery is:

Easy to write and hack stuff in
Very portable and standard (GNU Make)
Can’t build if your src build contains spaces
Install with symlinks, so no need to reinstall
Can build only stuff in one directory

On the meson part we can say:

Requires a relative modern version of Python
- muon/samu are C rewrites of the python’s meson and c++’s ninja
Not friendly with old unix systems
Build times and recompilations take longer than make
Works fine on Windows

Pick the one you like, but remember to update both things if you submit changes that require new files or compilation options which is something that is not really common.

Stable ABI and Versioning

During winter, radare2 enters the ABI breaking season. This is due to a self-imposed rule aimed at improving stability and reducing user complaints by enforcing a stable binary interface for all modules and public APIs.

The CI check works by downloading the static build from the latest stable release and using the abidiff tool from RedHat’s libabigail. It generates a comparison using the DWARF metadata to ensure all public functions and structures remain unchanged.

This check is only disabled when the last number of the version string is .9.

Let’s clarify how versioning works in radare2 - while it looks similar to semver, it follows a different convention!

Version string is split into 3 numbers: X.Y.Z

X Major: increments when minor version changes from .9 to .0
Y Minor: indicates ABI stable versioning; releases with the same minor version are binary compatible
Z Patch: specifies ABI compatible releases
- even (0, 2, 4, …): stable release
- odd (1, 3, 5, …): git version
  - .9: ABI breaking season

Throughout the project’s evolution, we’ve used various versioning approaches. Given the project’s active development and numerous changes, adopting semver would have led to versions like 237823.3.1, which are difficult to read and interpret.

The current model makes it easy to determine a version’s age at a glance, as we aim to release every 200-300 commits, approximately every three months.

Sometimes, necessary ABI-breaking changes can hinder evolution. To address this, we implement the --with-new-abi configure flag, which enables these ABI-breaking changes while allowing private or external projects to use the new APIs with their custom builds.

Users should consider these builds experimental and not as thoroughly tested (despite running the test suite with these changes enabled), so caution is advised. These changes take effect during the ABI breaking season.

Linting and Indenting

One of the extra checks that happens in the CI on every commit is to execute the following script:

sys/lint.sh

This is an ugly shell script that git greps over all the source files and finds incorrect syntax or constructions that break our coding conventions:

Indent code with tabs
Indent comments with spaces
R_API function definitions
R_IPI for internal cross-object functions
No space before ( in function definitions
Add space when calling functions
Don’t define variables inside for statements
Don’t use newlines in R_LOG calls
Don’t use formatting functions with non-formatted strings
Don’t use dangerous APIs like strcpy
…

We’ve tried different tools to enforce the coding style, but the fact is that all of them fail in some way or another, creating monstrous and unreadable codebases. This is why r2 requires contributors to indent the code by hand.

For some reason, young developers find that rule a bit annoying because they’re used to modern languages like Go, TypeScript, Rust, etc., that enforce these rules in the language itself. But for oldies like me, indenting C by hand brings some satisfaction in the sense that you have full control over what you write, and part of your style and personality gets transferred to the code you write.

To be honest, thanks to this I’m able to read a portion of code and tell you who wrote it without using git blame. That brings better satisfaction than enforcing automated indentation, in my humble opinion.

Grep is your Friend

One of the best ways to find things to fix or improve in the radare2 codebase isn’t by checking GitHub issues, but rather by grepping for specific keywords in the source code.

In the r2 ecosystem, we have established keywords and rules that help locate points of interest in the codebase. Using git grep is fast enough to make the workflow fluid and efficient. Here’s a list of these keywords:

TODO : stuff that needs to be done
XXX : this is known to be wrong
WIP : not finished, but kind of working
R2_600 : things to be done before r2-6.x
USE_NEW_ABI : only used during the abi stable seassons
R_API : find definition of api function
R_IPI : internal api function definition
R_TH_LOCAL : global variables to be eliminated

You can use the sys/counters.sh script to get some stats:

$ sys/counters.sh
XXX     1323
TODO        1927
GLOBALS      148
strcpy       227
sprintf      171
eUsage        21
f(char)        3
isdigit      199
f(void)       41
aPlugs         3
BROKEN       234
Cannot         9
http:/       134
strtok        12
R2_580         0
R2_590        89
R2_600       106
$

Ideally all the comments like R2_600 should be addressed BEFORE r2-6.0 is out. Note that 5.9.0 was released without addressing them all, so those are also good targets to be solved or converted to R2_600.

DoubleHashtags

You may have noticed while reading the git log that some commits contain hashtags prefixed with two ## characters. These are processed by the sys/release-notes.sh script, which groups those tags to create the text that’s attached to releases when a new tag is pushed to the master branch.

This CI custom machinery helps reduce the effort required to deliver new versions, so it’s important to use these tags wisely. Only use them if the commit is relevant enough to be included in the final changelog.

Some of the most commonly used tags are:

print : for p subcommands
analysis : related to code analysis
arch : cpu / architecture support
debug : the debugger capabilities
shell : commandline improvements
esil : esil expressions, engine
bin : binary parsing files
disasm : disassembly listings
io : related to the IO layer
crash : segfault, overflow, security related
fs : filesystems and partition tables
build : portability and build fixes
util : utility functions
search : / subcommands and r_search apis
visual : related to the visual mode
crypto : cryptographic and hash / checksums

Digging Tickets

When searching for issues on GitHub, effectively utilizing labels and search functionality helps you find the ones that best match your interests.

Labels are typically used to specify modules (rbin, arch, core, debug, etc.) that the issues are associated with. Use them to find topics that align with your areas of interest if you want to work on a specific subject.

Addressing issues associated with upcoming milestones is crucial for maintaining project momentum and meeting delivery deadlines. Milestones help teams organize issues and pull requests into larger goals or releases, providing a clear roadmap for project development.

We strive to maintain a friendly approach in code reviews because experience has shown that receiving numerous changes and complaints for every small patch can be frustrating. Having large, interesting patches abandoned due to frustration is worse than having multiple smaller ones that help motivate the team, enable better testing, and encourage open discussions about future development directions.

By regularly reviewing and prioritizing issues linked to immediate milestones, teams can ensure they’re focusing on the most important tasks and tracking progress effectively. This practice helps prevent last-minute rushes, enables better resource allocation, and contributes to more successful project deliveries while maintaining transparency with stakeholders regarding project status and timeline adherence.

Challenge

Today’s challenge is to solve a GitHub issue, a TODO or an XXX comment by submitting a Pull Request.

Choose the one that best suits your skills and make it happen!

Closing

I hope you all feel motivated after reading this. Next time you find something confusing or miss a feature, try to fix it yourself before opening a ticket!

Check out the Developers Welcome presentation from #r2con2024 to learn more tips and hear testimonials from some of the main contributors!

See you tomorrow in another post of the advent of radare2!

–pancake