Welcome to Day 19 of the Advent of Radare!
Today’s focus is on extracting byte sequences from functions, which raises several questions:
This post will try to answer all these questions and provide you with key commands to create YARA rules and zignature files, as well as identify patterns for similar functions. We’ll also address the challenges of different code constructions and discuss how to eliminate parts that can vary between similar patterns.
Functions can be described as a consecutive list of instructions that have one entrypoint. The rest of rules can vary depending on the way they are implemented.
As you can see, things that may look simple or easy can become a complete nightmatre for the analyst, and even more for writing software that reliabily aims to find out athe real constructions behind the assembly.
So having all these concepts in mind we may want to know what would be the best or easiest way to get a list of all the basic blocks..
Imagine a perfect world where all compilers generate a single entry point for every function, without reusing basic blocks, and place the implementation linearly below the entry point.
Imagine these functions have no data mixed with code within their boundaries, instead delegating data to the space between functions or into a separate rodata section within the binary.
In this imaginary world, we wouldn’t have many problems and could simply use commands like:
pD $FS @ $FB
Where: * $FS: The linear size of the function * $FB: The beginning address of the function
Alternatively, you can use p8 to view the byte sequence in hexadecimal pairs:
p8 $FS @ $FB
This linear disassembly is usually easily readable with commands like
pdf
or pD $FS
. However, if you’ve used these
commands for a while, you’ve probably noticed that sometimes the output
is incomplete and contains disappointing “…” ellipses.
Unfortunately, this perfect world doesn’t exist.
Now let’s get into the harsh real world problems, following the rules
we read before we need a way to enumerate the basic blocks of a
function: afb
.
[0x100003a58]> afb~:0..10
0x100003a58 0x100003aa4 00:0000 76 j 0x100003aa8 f 0x100003aa4
0x100003aa4 0x100003aa8 00:0000 4 j 0x100003aa8
0x100003aa8 0x100003aec 00:0000 68 j 0x100003b1c f 0x100003aec
0x100003aec 0x100003b00 00:0000 20 j 0x100003b88 f 0x100003b00
0x100003b00 0x100003b1c 00:0000 28 j 0x100003b88
0x100003b1c 0x100003b38 00:0000 28 j 0x100003b40 f 0x100003b38
0x100003b38 0x100003b40 00:0000 8 j 0x100003b6c f 0x100003b40
0x100003b40 0x100003b60 00:0000 32 j 0x100003b80 f 0x100003b60
0x100003b60 0x100003b68 00:0000 8 j 0x100003b7c f 0x100003b68
0x100003b68 0x100003b6c 00:0000 4 j 0x100003b80
[0x100003a58]>
afbq
(the quiet version of afb
) to
enumerate only the addresses.~:0..10
is the same as
| head -n 10
Now that we have the list of basic block entrypoints of the current function we need to disassemble every basic block:
pdb @@= `afbq`
pdb
: disassemble the basic block (same as
pD $BS @ $BB
)@@=
: foreach operator that takes space separated list
of addresses for tempoeral seekingafbq
: backticks replace the output of the command
inside the same lineWe said we wanted to get the bytes, right? So we may replace the
pdb
with:
p8 $BS
: show N hexpairs, where $N is the numvar that
specifies the size of the basic block size.Additionally we may probably want to use @ $BB
to force
the temporal seek to start at the begining of the basic block. But as
long as we have learned a single address can be owned by multiple basic
blocks. and we must assume afbq
output is enough for us to
determine each basic block address.
For parsing reasons, we can replace the following expression:
@@=`command`
With the non-backtick version: @@c:
This gives us the possibility to create a oneliner that prints a single block
[0x100003a58]> echo `p8b@@c:afbq`|sed -e 's, ,,g'
7f2303d5fc6fbaa9fa6701a9f85f02a9f65703a9f44f04a9fd7b05...
Visualizing the how much sparsed the function code is can be done
with the afb=
command which shows some nice ascii art about
it. (Yes, this ascii art can be much better, and i’m open to suggestions
and pull requests!)
[0x1000038fc]> afb=
0* 0x1000038fc ███―――――― 0x100003914
1 0x100003914 ――█―――――― 0x100003918
2 0x100003918 ――██――――― 0x100003928
3 0x100003928 ――――█―――― 0x100003930
4 0x100003930 ――――██――― 0x100003934
5 0x100003934 ―――――█――― 0x10000393c
6 0x10000393c ―――――████ 0x100003960
=> 0x1000038fc ^^^^^^^^^ 0x1000039fc
[0x1000038fc]>
Which command can we use instead of pdf
to disassemble a
sparse functions?
Correct! It’s pdr
, which stands for
print-disasm-recursive. That command will probably not show the branch
lines in the best possible way, but will cover all the basic blocks,
trying to enumerate them by jump and address location order.
pdr
is that it will show all the code
of the functionGive it a try!
For non-linear functions, it’s essential to process each basic block individually to retrieve the complete byte sequence.
Functions can be sorted using the afls
command, which
affects the default listing from afl
. However, we can
always use afl,
to create custom table queries to filter
and reorder functions as needed.
[0x100003a58]> afls?
Usage: afls [afls] # sort function list
| afls same as aflsa
| aflsa sort by address (same as afls)
| aflss sort by size
| aflsn sort by name
| aflsb sort by number of basic blocks
[0x100003a58]>
Unlike afls
, there’s no sorting command for
afb
(no afbs
). This could be a potential
contribution to the project. However, we can use afb,
to
filter basic block listings according to our requirements.
But what’s the correct order for sorting them? Can we simply use the entrypoint address as a numeric ordinal? Unfortunately, no. The appropriate approach depends on our specific needs.
If we’re reading code: It’s generally fine to follow each branch on every basic block until we’ve covered all basic blocks.
afbq > $bbs
p8 $BS @@c:cat $bbs
For cases where we need a pattern that’s as linear as possible, we might want to sort them and fill gaps with masked bytes.
Instead of using afbq
(which sorts by offset), we can
use afba
which mirrors the implementation of
afla
. This means taking all basic blocks and traversing
them in reverse order, covering all basic blocks and code paths while
defining the proper analysis order following the jumps.
[0x100003a58]> afb? | grep order
| afba[!] list basic blocks of current offset in analysis order (EXPERIMENTAL, see afla)
The only issue with this listing is that it’s reversed. We can solve
this using tac
(the reverse version of cat
) to
achieve our desired order.
aaaa # analyze all the things
afba > $bbs # list basic blocks in reverse analysis dependency order
tac $bbs # reverse every line of that file, entrypoint should be the first
Creating function signatures is useful for many reasons. While this post delves into the topic, we might be focusing too much on the technical requirements rather than the practical applications.
In r2land we call them “zignaturez”, yes; with ‘z’.
And as expected, radare2 ships its own implementation for signatures.
You might wonder: “Why not just use FLIRT or whatever GHIDRA provides?”
The main reason is that IDA’s implementation is too simplistic and comes
packed in a proprietary file format, while Ghidra’s FIDB
is stored as Java serialization data, version 5
.
I don’t really understand why such a complex topic was reduced to just “byte pattern + binary mask” and stored in a proprietary, non-standard file format. This explains why r2 has its own implementation, which is more flexible, configurable, powerful, and precise.
And the best part? As usual, hardly anyone knows about it! So you can feel even more exclusive when using these features.
The choice for the z
command for this feature was made
because s
was already taken, and using z
became a memorable pronunciation joke. We won’t go into much detail here
since there’s an entire
chapter in the r2book about it.
Let’s look at how to use it first, then we’ll explore the configuration options and metrics. Here’s a sample session:
aaaa # analyze all
zg # generate zignatures for all the functions
z* > z.r2 # dump them into an r2script
If we load a different file or a program with functions from the
library we’ve analyzed, we must use the z/
command, which
scans and compares every function with the loaded signatures. We can
load multiple zignatures at once and manipulate them as needed.
To inspect what’s created in the file, we can simply read this
z.r2
or dump it in JSON format.
[0x100003a58]> zj~{}|head -n 50
[
{
"name": "sym.imp.__tolower",
"bytes": "110000b031820091300240f9110a1fd7",
"mask": "ff000000ffffffffffffffffffffffff",
"graph": {
"cc": 1,
"nbbs": 1,
"edges": 0,
"ebbs": 1,
"bbsum": 16
},
"addr": 4294997000,
"next": "sym.imp.abort",
"types": "int __tolower (int c)",
"refs": [
],
"xrefs": [
"sym.func.100006780"
],
"collisions": [
],
"vars": [
],
"hash": {
"bbhash": "ceb84efacc7c830486ac15b8fa27a452ecb0f75f3d21eac04bd44bdcfc580a2e"
}
},
{
"name": "sym.imp.compat_mode",
"bytes": "110000b031220291300240f9110a1fd7",
"mask": "ff000000ffffffffffffffffffffffff",
"graph": {
"cc": 1,
"nbbs": 1,
"edges": 0,
"ebbs": 1,
"bbsum": 16
},
This generates a zignature, a function signature that uses bitmasks to exclude bits prone to variation.
We can filter the patterns with the bytes like this:
z~bytes[1]
To see the bitmask applied by the zignature (showing which bits are ignored), use:
z~mask[1]
This bitmask helps identify which parts of the instruction sequence are static and which are variable, making it easier to create robust YARA rules or other detection mechanisms that remain effective across different build variations.
Other tools come with just a big button that makes things happen, but that’s not the r2 way. Here, we prefer to understand how things work and customize them to fit our specific use cases. While we strive to provide good defaults, sometimes they might not work well or haven’t been thoroughly tested for .. reasons.
The metrics used to generate signatures are the following:
We can see the configuration options to generate and match the metrics with the following command:
[0x100003a58]> e??zign.
zign.autoload: autoload all zignatures located in dir.zigns
zign.bytes: use bytes patterns for matching
zign.diff.bthresh: threshold for diffing zign bytes [0, 1] (see zc?)
zign.diff.gthresh: threshold for diffing zign graphs [0, 1] (see zc?)
zign.dups: allow duplicate zignatures
zign.graph: use graph metrics for matching
zign.hash: use Hash for matching
zign.mangled: use the manged name for zignatures (EXPERIMENTAL)
zign.maxsz: maximum zignature length
zign.mincc: minimum cyclomatic complexity for matching
zign.minsz: minimum zignature length for matching
zign.offset: use original offset for matching
zign.prefix: default prefix for zignatures matches
zign.refs: use references for matching
zign.threshold: minimum similarity required for inclusion in zb output
zign.types: use types for matching
[0x100003a58]>
R2Yara brings the power of YARA pattern matching into radare2, enabling efficient binary analysis and malware detection. This integration allows you to scan binaries for specific patterns using YARA rules directly within your r2 session.
Simply install using r2pm:
$ r2pm -ci r2yara
R2Yara provides two command sets: yara
for full commands
and yr
for shorter alternatives. The key commands are:
yr <file>
yrl
yrs
yr-*
[0x00000000]> yr crypto.yara # Load crypto detection rules
[0x00000000]> yrs # Scan current binary
R2Yara will automatically creates flags at matched locations, making it valuable for both automated analysis and manual investigation of suspicious binaries.
Common use cases for r2yara are:
I encourage you to watch “Uncovering more crypto secrets”, a presentation by Sylvain and Azox at #r2con2024, to learn more about practical use cases of YARA rules for cryptographic pattern detection.
With the scripting knowledge from yesterday’s post, I’m challenging you to create an r2js script (or Python using the r2pipe API) that creates a binary mask pattern for every function, similar to how the zignatures implementation works. Try to improve upon the existing implementation by:
$ for a in test/bins/**/* ; do r2 -qi script.r2.js $a ; done
After completing this, share your results by:
Additional questions:
ao
output, rather than relying on the zignatures
implementation to remove parameterized and immediate values from the
disassembly and show the associated binary mask?Radare2 provides robust tools for displaying a function’s byte sequence, whether the function is linear or divided into basic blocks.
Commands like p8 and pD make it easy to capture bytes for linear functions, while p8 $BS @@c:afbq captures bytes across multiple basic blocks. For reliable detection across builds, zignatures (zg) generate function signatures with bitmasks to ignore variable bits, helping you create accurate, flexible YARA rules.
Stay tuned for tomorrow’s Radare2 post as we continue exploring advanced analysis and reverse engineering techniques!