Welcome to Day 11 of the Advent of Radare!
Today, we’ll dive into a fascinating aspect of reverse engineering: carving binary formats from raw data or memory dumps using Radare2. This process involves identifying and extracting files, filesystems, or data structures hidden within larger binary blobs.
In this post, we’ll cover:
/m
command to search for known file headers.wtf
.Carving is invaluable in many reverse engineering scenarios:
By scanning for known file headers, Radare2 enables you to pinpoint these hidden gems within your binaries.
Many people will recognize many of these features as part of binwalk, well, the thing is that radare2 had them since 2006, 4 years before the very first version of binwalk, and as usual, despite being more portable and have more features, support remote instances, in-process scanning and better performance at the time it was ignored because python. Historically I took inspiration in photorec and testdisk, as those were the most reliable tools for data recovering when I was working as a forensic analyst.
This post aims to discover these features so you can take advange to integrate them into your workflows and get handy with the interactive capabilities of the radare2 shell.
Radare2’s /m
command scans for known file headers using
libmagic. These headers, or “magic patterns,” allow you
to identify specific file types embedded within binary data. Note that
radare2 ships it’s own version of libmagic (which is based on a fork of
the OpenBSD implementation) but also supports dynamic linking to the GNU
library from the system if you have it.
The benefits of using the shipped one are:
Let’s take a look at one of the magic files shipped within radare2:
$ cat libr/magic/d/default/cafebabe
0 beshort 0xcafe
>2 beshort 0xbabe
#>4 belong >30 compiled Java class data,
#>4 belong >30
# !:mime application/x-java-applet
>>6 beshort <20
>>>6 beshort x version %d.
>>>4 beshort x \b%d
>>>4 belong 1 Mach-O fat file with 1 architecture
>>>4 belong >1
>>>>4 belong <20 Mach-O fat file with %d architectures
>2 beshort 0xd00d JAR compressed with pack200
!:mime application/x-java-pack200
Running these signatures over our favorite target (/bin/ls) and see what it can find:
$ r2 -n /bin/ls
[0x00000000]> /m
0x00000000 0 hit0_0 Fat-Mach-O
0x00004000 0 hit0_1 Mach-O
0x0000b390 0 hit0_2 MacOS Deteched Code Signature
0x00010000 0 hit0_3 Mach-O
0x00021260 0 hit0_4 MacOS Deteched Code Signature
[0x00000000]>
If we executed this command in our machine we may probably noticed that it takes a little to finish, the reason is because.
-e search.in=?
to improve
performanceWe can check how much time it takes to perform this search by using
the ?t
prefix like this:
[0x00000000]> ?t /m
...
3.525228
[0x00000000]>
The primary way to optimize the search is by tweaking the
search.align
configuration option:
[0x00000000]> e search.align=4
[0x00000000]> ?t /m
...
0.900765
[0x00000000]> e search.align=16
[0x00000000]> ?t /m
...
0.245257
[0x00000000]>
Radare2’s default magic database prioritizes precision, focusing on
reducing false positives. This subset is tailored for reverse
engineering and differs from the broader magic databases found in tools
like file
. However, you can extend this functionality by
loading your own magic files or leveraging /F
for dynamic
scans.
For hard drives we can probably use a larger alignment value like 512 or 1024, reducing scan times by a lot.
Here’s an overview of /m
’s capabilities:
[0x00000000]> /m?
| /m search for known magic patterns
| /m [file] same as above but using the given magic file
| /me like ?e similar to IRC's /me
| /mm search for known filesystems and mount them automatically
| /mb search recognized RBin headers
[0x00000000]>
You can enhance /m
signature database by changing the
signature directory defined by this variable:
[0x00000000]> e dir.magic
/usr/local/share/radare2/5.9.9/magic
Or by passing the custom magic file as argument
Once /m
identifies a file header, the wtf
command can extract and dump the corresponding data:
[0x100003a58]> wtf file.dump
Let’s inspect the help message to understand better what
wtf
and wtff
can offer us:
[0x00000000]> wt?
Usage: wt[afs] [filename] [size] Write current block or [size] bytes from offset to file
| wta [filename] append to 'filename'
| wtf [filename] [size] write to file (see also 'wxf' and 'wf?')
| wtf! [filename] write to file from current address to eof (ignores given size)
| wtff [prefix] [size] write block from current seek to "<prefix>-<offset>"
| wts host:port [size] send data to remote socket at tcp://host:port
| NOTE: filename defaults to "<cfg.prefixdump>.<offset>"
[0x00000000]>
By default, the dump size is determined by the
blocksize (b
), which can be adjusted for
larger or smaller files:
[0x100003a58]> b=1M
Setting b=1M
adjusts the dump size to 1MB, ensuring
sufficient space for larger files. Automate the dumping process for each
hit by configuring:
[0x100003a58]> e cmd.hit=wtff
[0x100003a58]> e search.align=4
[0x100003a58]> b=512K
[0x100003a58]> /m
...
Additionally, radare2 includes additional subcommands that make use of the RBin and RFS plugins to identify known binaries and filesystems in memory.
/mb
): Focuses on binary headers
like ELF and PE./mm
): Scans for filesystems and
attempts to mount them.Considering the amount of plugins supported by radare2, and its
capability to mount filesystems from memory using the m
command or load binaries from memory using the oba
command,
this scan will give us a deeper insight on what’s inside a blob.
Note that it’s also possible to have false positives, as well you may ignore all the filetypes that make no sense depending on the target you are looking at.
$ rabin2 -L
bin any Dummy format r_bin plugin
bin art Android Runtime
bin avr ATmel AVR MCUs
bin bf brainfuck
bin bflt bFLT format r_bin plugin
bin bios BIOS bin plugin
bin bootimg Android Boot Image
bin cgc CGC format r_bin plugin
bin coff COFF format r_bin plugin
bin dex dex format bin plugin
bin dis Inferno Dis VM bin plugin
bin dmp64 Windows Crash Dump x64 r_bin plugin
bin dol Nintendo Dolphin binary format
bin dyldcache dyldcache bin plugin
..
By default we may be able to count how many bin and fs plugins are shipped:
[0x00000000]> Li~?
73
[0x00000000]> Lm~?
20
[0x00000000]>
We learned in previous posts about how to search for hexadecimal patterns or plaintext strings, but when we are looking to find known patterns like magic headers sometimes we need to cook our own files and scan the memory for them.
The /F
command is designed for this purpose, allowing
you to pass a file as an argument (/F file.bin
) and use its
content as a signature pattern to search in memory. The command’s
parameters let you also specify which portion of the file to use as a
signature, making it convenient to script multiple pattern searches
using a single reference file.
[0x00000000]> /F?
| /F file [off] [sz] search contents of file with offset and size
[0x00000000]>
[0x00000000]> /F file.bin
There are many more commands for searching in radare2. I would
suggest you to take a look at the /?
help message and play
a little with /p
, /e
, /v
and
binary masks.
Combining what we learned from previous posts for today, the challenge for today will consist in:
r2 -d <pid>
)dm
command./m
to scan for known file types like images,
encryption keys, SQL statements, ..Practice the performance techniques learned in this post and share your findings! #aor24
See you tomorrow in another advent post!
–pancake