/advent

11 - Carving Data with BinLimp

Welcome to Day 11 of the Advent of Radare!

Today, we’ll dive into a fascinating aspect of reverse engineering: carving binary formats from raw data or memory dumps using Radare2. This process involves identifying and extracting files, filesystems, or data structures hidden within larger binary blobs.

In this post, we’ll cover:

Why Carving Matters

Carving is invaluable in many reverse engineering scenarios:

By scanning for known file headers, Radare2 enables you to pinpoint these hidden gems within your binaries.

Many people will recognize many of these features as part of binwalk, well, the thing is that radare2 had them since 2006, 4 years before the very first version of binwalk, and as usual, despite being more portable and have more features, support remote instances, in-process scanning and better performance at the time it was ignored because python. Historically I took inspiration in photorec and testdisk, as those were the most reliable tools for data recovering when I was working as a forensic analyst.

This post aims to discover these features so you can take advange to integrate them into your workflows and get handy with the interactive capabilities of the radare2 shell.

Searching for Magic

Radare2’s /m command scans for known file headers using libmagic. These headers, or “magic patterns,” allow you to identify specific file types embedded within binary data. Note that radare2 ships it’s own version of libmagic (which is based on a fork of the OpenBSD implementation) but also supports dynamic linking to the GNU library from the system if you have it.

The benefits of using the shipped one are:

Let’s take a look at one of the magic files shipped within radare2:

$ cat libr/magic/d/default/cafebabe
0   beshort     0xcafe
>2  beshort         0xbabe
#>4 belong      >30     compiled Java class data,
#>4 belong      >30
# !:mime    application/x-java-applet
>>6 beshort     <20
>>>6    beshort     x           version %d.
>>>4    beshort     x           \b%d
>>>4    belong      1       Mach-O fat file with 1 architecture
>>>4    belong      >1
>>>>4   belong      <20     Mach-O fat file with %d architectures
>2  beshort     0xd00d      JAR compressed with pack200
!:mime  application/x-java-pack200

Running these signatures over our favorite target (/bin/ls) and see what it can find:

$ r2 -n /bin/ls
[0x00000000]> /m
0x00000000 0 hit0_0 Fat-Mach-O
0x00004000 0 hit0_1 Mach-O
0x0000b390 0 hit0_2 MacOS Deteched Code Signature
0x00010000 0 hit0_3 Mach-O
0x00021260 0 hit0_4 MacOS Deteched Code Signature
[0x00000000]>

Performance

If we executed this command in our machine we may probably noticed that it takes a little to finish, the reason is because.

We can check how much time it takes to perform this search by using the ?t prefix like this:

[0x00000000]> ?t /m
...
3.525228
[0x00000000]>

The primary way to optimize the search is by tweaking the search.align configuration option:

[0x00000000]> e search.align=4
[0x00000000]> ?t /m
...
0.900765
[0x00000000]> e search.align=16
[0x00000000]> ?t /m
...
0.245257
[0x00000000]>

Radare2’s default magic database prioritizes precision, focusing on reducing false positives. This subset is tailored for reverse engineering and differs from the broader magic databases found in tools like file. However, you can extend this functionality by loading your own magic files or leveraging /F for dynamic scans.

For hard drives we can probably use a larger alignment value like 512 or 1024, reducing scan times by a lot.

Magic Subcommands

Here’s an overview of /m’s capabilities:

[0x00000000]> /m?
| /m         search for known magic patterns
| /m [file]  same as above but using the given magic file
| /me        like ?e similar to IRC's /me
| /mm        search for known filesystems and mount them automatically
| /mb        search recognized RBin headers
[0x00000000]>

Custom Magic Files

You can enhance /m signature database by changing the signature directory defined by this variable:

[0x00000000]> e dir.magic
/usr/local/share/radare2/5.9.9/magic

Or by passing the custom magic file as argument

Automating Data Extraction

Once /m identifies a file header, the wtf command can extract and dump the corresponding data:

[0x100003a58]> wtf file.dump

Let’s inspect the help message to understand better what wtf and wtff can offer us:

[0x00000000]> wt?
Usage: wt[afs] [filename] [size]   Write current block or [size] bytes from offset to file
| wta [filename]         append to 'filename'
| wtf [filename] [size]  write to file (see also 'wxf' and 'wf?')
| wtf! [filename]        write to file from current address to eof (ignores given size)
| wtff [prefix] [size]   write block from current seek to "<prefix>-<offset>"
| wts host:port [size]   send data to remote socket at tcp://host:port
| NOTE:                  filename defaults to "<cfg.prefixdump>.<offset>"
[0x00000000]>

By default, the dump size is determined by the blocksize (b), which can be adjusted for larger or smaller files:

[0x100003a58]> b=1M

Setting b=1M adjusts the dump size to 1MB, ensuring sufficient space for larger files. Automate the dumping process for each hit by configuring:

[0x100003a58]> e cmd.hit=wtff
[0x100003a58]> e search.align=4
[0x100003a58]> b=512K
[0x100003a58]> /m
...

Known Filesystems

Additionally, radare2 includes additional subcommands that make use of the RBin and RFS plugins to identify known binaries and filesystems in memory.

Considering the amount of plugins supported by radare2, and its capability to mount filesystems from memory using the m command or load binaries from memory using the oba command, this scan will give us a deeper insight on what’s inside a blob.

Note that it’s also possible to have false positives, as well you may ignore all the filetypes that make no sense depending on the target you are looking at.

$ rabin2 -L
bin  any         Dummy format r_bin plugin
bin  art         Android Runtime
bin  avr         ATmel AVR MCUs
bin  bf          brainfuck
bin  bflt        bFLT format r_bin plugin
bin  bios        BIOS bin plugin
bin  bootimg     Android Boot Image
bin  cgc         CGC format r_bin plugin
bin  coff        COFF format r_bin plugin
bin  dex         dex format bin plugin
bin  dis         Inferno Dis VM bin plugin
bin  dmp64       Windows Crash Dump x64 r_bin plugin
bin  dol         Nintendo Dolphin binary format
bin  dyldcache   dyldcache bin plugin
..

By default we may be able to count how many bin and fs plugins are shipped:

[0x00000000]> Li~?
73
[0x00000000]> Lm~?
20
[0x00000000]>

Searching Dump

We learned in previous posts about how to search for hexadecimal patterns or plaintext strings, but when we are looking to find known patterns like magic headers sometimes we need to cook our own files and scan the memory for them.

The /F command is designed for this purpose, allowing you to pass a file as an argument (/F file.bin) and use its content as a signature pattern to search in memory. The command’s parameters let you also specify which portion of the file to use as a signature, making it convenient to script multiple pattern searches using a single reference file.

[0x00000000]> /F?
| /F file [off] [sz]  search contents of file with offset and size
[0x00000000]>
[0x00000000]> /F file.bin

There are many more commands for searching in radare2. I would suggest you to take a look at the /? help message and play a little with /p, /e, /v and binary masks.

Challenge

Combining what we learned from previous posts for today, the challenge for today will consist in:

Practice the performance techniques learned in this post and share your findings! #aor24

See you tomorrow in another advent post!

–pancake