Skip to content

How To; Disassemble Binary Code Iteratively

Ahmed Garhy edited this page Jan 12, 2019 · 1 revision

Overview

Often, you might be interested in disassembling binary code in an iterative fashion, 1 instruction at a time. Iterative disassembling allows you to disassemble small ranges of binary code 1 instruction at a time, inspect the disassembled instruction, and make a decision to whether disassemble the next instruction or break out of the disassembly loop early without wasting resources.

Capstone has support for this disassemble method and Capstone.NET exposes it through a deferred enumerable, which is the accepted idiom for producing a lazy stream of data in C# and .NET. A deferred enumerable is an enumerable that is lazy evaluated; meaning it does not actually produce data until the caller requests it. The data being, in this case, disassembled instructions. Let's look at an example:

using Gee.External.Capstone;
using Gee.External.Capstone.X86;

// ...
//
// Create an X86 disassembler in 32-bit mode.
using (CapstoneX86Disassembler disassembler = CapstoneDisassembler.CreateX86Disassembler(X86DisassembleMode.Bit32)) {
    // ...
    //
    // Enable details for disassembled instructions.
    disassembler.EnableInstructionDetails = true;

    // ...
    //
    // Just for fun, let's generate assembly code in MASM syntax.
    disassembler.DisassembleSyntax = DisassembleSyntax.Masm;

    var binaryCode = new byte[] {
        0x09, 0x00, 0x38, 0xd5, 0xbf, 0x40, 0x00, 0xd5, 0x0c, 0x05, 0x13, 0xd5, 0x20,0x50, 0x02, 0x0e,
        0x20, 0xe4, 0x3d, 0x0f, 0x00, 0x18, 0xa0, 0x5f, 0xa2, 0x00, 0xae, 0x9e, 0x9f, 0x37, 0x03, 0xd5,
        0xbf, 0x33, 0x03, 0xd5, 0xdf, 0x3f, 0x03, 0xd5, 0x21, 0x7c, 0x02, 0x9b, 0x21, 0x7c, 0x00, 0x53,
        0x00, 0x40, 0x21, 0x4b, 0xe1, 0x0b, 0x40, 0xb9, 0x20, 0x04, 0x81, 0xda, 0x20, 0x08, 0x02, 0x8b,
        0x10, 0x5b, 0xe8, 0x3c
    };

    // ...
    //
    // Typically, this binary code consists of 23 instructions. But because we are using the iterative
    // method, Capstone.NET didn't actually disassemble any instructions and instead returned a deferred
    // enumerable. As we loop through the enumerable, 1 instruction will be disassembled at a time instead
    // of the entire 23.
    IEnumerable<X86Instruction> instructions = disassembler.Iterate(binaryCode);
    foreach (X86Instruction instruction in instructions) {
        // ...
        //
        // With every iteration, only a single instruction will be disassembled.
    }
}

The above example is pretty straight forward and all you need to do to disassemble binary code using deferred, or lazy, execution. Because instructions are only disassembled 1 by 1 while enumerating the enumerable in the foreach loop, you can also do things like change the disassembler's configuration in-between instructions. Here is a trivial modification of the previous example, just for fun, that disables details for and changes the syntax for generate assembly code starting with the 5th disassembled instruction:

IEnumerable<X86Instruction> instructions = disassembler.Iterate(binaryCode);
var i = 1;
foreach (X86Instruction instruction in instructions) {
    i++;
    if (i == 4) {
        // ...
        //
        // Just for fun, let's disable details and generate assembly code in Intel syntax starting
        // with the 5th disassembled instruction.
        disassembler.EnableInstructionDetails = false;
        disassembler.DisassembleSyntax = DisassembleSyntax.Intel;
    }
    
    // ...
    //
    // Analyze instruction.
}

Caveats

Deferred, or lazy, disassembling of binary code sounds great on paper, but it is not without its caveats. More specifically, because instructions are disassembled 1 by 1, the lifetime of the disassembler must exceed the lifetime of the deferred enumerable. Internally, the deferred enumerable has a reference to the disassembler and it uses it to disassemble the next instruction in the sequence as it is being enumerated. If you dispose of the disassembler at any time before you finish enumerating the enumerable, you'll get a run-time exception! Let's look at an example:

using Gee.External.Capstone;
using Gee.External.Capstone.X86;

using (CapstoneX86Disassembler disassembler = CapstoneDisassembler.CreateX86Disassembler(X86DisassembleMode.Bit32)) {
    disassembler.EnableInstructionDetails = true;
    disassembler.DisassembleSyntax = DisassembleSyntax.Masm;

    var binaryCode = new byte[] {
        0x09, 0x00, 0x38, 0xd5, 0xbf, 0x40, 0x00, 0xd5, 0x0c, 0x05, 0x13, 0xd5, 0x20, 0x50, 0x02, 0x0e,
        0x20, 0xe4, 0x3d, 0x0f, 0x00, 0x18, 0xa0, 0x5f, 0xa2, 0x00, 0xae, 0x9e, 0x9f, 0x37, 0x03, 0xd5,
        0xbf, 0x33, 0x03, 0xd5, 0xdf, 0x3f, 0x03, 0xd5, 0x21, 0x7c, 0x02, 0x9b, 0x21, 0x7c, 0x00, 0x53,
        0x00, 0x40, 0x21, 0x4b, 0xe1, 0x0b, 0x40, 0xb9, 0x20, 0x04, 0x81, 0xda, 0x20, 0x08, 0x02, 0x8b,
        0x10, 0x5b, 0xe8, 0x3c
    };

    IEnumerable<X86Instruction> instructions = disassembler.Iterate(binaryCode);
    foreach (X86Instruction instruction in instructions) {
        // ...
        //
        // Oops, because we disposed of the disassembler, we will get a run-time exception on the next
        // iteration.
        disassembler.Dispose();
    }
}

The above example is, of course, quite ridiculous because no serious programmer would ever do this. But it highlights the important requirement that the lifetime of the disassembler must exceed the lifetime of the deferred enumerable. Let's take a look at a more realistic example where its easy to forget this requirement until your process crashes from an unexpected run-time exception:

using Gee.External.Capstone;
using Gee.External.Capstone.X86;

public void AnalyzeInstructions(IEnumerable<X86Instruction> instructions) {
    foreach (X86Instruction instruction in instructions) {
        // ...
        //
        // Oops, a run-time exception is immediately thrown because this method was passed a broken
        // enumerable. The problem is there is no way for this method to know that before hand!
    }
}

public IEnumerable<X86Instruction> DisassembleCode() {
    using (CapstoneX86Disassembler disassembler = CapstoneDisassembler.CreateX86Disassembler(X86DisassembleMode.Bit32)) {
        disassembler.EnableInstructionDetails = true;
        disassembler.DisassembleSyntax = DisassembleSyntax.Masm;

        var binaryCode = new byte[] {
            0x09, 0x00, 0x38, 0xd5, 0xbf, 0x40, 0x00, 0xd5, 0x0c, 0x05, 0x13, 0xd5, 0x20, 0x50, 0x02, 0x0e,
            0x20, 0xe4, 0x3d, 0x0f, 0x00, 0x18, 0xa0, 0x5f, 0xa2, 0x00, 0xae, 0x9e, 0x9f, 0x37, 0x03, 0xd5,
            0xbf, 0x33, 0x03, 0xd5, 0xdf, 0x3f, 0x03, 0xd5, 0x21, 0x7c, 0x02, 0x9b, 0x21, 0x7c, 0x00, 0x53,
            0x00, 0x40, 0x21, 0x4b, 0xe1, 0x0b, 0x40, 0xb9, 0x20, 0x04, 0x81, 0xda, 0x20, 0x08, 0x02, 0x8b,
            0x10, 0x5b, 0xe8, 0x3c
        };

        // ...
        //
        // Watch out, we are disposing of the disassembler and then returning a deferred enumerable. Our
        // caller is in trouble but there is no way for them to tell!
        IEnumerable<X86Instruction> instructions = disassembler.Iterate(binaryCode);
        return instructions;
    }
}

public void Main() {
    // ...
    //
    // This method returns a deferred enumerable. But because the enumerable was created using a disassembler
    // that is local to the method and disposed of as soon as the method returns, the enumerable is effectively
    // broken as soon as it is returned. The problem is there is no way to tell if its broken until we start
    // enumerating it!
    IEnumerable<X86Instruction> instructions = this.DisassembleCode();

    // ...
    //
    // This method enumerates the enumerable we got from the previous step. But because the enumerable is
    // effectively broken, the method will throw a run-time exception as soon as it attempts to enumerate
    // the enumerable!
    this.AnalyzeInstructions(instructions);
}

In the above example, notice how the DisassembleCode() method returns a deferred enumerable after it disposes of the disassembler. When the Main() method calls DisassembleCode(), it has no way of knowing that. So when it blindly passes that deferred enumerable as an argument to the AnalyzeInstructions(IEnumerable<X86Instruction>) method, that method will immediately throw a run-time exception.

This is by far the most common scenario to trigger the "gothca" of deferred, or lazy, disassembling of binary code. The lesson to learn here is while deferred, or lazy, disassembling of binary code is a great feature, you have to be careful on how you use it to avoid unexpected run-time exceptions.

Comparison with Capstone

Internally, Capstone.NET uses Capstone's cs_disasm_iter() API to perform deferred disassembling of binary code. However, you won't gain all the benefits of Capstone's cs_disasm_iter() API due to Capstone.NET's managed nature.

The intent of Capstone's cs_disasm_iter() API, as documented on the Capstone website, is to increase performance and utilize resources more optimally on memory sensitive systems by eliminating multiple alloc() and realloc() calls by the core Capstone engine and, instead, have the caller allocate memory once for a single disassembled instruction that is reused in a disassemble loop.

Internally, Capstone.NET follows this same principal, but it only does so for unmanaged memory. When Capstone.NET creates a deferred enumerable, it will allocate unmanaged memory for a single disassembled instruction and reuse it for the entire lifetime of the enumerable. The unmanaged memory is released when the enumerable is exhausted, disposed, or cleaned by the garbage collector. However, when the unmanaged memory is marshaled to managed memory, a new managed object is created for every disassembled instruction that cannot be freed manually and is instead freed automatically by the garbage collector at its discretion. So while you will certainly benefit from the performance gains the Capstone website claims will be a result of the elimination of the multiple alloc() and realloc() calls for unmanaged memory, you will not benefit from optimal memory utilization due to Capstone.NET's managed nature.