# Using the LLVM MC Disassembly API

In this post, I’ll walk through how to link an application against LLVM and show a simple usage of the LLVM McDisassembler API. It’s a little more complex that it seems, probably because there’s not many good resources for using this API.

## Linking a program with LLVM

The handy llvm-config utility, which comes with LLVM, can be used to determine the compiler/linker flags you need for LLVM. The relevant options are --ldflags, --cxxflags, and --libs. Let’s see what the output of these will be.

If you have a C++ file that includes LLVM headers, first, compile your application to a .o object file with the -c option of g++, which tells it not to run the linker.

Now, we want to link this .o file against LLVM.

The tricky thing is that you can’t just add llvm-config --ldflags --cxxflags --libs to g++, because the order of these flags matters. When linking libraries, the linker goes from left to right through the libraries, building up a list of missing symbols and resolving symbols as it encounters new libraries. However, it does not search backwards for symbols!

So, if you specify a library too early in the command, it will never get used, even when other libraries depend on it!

The correct way to link is to first specify the cxxflags and libs, then the ldflags like so:

Now, all the missing symbols are filled in by libraries further to the right, so the linker can work correctly!

## McDisassembly

The LLVM MC (Machine Code) library is well-suited to large-scale disassembly applications. Let’s see the most basic way to use it.

We start with a buffer of x86 assembly formatted as a std::string of hex characters: 89e5. We want to disassemble this to the mov ebp, esp instruction.

You’ll need to include these header files:

Now, initialize everything

These functions are defined in llvm/lib/MC/MCDisassembler/Disassembler.cpp.

We need a handle to a LLVMDisasmContextRef for all future functions, and we can use LLVMCreateDisasm to make one. The first argument to LLVMCreateDisasm is a TripleName, which is formatted like archType-vendor-OS. I think the default vendor is "unknown". Some valid triples include x86_64-unknown-linux-gnu, i486--linux-gnu, etc.

If you want to set Intel syntax, you need to use LLVMSetDisasmOptions(). If you set the correct flag, it will toggle one of three options:

By default, the asm printer is AT&T syntax, so we need to toggle option flag 4 for Intel:

### Getting disassembly

Finally, let’s disassemble our hex string. For this, we’ll use the LLVMDisasmInstruction function.

This function takes an input buffer of uint8_t, an output buffer of chars, proper lengths, and a program counter PC.

Here’s a routine to convert a std::string to a uint8_t buffer.

Now, we’re all set to use LLVMDisasmInstruction!

To see more usages of the LLVM McDisassembly API, check out the LLVM Project Blog.