In this post, I’ll walk through how to link an application against LLVM and show a simple usage of the LLVM
McDisassembler API. It’s a little more complex that it seems, probably because there’s not many good resources for using this API.
llvm-config utility, which comes with LLVM, can be used to determine the compiler/linker flags you need for LLVM. The relevant options are
--libs. Let’s see what the output of these will be.
If you have a C++ file that includes LLVM headers, first, compile your application to a
.o object file with the
-c option of
g++, which tells it not to run the linker.
Now, we want to link this
.o file against LLVM.
The tricky thing is that you can’t just add
llvm-config --ldflags --cxxflags --libs to
g++, because the order of these flags matters. When linking libraries, the linker goes from left to right through the libraries, building up a list of missing symbols and resolving symbols as it encounters new libraries. However, it does not search backwards for symbols!
So, if you specify a library too early in the command, it will never get used, even when other libraries depend on it!
The correct way to link is to first specify the
libs, then the
ldflags like so:
Now, all the missing symbols are filled in by libraries further to the right, so the linker can work correctly!
The LLVM MC (Machine Code) library is well-suited to large-scale disassembly applications. Let’s see the most basic way to use it.
We start with a buffer of x86 assembly formatted as a
std::string of hex characters:
89e5. We want to disassemble this to the
mov ebp, esp instruction.
You’ll need to include these header files:
Now, initialize everything
These functions are defined in
We need a handle to a
LLVMDisasmContextRef for all future functions, and we can use
LLVMCreateDisasm to make one. The first argument to
LLVMCreateDisasm is a
TripleName, which is formatted like
archType-vendor-OS. I think the default vendor is
"unknown". Some valid triples include
x86_64-unknown-linux-gnu, i486--linux-gnu, etc.
If you want to set Intel syntax, you need to use
LLVMSetDisasmOptions(). If you set the correct flag, it will toggle one of three options:
By default, the asm printer is AT&T syntax, so we need to toggle option flag
4 for Intel:
Finally, let’s disassemble our hex string. For this, we’ll use the
This function takes an input buffer of
uint8_t, an output buffer of
chars, proper lengths, and a program counter
Here’s a routine to convert a
std::string to a
Now, we’re all set to use
To see more usages of the LLVM McDisassembly API, check out the LLVM Project Blog.