Position Independent Code on i386 -- what is it, why is it needed, how to write, how to exploit

In the old days of UNIX, a compiled binary (in a.out format) was built by linker (ld(1)) from series of object files (.o). Such a.out file was ready to be loaded to memory and run, because the code was pre-linked for some constant address by linker and didn't require any further processing. This meant that code of the same executable could easily be shared by processes running it, using the same subsystem as generic memory and file management.

However, soon a problem arose: compiled code could be shared, but only by processes running the same binary. This meant that two shell processes would be memory-efficient, but one csh and one sh process couldn't.

You'd think it isn't of much problem, since csh contains other code than sh. Well, not really. They both contain some parts of standard c library -- the thing that implements such functions as strcpy(3), printf(3), exit(3) and the like. As more and more features were added to it, it grew in size. And since it was used by almost all programs, all executable binaries grew in size as well, and lots of memory was wasted for duplicate libc code.

The obvious solution was to put libc into separate file, and have both libc and program binary be mapped to address space on program execution.

However, it turned up several problems:

This clearly means that we cannot write dynamic libraries as usual.


To counter first and second problem, the only solution is not to link libraries to any address at all, and choose it dynamically when loading program. Linking program at fixed address is allright, however -- after all, there is only one program binary per process.

This poses several problems if we want to retain the most important requirement -- binary code sharable across processes. This means that code cannot refer to direct address of anything. We need to somehow write a code that won't depend on its own position, and yet it somehow needs to be able to determine not only its own address, but address of every symbol in process. Sounds hard. And code that meets above requirements is called "Position-Independent Code". Now you probably know why.

Get your own address

Solutions to PIC problem vary very much across architectures, since they have very different capablilites concerning Program Counter manipulation. Rest of this tutorial will only reference i386 sequences.

So, let's say we have following code.

int foo() {
static int i = 0;
return ++i;
}

Compiled to normal (non-PIC) code, it looks like that:

.bss
foo.i:
.skip 4
.text
.global foo
foo:
incl foo.i
movl foo.i, %eax
ret

What's wrong with this code as PIC code? Well, it refers to address of foo.i symbol. This address is embedded directly in incl and movl instructions, so it resides in .text section. And it should be Position-independent. Bad.

So, how are we supposed to do it? Address of i of course isn't constant across processes. But since it resides in the same library as code we're writing, the distance from function code to foo.i symbol is constant. Why? Let's see.

Suppose foo is placed 0x120 bytes after start of library image, and foo.i is placed 0x1030 after its start. We then have:

Address of library image in memoryAddress of foo.iAddress of fooDistance foo.i-foo
Library file0x1030-0x120=0xf10
Process 10x1200000x121030-0x120120=0xf10
Process 20x800000000x80001030-0x80000120=0xf10
Process 30xb7fef0000xb7ff0030-0xb7fef120=0xf10

Coincidence? Of course no. Distances between symbols in the same library stay constant across processes. And can be used in PIC.

This means that we can get address of everything in our own file, by determining address of anything else in this file. How to do it and what adddress can we get easily?

Let's see. With i386 standard calling sequence, the only common register with fairly known value on entry is eip. It contains address of currently executing function. Which is our function foo. It resides in our library. This means we're home, if only we can extract value from eip.

However, i386 provides no direct way to do it -- you cannot just movl from eip. Sounds bad.

But searching for eip in intel manuals turns up interesting thing: call instruction. It pushes eip onto the stack. But it also jumps to given address. Fortunately, the address is given as distance from current eip to target. This sounds like something we can use in PIC.

So, the following function will return its own eip:

fun:
call .label	#this pushes current eip on stack, as return address...
.label:
popl %eax	#... which is (mis-)used here as return value from the function, by popping it directly.
ret

We can now determine address of something in our own library. The question is: what is it? Is it the address of function? Let's test.

#include <stdio.h>
/* test program for above code */

asm (
	".text\n"	/* make sure function is assembled to text section. */
	"fun:\n"
	"call .label\n"
	".label:\n"
	"popl %eax\n"
	"ret\n"
	".previous\n"
);

void *fun ();

int main() {
	printf("address of function: %p\n", &fun);
	printf("the function returns: %p\n", fun());
	return 0;
}

Results are a bit disappointing:

mwk@Heracles ~ $ ./a.out
address of function: 0x8048384
the function returns: 0x8048389

It returns something close to its own address, but not quite. Why? Let's look at i386 manual. Hm. It seems to say that not the address of call instruction is pushed on stack, but the address of the following instruction. Why? Because ret instruction simply pops eip from the stack. If call pushed its own address, ret would cause the call to be executed again. But if call pushes address of sollowing instruction, ret jumps to next instruction after call. This is good.

So, we determined that fun returns address of thing that follows call. What is it? It's the popl instruction. The same thing that we earlier called with label .label. So, we now just need to add another distance in code to result of pop to get our own address:

fun:
call .label
.label:
popl %eax
addl $fun-.label, %eax
ret

After changing assembly inline in our c tester, we get following results:

mwk@Heracles ~ $ ./a.out
address of function: 0x8048384
the function returns: 0x8048384

Got it!

Why it works? Well, look at the above table (the one with addresses of foo and foo.i in different processes), just change the names to fun and .label. If we know the distance between two symbols and address of one symbol, we can recreate address of second symbol by adding the difference to address of first one. This is exactly what the addl instruction does.

Let's try to re-write our eariler foo function with static i variable as PIC using this technique:

.bss
foo.i:
.skip 4
.text
.global foo
foo:
call .label
.label:
popl %ecx			#now ecx contains address of .label
addl $foo.i-.label, %ecx	#now ecx contains address of foo.i
incl (%ecx)
movl (%ecx), %eax
ret

You can check that it works, but looking at disassemblies is more interesting:

mwk@Heracles ~/pic $ objdump -rd a.o

a.o:     file format elf32-i386

Disassembly of section .text:

00000000 :
   0:   ff 05 00 00 00 00       incl   0x0
                        2: R_386_32     .bss
   6:   a1 00 00 00 00          mov    0x0,%eax
                        7: R_386_32     .bss
   b:   c3                      ret

mwk@Heracles ~/pic $ objdump -rd b.o

b.o:     file format elf32-i386

Disassembly of section .text:

00000000 :
   0:   e8 00 00 00 00          call   5 <.label>

00000005 <.label>:
   5:   59                      pop    %ecx
   6:   81 c1 03 00 00 00       add    $0x3,%ecx
                        8: R_386_PC32   .bss
   c:   ff 01                   incl   (%ecx)
   e:   8b 01                   mov    (%ecx),%eax
  10:   c3                      ret

Our first non-PIC example contained R_386_32 relocations, which are direct addresses of symbol, and cannot be used in PIC libs. But second one contains only R_386_PC32 one, which is distance to symbol from current place, which is PIC-ok. We won.

This technique works very good when referring to your own data. But what about changing global variable from another library (or main program)? What about calling function from other lib? We cannot calculate distance to them, since we can't predict load addresses in advance. This is gonna be hard.

Global Offset Table

Is there any possibility at all to refer to external symbols without sacrificing PICness? Well, there isn't. The address of external symbol has to be filled in runtime by dynamic linker. How should it tell it to library, then?

Three facts:

This concept is called Global Offset Table: There is big section called .got which resides in data segment and contains only zeroes on start, but dynamic linker will write addresses of all external symbols there after loading everything. Then, the code that wants address of external symbol will first access GOT in the same way as static variables, then use address read from GOT as address of the real symbol. This is something like pointer-to-pointer, but it's done automatically by compiler. For example, let's suppose libfoo.so refers to symbol sym defined in libbar.so.

      libfoo.so                     libbar.so
/--------------------------A-D-D-R-E-S-S---S-P-A-C-E---------------------------\
|    | code | data | got | ...... | code | data | sym | data | got | ......... |
\------------------------------------------------------------------------------/
      |              ^                            ^
      |              |\----->---------------->----/
      \-->------>----/  GOT entry filled by dynlinker
       known distance

How does it look in practice? Well, let's try to implement this tiny function, assuming sym is in other lib.

extern int sym;

int foo() {
	return sym;
}

The first thing such a code would need to do is determine address of the GOT. You'd expect it's the same as getting address of foo.i in above example. Sadly, not. Assembler designers for some weird and unclear reason decided that referencing _GLOBAL_OFFSET_TABLE_ will be special -- it will refer to GOT, but it isn't the address of GOT. It is defined as "distance from address of current instruction to address of GOT". Honestly, i have no idea why. Maybe they thought it'll be easier to use this way, but it isn't, since you need to do it relative to .label, not to address of add instuction. But well.

So, PIC code generated for above C function looks like that:

.global foo
foo:
call .label
.label:
popl %ecx		#ecx contains address of .label
addl $_GLOBAL_OFFSET_TABLE_+[.-.label], %ecx	#now, ecx contains address of the GOT.
#So far so good, but how do we determine where in GOT is located address of
#sym? We need to use special symbol reference, @got.
movl sym@got(%ecx), %edx	# get address of sym from GOT and store it to edx
movl (%edx), %eax		# since address of sym is now in edx, we can just read from it,
ret				# and return the result.

Done.

Procedure Linkage Table

But now, this was variable. How can we do function call? We could just use indirect jump with the above sequence, and it would work. But there is place for optimisation here.

What would happen if the above sequence was executed if GOT wasn't filled? It obviously wouldn't work, and would probably segfault. This basically means that all GOT entries have to be filled by dynlinker on start, since otherwise program will explode. But, dynlinker doesn't know how many of them will actually be used by program, so it can lead to significant time wastage. This is unavoidable with variables.

But for function calls, there is small trick. Functions are usually called in only one way -- using call instruction. This means that if, instead of real address, we fill the GOT entry with address of some other thing that is valid function, it will be executed instead and nothing bad will happen.

Why is it so useful? Because as the address of function we can place address of a fake function that will find out address of real function, put it into GOT instead of its own address, and then jump to real function, pretending it didn't ever happen. What's the point? Well, such a function would likely be the same for all functions in existance, so we have only one address to determine and we can quickly fill all GOT entries with it. Then, each time this function is called, it does the real job, finds out address of real function, puts it in GOT entry instead of own address, and jumps there. This means that function address is effectively resolved when it's needed first time. And it's done only once, since after first time, linkage function sets GOT entry to real thing.

This approach is called lazy binding and has two important features: it delays all the work, possibly not doing it at all for functions that are never used, and it doesn't slow down things much when function is called lots of times -- only the first time is slow. But it doesn't matter, since otherwise this would have to be done before the program is run.

To use this mechanism, Procedure Linkage Table was created. I won't describe here how it exactly works -- details can be found in psABI, but are not really needed. The important thing is that PLT contains one entry for each function you use. And the entry itself is a callable code that in turn calls needed function. It can be used in the same way you'd normally call a function. And since PLT is inside the library, you can use normal call instruction to call it. Just add @plt to tell assembler to use PLT entry, not the real function.

HOWEVER, there is one small thing you need to remember about. PLT entries obviously have to consult GOT. So they need to have its address passed somewhere. It is defined by ABI to be ebx. So remember: each time you call PLT entry, ensure ebx contains address of your GOT. And remember tht ebx is callee-saved register.

extern int bar();

int foo() {
	return bar();
}

Translates to

.global foo
foo:
pushl %ebx		#remember to save old value of ebx.
call .label
.label:
popl %ebx               #ebx contains address of .label
addl $_GLOBAL_OFFSET_TABLE_+[.-.label], %ebx    #now, ebx contains address of the GOT. as needed by PLT.
call bar@plt		# do the call
popl %ebx		#restore ebx
ret

Executable's point of view

Up to date, everything applied only to libraries. But what about code placed in main binary? Does it have to be PIC? Or can we just code in the same way as before shared libraries era?

The answer is: neither. The code doesn't need to be PIC -- after all, main executable is loaded at constant address, determined when compiling it. However, we cannot write executables in the same way as before -- program knows its own address. But it doesn't know addresses of libraries. It cannot refer to libraries' symbols directly.

But main executable has a very important property -- there is only one main executable per process.

There is only one way that will enable us to refer to all symbols directly from executable. It is putting them IN the executable. But doesn't that counter the very idea of shared libraries?

If done correctly, no.

All symbols in C can be divided into two categories:

First, variables. They are easy to put in executable, even if they're defined in shared libraries. If they are referenced in executable, they form public library interface and have known size that won't change in binary-compatible versions of library. Therefore, we can reserve some place in .bss section of executable and stuff the variable there. On runtime, its initial contents will be copied from shared library (that's what R_386_COPY is for), and address of variable will be considered to be the one inside executable's .bss section.

On the other hand, if they aren't referenced by executable, they don't have known size. But it doesn't matter, since if executable doesn't refer to it, it doesn't need to be in well-known place.

One note: this means that even if your variable is defined in your library, it doesn't mean it'll be there on execution -- it could have been moved to main executable. Therefore, you need the GOT approach to access it. This doesn't apply to static and hidden variables, though, as they won't be accessed by executable directly.

What about functions? Since functions have unknown length to producer of executable, they cannot be handled by copying their code to executable. Is there any solution at all to make it work?

There is.

The trick is to use wisely the fact that they contain executable code. This means that behavior of function won't change if instead of real function, we redirect all accesses to a stub that will check address of real function, and then jump there. We then can say that address of our stub function is the real value of symbol, to satisfy ANSI C99 requirement that all pointers to the same function compare equal.

This sounds alot like PLT used in shared libraries. And yes, this IS PLT. But it isn't the same thing -- in shared libs, you needed to explicitely write @plt to access PLT entry, since it was different from real value of symbol. However, in main executable PLT, real value of the symbol is defined to be address of function's PLT entry. However, the address of function that actually does the job (the one in appropriate library) is only known to main executable's GOT. This means that all accesses to this function go through this PLT entry.

Another difference is that main executable PLT doesn't need ebx to be set up before entry -- since executable isn't PIC, it can just have address of its GOT written diectly in PLT entries. This means that those PLT entries can be called without preparation, just like in normal PIC code.

And here is the answer: you can write old-style, non-PIC code in your main executable, and it'll be alright. There won't even be any performance hit. It'll be changed a bit to work right, but it's all done automatically for you.

Exploiting GOT

GOT is one of things that can be used by attacker to ease his task of gaining control over program, once he has discovered some security hole.

As written in the above section, all library function calls from executable go through PLT. PLT is itself in read-only section, so cannot be modified by exploits, buffer overflows, format strings, and the like. However, it consults GOT for addresses of function bodies to call. It calls anything whose address is in GOT directly, without any checking done. This means that modifying GOT entry will result in your code being called each time program tries to call appropriate function. And modifying code is pretty doable by eg. format string vulerabilities.

How to use it? Let's see. First thing to do is to determine address of GOT entry of some often-used function (this entry will only be consulted when the function is called, so choose something you're sure the program will use). How to do it? Use objdump, and search for PLT entry. Let's search my /bin/ls for entry concerning readdir64 (it's likely to be called in list-directory program, no?)

mwk@Heracles ~ $ objdump -xtrds /bin/ls|less

/bin/ls:     file format elf32-i386
/bin/ls
architecture: i386, flags 0x00000112:
EXEC_P, HAS_SYMS, D_PAGED
start address 0x08049a50

(...)

080494d4 <readdir64@plt>:
80494d4:       ff 25 90 a2 05 08       jmp    *0x805a290
80494da:       68 28 00 00 00          push   $0x28
80494df:       e9 90 ff ff ff          jmp    8049474 <_init+0x18>

What do we know now? Well, it seems like PLT entry for readdir is at address 0x080495d4. And it seems to call function whose address is stored in GOT at address 0x0805a290. So this is address of place which we can overwrite to take over control.

What's left now? Finding some vulnerability in ls. And somehow exploiting an ls process that runs as root or other superuser. But now we know that if we somehow find a way to write to memory, then we can get any code to be called by storing its address at address 0x805a290. How do we do it? Well, maybe stack overflow will enable us to overwrite some pointer which application uses. Maybe format string bug allows us to write over memory with %n. There are lots of ways you could be able to write over memory, but not be able to execute code. GOT overwrite is the technique that allows you to go from memory writing to code execution. And who owns program counter, owns the whole process. Happy exploting.