Runtime Dynamic Linking

Dynamically linked binaries (usually) resolve external function calls lazily through what's called the Procedure Linkage Table (PLT). The PLT holds an entry for each external function reference. When the function, say printf, is first called, it jumps to a known offset within the PLT corresponding to that function. This location contains a few instructions. The first performs an indirect jump into an entry of the Global Offset Table (GOT). At first, this entry contains the address of the instruction following the previous jump. This method is commonly known as trampolining. The next instruction pushes some info on the stack (the PLT offset) and jumps to the very first entry into the PLT, which calls into the dynamic linker's resolution function (_dl_runtime_resolve for ld.so).

This call is simply another indirect jump into the GOT. The first three entries of the GOT are reserved, and are filled in by the dynamic linker on program startup. GOT[0] is the address of the program's .dynamic segment. This segment holds a lot of pointers to other parts of the ELF. It basically serves as a guide for the dynamic linker to navigate the ELF.

GOT[1] is the pointer to a data structure that the dynamic linker manages. This data structure is a linked list of nodes corresponding to the symbol tables for each shared library linked with the program. When a symbol is to be resolved by the linker, this list is traversed to find the appropriate symbol. Using the LD_PRELOAD environment variable basically ensures that your preload library will be the first node on this list.

Finally, GOT[2] is the address osf the symbol resolution function within the dynamic linker. In ld.so, it contains the address of the function named _dl_runtime_resolve, which is basically an assembly stub that does some register/stack setup and calls into a C function called dl_fixup(). dl_fixup is the workhorse that actually resolves the symbol in question. Once the symbol's address is found, the program's GOT entry for it must be patched. This is also the job of dl_fixup(). Once dl_fixup() patches the correct GOT entry, the next time the function is called, it will again jump to the PLT entry, but this time the indirect jump there will go to the symbol's address instead of the following instruction.

This method of lazy symbol resolution avoids costly lookups for functions that aren't even called. You can force the linker to eagerly resolve symbols on program startup by setting the LD_BIND_NOW environment variable.