The Linux implementation provides a reflective ELF loader and ptrace-based process injection, supporting six architectures.
| Architecture | ELF Class | Pointer Size | Relocation Style | Entry Trampoline |
|---|---|---|---|---|
| x86_64 | ELF64 | 8 bytes | RELA | entry_x86_64.S |
| aarch64 | ELF64 | 8 bytes | RELA | entry_aarch64.S |
| i386 | ELF32 | 4 bytes | REL | entry_x86.S |
| ARM (LE/BE) | ELF32 | 4 bytes | REL | entry_arm.S |
| MIPS (LE/BE) | ELF32 | 4 bytes | REL | entry_mips.S |
| SPARC | ELF32 | 4 bytes | RELA | entry_sparc.S |
Architecture is selected at compile time via the compiler's predefined macros. A single build targets exactly one architecture.
The loader uses a compile-time type bridge to handle both ELF32 and ELF64 transparently:
#if __SIZEOF_POINTER__ == 8
typedef Elf64_Ehdr Elf_Ehdr;
typedef Elf64_Phdr Elf_Phdr;
typedef Elf64_Dyn Elf_Dyn;
typedef Elf64_Sym Elf_Sym;
typedef Elf64_Rela Elf_Rela;
typedef Elf64_Rel Elf_Rel;
typedef Elf64_Addr Elf_Addr;
// ...
#else
typedef Elf32_Ehdr Elf_Ehdr;
// ... (32-bit equivalents)
#endifThis means Elf_Addr is uint32_t on 32-bit targets and uint64_t on 64-bit, and all pointer-sized slots in relocations are handled correctly.
- Parse ELF header — verify magic, class (32/64), machine type
- Map LOAD segments —
mmapeachPT_LOADsegment at the correct virtual offset. ForET_DYN(shared objects), the base address is chosen by the kernel. ForET_EXEC, segments are mapped at their specified virtual addresses. - Process DYNAMIC section — walk
PT_DYNAMICto find:DT_SYMTAB,DT_STRTAB,DT_SYMENT— symbol tableDT_GNU_HASHorDT_HASH— hash table for symbol lookupDT_RELA/DT_REL— relocation tablesDT_JMPREL+DT_PLTREL— PLT relocations (may be REL or RELA)DT_INIT,DT_INIT_ARRAY— constructors
- Apply relocations — for each relocation entry:
- Classify as relative or import using per-architecture functions
- Relative:
*slot += base(adjusts internal pointers) - Import:
*slot = dlsym(RTLD_DEFAULT, name) + addend
- Flush instruction cache — on architectures with non-coherent I/D caches (ARM, MIPS, SPARC), issue
__builtin___clear_cache - Run constructors — call
DT_INIT, then each function inDT_INIT_ARRAY - Return handle — the context stores the load base, segment map, and dynamic tables for later
libload_symcalls
| Architecture | Relative | Global Data | Jump Slot | Absolute |
|---|---|---|---|---|
| x86_64 | R_X86_64_RELATIVE (8) |
R_X86_64_GLOB_DAT (6) |
R_X86_64_JUMP_SLOT (7) |
R_X86_64_64 (1) |
| aarch64 | R_AARCH64_RELATIVE (1027) |
R_AARCH64_GLOB_DAT (1025) |
R_AARCH64_JUMP_SLOT (1026) |
R_AARCH64_ABS64 (257) |
| i386 | R_386_RELATIVE (8) |
R_386_GLOB_DAT (6) |
R_386_JMP_SLOT (7) |
R_386_32 (1) |
| ARM | R_ARM_RELATIVE (23) |
R_ARM_GLOB_DAT (21) |
R_ARM_JUMP_SLOT (22) |
R_ARM_ABS32 (2) |
| MIPS | R_MIPS_REL32 (3)¹ |
R_MIPS_GLOB_DAT (51) |
R_MIPS_JUMP_SLOT (127) |
R_MIPS_32 (2) |
| SPARC | R_SPARC_RELATIVE (22) |
R_SPARC_GLOB_DAT (20) |
R_SPARC_JMP_SLOT (21) |
R_SPARC_32 (3) |
¹ MIPS uses R_MIPS_REL32 for both relative and import relocations. When sym == 0, it's treated as relative; when sym > 0, it's an import.
- RELA (used by x86_64, aarch64, SPARC): addend is stored in the relocation entry itself (
r_addendfield) - REL (used by i386, ARM, MIPS): addend is implicit — read from the slot before applying the relocation
The DT_PLTREL tag specifies whether the PLT relocation table (DT_JMPREL) uses REL or RELA format, which is dispatched at runtime.
When executing a complete ELF binary via libload_exec or libload_run, the loader must set up the initial process stack exactly as the kernel would. Each architecture has a .S file that:
- Receives the stack pointer and entry point as function arguments
- Switches to the prepared stack
- Jumps to the entry point
The initial stack layout follows the System V ABI:
[top of stack]
NULL (end of auxv)
auxv entries... (AT_PAGESZ, AT_PHDR, etc.)
NULL (end of envp)
envp[n-1]
...
envp[0]
NULL (end of argv)
argv[argc-1]
...
argv[0]
argc (integer)
[stack pointer →]
x86_64: Arguments in rdi (stack) and rsi (entry). SysV ABI.
aarch64: Arguments in x0 (stack) and x1 (entry). AAPCS.
i386: Arguments on the stack (cdecl). The trampoline pops them before switching.
ARM: Arguments in r0 (stack) and r1 (entry). Uses BX for Thumb interwork compatibility.
MIPS: Arguments in $a0 (stack) and $a1 (entry). Sets $t9 = entry for PIC convention (MIPS uses $t9 as the static chain register). Includes delay slot handling.
SPARC: Arguments in %o0 (stack) and %o1 (entry). Includes delay slot handling.
libload_sym looks up exported symbols in the loaded ELF's dynamic symbol table:
- If
DT_GNU_HASHis present, use the GNU hash algorithm (bloom filter + bucket chains) - Otherwise fall back to
DT_HASH(SysV hash) - Match the name in
DT_STRTAB - Return
base + sym.st_value
On architectures with separate instruction and data caches (ARM, MIPS, SPARC), the loader issues __builtin___clear_cache() after writing code. On x86/x86_64, this is a no-op since the caches are coherent.
The LL_FLUSH_ICACHE macro handles this:
#if defined(__arm__) || defined(__mips__) || defined(__sparc__)
#define LL_FLUSH_ICACHE(start, end) __builtin___clear_cache((char*)(start), (char*)(end))
#else
#define LL_FLUSH_ICACHE(start, end) ((void)0)
#endif- Only dynamic executables (those with a
PT_DYNAMICsegment) are supported forlibload_open. Statically-linked executables can be executed vialibload_exec/libload_runor converted to flat binaries vialltool elf2binand executed withlibload_exec_bin/libload_run_bin. libload_openrequires the ELF to be a shared object (ET_DYN) or position-independent executable- Thread-local storage (
PT_TLS) is not handled - MIPS
R_MIPS_REL32with non-zero symbol is treated as absolute import; more exotic MIPS relocation types are not supported
Three injection methods are available on Linux, all based on ptrace.
| Method | Requirement |
|---|---|
libload_inject |
ptrace access (Yama scope 0, CAP_SYS_PTRACE, or target is direct child) |
libload_inject_dylib |
Same |
libload_inject_spawn |
None |
Uses ptrace to proxy syscalls in the target: mmap (allocate RW), process_vm_writev (copy code), mprotect (RW→RX), clone (new thread at entry). The injector scans the target's executable memory for the architecture-specific syscall instruction to use as a gadget.
| Architecture | Syscall Instruction |
|---|---|
| x86_64 | syscall (0F 05) |
| aarch64 | svc #0 (01 00 00 D4) |
| i386 | int 0x80 (CD 80) |
| ARM | svc #0 (00 00 00 EF / 00 DF Thumb) |
| MIPS | syscall (0C 00 00 00) |
| SPARC | ta 0x10 (91 D0 20 10) |
Attaches via ptrace, hijacks thread registers to call dlopen(path, RTLD_NOW) in the target, catches the trap on return, restores registers. Resolves dlopen address by parsing /proc/<pid>/maps + libc symbol tables.
Forks, sets the LD_PRELOAD environment variable to the .so path, then execves the target. The dynamic linker loads the library before main().
Limitations: Does not work on statically-linked or setuid binaries.
libload_exec_bin and libload_run_bin execute flat binary images produced by lltool elf2bin. These are static-pie ELF executables flattened to a contiguous image with BSS trimmed. The loader reads the preserved ELF header for entry point and program header info, computes total memsz from LOAD segments, maps RWX memory, copies the image, builds the initial stack with auxv, and jumps to the entry point.
See elf2bin.md for format details and the C API.