contrib/llvm/tools/lld/docs/NewLLD.rst

   1 The ELF, COFF and Wasm Linkers
   2 ==============================
   3
   4 The ELF Linker as a Library
   5 ---------------------------
   6
   7 You can embed LLD to your program by linking against it and calling the linker's
   8 entry point function lld::elf::link.
   9
  10 The current policy is that it is your reponsibility to give trustworthy object
  11 files. The function is guaranteed to return as long as you do not pass corrupted
  12 or malicious object files. A corrupted file could cause a fatal error or SEGV.
  13 That being said, you don't need to worry too much about it if you create object
  14 files in the usual way and give them to the linker. It is naturally expected to
  15 work, or otherwise it's a linker's bug.
  16
  17 Design
  18 ======
  19
  20 We will describe the design of the linkers in the rest of the document.
  21
  22 Key Concepts
  23 ------------
  24
  25 Linkers are fairly large pieces of software.
  26 There are many design choices you have to make to create a complete linker.
  27
  28 This is a list of design choices we've made for ELF and COFF LLD.
  29 We believe that these high-level design choices achieved a right balance
  30 between speed, simplicity and extensibility.
  31
  32 * Implement as native linkers
  33
  34   We implemented the linkers as native linkers for each file format.
  35
  36   The linkers share the same design but share very little code.
  37   Sharing code makes sense if the benefit is worth its cost.
  38   In our case, the object formats are different enough that we thought the layer
  39   to abstract the differences wouldn't be worth its complexity and run-time
  40   cost.  Elimination of the abstract layer has greatly simplified the
  41   implementation.
  42
  43 * Speed by design
  44
  45   One of the most important things in archiving high performance is to
  46   do less rather than do it efficiently.
  47   Therefore, the high-level design matters more than local optimizations.
  48   Since we are trying to create a high-performance linker,
  49   it is very important to keep the design as efficient as possible.
  50
  51   Broadly speaking, we do not do anything until we have to do it.
  52   For example, we do not read section contents or relocations
  53   until we need them to continue linking.
  54   When we need to do some costly operation (such as looking up
  55   a hash table for each symbol), we do it only once.
  56   We obtain a handler (which is typically just a pointer to actual data)
  57   on the first operation and use it throughout the process.
  58
  59 * Efficient archive file handling
  60
  61   LLD's handling of archive files (the files with ".a" file extension) is
  62   different from the traditional Unix linkers and similar to Windows linkers.
  63   We'll describe how the traditional Unix linker handles archive files, what the
  64   problem is, and how LLD approached the problem.
  65
  66   The traditional Unix linker maintains a set of undefined symbols during
  67   linking.  The linker visits each file in the order as they appeared in the
  68   command line until the set becomes empty. What the linker would do depends on
  69   file type.
  70
  71   - If the linker visits an object file, the linker links object files to the
  72     result, and undefined symbols in the object file are added to the set.
  73
  74   - If the linker visits an archive file, it checks for the archive file's
  75     symbol table and extracts all object files that have definitions for any
  76     symbols in the set.
  77
  78   This algorithm sometimes leads to a counter-intuitive behavior.  If you give
  79   archive files before object files, nothing will happen because when the linker
  80   visits archives, there is no undefined symbols in the set.  As a result, no
  81   files are extracted from the first archive file, and the link is done at that
  82   point because the set is empty after it visits one file.
  83
  84   You can fix the problem by reordering the files,
  85   but that cannot fix the issue of mutually-dependent archive files.
  86
  87   Linking mutually-dependent archive files is tricky.  You may specify the same
  88   archive file multiple times to let the linker visit it more than once.  Or,
  89   you may use the special command line options, `--start-group` and
  90   `--end-group`, to let the linker loop over the files between the options until
  91   no new symbols are added to the set.
  92
  93   Visiting the same archive files multiple makes the linker slower.
  94
  95   Here is how LLD approaches the problem. Instead of memorizing only undefined
  96   symbols, we program LLD so that it memorizes all symbols.  When it sees an
  97   undefined symbol that can be resolved by extracting an object file from an
  98   archive file it previously visited, it immediately extracts the file and link
  99   it.  It is doable because LLD does not forget symbols it have seen in archive
 100   files.
 101
 102   We believe that the LLD's way is efficient and easy to justify.
 103
 104   The semantics of LLD's archive handling is different from the traditional
 105   Unix's.  You can observe it if you carefully craft archive files to exploit
 106   it.  However, in reality, we don't know any program that cannot link with our
 107   algorithm so far, so it's not going to cause trouble.
 108
 109 Numbers You Want to Know
 110 ------------------------
 111
 112 To give you intuition about what kinds of data the linker is mainly working on,
 113 I'll give you the list of objects and their numbers LLD has to read and process
 114 in order to link a very large executable. In order to link Chrome with debug
 115 info, which is roughly 2 GB in output size, LLD reads
 116
 117 - 17,000 files,
 118 - 1,800,000 sections,
 119 - 6,300,000 symbols, and
 120 - 13,000,000 relocations.
 121
 122 LLD produces the 2 GB executable in 15 seconds.
 123
 124 These numbers vary depending on your program, but in general,
 125 you have a lot of relocations and symbols for each file.
 126 If your program is written in C++, symbol names are likely to be
 127 pretty long because of name mangling.
 128
 129 It is important to not waste time on relocations and symbols.
 130
 131 In the above case, the total amount of symbol strings is 450 MB,
 132 and inserting all of them to a hash table takes 1.5 seconds.
 133 Therefore, if you causally add a hash table lookup for each symbol,
 134 it would slow down the linker by 10%. So, don't do that.
 135
 136 On the other hand, you don't have to pursue efficiency
 137 when handling files.
 138
 139 Important Data Structures
 140 -------------------------
 141
 142 We will describe the key data structures in LLD in this section.  The linker can
 143 be understood as the interactions between them.  Once you understand their
 144 functions, the code of the linker should look obvious to you.
 145
 146 * Symbol
 147
 148   This class represents a symbol.
 149   They are created for symbols in object files or archive files.
 150   The linker creates linker-defined symbols as well.
 151
 152   There are basically three types of Symbols: Defined, Undefined, or Lazy.
 153
 154   - Defined symbols are for all symbols that are considered as "resolved",
 155     including real defined symbols, COMDAT symbols, common symbols,
 156     absolute symbols, linker-created symbols, etc.
 157   - Undefined symbols represent undefined symbols, which need to be replaced by
 158     Defined symbols by the resolver until the link is complete.
 159   - Lazy symbols represent symbols we found in archive file headers
 160     which can turn into Defined if we read archieve members.
 161
 162   There's only one Symbol instance for each unique symbol name. This uniqueness
 163   is guaranteed by the symbol table. As the resolver reads symbols from input
 164   files, it replaces an existing Symbol with the "best" Symbol for its symbol
 165   name using the placement new.
 166
 167   The above mechanism allows you to use pointers to Symbols as a very cheap way
 168   to access name resolution results. Assume for example that you have a pointer
 169   to an undefined symbol before name resolution. If the symbol is resolved to a
 170   defined symbol by the resolver, the pointer will "automatically" point to the
 171   defined symbol, because the undefined symbol the pointer pointed to will have
 172   been replaced by the defined symbol in-place.
 173
 174 * SymbolTable
 175
 176   SymbolTable is basically a hash table from strings to Symbols
 177   with logic to resolve symbol conflicts. It resolves conflicts by symbol type.
 178
 179   - If we add Defined and Undefined symbols, the symbol table will keep the
 180     former.
 181   - If we add Defined and Lazy symbols, it will keep the former.
 182   - If we add Lazy and Undefined, it will keep the former,
 183     but it will also trigger the Lazy symbol to load the archive member
 184     to actually resolve the symbol.
 185
 186 * Chunk (COFF specific)
 187
 188   Chunk represents a chunk of data that will occupy space in an output.
 189   Each regular section becomes a chunk.
 190   Chunks created for common or BSS symbols are not backed by sections.
 191   The linker may create chunks to append additional data to an output as well.
 192
 193   Chunks know about their size, how to copy their data to mmap'ed outputs,
 194   and how to apply relocations to them.
 195   Specifically, section-based chunks know how to read relocation tables
 196   and how to apply them.
 197
 198 * InputSection (ELF specific)
 199
 200   Since we have less synthesized data for ELF, we don't abstract slices of
 201   input files as Chunks for ELF. Instead, we directly use the input section
 202   as an internal data type.
 203
 204   InputSection knows about their size and how to copy themselves to
 205   mmap'ed outputs, just like COFF Chunks.
 206
 207 * OutputSection
 208
 209   OutputSection is a container of InputSections (ELF) or Chunks (COFF).
 210   An InputSection or Chunk belongs to at most one OutputSection.
 211
 212 There are mainly three actors in this linker.
 213
 214 * InputFile
 215
 216   InputFile is a superclass of file readers.
 217   We have a different subclass for each input file type,
 218   such as regular object file, archive file, etc.
 219   They are responsible for creating and owning Symbols and InputSections/Chunks.
 220
 221 * Writer
 222
 223   The writer is responsible for writing file headers and InputSections/Chunks to
 224   a file.  It creates OutputSections, put all InputSections/Chunks into them,
 225   assign unique, non-overlapping addresses and file offsets to them, and then
 226   write them down to a file.
 227
 228 * Driver
 229
 230   The linking process is driven by the driver. The driver:
 231
 232   - processes command line options,
 233   - creates a symbol table,
 234   - creates an InputFile for each input file and puts all symbols within into
 235     the symbol table,
 236   - checks if there's no remaining undefined symbols,
 237   - creates a writer,
 238   - and passes the symbol table to the writer to write the result to a file.
 239
 240 Link-Time Optimization
 241 ----------------------
 242
 243 LTO is implemented by handling LLVM bitcode files as object files.
 244 The linker resolves symbols in bitcode files normally. If all symbols
 245 are successfully resolved, it then runs LLVM passes
 246 with all bitcode files to convert them to one big regular ELF/COFF file.
 247 Finally, the linker replaces bitcode symbols with ELF/COFF symbols,
 248 so that they are linked as if they were in the native format from the beginning.
 249
 250 The details are described in this document.
 251 http://llvm.org/docs/LinkTimeOptimization.html
 252
 253 Glossary
 254 --------
 255
 256 * RVA (COFF)
 257
 258   Short for Relative Virtual Address.
 259
 260   Windows executables or DLLs are not position-independent; they are
 261   linked against a fixed address called an image base. RVAs are
 262   offsets from an image base.
 263
 264   Default image bases are 0x140000000 for executables and 0x18000000
 265   for DLLs. For example, when we are creating an executable, we assume
 266   that the executable will be loaded at address 0x140000000 by the
 267   loader, so we apply relocations accordingly. Result texts and data
 268   will contain raw absolute addresses.
 269
 270 * VA
 271
 272   Short for Virtual Address. For COFF, it is equivalent to RVA + image base.
 273
 274 * Base relocations (COFF)
 275
 276   Relocation information for the loader. If the loader decides to map
 277   an executable or a DLL to a different address than their image
 278   bases, it fixes up binaries using information contained in the base
 279   relocation table. A base relocation table consists of a list of
 280   locations containing addresses. The loader adds a difference between
 281   RVA and actual load address to all locations listed there.
 282
 283   Note that this run-time relocation mechanism is much simpler than ELF.
 284   There's no PLT or GOT. Images are relocated as a whole just
 285   by shifting entire images in memory by some offsets. Although doing
 286   this breaks text sharing, I think this mechanism is not actually bad
 287   on today's computers.
 288
 289 * ICF
 290
 291   Short for Identical COMDAT Folding (COFF) or Identical Code Folding (ELF).
 292
 293   ICF is an optimization to reduce output size by merging read-only sections
 294   by not only their names but by their contents. If two read-only sections
 295   happen to have the same metadata, actual contents and relocations,
 296   they are merged by ICF. It is known as an effective technique,
 297   and it usually reduces C++ program's size by a few percent or more.
 298
 299   Note that this is not an entirely sound optimization. C/C++ require
 300   different functions have different addresses. If a program depends on
 301   that property, it would fail at runtime.
 302
 303   On Windows, that's not really an issue because MSVC link.exe enabled
 304   the optimization by default. As long as your program works
 305   with the linker's default settings, your program should be safe with ICF.
 306
 307   On Unix, your program is generally not guaranteed to be safe with ICF,
 308   although large programs happen to work correctly.
 309   LLD works fine with ICF for example.