contrib/llvm/tools/lld/docs/Readers.rst

   1 .. _Readers:
   2
   3 Developing lld Readers
   4 ======================
   5
   6 Note: this document discuss Mach-O port of LLD. For ELF and COFF,
   7 see :doc:`index`.
   8
   9 Introduction
  10 ------------
  11
  12 The purpose of a "Reader" is to take an object file in a particular format
  13 and create an `lld::File`:cpp:class: (which is a graph of Atoms)
  14 representing the object file.  A Reader inherits from
  15 `lld::Reader`:cpp:class: which lives in
  16 :file:`include/lld/Core/Reader.h` and
  17 :file:`lib/Core/Reader.cpp`.
  18
  19 The Reader infrastructure for an object format ``Foo`` requires the
  20 following pieces in order to fit into lld:
  21
  22 :file:`include/lld/ReaderWriter/ReaderFoo.h`
  23
  24    .. cpp:class:: ReaderOptionsFoo : public ReaderOptions
  25
  26       This Options class is the only way to configure how the Reader will
  27       parse any file into an `lld::Reader`:cpp:class: object.  This class
  28       should be declared in the `lld`:cpp:class: namespace.
  29
  30    .. cpp:function:: Reader *createReaderFoo(ReaderOptionsFoo &reader)
  31
  32       This factory function configures and create the Reader. This function
  33       should be declared in the `lld`:cpp:class: namespace.
  34
  35 :file:`lib/ReaderWriter/Foo/ReaderFoo.cpp`
  36
  37    .. cpp:class:: ReaderFoo : public Reader
  38
  39       This is the concrete Reader class which can be called to parse
  40       object files. It should be declared in an anonymous namespace or
  41       if there is shared code with the `lld::WriterFoo`:cpp:class: you
  42       can make a nested namespace (e.g. `lld::foo`:cpp:class:).
  43
  44 You may have noticed that :cpp:class:`ReaderFoo` is not declared in the
  45 ``.h`` file. An important design aspect of lld is that all Readers are
  46 created *only* through an object-format-specific
  47 :cpp:func:`createReaderFoo` factory function. The creation of the Reader is
  48 parametrized through a :cpp:class:`ReaderOptionsFoo` class. This options
  49 class is the one-and-only way to control how the Reader operates when
  50 parsing an input file into an Atom graph. For instance, you may want the
  51 Reader to only accept certain architectures. The options class can be
  52 instantiated from command line options or be programmatically configured.
  53
  54 Where to start
  55 --------------
  56
  57 The lld project already has a skeleton of source code for Readers for
  58 ``ELF``, ``PECOFF``, ``MachO``, and lld's native ``YAML`` graph format.
  59 If your file format is a variant of one of those, you should modify the
  60 existing Reader to support your variant. This is done by customizing the Options
  61 class for the Reader and making appropriate changes to the ``.cpp`` file to
  62 interpret those options and act accordingly.
  63
  64 If your object file format is not a variant of any existing Reader, you'll need
  65 to create a new Reader subclass with the organization described above.
  66
  67 Readers are factories
  68 ---------------------
  69
  70 The linker will usually only instantiate your Reader once.  That one Reader will
  71 have its loadFile() method called many times with different input files.
  72 To support multithreaded linking, the Reader may be parsing multiple input
  73 files in parallel. Therefore, there should be no parsing state in you Reader
  74 object.  Any parsing state should be in ivars of your File subclass or in
  75 some temporary object.
  76
  77 The key method to implement in a reader is::
  78
  79   virtual error_code loadFile(LinkerInput &input,
  80                               std::vector<std::unique_ptr<File>> &result);
  81
  82 It takes a memory buffer (which contains the contents of the object file
  83 being read) and returns an instantiated lld::File object which is
  84 a collection of Atoms. The result is a vector of File pointers (instead of
  85 simple a File pointer) because some file formats allow multiple object
  86 "files" to be encoded in one file system file.
  87
  88
  89 Memory Ownership
  90 ----------------
  91
  92 Atoms are always owned by their File object. During core linking when Atoms
  93 are coalesced or stripped away, core linking does not delete them.
  94 Core linking just removes those unused Atoms from its internal list.
  95 The destructor of a File object is responsible for deleting all Atoms it
  96 owns, and if ownership of the MemoryBuffer was passed to it, the File
  97 destructor needs to delete that too.
  98
  99 Making Atoms
 100 ------------
 101
 102 The internal model of lld is purely Atom based.  But most object files do not
 103 have an explicit concept of Atoms, instead most have "sections". The way
 104 to think of this is that a section is just a list of Atoms with common
 105 attributes.
 106
 107 The first step in parsing section-based object files is to cleave each
 108 section into a list of Atoms. The technique may vary by section type. For
 109 code sections (e.g. .text), there are usually symbols at the start of each
 110 function. Those symbol addresses are the points at which the section is
 111 cleaved into discrete Atoms.  Some file formats (like ELF) also include the
 112 length of each symbol in the symbol table. Otherwise, the length of each
 113 Atom is calculated to run to the start of the next symbol or the end of the
 114 section.
 115
 116 Other sections types can be implicitly cleaved. For instance c-string literals
 117 or unwind info (e.g. .eh_frame) can be cleaved by having the Reader look at
 118 the content of the section.  It is important to cleave sections into Atoms
 119 to remove false dependencies. For instance the .eh_frame section often
 120 has no symbols, but contains "pointers" to the functions for which it
 121 has unwind info.  If the .eh_frame section was not cleaved (but left as one
 122 big Atom), there would always be a reference (from the eh_frame Atom) to
 123 each function.  So the linker would be unable to coalesce or dead stripped
 124 away the function atoms.
 125
 126 The lld Atom model also requires that a reference to an undefined symbol be
 127 modeled as a Reference to an UndefinedAtom. So the Reader also needs to
 128 create an UndefinedAtom for each undefined symbol in the object file.
 129
 130 Once all Atoms have been created, the second step is to create References
 131 (recall that Atoms are "nodes" and References are "edges"). Most References
 132 are created by looking at the "relocation records" in the object file. If
 133 a function contains a call to "malloc", there is usually a relocation record
 134 specifying the address in the section and the symbol table index. Your
 135 Reader will need to convert the address to an Atom and offset and the symbol
 136 table index into a target Atom. If "malloc" is not defined in the object file,
 137 the target Atom of the Reference will be an UndefinedAtom.
 138
 139
 140 Performance
 141 -----------
 142 Once you have the above working to parse an object file into Atoms and
 143 References, you'll want to look at performance.  Some techniques that can
 144 help performance are:
 145
 146 * Use llvm::BumpPtrAllocator or pre-allocate one big vector<Reference> and then
 147   just have each atom point to its subrange of References in that vector.
 148   This can be faster that allocating each Reference as separate object.
 149 * Pre-scan the symbol table and determine how many atoms are in each section
 150   then allocate space for all the Atom objects at once.
 151 * Don't copy symbol names or section content to each Atom, instead use
 152   StringRef and ArrayRef in each Atom to point to its name and content in the
 153   MemoryBuffer.
 154
 155
 156 Testing
 157 -------
 158
 159 We are still working on infrastructure to test Readers. The issue is that
 160 you don't want to check in binary files to the test suite. And the tools
 161 for creating your object file from assembly source may not be available on
 162 every OS.
 163
 164 We are investigating a way to use YAML to describe the section, symbols,
 165 and content of a file. Then have some code which will write out an object
 166 file from that YAML description.
 167
 168 Once that is in place, you can write test cases that contain section/symbols
 169 YAML and is run through the linker to produce Atom/References based YAML which
 170 is then run through FileCheck to verify the Atoms and References are as
 171 expected.
 172
 173
 174