docs/SanitizerCoverage.rst

   1 =================
   2 SanitizerCoverage
   3 =================
   4
   5 .. contents::
   6    :local:
   7
   8 Introduction
   9 ============
  10
  11 Sanitizer tools have a very simple code coverage tool built in. It allows to
  12 get function-level, basic-block-level, and edge-level coverage at a very low
  13 cost.
  14
  15 How to build and run
  16 ====================
  17
  18 SanitizerCoverage can be used with :doc:`AddressSanitizer`,
  19 :doc:`LeakSanitizer`, :doc:`MemorySanitizer`,
  20 UndefinedBehaviorSanitizer, or without any sanitizer.  Pass one of the
  21 following compile-time flags:
  22
  23 * ``-fsanitize-coverage=func`` for function-level coverage (very fast).
  24 * ``-fsanitize-coverage=bb`` for basic-block-level coverage (may add up to 30%
  25   **extra** slowdown).
  26 * ``-fsanitize-coverage=edge`` for edge-level coverage (up to 40% slowdown).
  27
  28 You may also specify ``-fsanitize-coverage=indirect-calls`` for
  29 additional `caller-callee coverage`_.
  30
  31 At run time, pass ``coverage=1`` in ``ASAN_OPTIONS``,
  32 ``LSAN_OPTIONS``, ``MSAN_OPTIONS`` or ``UBSAN_OPTIONS``, as
  33 appropriate. For the standalone coverage mode, use ``UBSAN_OPTIONS``.
  34
  35 To get `Coverage counters`_, add ``-fsanitize-coverage=8bit-counters``
  36 to one of the above compile-time flags. At runtime, use
  37 ``*SAN_OPTIONS=coverage=1:coverage_counters=1``.
  38
  39 Example:
  40
  41 .. code-block:: console
  42
  43     % cat -n cov.cc
  44          1  #include <stdio.h>
  45          2  __attribute__((noinline))
  46          3  void foo() { printf("foo\n"); }
  47          4
  48          5  int main(int argc, char **argv) {
  49          6    if (argc == 2)
  50          7      foo();
  51          8    printf("main\n");
  52          9  }
  53     % clang++ -g cov.cc -fsanitize=address -fsanitize-coverage=func
  54     % ASAN_OPTIONS=coverage=1 ./a.out; ls -l *sancov
  55     main
  56     -rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov
  57     % ASAN_OPTIONS=coverage=1 ./a.out foo ; ls -l *sancov
  58     foo
  59     main
  60     -rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov
  61     -rw-r----- 1 kcc eng 8 Nov 27 12:21 a.out.22679.sancov
  62
  63 Every time you run an executable instrumented with SanitizerCoverage
  64 one ``*.sancov`` file is created during the process shutdown.
  65 If the executable is dynamically linked against instrumented DSOs,
  66 one ``*.sancov`` file will be also created for every DSO.
  67
  68 Postprocessing
  69 ==============
  70
  71 The format of ``*.sancov`` files is very simple: the first 8 bytes is the magic,
  72 one of ``0xC0BFFFFFFFFFFF64`` and ``0xC0BFFFFFFFFFFF32``. The last byte of the
  73 magic defines the size of the following offsets. The rest of the data is the
  74 offsets in the corresponding binary/DSO that were executed during the run.
  75
  76 A simple script
  77 ``$LLVM/projects/compiler-rt/lib/sanitizer_common/scripts/sancov.py`` is
  78 provided to dump these offsets.
  79
  80 .. code-block:: console
  81
  82     % sancov.py print a.out.22679.sancov a.out.22673.sancov
  83     sancov.py: read 2 PCs from a.out.22679.sancov
  84     sancov.py: read 1 PCs from a.out.22673.sancov
  85     sancov.py: 2 files merged; 2 PCs total
  86     0x465250
  87     0x4652a0
  88
  89 You can then filter the output of ``sancov.py`` through ``addr2line --exe
  90 ObjectFile`` or ``llvm-symbolizer --obj ObjectFile`` to get file names and line
  91 numbers:
  92
  93 .. code-block:: console
  94
  95     % sancov.py print a.out.22679.sancov a.out.22673.sancov 2> /dev/null | llvm-symbolizer --obj a.out
  96     cov.cc:3
  97     cov.cc:5
  98
  99 Sancov Tool
 100 ===========
 101
 102 A new experimental ``sancov`` tool is developed to process coverage files.
 103 The tool is part of LLVM project and is currently supported only on Linux.
 104 It can handle symbolization tasks autonomously without any extra support
 105 from the environment. You need to pass .sancov files (named
 106 ``<module_name>.<pid>.sancov`` and paths to all corresponding binary elf files.
 107 Sancov matches these files using module names and binaries file names.
 108
 109 .. code-block:: console
 110
 111     USAGE: sancov [options] <action> (<binary file>|<.sancov file>)...
 112
 113     Action (required)
 114       -print                    - Print coverage addresses
 115       -covered-functions        - Print all covered functions.
 116       -not-covered-functions    - Print all not covered functions.
 117       -html-report              - Print HTML coverage report.
 118
 119     Options
 120       -blacklist=<string>         - Blacklist file (sanitizer blacklist format).
 121       -demangle                   - Print demangled function name.
 122       -strip_path_prefix=<string> - Strip this prefix from file paths in reports
 123
 124
 125 Automatic HTML Report Generation
 126 ================================
 127
 128 If ``*SAN_OPTIONS`` contains ``html_cov_report=1`` option set, then html
 129 coverage report would be automatically generated alongside the coverage files.
 130 The ``sancov`` binary should be present in ``PATH`` or
 131 ``sancov_path=<path_to_sancov`` option can be used to specify tool location.
 132
 133
 134 How good is the coverage?
 135 =========================
 136
 137 It is possible to find out which PCs are not covered, by subtracting the covered
 138 set from the set of all instrumented PCs. The latter can be obtained by listing
 139 all callsites of ``__sanitizer_cov()`` in the binary. On Linux, ``sancov.py``
 140 can do this for you. Just supply the path to binary and a list of covered PCs:
 141
 142 .. code-block:: console
 143
 144     % sancov.py print a.out.12345.sancov > covered.txt
 145     sancov.py: read 2 64-bit PCs from a.out.12345.sancov
 146     sancov.py: 1 file merged; 2 PCs total
 147     % sancov.py missing a.out < covered.txt
 148     sancov.py: found 3 instrumented PCs in a.out
 149     sancov.py: read 2 PCs from stdin
 150     sancov.py: 1 PCs missing from coverage
 151     0x4cc61c
 152
 153 Edge coverage
 154 =============
 155
 156 Consider this code:
 157
 158 .. code-block:: c++
 159
 160     void foo(int *a) {
 161       if (a)
 162         *a = 0;
 163     }
 164
 165 It contains 3 basic blocks, let's name them A, B, C:
 166
 167 .. code-block:: none
 168
 169     A
 170     |\
 171     | \
 172     |  B
 173     | /
 174     |/
 175     C
 176
 177 If blocks A, B, and C are all covered we know for certain that the edges A=>B
 178 and B=>C were executed, but we still don't know if the edge A=>C was executed.
 179 Such edges of control flow graph are called
 180 `critical <http://en.wikipedia.org/wiki/Control_flow_graph#Special_edges>`_. The
 181 edge-level coverage (``-fsanitize-coverage=edge``) simply splits all critical
 182 edges by introducing new dummy blocks and then instruments those blocks:
 183
 184 .. code-block:: none
 185
 186     A
 187     |\
 188     | \
 189     D  B
 190     | /
 191     |/
 192     C
 193
 194 Bitset
 195 ======
 196
 197 When ``coverage_bitset=1`` run-time flag is given, the coverage will also be
 198 dumped as a bitset (text file with 1 for blocks that have been executed and 0
 199 for blocks that were not).
 200
 201 .. code-block:: console
 202
 203     % clang++ -fsanitize=address -fsanitize-coverage=edge cov.cc
 204     % ASAN_OPTIONS="coverage=1:coverage_bitset=1" ./a.out
 205     main
 206     % ASAN_OPTIONS="coverage=1:coverage_bitset=1" ./a.out 1
 207     foo
 208     main
 209     % head *bitset*
 210     ==> a.out.38214.bitset-sancov <==
 211     01101
 212     ==> a.out.6128.bitset-sancov <==
 213     11011%
 214
 215 For a given executable the length of the bitset is always the same (well,
 216 unless dlopen/dlclose come into play), so the bitset coverage can be
 217 easily used for bitset-based corpus distillation.
 218
 219 Caller-callee coverage
 220 ======================
 221
 222 (Experimental!)
 223 Every indirect function call is instrumented with a run-time function call that
 224 captures caller and callee.  At the shutdown time the process dumps a separate
 225 file called ``caller-callee.PID.sancov`` which contains caller/callee pairs as
 226 pairs of lines (odd lines are callers, even lines are callees)
 227
 228 .. code-block:: console
 229
 230     a.out 0x4a2e0c
 231     a.out 0x4a6510
 232     a.out 0x4a2e0c
 233     a.out 0x4a87f0
 234
 235 Current limitations:
 236
 237 * Only the first 14 callees for every caller are recorded, the rest are silently
 238   ignored.
 239 * The output format is not very compact since caller and callee may reside in
 240   different modules and we need to spell out the module names.
 241 * The routine that dumps the output is not optimized for speed
 242 * Only Linux x86_64 is tested so far.
 243 * Sandboxes are not supported.
 244
 245 Coverage counters
 246 =================
 247
 248 This experimental feature is inspired by
 249 `AFL <http://lcamtuf.coredump.cx/afl/technical_details.txt>`__'s coverage
 250 instrumentation. With additional compile-time and run-time flags you can get
 251 more sensitive coverage information.  In addition to boolean values assigned to
 252 every basic block (edge) the instrumentation will collect imprecise counters.
 253 On exit, every counter will be mapped to a 8-bit bitset representing counter
 254 ranges: ``1, 2, 3, 4-7, 8-15, 16-31, 32-127, 128+`` and those 8-bit bitsets will
 255 be dumped to disk.
 256
 257 .. code-block:: console
 258
 259     % clang++ -g cov.cc -fsanitize=address -fsanitize-coverage=edge,8bit-counters
 260     % ASAN_OPTIONS="coverage=1:coverage_counters=1" ./a.out
 261     % ls -l *counters-sancov
 262     ... a.out.17110.counters-sancov
 263     % xxd *counters-sancov
 264     0000000: 0001 0100 01
 265
 266 These counters may also be used for in-process coverage-guided fuzzers. See
 267 ``include/sanitizer/coverage_interface.h``:
 268
 269 .. code-block:: c++
 270
 271     // The coverage instrumentation may optionally provide imprecise counters.
 272     // Rather than exposing the counter values to the user we instead map
 273     // the counters to a bitset.
 274     // Every counter is associated with 8 bits in the bitset.
 275     // We define 8 value ranges: 1, 2, 3, 4-7, 8-15, 16-31, 32-127, 128+
 276     // The i-th bit is set to 1 if the counter value is in the i-th range.
 277     // This counter-based coverage implementation is *not* thread-safe.
 278
 279     // Returns the number of registered coverage counters.
 280     uintptr_t __sanitizer_get_number_of_counters();
 281     // Updates the counter 'bitset', clears the counters and returns the number of
 282     // new bits in 'bitset'.
 283     // If 'bitset' is nullptr, only clears the counters.
 284     // Otherwise 'bitset' should be at least
 285     // __sanitizer_get_number_of_counters bytes long and 8-aligned.
 286     uintptr_t
 287     __sanitizer_update_counter_bitset_and_clear_counters(uint8_t *bitset);
 288
 289 Tracing basic blocks
 290 ====================
 291 Experimental support for basic block (or edge) tracing.
 292 With ``-fsanitize-coverage=trace-bb`` the compiler will insert
 293 ``__sanitizer_cov_trace_basic_block(s32 *id)`` before every function, basic block, or edge
 294 (depending on the value of ``-fsanitize-coverage=[func,bb,edge]``).
 295 Example:
 296
 297 .. code-block:: console
 298
 299     % clang -g -fsanitize=address -fsanitize-coverage=edge,trace-bb foo.cc
 300     % ASAN_OPTIONS=coverage=1 ./a.out
 301
 302 This will produce two files after the process exit:
 303 `trace-points.PID.sancov` and `trace-events.PID.sancov`.
 304 The first file will contain a textual description of all the instrumented points in the program
 305 in the form that you can feed into llvm-symbolizer (e.g. `a.out 0x4dca89`), one per line.
 306 The second file will contain the actual execution trace as a sequence of 4-byte integers
 307 -- these integers are the indices into the array of instrumented points (the first file).
 308
 309 Basic block tracing is currently supported only for single-threaded applications.
 310
 311
 312 Tracing PCs
 313 ===========
 314 *Experimental* feature similar to tracing basic blocks, but with a different API.
 315 With ``-fsanitize-coverage=trace-pc`` the compiler will insert
 316 ``__sanitizer_cov_trace_pc()`` on every edge.
 317 With an additional ``...=trace-pc,indirect-calls`` flag
 318 ``__sanitizer_cov_trace_pc_indirect(void *callee)`` will be inserted on every indirect call.
 319 These callbacks are not implemented in the Sanitizer run-time and should be defined
 320 by the user. So, these flags do not require the other sanitizer to be used.
 321 This mechanism is used for fuzzing the Linux kernel (https://github.com/google/syzkaller)
 322 and can be used with `AFL <http://lcamtuf.coredump.cx/afl>`__.
 323
 324 Tracing data flow
 325 =================
 326
 327 An *experimental* feature to support data-flow-guided fuzzing.
 328 With ``-fsanitize-coverage=trace-cmp`` the compiler will insert extra instrumentation
 329 around comparison instructions and switch statements.
 330 The fuzzer will need to define the following functions,
 331 they will be called by the instrumented code.
 332
 333 .. code-block:: c++
 334
 335   // Called before a comparison instruction.
 336   // SizeAndType is a packed value containing
 337   //   - [63:32] the Size of the operands of comparison in bits
 338   //   - [31:0] the Type of comparison (one of ICMP_EQ, ... ICMP_SLE)
 339   // Arg1 and Arg2 are arguments of the comparison.
 340   void __sanitizer_cov_trace_cmp(uint64_t SizeAndType, uint64_t Arg1, uint64_t Arg2);
 341
 342   // Called before a switch statement.
 343   // Val is the switch operand.
 344   // Cases[0] is the number of case constants.
 345   // Cases[1] is the size of Val in bits.
 346   // Cases[2:] are the case constants.
 347   void __sanitizer_cov_trace_switch(uint64_t Val, uint64_t *Cases);
 348
 349 This interface is a subject to change.
 350 The current implementation is not thread-safe and thus can be safely used only for single-threaded targets.
 351
 352 Output directory
 353 ================
 354
 355 By default, .sancov files are created in the current working directory.
 356 This can be changed with ``ASAN_OPTIONS=coverage_dir=/path``:
 357
 358 .. code-block:: console
 359
 360     % ASAN_OPTIONS="coverage=1:coverage_dir=/tmp/cov" ./a.out foo
 361     % ls -l /tmp/cov/*sancov
 362     -rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov
 363     -rw-r----- 1 kcc eng 8 Nov 27 12:21 a.out.22679.sancov
 364
 365 Sudden death
 366 ============
 367
 368 Normally, coverage data is collected in memory and saved to disk when the
 369 program exits (with an ``atexit()`` handler), when a SIGSEGV is caught, or when
 370 ``__sanitizer_cov_dump()`` is called.
 371
 372 If the program ends with a signal that ASan does not handle (or can not handle
 373 at all, like SIGKILL), coverage data will be lost. This is a big problem on
 374 Android, where SIGKILL is a normal way of evicting applications from memory.
 375
 376 With ``ASAN_OPTIONS=coverage=1:coverage_direct=1`` coverage data is written to a
 377 memory-mapped file as soon as it collected.
 378
 379 .. code-block:: console
 380
 381     % ASAN_OPTIONS="coverage=1:coverage_direct=1" ./a.out
 382     main
 383     % ls
 384     7036.sancov.map  7036.sancov.raw  a.out
 385     % sancov.py rawunpack 7036.sancov.raw
 386     sancov.py: reading map 7036.sancov.map
 387     sancov.py: unpacking 7036.sancov.raw
 388     writing 1 PCs to a.out.7036.sancov
 389     % sancov.py print a.out.7036.sancov
 390     sancov.py: read 1 PCs from a.out.7036.sancov
 391     sancov.py: 1 files merged; 1 PCs total
 392     0x4b2bae
 393
 394 Note that on 64-bit platforms, this method writes 2x more data than the default,
 395 because it stores full PC values instead of 32-bit offsets.
 396
 397 In-process fuzzing
 398 ==================
 399
 400 Coverage data could be useful for fuzzers and sometimes it is preferable to run
 401 a fuzzer in the same process as the code being fuzzed (in-process fuzzer).
 402
 403 You can use ``__sanitizer_get_total_unique_coverage()`` from
 404 ``<sanitizer/coverage_interface.h>`` which returns the number of currently
 405 covered entities in the program. This will tell the fuzzer if the coverage has
 406 increased after testing every new input.
 407
 408 If a fuzzer finds a bug in the ASan run, you will need to save the reproducer
 409 before exiting the process.  Use ``__asan_set_death_callback`` from
 410 ``<sanitizer/asan_interface.h>`` to do that.
 411
 412 An example of such fuzzer can be found in `the LLVM tree
 413 <http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Fuzzer/README.txt?view=markup>`_.
 414
 415 Performance
 416 ===========
 417
 418 This coverage implementation is **fast**. With function-level coverage
 419 (``-fsanitize-coverage=func``) the overhead is not measurable. With
 420 basic-block-level coverage (``-fsanitize-coverage=bb``) the overhead varies
 421 between 0 and 25%.
 422
 423 ==============  =========  =========  =========  =========  =========  =========
 424      benchmark      cov0        cov1   diff 0-1       cov2   diff 0-2   diff 1-2
 425 ==============  =========  =========  =========  =========  =========  =========
 426  400.perlbench    1296.00    1307.00       1.01    1465.00       1.13       1.12
 427      401.bzip2     858.00     854.00       1.00    1010.00       1.18       1.18
 428        403.gcc     613.00     617.00       1.01     683.00       1.11       1.11
 429        429.mcf     605.00     582.00       0.96     610.00       1.01       1.05
 430      445.gobmk     896.00     880.00       0.98    1050.00       1.17       1.19
 431      456.hmmer     892.00     892.00       1.00     918.00       1.03       1.03
 432      458.sjeng     995.00    1009.00       1.01    1217.00       1.22       1.21
 433 462.libquantum     497.00     492.00       0.99     534.00       1.07       1.09
 434    464.h264ref    1461.00    1467.00       1.00    1543.00       1.06       1.05
 435    471.omnetpp     575.00     590.00       1.03     660.00       1.15       1.12
 436      473.astar     658.00     652.00       0.99     715.00       1.09       1.10
 437  483.xalancbmk     471.00     491.00       1.04     582.00       1.24       1.19
 438       433.milc     616.00     627.00       1.02     627.00       1.02       1.00
 439       444.namd     602.00     601.00       1.00     654.00       1.09       1.09
 440     447.dealII     630.00     634.00       1.01     653.00       1.04       1.03
 441     450.soplex     365.00     368.00       1.01     395.00       1.08       1.07
 442     453.povray     427.00     434.00       1.02     495.00       1.16       1.14
 443        470.lbm     357.00     375.00       1.05     370.00       1.04       0.99
 444    482.sphinx3     927.00     928.00       1.00    1000.00       1.08       1.08
 445 ==============  =========  =========  =========  =========  =========  =========
 446
 447 Why another coverage?
 448 =====================
 449
 450 Why did we implement yet another code coverage?
 451   * We needed something that is lightning fast, plays well with
 452     AddressSanitizer, and does not significantly increase the binary size.
 453   * Traditional coverage implementations based in global counters
 454     `suffer from contention on counters
 455     <https://groups.google.com/forum/#!topic/llvm-dev/cDqYgnxNEhY>`_.