www/analyzer/checker_dev_manual.html

   1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
   2           "http://www.w3.org/TR/html4/strict.dtd">
   3 <html>
   4 <head>
   5   <title>Checker Developer Manual</title>
   6   <link type="text/css" rel="stylesheet" href="menu.css">
   7   <link type="text/css" rel="stylesheet" href="content.css">
   8   <script type="text/javascript" src="scripts/menu.js"></script>
   9 </head>
  10 <body>
  11
  12 <div id="page">
  13 <!--#include virtual="menu.html.incl"-->
  14
  15 <div id="content">
  16
  17 <h3 style="color:red">This Page Is Under Construction</h3>
  18
  19 <h1>Checker Developer Manual</h1>
  20
  21 <p>The static analyzer engine performs path-sensitive exploration of the program and
  22 relies on a set of checkers to implement the logic for detecting and
  23 constructing specific bug reports. Anyone who is interested in implementing their own
  24 checker, should check out the Building a Checker in 24 Hours talk
  25 (<a href="http://llvm.org/devmtg/2012-11/Zaks-Rose-Checker24Hours.pdf">slides</a>
  26  <a href="https://youtu.be/kdxlsP5QVPw">video</a>)
  27 and refer to this page for additional information on writing a checker. The static analyzer is a
  28 part of the Clang project, so consult <a href="http://clang.llvm.org/hacking.html">Hacking on Clang</a>
  29 and <a href="http://llvm.org/docs/ProgrammersManual.html">LLVM Programmer's Manual</a>
  30 for developer guidelines and send your questions and proposals to
  31 <a href=http://lists.llvm.org/mailman/listinfo/cfe-dev>cfe-dev mailing list</a>.
  32 </p>
  33
  34     <ul>
  35       <li><a href="#start">Getting Started</a></li>
  36       <li><a href="#analyzer">Static Analyzer Overview</a>
  37       <ul>
  38         <li><a href="#interaction">Interaction with Checkers</a></li>
  39         <li><a href="#values">Representing Values</a></li>
  40       </ul></li>
  41       <li><a href="#idea">Idea for a Checker</a></li>
  42       <li><a href="#registration">Checker Registration</a></li>
  43       <li><a href="#events_callbacks">Events, Callbacks, and Checker Class Structure</a></li>
  44       <li><a href="#extendingstates">Custom Program States</a></li>
  45       <li><a href="#bugs">Bug Reports</a></li>
  46       <li><a href="#ast">AST Visitors</a></li>
  47       <li><a href="#testing">Testing</a></li>
  48       <li><a href="#commands">Useful Commands/Debugging Hints</a>
  49       <ul>
  50         <li><a href="#attaching">Attaching the Debugger</a></li>
  51         <li><a href="#narrowing">Narrowing Down the Problem</a></li>
  52         <li><a href="#visualizing">Visualizing the Analysis</a></li>
  53         <li><a href="#debugprints">Debug Prints and Tricks</a></li>
  54       </ul></li>
  55       <li><a href="#additioninformation">Additional Sources of Information</a></li>
  56       <li><a href="#links">Useful Links</a></li>
  57     </ul>
  58
  59 <h2 id=start>Getting Started</h2>
  60   <ul>
  61     <li>To check out the source code and build the project, follow steps 1-4 of
  62     the <a href="http://clang.llvm.org/get_started.html">Clang Getting Started</a>
  63   page.</li>
  64
  65     <li>The analyzer source code is located under the Clang source tree:
  66     <br><tt>
  67     $ <b>cd llvm/tools/clang</b>
  68     </tt>
  69     <br>See: <tt>include/clang/StaticAnalyzer</tt>, <tt>lib/StaticAnalyzer</tt>,
  70      <tt>test/Analysis</tt>.</li>
  71
  72     <li>The analyzer regression tests can be executed from the Clang's build
  73     directory:
  74     <br><tt>
  75     $ <b>cd ../../../; cd build/tools/clang; TESTDIRS=Analysis make test</b>
  76     </tt></li>
  77
  78     <li>Analyze a file with the specified checker:
  79     <br><tt>
  80     $ <b>clang -cc1 -analyze -analyzer-checker=core.DivideZero test.c</b>
  81     </tt></li>
  82
  83     <li>List the available checkers:
  84     <br><tt>
  85     $ <b>clang -cc1 -analyzer-checker-help</b>
  86     </tt></li>
  87
  88     <li>See the analyzer help for different output formats, fine tuning, and
  89     debug options:
  90     <br><tt>
  91     $ <b>clang -cc1 -help | grep "analyzer"</b>
  92     </tt></li>
  93
  94   </ul>
  95
  96 <h2 id=analyzer>Static Analyzer Overview</h2>
  97   The analyzer core performs symbolic execution of the given program. All the
  98   input values are represented with symbolic values; further, the engine deduces
  99   the values of all the expressions in the program based on the input symbols
 100   and the path. The execution is path sensitive and every possible path through
 101   the program is explored. The explored execution traces are represented with
 102   <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1ExplodedGraph.html">ExplodedGraph</a> object.
 103   Each node of the graph is
 104   <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1ExplodedNode.html">ExplodedNode</a>,
 105   which consists of a <tt>ProgramPoint</tt> and a <tt>ProgramState</tt>.
 106   <p>
 107   <a href="http://clang.llvm.org/doxygen/classclang_1_1ProgramPoint.html">ProgramPoint</a>
 108   represents the corresponding location in the program (or the CFG).
 109   <tt>ProgramPoint</tt> is also used to record additional information on
 110   when/how the state was added. For example, <tt>PostPurgeDeadSymbolsKind</tt>
 111   kind means that the state is the result of purging dead symbols - the
 112   analyzer's equivalent of garbage collection.
 113   <p>
 114   <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1ProgramState.html">ProgramState</a>
 115   represents abstract state of the program. It consists of:
 116   <ul>
 117     <li><tt>Environment</tt> - a mapping from source code expressions to symbolic
 118     values
 119     <li><tt>Store</tt> - a mapping from memory locations to symbolic values
 120     <li><tt>GenericDataMap</tt> - constraints on symbolic values
 121   </ul>
 122
 123   <h3 id=interaction>Interaction with Checkers</h3>
 124
 125   <p>
 126   Checkers are not merely passive receivers of the analyzer core changes - they
 127   actively participate in the <tt>ProgramState</tt> construction through the
 128   <tt>GenericDataMap</tt> which can be used to store the checker-defined part
 129   of the state. Each time the analyzer engine explores a new statement, it
 130   notifies each checker registered to listen for that statement, giving it an
 131   opportunity to either report a bug or modify the state. (As a rule of thumb,
 132   the checker itself should be stateless.) The checkers are called one after another
 133   in the predefined order; thus, calling all the checkers adds a chain to the
 134   <tt>ExplodedGraph</tt>.
 135   </p>
 136
 137   <h3 id=values>Representing Values</h3>
 138
 139   <p>
 140   During symbolic execution, <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1SVal.html">SVal</a>
 141   objects are used to represent the semantic evaluation of expressions.
 142   They can represent things like concrete
 143   integers, symbolic values, or memory locations (which are memory regions).
 144   They are a discriminated union of "values", symbolic and otherwise.
 145   If a value isn't symbolic, usually that means there is no symbolic
 146   information to track. For example, if the value was an integer, such as
 147   <tt>42</tt>, it would be a <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1nonloc_1_1ConcreteInt.html">ConcreteInt</a>,
 148   and the checker doesn't usually need to track any state with the concrete
 149   number. In some cases, <tt>SVal</tt> is not a symbol, but it really should be
 150   a symbolic value. This happens when the analyzer cannot reason about something
 151   (yet). An example is floating point numbers. In such cases, the
 152   <tt>SVal</tt> will evaluate to <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1UnknownVal.html">UnknownVal</a>.
 153   This represents a case that is outside the realm of the analyzer's reasoning
 154   capabilities. <tt>SVals</tt> are value objects and their values can be viewed
 155   using the <tt>.dump()</tt> method. Often they wrap persistent objects such as
 156   symbols or regions.
 157   </p>
 158
 159   <p>
 160   <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1SymExpr.html">SymExpr</a> (symbol)
 161   is meant to represent abstract, but named, symbolic value. Symbols represent
 162   an actual (immutable) value. We might not know what its specific value is, but
 163   we can associate constraints with that value as we analyze a path. For
 164   example, we might record that the value of a symbol is greater than
 165   <tt>0</tt>, etc.
 166   </p>
 167
 168   <p>
 169   <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1MemRegion.html">MemRegion</a> is similar to a symbol.
 170   It is used to provide a lexicon of how to describe abstract memory. Regions can
 171   layer on top of other regions, providing a layered approach to representing memory.
 172   For example, a struct object on the stack might be represented by a <tt>VarRegion</tt>,
 173   but a <tt>FieldRegion</tt> which is a subregion of the <tt>VarRegion</tt> could
 174   be used to represent the memory associated with a specific field of that object.
 175   So how do we represent symbolic memory regions? That's what
 176   <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1SymbolicRegion.html">SymbolicRegion</a>
 177   is for. It is a <tt>MemRegion</tt> that has an associated symbol. Since the
 178   symbol is unique and has a unique name; that symbol names the region.
 179   </p>
 180
 181   <p>
 182   Let's see how the analyzer processes the expressions in the following example:
 183   </p>
 184
 185   <p>
 186   <pre class="code_example">
 187   int foo(int x) {
 188      int y = x * 2;
 189      int z = x;
 190      ...
 191   }
 192   </pre>
 193   </p>
 194
 195   <p>
 196 Let's look at how <tt>x*2</tt> gets evaluated. When <tt>x</tt> is evaluated,
 197 we first construct an <tt>SVal</tt> that represents the lvalue of <tt>x</tt>, in
 198 this case it is an <tt>SVal</tt> that references the <tt>MemRegion</tt> for <tt>x</tt>.
 199 Afterwards, when we do the lvalue-to-rvalue conversion, we get a new <tt>SVal</tt>,
 200 which references the value <b>currently bound</b> to <tt>x</tt>. That value is
 201 symbolic; it's whatever <tt>x</tt> was bound to at the start of the function.
 202 Let's call that symbol <tt>$0</tt>. Similarly, we evaluate the expression for <tt>2</tt>,
 203 and get an <tt>SVal</tt> that references the concrete number <tt>2</tt>. When
 204 we evaluate <tt>x*2</tt>, we take the two <tt>SVals</tt> of the subexpressions,
 205 and create a new <tt>SVal</tt> that represents their multiplication (which in
 206 this case is a new symbolic expression, which we might call <tt>$1</tt>). When we
 207 evaluate the assignment to <tt>y</tt>, we again compute its lvalue (a <tt>MemRegion</tt>),
 208 and then bind the <tt>SVal</tt> for the RHS (which references the symbolic value <tt>$1</tt>)
 209 to the <tt>MemRegion</tt> in the symbolic store.
 210 <br>
 211 The second line is similar. When we evaluate <tt>x</tt> again, we do the same
 212 dance, and create an <tt>SVal</tt> that references the symbol <tt>$0</tt>. Note, two <tt>SVals</tt>
 213 might reference the same underlying values.
 214   </p>
 215
 216 <p>
 217 To summarize, MemRegions are unique names for blocks of memory. Symbols are
 218 unique names for abstract symbolic values. Some MemRegions represents abstract
 219 symbolic chunks of memory, and thus are also based on symbols. SVals are just
 220 references to values, and can reference either MemRegions, Symbols, or concrete
 221 values (e.g., the number 1).
 222 </p>
 223
 224   <!--
 225   TODO: Add a picture.
 226   <br>
 227   Symbols<br>
 228   FunctionalObjects are used throughout.
 229   -->
 230
 231 <h2 id=idea>Idea for a Checker</h2>
 232   Here are several questions which you should consider when evaluating your
 233   checker idea:
 234   <ul>
 235     <li>Can the check be effectively implemented without path-sensitive
 236     analysis? See <a href="#ast">AST Visitors</a>.</li>
 237
 238     <li>How high the false positive rate is going to be? Looking at the occurrences
 239     of the issue you want to write a checker for in the existing code bases might
 240     give you some ideas. </li>
 241
 242     <li>How the current limitations of the analysis will effect the false alarm
 243     rate? Currently, the analyzer only reasons about one procedure at a time (no
 244     inter-procedural analysis). Also, it uses a simple range tracking based
 245     solver to model symbolic execution.</li>
 246
 247     <li>Consult the <a
 248     href="http://llvm.org/bugs/buglist.cgi?query_format=advanced&amp;bug_status=NEW&amp;bug_status=REOPENED&amp;version=trunk&amp;component=Static%20Analyzer&amp;product=clang">Bugzilla database</a>
 249     to get some ideas for new checkers and consider starting with improving/fixing
 250     bugs in the existing checkers.</li>
 251   </ul>
 252
 253 <p>Once an idea for a checker has been chosen, there are two key decisions that
 254 need to be made:
 255   <ul>
 256     <li> Which events the checker should be tracking. This is discussed in more
 257     detail in the section <a href="#events_callbacks">Events, Callbacks, and
 258     Checker Class Structure</a>.
 259     <li> What checker-specific data needs to be stored as part of the program
 260     state (if any). This should be minimized as much as possible. More detail about
 261     implementing custom program state is given in section <a
 262     href="#extendingstates">Custom Program States</a>.
 263   </ul>
 264
 265
 266 <h2 id=registration>Checker Registration</h2>
 267   All checker implementation files are located in
 268   <tt>clang/lib/StaticAnalyzer/Checkers</tt> folder. The steps below describe
 269   how the checker <tt>SimpleStreamChecker</tt>, which checks for misuses of
 270   stream APIs, was registered with the analyzer.
 271   Similar steps should be followed for a new checker.
 272 <ol>
 273   <li>A new checker implementation file, <tt>SimpleStreamChecker.cpp</tt>, was
 274   created in the directory <tt>lib/StaticAnalyzer/Checkers</tt>.
 275   <li>The following registration code was added to the implementation file:
 276 <pre class="code_example">
 277 void ento::registerSimpleStreamChecker(CheckerManager &amp;mgr) {
 278   mgr.registerChecker&lt;SimpleStreamChecker&gt();
 279 }
 280 </pre>
 281 <li>A package was selected for the checker and the checker was defined in the
 282 table of checkers at <tt>include/clang/StaticAnalyzer/Checkers/Checkers.td</tt>.
 283 Since all checkers should first be developed as "alpha", and the SimpleStreamChecker
 284 performs UNIX API checks, the correct package is "alpha.unix", and the following
 285 was added to the corresponding <tt>UnixAlpha</tt> section of <tt>Checkers.td</tt>:
 286 <pre class="code_example">
 287 let ParentPackage = UnixAlpha in {
 288 ...
 289 def SimpleStreamChecker : Checker<"SimpleStream">,
 290   HelpText<"Check for misuses of stream APIs">,
 291   DescFile<"SimpleStreamChecker.cpp">;
 292 ...
 293 } // end "alpha.unix"
 294 </pre>
 295
 296 <li>The source code file was made visible to CMake by adding it to
 297 <tt>lib/StaticAnalyzer/Checkers/CMakeLists.txt</tt>.
 298
 299 </ol>
 300
 301 After adding a new checker to the analyzer, one can verify that the new checker
 302 was successfully added by seeing if it appears in the list of available checkers:
 303 <br> <tt><b>$clang -cc1 -analyzer-checker-help</b></tt>
 304
 305 <h2 id=events_callbacks>Events, Callbacks, and Checker Class Structure</h2>
 306
 307 <p> All checkers inherit from the <tt><a
 308 href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1Checker.html">
 309 Checker</a></tt> template class; the template parameter(s) describe the type of
 310 events that the checker is interested in processing. The various types of events
 311 that are available are described in the file <a
 312 href="http://clang.llvm.org/doxygen/CheckerDocumentation_8cpp_source.html">
 313 CheckerDocumentation.cpp</a>
 314
 315 <p> For each event type requested, a corresponding callback function must be
 316 defined in the checker class (<a
 317 href="http://clang.llvm.org/doxygen/CheckerDocumentation_8cpp_source.html">
 318 CheckerDocumentation.cpp</a> shows the
 319 correct function name and signature for each event type).
 320
 321 <p> As an example, consider <tt>SimpleStreamChecker</tt>. This checker needs to
 322 take action at the following times:
 323
 324 <ul>
 325 <li>Before making a call to a function, check if the function is <tt>fclose</tt>.
 326 If so, check the parameter being passed.
 327 <li>After making a function call, check if the function is <tt>fopen</tt>. If
 328 so, process the return value.
 329 <li>When values go out of scope, check whether they are still-open file
 330 descriptors, and report a bug if so. In addition, remove any information about
 331 them from the program state in order to keep the state as small as possible.
 332 <li>When file pointers "escape" (are used in a way that the analyzer can no longer
 333 track them), mark them as such. This prevents false positives in the cases where
 334 the analyzer cannot be sure whether the file was closed or not.
 335 </ul>
 336
 337 <p>These events that will be used for each of these actions are, respectively, <a
 338 href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1check_1_1PreCall.html">PreCall</a>,
 339 <a
 340 href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1check_1_1PostCall.html">PostCall</a>,
 341 <a
 342 href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1check_1_1DeadSymbols.html">DeadSymbols</a>,
 343 and <a
 344 href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1check_1_1PointerEscape.html">PointerEscape</a>.
 345 The high-level structure of the checker's class is thus:
 346
 347 <pre class="code_example">
 348 class SimpleStreamChecker : public Checker&lt;check::PreCall,
 349                                            check::PostCall,
 350                                            check::DeadSymbols,
 351                                            check::PointerEscape&gt; {
 352 public:
 353
 354   void checkPreCall(const CallEvent &amp;Call, CheckerContext &amp;C) const;
 355
 356   void checkPostCall(const CallEvent &amp;Call, CheckerContext &amp;C) const;
 357
 358   void checkDeadSymbols(SymbolReaper &amp;SR, CheckerContext &amp;C) const;
 359
 360   ProgramStateRef checkPointerEscape(ProgramStateRef State,
 361                                      const InvalidatedSymbols &amp;Escaped,
 362                                      const CallEvent *Call,
 363                                      PointerEscapeKind Kind) const;
 364 };
 365 </pre>
 366
 367 <h2 id=extendingstates>Custom Program States</h2>
 368
 369 <p> Checkers often need to keep track of information specific to the checks they
 370 perform. However, since checkers have no guarantee about the order in which the
 371 program will be explored, or even that all possible paths will be explored, this
 372 state information cannot be kept within individual checkers. Therefore, if
 373 checkers need to store custom information, they need to add new categories of
 374 data to the <tt>ProgramState</tt>. The preferred way to do so is to use one of
 375 several macros designed for this purpose. They are:
 376
 377 <ul>
 378 <li><a
 379 href="http://clang.llvm.org/doxygen/ProgramStateTrait_8h.html#ae4cddb54383cd702a045d7c61b009147">REGISTER_TRAIT_WITH_PROGRAMSTATE</a>:
 380 Used when the state information is a single value. The methods available for
 381 state types declared with this macro are <tt>get</tt>, <tt>set</tt>, and
 382 <tt>remove</tt>.
 383 <li><a
 384 href="http://clang.llvm.org/doxygen/CheckerContext_8h.html#aa27656fa0ce65b0d9ba12eb3c02e8be9">REGISTER_LIST_WITH_PROGRAMSTATE</a>:
 385 Used when the state information is a list of values. The methods available for
 386 state types declared with this macro are <tt>add</tt>, <tt>get</tt>,
 387 <tt>remove</tt>, and <tt>contains</tt>.
 388 <li><a
 389 href="http://clang.llvm.org/doxygen/CheckerContext_8h.html#ad90f9387b94b344eaaf499afec05f4d1">REGISTER_SET_WITH_PROGRAMSTATE</a>:
 390 Used when the state information is a set of values. The methods available for
 391 state types declared with this macro are <tt>add</tt>, <tt>get</tt>,
 392 <tt>remove</tt>, and <tt>contains</tt>.
 393 <li><a
 394 href="http://clang.llvm.org/doxygen/CheckerContext_8h.html#a6d1893bb8c18543337b6c363c1319fcf">REGISTER_MAP_WITH_PROGRAMSTATE</a>:
 395 Used when the state information is a map from a key to a value. The methods
 396 available for state types declared with this macro are <tt>add</tt>,
 397 <tt>set</tt>, <tt>get</tt>, <tt>remove</tt>, and <tt>contains</tt>.
 398 </ul>
 399
 400 <p>All of these macros take as parameters the name to be used for the custom
 401 category of state information and the data type(s) to be used for storage. The
 402 data type(s) specified will become the parameter type and/or return type of the
 403 methods that manipulate the new category of state information. Each of these
 404 methods are templated with the name of the custom data type.
 405
 406 <p>For example, a common case is the need to track data associated with a
 407 symbolic expression; a map type is the most logical way to implement this. The
 408 key for this map will be a pointer to a symbolic expression
 409 (<tt>SymbolRef</tt>). If the data type to be associated with the symbolic
 410 expression is an integer, then the custom category of state information would be
 411 declared as
 412
 413 <pre class="code_example">
 414 REGISTER_MAP_WITH_PROGRAMSTATE(ExampleDataType, SymbolRef, int)
 415 </pre>
 416
 417 The data would be accessed with the function
 418
 419 <pre class="code_example">
 420 ProgramStateRef state;
 421 SymbolRef Sym;
 422 ...
 423 int currentlValue = state-&gt;get&lt;ExampleDataType&gt;(Sym);
 424 </pre>
 425
 426 and set with the function
 427
 428 <pre class="code_example">
 429 ProgramStateRef state;
 430 SymbolRef Sym;
 431 int newValue;
 432 ...
 433 ProgramStateRef newState = state-&gt;set&lt;ExampleDataType&gt;(Sym, newValue);
 434 </pre>
 435
 436 <p>In addition, the macros define a data type used for storing the data of the
 437 new data category; the name of this type is the name of the data category with
 438 "Ty" appended. For <tt>REGISTER_TRAIT_WITH_PROGRAMSTATE</tt>, this will simply
 439 be passed data type; for the other three macros, this will be a specialized
 440 version of the <a
 441 href="http://llvm.org/doxygen/classllvm_1_1ImmutableList.html">llvm::ImmutableList</a>,
 442 <a
 443 href="http://llvm.org/doxygen/classllvm_1_1ImmutableSet.html">llvm::ImmutableSet</a>,
 444 or <a
 445 href="http://llvm.org/doxygen/classllvm_1_1ImmutableMap.html">llvm::ImmutableMap</a>
 446 templated class. For the <tt>ExampleDataType</tt> example above, the type
 447 created would be equivalent to writing the declaration:
 448
 449 <pre class="code_example">
 450 typedef llvm::ImmutableMap&lt;SymbolRef, int&gt; ExampleDataTypeTy;
 451 </pre>
 452
 453 <p>These macros will cover a majority of use cases; however, they still have a
 454 few limitations. They cannot be used inside namespaces (since they expand to
 455 contain top-level namespace references), and the data types that they define
 456 cannot be referenced from more than one file.
 457
 458 <p>Note that <tt>ProgramStates</tt> are immutable; instead of modifying an existing
 459 one, functions that modify the state will return a copy of the previous state
 460 with the change applied. This updated state must be then provided to the
 461 analyzer core by calling the <tt>CheckerContext::addTransition</tt> function.
 462 <h2 id=bugs>Bug Reports</h2>
 463
 464
 465 <p> When a checker detects a mistake in the analyzed code, it needs a way to
 466 report it to the analyzer core so that it can be displayed. The two classes used
 467 to construct this report are <tt><a
 468 href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1BugType.html">BugType</a></tt>
 469 and <tt><a
 470 href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1BugReport.html">
 471 BugReport</a></tt>.
 472
 473 <p>
 474 <tt>BugType</tt>, as the name would suggest, represents a type of bug. The
 475 constructor for <tt>BugType</tt> takes two parameters: The name of the bug
 476 type, and the name of the category of the bug. These are used (e.g.) in the
 477 summary page generated by the scan-build tool.
 478
 479 <P>
 480   The <tt>BugReport</tt> class represents a specific occurrence of a bug. In
 481   the most common case, three parameters are used to form a <tt>BugReport</tt>:
 482 <ol>
 483 <li>The type of bug, specified as an instance of the <tt>BugType</tt> class.
 484 <li>A short descriptive string. This is placed at the location of the bug in
 485 the detailed line-by-line output generated by scan-build.
 486 <li>The context in which the bug occurred. This includes both the location of
 487 the bug in the program and the program's state when the location is reached. These are
 488 both encapsulated in an <tt>ExplodedNode</tt>.
 489 </ol>
 490
 491 <p>In order to obtain the correct <tt>ExplodedNode</tt>, a decision must be made
 492 as to whether or not analysis can continue along the current path. This decision
 493 is based on whether the detected bug is one that would prevent the program under
 494 analysis from continuing. For example, leaking of a resource should not stop
 495 analysis, as the program can continue to run after the leak. Dereferencing a
 496 null pointer, on the other hand, should stop analysis, as there is no way for
 497 the program to meaningfully continue after such an error.
 498
 499 <p>If analysis can continue, then the most recent <tt>ExplodedNode</tt>
 500 generated by the checker can be passed to the <tt>BugReport</tt> constructor
 501 without additional modification. This <tt>ExplodedNode</tt> will be the one
 502 returned by the most recent call to <a
 503 href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1CheckerContext.html#a264f48d97809707049689c37aa35af78">CheckerContext::addTransition</a>.
 504 If no transition has been performed during the current callback, the checker should call <a
 505 href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1CheckerContext.html#a264f48d97809707049689c37aa35af78">CheckerContext::addTransition()</a>
 506 and use the returned node for bug reporting.
 507
 508 <p>If analysis can not continue, then the current state should be transitioned
 509 into a so-called <i>sink node</i>, a node from which no further analysis will be
 510 performed. This is done by calling the <a
 511 href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1CheckerContext.html#adeea33a5a2bed190210c4a2bb807a6f0">
 512 CheckerContext::generateSink</a> function; this function is the same as the
 513 <tt>addTransition</tt> function, but marks the state as a sink node. Like
 514 <tt>addTransition</tt>, this returns an <tt>ExplodedNode</tt> with the updated
 515 state, which can then be passed to the <tt>BugReport</tt> constructor.
 516
 517 <p>
 518 After a <tt>BugReport</tt> is created, it should be passed to the analyzer core
 519 by calling <a href = "http://clang.llvm.org/doxygen/classclang_1_1ento_1_1CheckerContext.html#ae7738af2cbfd1d713edec33d3203dff5">CheckerContext::emitReport</a>.
 520
 521 <h2 id=ast>AST Visitors</h2>
 522   Some checks might not require path-sensitivity to be effective. Simple AST walk
 523   might be sufficient. If that is the case, consider implementing a Clang
 524   compiler warning. On the other hand, a check might not be acceptable as a compiler
 525   warning; for example, because of a relatively high false positive rate. In this
 526   situation, AST callbacks <tt><b>checkASTDecl</b></tt> and
 527   <tt><b>checkASTCodeBody</b></tt> are your best friends.
 528
 529 <h2 id=testing>Testing</h2>
 530   Every patch should be well tested with Clang regression tests. The checker tests
 531   live in <tt>clang/test/Analysis</tt> folder. To run all of the analyzer tests,
 532   execute the following from the <tt>clang</tt> build directory:
 533     <pre class="code">
 534     $ <b>bin/llvm-lit -sv ../llvm/tools/clang/test/Analysis</b>
 535     </pre>
 536
 537 <h2 id=commands>Useful Commands/Debugging Hints</h2>
 538
 539 <h3 id=attaching>Attaching the Debugger</h3>
 540
 541 <p>When your command contains the <tt><b>-cc1</b></tt> flag, you can attach the
 542 debugger to it directly:</p>
 543
 544 <pre class="code">
 545     $ <b>gdb --args clang -cc1 -analyze -analyzer-checker=core test.c</b>
 546     $ <b>lldb -- clang -cc1 -analyze -analyzer-checker=core test.c</b>
 547 </pre>
 548
 549 <p>
 550 Otherwise, if your command line contains <tt><b>--analyze</b></tt>,
 551 the actual clang instance would be run in a separate process. In
 552 order to debug it, use the <tt><b>-###</b></tt> flag for obtaining
 553 the command line of the child process:
 554 </p>
 555
 556 <pre class="code">
 557     $ <b>clang --analyze test.c -\#\#\#</b>
 558 </pre>
 559
 560 <p>
 561 Below we describe a few useful command line arguments, all of which assume that
 562 you are running <tt><b>clang -cc1</b></tt>.
 563 </p>
 564
 565 <h3 id=narrowing>Narrowing Down the Problem</h3>
 566
 567 <p>While investigating a checker-related issue, instruct the analyzer to only
 568 execute a single checker:
 569 </p>
 570 <pre class="code">
 571     $ <b>clang -cc1 -analyze -analyzer-checker=osx.KeychainAPI test.c</b>
 572 </pre>
 573
 574 <p>If you are experiencing a crash, to see which function is failing while
 575 processing a large file use the  <tt><b>-analyzer-display-progress</b></tt>
 576 option.</p>
 577
 578 <p>To selectively analyze only the given function, use the
 579 <tt><b>-analyze-function</b></tt> option:</p>
 580 <pre class="code">
 581     $ <b>clang -cc1 -analyze -analyzer-checker=core test.c -analyzer-display-progress</b>
 582     ANALYZE (Syntax): test.c foo
 583     ANALYZE (Syntax): test.c bar
 584     ANALYZE (Path,  Inline_Regular): test.c bar
 585     ANALYZE (Path,  Inline_Regular): test.c foo
 586     $ <b>clang -cc1 -analyze -analyzer-checker=core test.c -analyzer-display-progress -analyze-function=foo</b>
 587     ANALYZE (Syntax): test.c foo
 588     ANALYZE (Path,  Inline_Regular): test.c foo
 589 </pre>
 590
 591 <b>Note: </b> a fully qualified function name has to be used when selecting
 592 C++ functions and methods, Objective-C methods and blocks, e.g.:
 593
 594 <pre class="code">
 595     $ <b>clang -cc1 -analyze -analyzer-checker=core test.cc -analyze-function=foo(int)</b>
 596 </pre>
 597
 598 The fully qualified name can be found from the
 599 <tt><b>-analyzer-display-progress</b></tt> output.
 600
 601 <p>The bug reporter mechanism removes path diagnostics inside intermediate
 602 function calls that have returned by the time the bug was found and contain
 603 no interesting pieces. Usually it is up to the checkers to produce more
 604 interesting pieces by adding custom <tt>BugReporterVisitor</tt> objects.
 605 However, you can disable path pruning while debugging with the
 606 <tt><b>-analyzer-config prune-paths=false</b></tt> option.
 607
 608 <h3 id=visualizing>Visualizing the Analysis</h3>
 609
 610 <p>To dump the AST, which often helps understanding how the program should
 611 behave:</p>
 612 <pre class="code">
 613     $ <b>clang -cc1 -ast-dump test.c</b>
 614 </pre>
 615
 616 <p>To view/dump CFG use <tt>debug.ViewCFG</tt> or <tt>debug.DumpCFG</tt>
 617 checkers:</p>
 618 <pre class="code">
 619     $ <b>clang -cc1 -analyze -analyzer-checker=debug.ViewCFG test.c</b>
 620 </pre>
 621
 622 <p><tt>ExplodedGraph</tt> (the state graph explored by the analyzer) can be
 623 visualized with another debug checker:</p>
 624 <pre class="code">
 625     $ <b>clang -cc1 -analyze -analyzer-checker=debug.ViewExplodedGraph test.c</b>
 626 </pre>
 627 <p>Or, equivalently, with <tt><b>-analyzer-viz-egraph-graphviz</b></tt>
 628 option, which does the same thing - dumps the exploded graph in graphviz
 629 <tt><b>.dot</b></tt> format.</p>
 630
 631 <p>You can convert <tt><b>.dot</b></tt> files into other formats - in
 632 particular, converting to <tt><b>.svg</b></tt> and viewing in your web
 633 browser might be more comfortable than using a <tt><b>.dot</b></tt> viewer:</p>
 634 <pre class="code">
 635     $ <b>dot -Tsvg ExprEngine-501e2e.dot -o ExprEngine-501e2e.svg</b>
 636 </pre>
 637
 638 <p>The <tt><b>-trim-egraph</b></tt> option removes all paths except those
 639 leading to bug reports from the exploded graph dump. This is useful
 640 because exploded graphs are often huge and hard to navigate.</p>
 641
 642 <p>Viewing <tt>ExplodedGraph</tt> is your most powerful tool for understanding
 643 the analyzer's false positives, because it gives comprehensive information
 644 on every decision made by the analyzer across all analysis paths.</p>
 645
 646 <p>There are more debug checkers available. To see all available debug checkers:
 647 </p>
 648 <pre class="code">
 649     $ <b>clang -cc1 -analyzer-checker-help | grep "debug"</b>
 650 </pre>
 651
 652 <h3 id=debugprints>Debug Prints and Tricks</h3>
 653
 654 <p>To view "half-baked" <tt>ExplodedGraph</tt> while debugging, jump to a frame
 655 that has <tt>clang::ento::ExprEngine</tt> object and execute:</p>
 656 <pre class="code">
 657     (gdb) <b>p ViewGraph(0)</b>
 658 </pre>
 659
 660 <p>To see the <tt>ProgramState</tt> while debugging use the following command.
 661 <pre class="code">
 662     (gdb) <b>p State->dump()</b>
 663 </pre>
 664
 665 <p>To see <tt>clang::Expr</tt> while debugging use the following command. If you
 666 pass in a <tt>SourceManager</tt> object, it will also dump the corresponding line in the
 667 source code.</p>
 668 <pre class="code">
 669     (gdb) <b>p E->dump()</b>
 670 </pre>
 671
 672 <p>To dump AST of a method that the current <tt>ExplodedNode</tt> belongs
 673 to:</p>
 674 <pre class="code">
 675     (gdb) <b>p C.getPredecessor()->getCodeDecl().getBody()->dump()</b>
 676 </pre>
 677
 678 <h2 id=additioninformation>Additional Sources of Information</h2>
 679
 680 Here are some additional resources that are useful when working on the Clang
 681 Static Analyzer:
 682
 683 <ul>
 684 <li> <a href="http://clang.llvm.org/doxygen">Clang doxygen</a>. Contains
 685 up-to-date documentation about the APIs available in Clang. Relevant entries
 686 have been linked throughout this page. Also of use is the
 687 <a href="http://llvm.org/doxygen">LLVM doxygen</a>, when dealing with classes
 688 from LLVM.
 689 <li> The <a href="http://lists.llvm.org/mailman/listinfo/cfe-dev">
 690 cfe-dev mailing list</a>. This is the primary mailing list used for
 691 discussion of Clang development (including static code analysis). The
 692 <a href="http://lists.llvm.org/pipermail/cfe-dev">archive</a> also contains
 693 a lot of information.
 694 <li> The "Building a Checker in 24 hours" presentation given at the <a
 695 href="http://llvm.org/devmtg/2012-11">November 2012 LLVM Developer's
 696 meeting</a>. Describes the construction of SimpleStreamChecker. <a
 697 href="http://llvm.org/devmtg/2012-11/Zaks-Rose-Checker24Hours.pdf">Slides</a>
 698 and <a
 699 href="https://youtu.be/kdxlsP5QVPw">video</a>
 700 are available.
 701 </ul>
 702
 703 <h2 id=links>Useful Links</h2>
 704 <ul>
 705 <li>The list of <a href="implicit_checks.html">Implicit Checkers</a></li>
 706 </ul>
 707
 708 </div>
 709 </div>
 710 </body>
 711 </html>