www/analyzer/checker_dev_manual.html

   1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
   2           "http://www.w3.org/TR/html4/strict.dtd">
   3 <html>
   4 <head>
   5   <title>Checker Developer Manual</title>
   6   <link type="text/css" rel="stylesheet" href="menu.css">
   7   <link type="text/css" rel="stylesheet" href="content.css">
   8   <script type="text/javascript" src="scripts/menu.js"></script>
   9 </head>
  10 <body>
  11
  12 <div id="page">
  13 <!--#include virtual="menu.html.incl"-->
  14
  15 <div id="content">
  16
  17 <h1 style="color:red">This Page Is Under Construction</h1>
  18
  19 <h1>Checker Developer Manual</h1>
  20
  21 <p>The static analyzer engine performs path-sensitive exploration of the program and
  22 relies on a set of checkers to implement the logic for detecting and
  23 constructing specific bug reports. Anyone who is interested in implementing their own
  24 checker, should check out the Building a Checker in 24 Hours talk
  25 (<a href="http://llvm.org/devmtg/2012-11/Zaks-Rose-Checker24Hours.pdf">slides</a>
  26  <a href="http://llvm.org/devmtg/2012-11/videos/Zaks-Rose-Checker24Hours.mp4">video</a>)
  27 and refer to this page for additional information on writing a checker. The static analyzer is a
  28 part of the Clang project, so consult <a href="http://clang.llvm.org/hacking.html">Hacking on Clang</a>
  29 and <a href="http://llvm.org/docs/ProgrammersManual.html">LLVM Programmer's Manual</a>
  30 for developer guidelines and send your questions and proposals to
  31 <a href=http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev>cfe-dev mailing list</a>.
  32 </p>
  33
  34     <ul>
  35       <li><a href="#start">Getting Started</a></li>
  36       <li><a href="#analyzer">Analyzer Overview</a></li>
  37       <li><a href="#idea">Idea for a Checker</a></li>
  38       <li><a href="#registration">Checker Registration</a></li>
  39       <li><a href="#skeleton">Checker Skeleton</a></li>
  40       <li><a href="#node">Exploded Node</a></li>
  41       <li><a href="#bugs">Bug Reports</a></li>
  42       <li><a href="#ast">AST Visitors</a></li>
  43       <li><a href="#testing">Testing</a></li>
  44       <li><a href="#commands">Useful Commands</a></li>
  45     </ul>
  46
  47 <h2 id=start>Getting Started</h2>
  48   <ul>
  49     <li>To check out the source code and build the project, follow steps 1-4 of
  50     the <a href="http://clang.llvm.org/get_started.html">Clang Getting Started</a>
  51   page.</li>
  52
  53     <li>The analyzer source code is located under the Clang source tree:
  54     <br><tt>
  55     $ <b>cd llvm/tools/clang</b>
  56     </tt>
  57     <br>See: <tt>include/clang/StaticAnalyzer</tt>, <tt>lib/StaticAnalyzer</tt>,
  58      <tt>test/Analysis</tt>.</li>
  59
  60     <li>The analyzer regression tests can be executed from the Clang's build
  61     directory:
  62     <br><tt>
  63     $ <b>cd ../../../; cd build/tools/clang; TESTDIRS=Analysis make test</b>
  64     </tt></li>
  65
  66     <li>Analyze a file with the specified checker:
  67     <br><tt>
  68     $ <b>clang -cc1 -analyze -analyzer-checker=core.DivideZero test.c</b>
  69     </tt></li>
  70
  71     <li>List the available checkers:
  72     <br><tt>
  73     $ <b>clang -cc1 -analyzer-checker-help</b>
  74     </tt></li>
  75
  76     <li>See the analyzer help for different output formats, fine tuning, and
  77     debug options:
  78     <br><tt>
  79     $ <b>clang -cc1 -help | grep "analyzer"</b>
  80     </tt></li>
  81
  82   </ul>
  83
  84 <h2 id=analyzer>Static Analyzer Overview</h2>
  85   The analyzer core performs symbolic execution of the given program. All the
  86   input values are represented with symbolic values; further, the engine deduces
  87   the values of all the expressions in the program based on the input symbols
  88   and the path. The execution is path sensitive and every possible path through
  89   the program is explored. The explored execution traces are represented with
  90   <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1ExplodedGraph.html">ExplodedGraph</a> object.
  91   Each node of the graph is
  92   <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1ExplodedNode.html">ExplodedNode</a>,
  93   which consists of a <tt>ProgramPoint</tt> and a <tt>ProgramState</tt>.
  94   <p>
  95   <a href="http://clang.llvm.org/doxygen/classclang_1_1ProgramPoint.html">ProgramPoint</a>
  96   represents the corresponding location in the program (or the CFG graph).
  97   <tt>ProgramPoint</tt> is also used to record additional information on
  98   when/how the state was added. For example, <tt>PostPurgeDeadSymbolsKind</tt>
  99   kind means that the state is the result of purging dead symbols - the
 100   analyzer's equivalent of garbage collection.
 101   <p>
 102   <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1ProgramState.html">ProgramState</a>
 103   represents abstract state of the program. It consists of:
 104   <ul>
 105     <li><tt>Environment</tt> - a mapping from source code expressions to symbolic
 106     values
 107     <li><tt>Store</tt> - a mapping from memory locations to symbolic values
 108     <li><tt>GenericDataMap</tt> - constraints on symbolic values
 109   </ul>
 110
 111   <h3>Interaction with Checkers</h3>
 112   Checkers are not merely passive receivers of the analyzer core changes - they
 113   actively participate in the <tt>ProgramState</tt> construction through the
 114   <tt>GenericDataMap</tt> which can be used to store the checker-defined part
 115   of the state. Each time the analyzer engine explores a new statement, it
 116   notifies each checker registered to listen for that statement, giving it an
 117   opportunity to either report a bug or modify the state. (As a rule of thumb,
 118   the checker itself should be stateless.) The checkers are called one after another
 119   in the predefined order; thus, calling all the checkers adds a chain to the
 120   <tt>ExplodedGraph</tt>.
 121
 122   <h3>Representing Values</h3>
 123   During symbolic execution, <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1SVal.html">SVal</a>
 124   objects are used to represent the semantic evaluation of expressions.
 125   They can represent things like concrete
 126   integers, symbolic values, or memory locations (which are memory regions).
 127   They are a discriminated union of "values", symbolic and otherwise.
 128   If a value isn't symbolic, usually that means there is no symbolic
 129   information to track. For example, if the value was an integer, such as
 130   <tt>42</tt>, it would be a <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1nonloc_1_1ConcreteInt.html">ConcreteInt</a>,
 131   and the checker doesn't usually need to track any state with the concrete
 132   number. In some cases, <tt>SVal</tt> is not a symbol, but it really should be
 133   a symbolic value. This happens when the analyzer cannot reason about something
 134   (yet). An example is floating point numbers. In such cases, the
 135   <tt>SVal</tt> will evaluate to <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1UnknownVal.html">UnknownVal<a>.
 136   This represents a case that is outside the realm of the analyzer's reasoning
 137   capabilities. <tt>SVals</tt> are value objects and their values can be viewed
 138   using the <tt>.dump()</tt> method. Often they wrap persistent objects such as
 139   symbols or regions.
 140   <p>
 141   <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1SymExpr.html">SymExpr</a> (symbol)
 142   is meant to represent abstract, but named, symbolic value. Symbols represent
 143   an actual (immutable) value. We might not know what its specific value is, but
 144   we can associate constraints with that value as we analyze a path. For
 145   example, we might record that the value of a symbol is greater than
 146   <tt>0</tt>, etc.
 147   <p>
 148
 149   <p>
 150   <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1MemRegion.html">MemRegion</a> is similar to a symbol.
 151   It is used to provide a lexicon of how to describe abstract memory. Regions can
 152   layer on top of other regions, providing a layered approach to representing memory.
 153   For example, a struct object on the stack might be represented by a <tt>VarRegion</tt>,
 154   but a <tt>FieldRegion</tt> which is a subregion of the <tt>VarRegion</tt> could
 155   be used to represent the memory associated with a specific field of that object.
 156   So how do we represent symbolic memory regions? That's what
 157   <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1SymbolicRegion.html">SymbolicRegion</a>
 158   is for. It is a <tt>MemRegion</tt> that has an associated symbol. Since the
 159   symbol is unique and has a unique name; that symbol names the region.
 160
 161   <P>
 162   Let's see how the analyzer processes the expressions in the following example:
 163   <p>
 164   <pre class="code_example">
 165   int foo(int x) {
 166      int y = x * 2;
 167      int z = x;
 168      ...
 169   }
 170   </pre>
 171   <p>
 172 Let's look at how <tt>x*2</tt> gets evaluated. When <tt>x</tt> is evaluated,
 173 we first construct an <tt>SVal</tt> that represents the lvalue of <tt>x</tt>, in
 174 this case it is an <tt>SVal</tt> that references the <tt>MemRegion</tt> for <tt>x</tt>.
 175 Afterwards, when we do the lvalue-to-rvalue conversion, we get a new <tt>SVal</tt>,
 176 which references the value <b>currently bound</b> to <tt>x</tt>. That value is
 177 symbolic; it's whatever <tt>x</tt> was bound to at the start of the function.
 178 Let's call that symbol <tt>$0</tt>. Similarly, we evaluate the expression for <tt>2</tt>,
 179 and get an <tt>SVal</tt> that references the concrete number <tt>2</tt>. When
 180 we evaluate <tt>x*2</tt>, we take the two <tt>SVals</tt> of the subexpressions,
 181 and create a new <tt>SVal</tt> that represents their multiplication (which in
 182 this case is a new symbolic expression, which we might call <tt>$1</tt>). When we
 183 evaluate the assignment to <tt>y</tt>, we again compute its lvalue (a <tt>MemRegion</tt>),
 184 and then bind the <tt>SVal</tt> for the RHS (which references the symbolic value <tt>$1</tt>)
 185 to the <tt>MemRegion</tt> in the symbolic store.
 186 <br>
 187 The second line is similar. When we evaluate <tt>x</tt> again, we do the same
 188 dance, and create an <tt>SVal</tt> that references the symbol <tt>$0</tt>. Note, two <tt>SVals</tt>
 189 might reference the same underlying values.
 190
 191 <p>
 192 To summarize, MemRegions are unique names for blocks of memory. Symbols are
 193 unique names for abstract symbolic values. Some MemRegions represents abstract
 194 symbolic chunks of memory, and thus are also based on symbols. SVals are just
 195 references to values, and can reference either MemRegions, Symbols, or concrete
 196 values (e.g., the number 1).
 197
 198   <!--
 199   TODO: Add a picture.
 200   <br>
 201   Symbols<br>
 202   FunctionalObjects are used throughout.
 203   -->
 204 <h2 id=idea>Idea for a Checker</h2>
 205   Here are several questions which you should consider when evaluating your
 206   checker idea:
 207   <ul>
 208     <li>Can the check be effectively implemented without path-sensitive
 209     analysis? See <a href="#ast">AST Visitors</a>.</li>
 210
 211     <li>How high the false positive rate is going to be? Looking at the occurrences
 212     of the issue you want to write a checker for in the existing code bases might
 213     give you some ideas. </li>
 214
 215     <li>How the current limitations of the analysis will effect the false alarm
 216     rate? Currently, the analyzer only reasons about one procedure at a time (no
 217     inter-procedural analysis). Also, it uses a simple range tracking based
 218     solver to model symbolic execution.</li>
 219
 220     <li>Consult the <a
 221     href="http://llvm.org/bugs/buglist.cgi?query_format=advanced&amp;bug_status=NEW&amp;bug_status=REOPENED&amp;version=trunk&amp;component=Static%20Analyzer&amp;product=clang">Bugzilla database</a>
 222     to get some ideas for new checkers and consider starting with improving/fixing
 223     bugs in the existing checkers.</li>
 224   </ul>
 225
 226 <h2 id=registration>Checker Registration</h2>
 227   All checker implementation files are located in <tt>clang/lib/StaticAnalyzer/Checkers</tt>
 228   folder. Follow the steps below to register a new checker with the analyzer.
 229 <ol>
 230   <li>Create a new checker implementation file, for example <tt>./lib/StaticAnalyzer/Checkers/NewChecker.cpp</tt>
 231 <pre class="code_example">
 232 using namespace clang;
 233 using namespace ento;
 234
 235 namespace {
 236 class NewChecker: public Checker< check::PreStmt&lt;CallExpr> > {
 237 public:
 238   void checkPreStmt(const CallExpr *CE, CheckerContext &amp;Ctx) const {}
 239 }
 240 }
 241 void ento::registerNewChecker(CheckerManager &amp;mgr) {
 242   mgr.registerChecker&lt;NewChecker>();
 243 }
 244 </pre>
 245
 246 <li>Pick the package name for your checker and add the registration code to
 247 <tt>./lib/StaticAnalyzer/Checkers/Checkers.td</tt>. Note, all checkers should
 248 first be developed as experimental. Suppose our new checker performs security
 249 related checks, then we should add the following lines under
 250 <tt>SecurityExperimental</tt> package:
 251 <pre class="code_example">
 252 let ParentPackage = SecurityExperimental in {
 253 ...
 254 def NewChecker : Checker<"NewChecker">,
 255   HelpText<"This text should give a short description of the checks performed.">,
 256   DescFile<"NewChecker.cpp">;
 257 ...
 258 } // end "security.experimental"
 259 </pre>
 260
 261 <li>Make the source code file visible to CMake by adding it to
 262 <tt>./lib/StaticAnalyzer/Checkers/CMakeLists.txt</tt>.
 263
 264 <li>Compile and see your checker in the list of available checkers by running:<br>
 265 <tt><b>$clang -cc1 -analyzer-checker-help</b></tt>
 266 </ol>
 267
 268
 269 <h2 id=skeleton>Checker Skeleton</h2>
 270   There are two main decisions you need to make:
 271   <ul>
 272     <li> Which events the checker should be tracking.
 273     See <a href="http://clang.llvm.org/doxygen/classento_1_1CheckerDocumentation.html">CheckerDocumentation</a>
 274     for the list of available checker callbacks.</li>
 275     <li> What data you want to store as part of the checker-specific program
 276     state. Try to minimize the checker state as much as possible. </li>
 277   </ul>
 278
 279 <h2 id=bugs>Bug Reports</h2>
 280
 281 <h2 id=ast>AST Visitors</h2>
 282   Some checks might not require path-sensitivity to be effective. Simple AST walk
 283   might be sufficient. If that is the case, consider implementing a Clang
 284   compiler warning. On the other hand, a check might not be acceptable as a compiler
 285   warning; for example, because of a relatively high false positive rate. In this
 286   situation, AST callbacks <tt><b>checkASTDecl</b></tt> and
 287   <tt><b>checkASTCodeBody</b></tt> are your best friends.
 288
 289 <h2 id=testing>Testing</h2>
 290   Every patch should be well tested with Clang regression tests. The checker tests
 291   live in <tt>clang/test/Analysis</tt> folder. To run all of the analyzer tests,
 292   execute the following from the <tt>clang</tt> build directory:
 293     <pre class="code">
 294     $ <b>TESTDIRS=Analysis make test</b>
 295     </pre>
 296
 297 <h2 id=commands>Useful Commands/Debugging Hints</h2>
 298 <ul>
 299 <li>
 300 While investigating a checker-related issue, instruct the analyzer to only
 301 execute a single checker:
 302 <br><tt>
 303 $ <b>clang -cc1 -analyze -analyzer-checker=osx.KeychainAPI test.c</b>
 304 </tt>
 305 </li>
 306 <li>
 307 To dump AST:
 308 <br><tt>
 309 $ <b>clang -cc1 -ast-dump test.c</b>
 310 </tt>
 311 </li>
 312 <li>
 313 To view/dump CFG use <tt>debug.ViewCFG</tt> or <tt>debug.DumpCFG</tt> checkers:
 314 <br><tt>
 315 $ <b>clang -cc1 -analyze -analyzer-checker=debug.ViewCFG test.c</b>
 316 </tt>
 317 </li>
 318 <li>
 319 To see all available debug checkers:
 320 <br><tt>
 321 $ <b>clang -cc1 -analyzer-checker-help | grep "debug"</b>
 322 </tt>
 323 </li>
 324 <li>
 325 To see which function is failing while processing a large file use
 326 <tt>-analyzer-display-progress</tt> option.
 327 </li>
 328 <li>
 329 While debugging execute <tt>clang -cc1 -analyze -analyzer-checker=core</tt>
 330 instead of <tt>clang --analyze</tt>, as the later would call the compiler
 331 in a separate process.
 332 </li>
 333 <li>
 334 To view <tt>ExplodedGraph</tt> (the state graph explored by the analyzer) while
 335 debugging, goto a frame that has <tt>clang::ento::ExprEngine</tt> object and
 336 execute:
 337 <br><tt>
 338 (gdb) <b>p ViewGraph(0)</b>
 339 </tt>
 340 </li>
 341 <li>
 342 To see the <tt>ProgramState</tt> while debugging use the following command.
 343 <br><tt>
 344 (gdb) <b>p State->dump()</b>
 345 </tt>
 346 </li>
 347 <li>
 348 To see <tt>clang::Expr</tt> while debugging use the following command. If you
 349 pass in a SourceManager object, it will also dump the corresponding line in the
 350 source code.
 351 <br><tt>
 352 (gdb) <b>p E->dump()</b>
 353 </tt>
 354 </li>
 355 <li>
 356 To dump AST of a method that the current <tt>ExplodedNode</tt> belongs to:
 357 <br><tt>
 358 (gdb) <b>p C.getPredecessor()->getCodeDecl().getBody()->dump()</b>
 359 (gdb) <b>p C.getPredecessor()->getCodeDecl().getBody()->dump(getContext().getSourceManager())</b>
 360 </tt>
 361 </li>
 362 </ul>
 363
 364 </div>
 365 </div>
 366 </body>
 367 </html>