1 ===============================================================
2 Tutorial for building tools using LibTooling and LibASTMatchers
3 ===============================================================
5 This document is intended to show how to build a useful source-to-source
6 translation tool based on Clang's `LibTooling <LibTooling.html>`_. It is
7 explicitly aimed at people who are new to Clang, so all you should need
8 is a working knowledge of C++ and the command line.
10 In order to work on the compiler, you need some basic knowledge of the
11 abstract syntax tree (AST). To this end, the reader is incouraged to
12 skim the :doc:`Introduction to the Clang
13 AST <IntroductionToTheClangAST>`
15 Step 0: Obtaining Clang
16 =======================
18 As Clang is part of the LLVM project, you'll need to download LLVM's
19 source code first. Both Clang and LLVM are maintained as Subversion
20 repositories, but we'll be accessing them through the git mirror. For
21 further information, see the `getting started
22 guide <http://llvm.org/docs/GettingStarted.html>`_.
24 .. code-block:: console
26 mkdir ~/clang-llvm && cd ~/clang-llvm
27 git clone http://llvm.org/git/llvm.git
29 git clone http://llvm.org/git/clang.git
31 git clone http://llvm.org/git/clang-tools-extra.git extra
33 Next you need to obtain the CMake build system and Ninja build tool. You
34 may already have CMake installed, but current binary versions of CMake
35 aren't built with Ninja support.
37 .. code-block:: console
40 git clone https://github.com/martine/ninja.git
44 sudo cp ninja /usr/bin/
47 git clone git://cmake.org/stage/cmake.git
54 Okay. Now we'll build Clang!
56 .. code-block:: console
59 mkdir build && cd build
60 cmake -G Ninja ../llvm -DLLVM_BUILD_TESTS=ON # Enable tests; default is off.
62 ninja check # Test LLVM only.
63 ninja clang-test # Test Clang only.
68 All of the tests should pass, though there is a (very) small chance that
69 you can catch LLVM and Clang out of sync. Running ``'git svn rebase'``
70 in both the llvm and clang directories should fix any problems.
72 Finally, we want to set Clang as its own compiler.
74 .. code-block:: console
79 The second command will bring up a GUI for configuring Clang. You need
80 to set the entry for ``CMAKE_CXX_COMPILER``. Press ``'t'`` to turn on
81 advanced mode. Scroll down to ``CMAKE_CXX_COMPILER``, and set it to
82 ``/usr/bin/clang++``, or wherever you installed it. Press ``'c'`` to
83 configure, then ``'g'`` to generate CMake's files.
85 Finally, run ninja one last time, and you're done.
87 Step 1: Create a ClangTool
88 ==========================
90 Now that we have enough background knowledge, it's time to create the
91 simplest productive ClangTool in existence: a syntax checker. While this
92 already exists as ``clang-check``, it's important to understand what's
95 First, we'll need to create a new directory for our tool and tell CMake
96 that it exists. As this is not going to be a core clang tool, it will
97 live in the ``tools/extra`` repository.
99 .. code-block:: console
101 cd ~/clang-llvm/llvm/tools/clang
102 mkdir tools/extra/loop-convert
103 echo 'add_subdirectory(loop-convert)' >> tools/extra/CMakeLists.txt
104 vim tools/extra/loop-convert/CMakeLists.txt
106 CMakeLists.txt should have the following contents:
110 set(LLVM_LINK_COMPONENTS support)
112 add_clang_executable(loop-convert
115 target_link_libraries(loop-convert
121 With that done, Ninja will be able to compile our tool. Let's give it
122 something to compile! Put the following into
123 ``tools/extra/loop-convert/LoopConvert.cpp``. A detailed explanation of
124 why the different parts are needed can be found in the `LibTooling
125 documentation <LibTooling.html>`_.
129 // Declares clang::SyntaxOnlyAction.
130 #include "clang/Frontend/FrontendActions.h"
131 #include "clang/Tooling/CommonOptionsParser.h"
132 #include "clang/Tooling/Tooling.h"
133 // Declares llvm::cl::extrahelp.
134 #include "llvm/Support/CommandLine.h"
136 using namespace clang::tooling;
137 using namespace llvm;
139 // Apply a custom category to all command-line options so that they are the
140 // only ones displayed.
141 static llvm::cl::OptionCategory MyToolCategory("my-tool options");
143 // CommonOptionsParser declares HelpMessage with a description of the common
144 // command-line options related to the compilation database and input files.
145 // It's nice to have this help message in all tools.
146 static cl::extrahelp CommonHelp(CommonOptionsParser::HelpMessage);
148 // A help message for this specific tool can be added afterwards.
149 static cl::extrahelp MoreHelp("\nMore help text...");
151 int main(int argc, const char **argv) {
152 CommonOptionsParser OptionsParser(argc, argv, MyToolCategory);
153 ClangTool Tool(OptionsParser.getCompilations(),
154 OptionsParser.getSourcePathList());
155 return Tool.run(newFrontendActionFactory<clang::SyntaxOnlyAction>().get());
158 And that's it! You can compile our new tool by running ninja from the
161 .. code-block:: console
163 cd ~/clang-llvm/build
166 You should now be able to run the syntax checker, which is located in
167 ``~/clang-llvm/build/bin``, on any source file. Try it!
169 .. code-block:: console
171 echo "int main() { return 0; }" > test.cpp
172 bin/loop-convert test.cpp --
174 Note the two dashes after we specify the source file. The additional
175 options for the compiler are passed after the dashes rather than loading
176 them from a compilation database - there just aren't any options needed
179 Intermezzo: Learn AST matcher basics
180 ====================================
182 Clang recently introduced the :doc:`ASTMatcher
183 library <LibASTMatchers>` to provide a simple, powerful, and
184 concise way to describe specific patterns in the AST. Implemented as a
185 DSL powered by macros and templates (see
186 `ASTMatchers.h <../doxygen/ASTMatchers_8h_source.html>`_ if you're
187 curious), matchers offer the feel of algebraic data types common to
188 functional programming languages.
190 For example, suppose you wanted to examine only binary operators. There
191 is a matcher to do exactly that, conveniently named ``binaryOperator``.
192 I'll give you one guess what this matcher does:
196 binaryOperator(hasOperatorName("+"), hasLHS(integerLiteral(equals(0))))
198 Shockingly, it will match against addition expressions whose left hand
199 side is exactly the literal 0. It will not match against other forms of
200 0, such as ``'\0'`` or ``NULL``, but it will match against macros that
201 expand to 0. The matcher will also not match against calls to the
202 overloaded operator ``'+'``, as there is a separate ``operatorCallExpr``
203 matcher to handle overloaded operators.
205 There are AST matchers to match all the different nodes of the AST,
206 narrowing matchers to only match AST nodes fulfilling specific criteria,
207 and traversal matchers to get from one kind of AST node to another. For
208 a complete list of AST matchers, take a look at the `AST Matcher
209 References <LibASTMatchersReference.html>`_
211 All matcher that are nouns describe entities in the AST and can be
212 bound, so that they can be referred to whenever a match is found. To do
213 so, simply call the method ``bind`` on these matchers, e.g.:
217 variable(hasType(isInteger())).bind("intvar")
219 Step 2: Using AST matchers
220 ==========================
222 Okay, on to using matchers for real. Let's start by defining a matcher
223 which will capture all ``for`` statements that define a new variable
224 initialized to zero. Let's start with matching all ``for`` loops:
230 Next, we want to specify that a single variable is declared in the first
231 portion of the loop, so we can extend the matcher to
235 forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl()))))
237 Finally, we can add the condition that the variable is initialized to
242 forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl(
243 hasInitializer(integerLiteral(equals(0))))))))
245 It is fairly easy to read and understand the matcher definition ("match
246 loops whose init portion declares a single variable which is initialized
247 to the integer literal 0"), but deciding that every piece is necessary
248 is more difficult. Note that this matcher will not match loops whose
249 variables are initialized to ``'\0'``, ``0.0``, ``NULL``, or any form of
250 zero besides the integer 0.
252 The last step is giving the matcher a name and binding the ``ForStmt``
253 as we will want to do something with it:
257 StatementMatcher LoopMatcher =
258 forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl(
259 hasInitializer(integerLiteral(equals(0)))))))).bind("forLoop");
261 Once you have defined your matchers, you will need to add a little more
262 scaffolding in order to run them. Matchers are paired with a
263 ``MatchCallback`` and registered with a ``MatchFinder`` object, then run
264 from a ``ClangTool``. More code!
266 Add the following to ``LoopConvert.cpp``:
270 #include "clang/ASTMatchers/ASTMatchers.h"
271 #include "clang/ASTMatchers/ASTMatchFinder.h"
273 using namespace clang;
274 using namespace clang::ast_matchers;
276 StatementMatcher LoopMatcher =
277 forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl(
278 hasInitializer(integerLiteral(equals(0)))))))).bind("forLoop");
280 class LoopPrinter : public MatchFinder::MatchCallback {
282 virtual void run(const MatchFinder::MatchResult &Result) {
283 if (const ForStmt *FS = Result.Nodes.getNodeAs<clang::ForStmt>("forLoop"))
288 And change ``main()`` to:
292 int main(int argc, const char **argv) {
293 CommonOptionsParser OptionsParser(argc, argv, MyToolCategory);
294 ClangTool Tool(OptionsParser.getCompilations(),
295 OptionsParser.getSourcePathList());
299 Finder.addMatcher(LoopMatcher, &Printer);
301 return Tool.run(newFrontendActionFactory(&Finder).get());
304 Now, you should be able to recompile and run the code to discover for
305 loops. Create a new file with a few examples, and test out our new
308 .. code-block:: console
310 cd ~/clang-llvm/llvm/llvm_build/
312 vim ~/test-files/simple-loops.cc
313 bin/loop-convert ~/test-files/simple-loops.cc
315 Step 3.5: More Complicated Matchers
316 ===================================
318 Our simple matcher is capable of discovering for loops, but we would
319 still need to filter out many more ourselves. We can do a good portion
320 of the remaining work with some cleverly chosen matchers, but first we
321 need to decide exactly which properties we want to allow.
323 How can we characterize for loops over arrays which would be eligible
324 for translation to range-based syntax? Range based loops over arrays of
327 - start at index ``0``
328 - iterate consecutively
329 - end at index ``N-1``
331 We already check for (1), so all we need to add is a check to the loop's
332 condition to ensure that the loop's index variable is compared against
333 ``N`` and another check to ensure that the increment step just
334 increments this same variable. The matcher for (2) is straightforward:
335 require a pre- or post-increment of the same variable declared in the
338 Unfortunately, such a matcher is impossible to write. Matchers contain
339 no logic for comparing two arbitrary AST nodes and determining whether
340 or not they are equal, so the best we can do is matching more than we
341 would like to allow, and punting extra comparisons to the callback.
343 In any case, we can start building this sub-matcher. We can require that
344 the increment step be a unary increment like this:
348 hasIncrement(unaryOperator(hasOperatorName("++")))
350 Specifying what is incremented introduces another quirk of Clang's AST:
351 Usages of variables are represented as ``DeclRefExpr``'s ("declaration
352 reference expressions") because they are expressions which refer to
353 variable declarations. To find a ``unaryOperator`` that refers to a
354 specific declaration, we can simply add a second condition to it:
358 hasIncrement(unaryOperator(
359 hasOperatorName("++"),
360 hasUnaryOperand(declRefExpr())))
362 Furthermore, we can restrict our matcher to only match if the
363 incremented variable is an integer:
367 hasIncrement(unaryOperator(
368 hasOperatorName("++"),
369 hasUnaryOperand(declRefExpr(to(varDecl(hasType(isInteger())))))))
371 And the last step will be to attach an identifier to this variable, so
372 that we can retrieve it in the callback:
376 hasIncrement(unaryOperator(
377 hasOperatorName("++"),
378 hasUnaryOperand(declRefExpr(to(
379 varDecl(hasType(isInteger())).bind("incrementVariable"))))))
381 We can add this code to the definition of ``LoopMatcher`` and make sure
382 that our program, outfitted with the new matcher, only prints out loops
383 that declare a single variable initialized to zero and have an increment
384 step consisting of a unary increment of some variable.
386 Now, we just need to add a matcher to check if the condition part of the
387 ``for`` loop compares a variable against the size of the array. There is
388 only one problem - we don't know which array we're iterating over
389 without looking at the body of the loop! We are again restricted to
390 approximating the result we want with matchers, filling in the details
391 in the callback. So we start with:
395 hasCondition(binaryOperator(hasOperatorName("<"))
397 It makes sense to ensure that the left-hand side is a reference to a
398 variable, and that the right-hand side has integer type.
402 hasCondition(binaryOperator(
403 hasOperatorName("<"),
404 hasLHS(declRefExpr(to(varDecl(hasType(isInteger()))))),
405 hasRHS(expr(hasType(isInteger())))))
407 Why? Because it doesn't work. Of the three loops provided in
408 ``test-files/simple.cpp``, zero of them have a matching condition. A
409 quick look at the AST dump of the first for loop, produced by the
410 previous iteration of loop-convert, shows us the answer:
417 (IntegerLiteral 0x173afa8 'int' 0)")
419 (BinaryOperator 0x173b060 '_Bool' '<'
420 (ImplicitCastExpr 0x173b030 'int'
421 (DeclRefExpr 0x173afe0 'int' lvalue Var 0x173af50 'i' 'int'))
422 (ImplicitCastExpr 0x173b048 'int'
423 (DeclRefExpr 0x173b008 'const int' lvalue Var 0x170fa80 'N' 'const int')))
424 (UnaryOperator 0x173b0b0 'int' lvalue prefix '++'
425 (DeclRefExpr 0x173b088 'int' lvalue Var 0x173af50 'i' 'int'))
426 (CompoundStatement ...
428 We already know that the declaration and increments both match, or this
429 loop wouldn't have been dumped. The culprit lies in the implicit cast
430 applied to the first operand (i.e. the LHS) of the less-than operator,
431 an L-value to R-value conversion applied to the expression referencing
432 ``i``. Thankfully, the matcher library offers a solution to this problem
433 in the form of ``ignoringParenImpCasts``, which instructs the matcher to
434 ignore implicit casts and parentheses before continuing to match.
435 Adjusting the condition operator will restore the desired match.
439 hasCondition(binaryOperator(
440 hasOperatorName("<"),
441 hasLHS(ignoringParenImpCasts(declRefExpr(
442 to(varDecl(hasType(isInteger())))))),
443 hasRHS(expr(hasType(isInteger())))))
445 After adding binds to the expressions we wished to capture and
446 extracting the identifier strings into variables, we have array-step-2
449 Step 4: Retrieving Matched Nodes
450 ================================
452 So far, the matcher callback isn't very interesting: it just dumps the
453 loop's AST. At some point, we will need to make changes to the input
454 source code. Next, we'll work on using the nodes we bound in the
457 The ``MatchFinder::run()`` callback takes a
458 ``MatchFinder::MatchResult&`` as its parameter. We're most interested in
459 its ``Context`` and ``Nodes`` members. Clang uses the ``ASTContext``
460 class to represent contextual information about the AST, as the name
461 implies, though the most functionally important detail is that several
462 operations require an ``ASTContext*`` parameter. More immediately useful
463 is the set of matched nodes, and how we retrieve them.
465 Since we bind three variables (identified by ConditionVarName,
466 InitVarName, and IncrementVarName), we can obtain the matched nodes by
467 using the ``getNodeAs()`` member function.
469 In ``LoopConvert.cpp`` add
473 #include "clang/AST/ASTContext.h"
475 Change ``LoopMatcher`` to
479 StatementMatcher LoopMatcher =
480 forStmt(hasLoopInit(declStmt(
481 hasSingleDecl(varDecl(hasInitializer(integerLiteral(equals(0))))
482 .bind("initVarName")))),
483 hasIncrement(unaryOperator(
484 hasOperatorName("++"),
485 hasUnaryOperand(declRefExpr(
486 to(varDecl(hasType(isInteger())).bind("incVarName")))))),
487 hasCondition(binaryOperator(
488 hasOperatorName("<"),
489 hasLHS(ignoringParenImpCasts(declRefExpr(
490 to(varDecl(hasType(isInteger())).bind("condVarName"))))),
491 hasRHS(expr(hasType(isInteger())))))).bind("forLoop");
493 And change ``LoopPrinter::run`` to
497 void LoopPrinter::run(const MatchFinder::MatchResult &Result) {
498 ASTContext *Context = Result.Context;
499 const ForStmt *FS = Result.Nodes.getNodeAs<ForStmt>("forLoop");
500 // We do not want to convert header files!
501 if (!FS || !Context->getSourceManager().isWrittenInMainFile(FS->getForLoc()))
503 const VarDecl *IncVar = Result.Nodes.getNodeAs<VarDecl>("incVarName");
504 const VarDecl *CondVar = Result.Nodes.getNodeAs<VarDecl>("condVarName");
505 const VarDecl *InitVar = Result.Nodes.getNodeAs<VarDecl>("initVarName");
507 if (!areSameVariable(IncVar, CondVar) || !areSameVariable(IncVar, InitVar))
509 llvm::outs() << "Potential array-based loop discovered.\n";
512 Clang associates a ``VarDecl`` with each variable to represent the variable's
513 declaration. Since the "canonical" form of each declaration is unique by
514 address, all we need to do is make sure neither ``ValueDecl`` (base class of
515 ``VarDecl``) is ``NULL`` and compare the canonical Decls.
519 static bool areSameVariable(const ValueDecl *First, const ValueDecl *Second) {
520 return First && Second &&
521 First->getCanonicalDecl() == Second->getCanonicalDecl();
524 If execution reaches the end of ``LoopPrinter::run()``, we know that the
525 loop shell that looks like
529 for (int i= 0; i < expr(); ++i) { ... }
531 For now, we will just print a message explaining that we found a loop.
532 The next section will deal with recursively traversing the AST to
533 discover all changes needed.
535 As a side note, it's not as trivial to test if two expressions are the same,
536 though Clang has already done the hard work for us by providing a way to
537 canonicalize expressions:
541 static bool areSameExpr(ASTContext *Context, const Expr *First,
542 const Expr *Second) {
543 if (!First || !Second)
545 llvm::FoldingSetNodeID FirstID, SecondID;
546 First->Profile(FirstID, *Context, true);
547 Second->Profile(SecondID, *Context, true);
548 return FirstID == SecondID;
551 This code relies on the comparison between two
552 ``llvm::FoldingSetNodeIDs``. As the documentation for
553 ``Stmt::Profile()`` indicates, the ``Profile()`` member function builds
554 a description of a node in the AST, based on its properties, along with
555 those of its children. ``FoldingSetNodeID`` then serves as a hash we can
556 use to compare expressions. We will need ``areSameExpr`` later. Before
557 you run the new code on the additional loops added to
558 test-files/simple.cpp, try to figure out which ones will be considered
559 potentially convertible.