1 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" |
2 | "http://www.w3.org/TR/html4/strict.dtd"> |
3 | <html> |
4 | <head> |
5 | <title>Checker Developer Manual</title> |
6 | <link type="text/css" rel="stylesheet" href="menu.css"> |
7 | <link type="text/css" rel="stylesheet" href="content.css"> |
8 | <script type="text/javascript" src="scripts/menu.js"></script> |
9 | </head> |
10 | <body> |
11 | |
12 | <div id="page"> |
13 | <!--#include virtual="menu.html.incl"--> |
14 | |
15 | <div id="content"> |
16 | |
17 | <h3 style="color:red">This Page Is Under Construction</h3> |
18 | |
19 | <h1>Checker Developer Manual</h1> |
20 | |
21 | <p>The static analyzer engine performs path-sensitive exploration of the program and |
22 | relies on a set of checkers to implement the logic for detecting and |
23 | constructing specific bug reports. Anyone who is interested in implementing their own |
24 | checker, should check out the Building a Checker in 24 Hours talk |
25 | (<a href="http://llvm.org/devmtg/2012-11/Zaks-Rose-Checker24Hours.pdf">slides</a> |
26 | <a href="https://youtu.be/kdxlsP5QVPw">video</a>) |
27 | and refer to this page for additional information on writing a checker. The static analyzer is a |
28 | part of the Clang project, so consult <a href="http://clang.llvm.org/hacking.html">Hacking on Clang</a> |
29 | and <a href="http://llvm.org/docs/ProgrammersManual.html">LLVM Programmer's Manual</a> |
30 | for developer guidelines and send your questions and proposals to |
31 | <a href=http://lists.llvm.org/mailman/listinfo/cfe-dev>cfe-dev mailing list</a>. |
32 | </p> |
33 | |
34 | <ul> |
35 | <li><a href="#start">Getting Started</a></li> |
36 | <li><a href="#analyzer">Static Analyzer Overview</a> |
37 | <ul> |
38 | <li><a href="#interaction">Interaction with Checkers</a></li> |
39 | <li><a href="#values">Representing Values</a></li> |
40 | </ul></li> |
41 | <li><a href="#idea">Idea for a Checker</a></li> |
42 | <li><a href="#registration">Checker Registration</a></li> |
43 | <li><a href="#events_callbacks">Events, Callbacks, and Checker Class Structure</a></li> |
44 | <li><a href="#extendingstates">Custom Program States</a></li> |
45 | <li><a href="#bugs">Bug Reports</a></li> |
46 | <li><a href="#ast">AST Visitors</a></li> |
47 | <li><a href="#testing">Testing</a></li> |
48 | <li><a href="#commands">Useful Commands/Debugging Hints</a> |
49 | <ul> |
50 | <li><a href="#attaching">Attaching the Debugger</a></li> |
51 | <li><a href="#narrowing">Narrowing Down the Problem</a></li> |
52 | <li><a href="#visualizing">Visualizing the Analysis</a></li> |
53 | <li><a href="#debugprints">Debug Prints and Tricks</a></li> |
54 | </ul></li> |
55 | <li><a href="#additioninformation">Additional Sources of Information</a></li> |
56 | <li><a href="#links">Useful Links</a></li> |
57 | </ul> |
58 | |
59 | <h2 id=start>Getting Started</h2> |
60 | <ul> |
61 | <li>To check out the source code and build the project, follow steps 1-4 of |
62 | the <a href="http://clang.llvm.org/get_started.html">Clang Getting Started</a> |
63 | page.</li> |
64 | |
65 | <li>The analyzer source code is located under the Clang source tree: |
66 | <br><tt> |
67 | $ <b>cd llvm/tools/clang</b> |
68 | </tt> |
69 | <br>See: <tt>include/clang/StaticAnalyzer</tt>, <tt>lib/StaticAnalyzer</tt>, |
70 | <tt>test/Analysis</tt>.</li> |
71 | |
72 | <li>The analyzer regression tests can be executed from the Clang's build |
73 | directory: |
74 | <br><tt> |
75 | $ <b>cd ../../../; cd build/tools/clang; TESTDIRS=Analysis make test</b> |
76 | </tt></li> |
77 | |
78 | <li>Analyze a file with the specified checker: |
79 | <br><tt> |
80 | $ <b>clang -cc1 -analyze -analyzer-checker=core.DivideZero test.c</b> |
81 | </tt></li> |
82 | |
83 | <li>List the available checkers: |
84 | <br><tt> |
85 | $ <b>clang -cc1 -analyzer-checker-help</b> |
86 | </tt></li> |
87 | |
88 | <li>See the analyzer help for different output formats, fine tuning, and |
89 | debug options: |
90 | <br><tt> |
91 | $ <b>clang -cc1 -help | grep "analyzer"</b> |
92 | </tt></li> |
93 | |
94 | </ul> |
95 | |
96 | <h2 id=analyzer>Static Analyzer Overview</h2> |
97 | The analyzer core performs symbolic execution of the given program. All the |
98 | input values are represented with symbolic values; further, the engine deduces |
99 | the values of all the expressions in the program based on the input symbols |
100 | and the path. The execution is path sensitive and every possible path through |
101 | the program is explored. The explored execution traces are represented with |
102 | <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1ExplodedGraph.html">ExplodedGraph</a> object. |
103 | Each node of the graph is |
104 | <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1ExplodedNode.html">ExplodedNode</a>, |
105 | which consists of a <tt>ProgramPoint</tt> and a <tt>ProgramState</tt>. |
106 | <p> |
107 | <a href="http://clang.llvm.org/doxygen/classclang_1_1ProgramPoint.html">ProgramPoint</a> |
108 | represents the corresponding location in the program (or the CFG). |
109 | <tt>ProgramPoint</tt> is also used to record additional information on |
110 | when/how the state was added. For example, <tt>PostPurgeDeadSymbolsKind</tt> |
111 | kind means that the state is the result of purging dead symbols - the |
112 | analyzer's equivalent of garbage collection. |
113 | <p> |
114 | <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1ProgramState.html">ProgramState</a> |
115 | represents abstract state of the program. It consists of: |
116 | <ul> |
117 | <li><tt>Environment</tt> - a mapping from source code expressions to symbolic |
118 | values |
119 | <li><tt>Store</tt> - a mapping from memory locations to symbolic values |
120 | <li><tt>GenericDataMap</tt> - constraints on symbolic values |
121 | </ul> |
122 | |
123 | <h3 id=interaction>Interaction with Checkers</h3> |
124 | |
125 | <p> |
126 | Checkers are not merely passive receivers of the analyzer core changes - they |
127 | actively participate in the <tt>ProgramState</tt> construction through the |
128 | <tt>GenericDataMap</tt> which can be used to store the checker-defined part |
129 | of the state. Each time the analyzer engine explores a new statement, it |
130 | notifies each checker registered to listen for that statement, giving it an |
131 | opportunity to either report a bug or modify the state. (As a rule of thumb, |
132 | the checker itself should be stateless.) The checkers are called one after another |
133 | in the predefined order; thus, calling all the checkers adds a chain to the |
134 | <tt>ExplodedGraph</tt>. |
135 | </p> |
136 | |
137 | <h3 id=values>Representing Values</h3> |
138 | |
139 | <p> |
140 | During symbolic execution, <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1SVal.html">SVal</a> |
141 | objects are used to represent the semantic evaluation of expressions. |
142 | They can represent things like concrete |
143 | integers, symbolic values, or memory locations (which are memory regions). |
144 | They are a discriminated union of "values", symbolic and otherwise. |
145 | If a value isn't symbolic, usually that means there is no symbolic |
146 | information to track. For example, if the value was an integer, such as |
147 | <tt>42</tt>, it would be a <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1nonloc_1_1ConcreteInt.html">ConcreteInt</a>, |
148 | and the checker doesn't usually need to track any state with the concrete |
149 | number. In some cases, <tt>SVal</tt> is not a symbol, but it really should be |
150 | a symbolic value. This happens when the analyzer cannot reason about something |
151 | (yet). An example is floating point numbers. In such cases, the |
152 | <tt>SVal</tt> will evaluate to <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1UnknownVal.html">UnknownVal</a>. |
153 | This represents a case that is outside the realm of the analyzer's reasoning |
154 | capabilities. <tt>SVals</tt> are value objects and their values can be viewed |
155 | using the <tt>.dump()</tt> method. Often they wrap persistent objects such as |
156 | symbols or regions. |
157 | </p> |
158 | |
159 | <p> |
160 | <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1SymExpr.html">SymExpr</a> (symbol) |
161 | is meant to represent abstract, but named, symbolic value. Symbols represent |
162 | an actual (immutable) value. We might not know what its specific value is, but |
163 | we can associate constraints with that value as we analyze a path. For |
164 | example, we might record that the value of a symbol is greater than |
165 | <tt>0</tt>, etc. |
166 | </p> |
167 | |
168 | <p> |
169 | <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1MemRegion.html">MemRegion</a> is similar to a symbol. |
170 | It is used to provide a lexicon of how to describe abstract memory. Regions can |
171 | layer on top of other regions, providing a layered approach to representing memory. |
172 | For example, a struct object on the stack might be represented by a <tt>VarRegion</tt>, |
173 | but a <tt>FieldRegion</tt> which is a subregion of the <tt>VarRegion</tt> could |
174 | be used to represent the memory associated with a specific field of that object. |
175 | So how do we represent symbolic memory regions? That's what |
176 | <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1SymbolicRegion.html">SymbolicRegion</a> |
177 | is for. It is a <tt>MemRegion</tt> that has an associated symbol. Since the |
178 | symbol is unique and has a unique name; that symbol names the region. |
179 | </p> |
180 | |
181 | <p> |
182 | Let's see how the analyzer processes the expressions in the following example: |
183 | </p> |
184 | |
185 | <p> |
186 | <pre class="code_example"> |
187 | int foo(int x) { |
188 | int y = x * 2; |
189 | int z = x; |
190 | ... |
191 | } |
192 | </pre> |
193 | </p> |
194 | |
195 | <p> |
196 | Let's look at how <tt>x*2</tt> gets evaluated. When <tt>x</tt> is evaluated, |
197 | we first construct an <tt>SVal</tt> that represents the lvalue of <tt>x</tt>, in |
198 | this case it is an <tt>SVal</tt> that references the <tt>MemRegion</tt> for <tt>x</tt>. |
199 | Afterwards, when we do the lvalue-to-rvalue conversion, we get a new <tt>SVal</tt>, |
200 | which references the value <b>currently bound</b> to <tt>x</tt>. That value is |
201 | symbolic; it's whatever <tt>x</tt> was bound to at the start of the function. |
202 | Let's call that symbol <tt>$0</tt>. Similarly, we evaluate the expression for <tt>2</tt>, |
203 | and get an <tt>SVal</tt> that references the concrete number <tt>2</tt>. When |
204 | we evaluate <tt>x*2</tt>, we take the two <tt>SVals</tt> of the subexpressions, |
205 | and create a new <tt>SVal</tt> that represents their multiplication (which in |
206 | this case is a new symbolic expression, which we might call <tt>$1</tt>). When we |
207 | evaluate the assignment to <tt>y</tt>, we again compute its lvalue (a <tt>MemRegion</tt>), |
208 | and then bind the <tt>SVal</tt> for the RHS (which references the symbolic value <tt>$1</tt>) |
209 | to the <tt>MemRegion</tt> in the symbolic store. |
210 | <br> |
211 | The second line is similar. When we evaluate <tt>x</tt> again, we do the same |
212 | dance, and create an <tt>SVal</tt> that references the symbol <tt>$0</tt>. Note, two <tt>SVals</tt> |
213 | might reference the same underlying values. |
214 | </p> |
215 | |
216 | <p> |
217 | To summarize, MemRegions are unique names for blocks of memory. Symbols are |
218 | unique names for abstract symbolic values. Some MemRegions represents abstract |
219 | symbolic chunks of memory, and thus are also based on symbols. SVals are just |
220 | references to values, and can reference either MemRegions, Symbols, or concrete |
221 | values (e.g., the number 1). |
222 | </p> |
223 | |
224 | <!-- |
225 | TODO: Add a picture. |
226 | <br> |
227 | Symbols<br> |
228 | FunctionalObjects are used throughout. |
229 | --> |
230 | |
231 | <h2 id=idea>Idea for a Checker</h2> |
232 | Here are several questions which you should consider when evaluating your |
233 | checker idea: |
234 | <ul> |
235 | <li>Can the check be effectively implemented without path-sensitive |
236 | analysis? See <a href="#ast">AST Visitors</a>.</li> |
237 | |
238 | <li>How high the false positive rate is going to be? Looking at the occurrences |
239 | of the issue you want to write a checker for in the existing code bases might |
240 | give you some ideas. </li> |
241 | |
242 | <li>How the current limitations of the analysis will effect the false alarm |
243 | rate? Currently, the analyzer only reasons about one procedure at a time (no |
244 | inter-procedural analysis). Also, it uses a simple range tracking based |
245 | solver to model symbolic execution.</li> |
246 | |
247 | <li>Consult the <a |
248 | href="http://llvm.org/bugs/buglist.cgi?query_format=advanced&bug_status=NEW&bug_status=REOPENED&version=trunk&component=Static%20Analyzer&product=clang">Bugzilla database</a> |
249 | to get some ideas for new checkers and consider starting with improving/fixing |
250 | bugs in the existing checkers.</li> |
251 | </ul> |
252 | |
253 | <p>Once an idea for a checker has been chosen, there are two key decisions that |
254 | need to be made: |
255 | <ul> |
256 | <li> Which events the checker should be tracking. This is discussed in more |
257 | detail in the section <a href="#events_callbacks">Events, Callbacks, and |
258 | Checker Class Structure</a>. |
259 | <li> What checker-specific data needs to be stored as part of the program |
260 | state (if any). This should be minimized as much as possible. More detail about |
261 | implementing custom program state is given in section <a |
262 | href="#extendingstates">Custom Program States</a>. |
263 | </ul> |
264 | |
265 | |
266 | <h2 id=registration>Checker Registration</h2> |
267 | All checker implementation files are located in |
268 | <tt>clang/lib/StaticAnalyzer/Checkers</tt> folder. The steps below describe |
269 | how the checker <tt>SimpleStreamChecker</tt>, which checks for misuses of |
270 | stream APIs, was registered with the analyzer. |
271 | Similar steps should be followed for a new checker. |
272 | <ol> |
273 | <li>A new checker implementation file, <tt>SimpleStreamChecker.cpp</tt>, was |
274 | created in the directory <tt>lib/StaticAnalyzer/Checkers</tt>. |
275 | <li>The following registration code was added to the implementation file: |
276 | <pre class="code_example"> |
277 | void ento::registerSimpleStreamChecker(CheckerManager &mgr) { |
278 | mgr.registerChecker<SimpleStreamChecker>(); |
279 | } |
280 | </pre> |
281 | <li>A package was selected for the checker and the checker was defined in the |
282 | table of checkers at <tt>include/clang/StaticAnalyzer/Checkers/Checkers.td</tt>. |
283 | Since all checkers should first be developed as "alpha", and the SimpleStreamChecker |
284 | performs UNIX API checks, the correct package is "alpha.unix", and the following |
285 | was added to the corresponding <tt>UnixAlpha</tt> section of <tt>Checkers.td</tt>: |
286 | <pre class="code_example"> |
287 | let ParentPackage = UnixAlpha in { |
288 | ... |
289 | def SimpleStreamChecker : Checker<"SimpleStream">, |
290 | HelpText<"Check for misuses of stream APIs">, |
291 | DescFile<"SimpleStreamChecker.cpp">; |
292 | ... |
293 | } // end "alpha.unix" |
294 | </pre> |
295 | |
296 | <li>The source code file was made visible to CMake by adding it to |
297 | <tt>lib/StaticAnalyzer/Checkers/CMakeLists.txt</tt>. |
298 | |
299 | </ol> |
300 | |
301 | After adding a new checker to the analyzer, one can verify that the new checker |
302 | was successfully added by seeing if it appears in the list of available checkers: |
303 | <br> <tt><b>$clang -cc1 -analyzer-checker-help</b></tt> |
304 | |
305 | <h2 id=events_callbacks>Events, Callbacks, and Checker Class Structure</h2> |
306 | |
307 | <p> All checkers inherit from the <tt><a |
308 | href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1Checker.html"> |
309 | Checker</a></tt> template class; the template parameter(s) describe the type of |
310 | events that the checker is interested in processing. The various types of events |
311 | that are available are described in the file <a |
312 | href="http://clang.llvm.org/doxygen/CheckerDocumentation_8cpp_source.html"> |
313 | CheckerDocumentation.cpp</a> |
314 | |
315 | <p> For each event type requested, a corresponding callback function must be |
316 | defined in the checker class (<a |
317 | href="http://clang.llvm.org/doxygen/CheckerDocumentation_8cpp_source.html"> |
318 | CheckerDocumentation.cpp</a> shows the |
319 | correct function name and signature for each event type). |
320 | |
321 | <p> As an example, consider <tt>SimpleStreamChecker</tt>. This checker needs to |
322 | take action at the following times: |
323 | |
324 | <ul> |
325 | <li>Before making a call to a function, check if the function is <tt>fclose</tt>. |
326 | If so, check the parameter being passed. |
327 | <li>After making a function call, check if the function is <tt>fopen</tt>. If |
328 | so, process the return value. |
329 | <li>When values go out of scope, check whether they are still-open file |
330 | descriptors, and report a bug if so. In addition, remove any information about |
331 | them from the program state in order to keep the state as small as possible. |
332 | <li>When file pointers "escape" (are used in a way that the analyzer can no longer |
333 | track them), mark them as such. This prevents false positives in the cases where |
334 | the analyzer cannot be sure whether the file was closed or not. |
335 | </ul> |
336 | |
337 | <p>These events that will be used for each of these actions are, respectively, <a |
338 | href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1check_1_1PreCall.html">PreCall</a>, |
339 | <a |
340 | href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1check_1_1PostCall.html">PostCall</a>, |
341 | <a |
342 | href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1check_1_1DeadSymbols.html">DeadSymbols</a>, |
343 | and <a |
344 | href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1check_1_1PointerEscape.html">PointerEscape</a>. |
345 | The high-level structure of the checker's class is thus: |
346 | |
347 | <pre class="code_example"> |
348 | class SimpleStreamChecker : public Checker<check::PreCall, |
349 | check::PostCall, |
350 | check::DeadSymbols, |
351 | check::PointerEscape> { |
352 | public: |
353 | |
354 | void checkPreCall(const CallEvent &Call, CheckerContext &C) const; |
355 | |
356 | void checkPostCall(const CallEvent &Call, CheckerContext &C) const; |
357 | |
358 | void checkDeadSymbols(SymbolReaper &SR, CheckerContext &C) const; |
359 | |
360 | ProgramStateRef checkPointerEscape(ProgramStateRef State, |
361 | const InvalidatedSymbols &Escaped, |
362 | const CallEvent *Call, |
363 | PointerEscapeKind Kind) const; |
364 | }; |
365 | </pre> |
366 | |
367 | <h2 id=extendingstates>Custom Program States</h2> |
368 | |
369 | <p> Checkers often need to keep track of information specific to the checks they |
370 | perform. However, since checkers have no guarantee about the order in which the |
371 | program will be explored, or even that all possible paths will be explored, this |
372 | state information cannot be kept within individual checkers. Therefore, if |
373 | checkers need to store custom information, they need to add new categories of |
374 | data to the <tt>ProgramState</tt>. The preferred way to do so is to use one of |
375 | several macros designed for this purpose. They are: |
376 | |
377 | <ul> |
378 | <li><a |
379 | href="http://clang.llvm.org/doxygen/ProgramStateTrait_8h.html#ae4cddb54383cd702a045d7c61b009147">REGISTER_TRAIT_WITH_PROGRAMSTATE</a>: |
380 | Used when the state information is a single value. The methods available for |
381 | state types declared with this macro are <tt>get</tt>, <tt>set</tt>, and |
382 | <tt>remove</tt>. |
383 | <li><a |
384 | href="http://clang.llvm.org/doxygen/CheckerContext_8h.html#aa27656fa0ce65b0d9ba12eb3c02e8be9">REGISTER_LIST_WITH_PROGRAMSTATE</a>: |
385 | Used when the state information is a list of values. The methods available for |
386 | state types declared with this macro are <tt>add</tt>, <tt>get</tt>, |
387 | <tt>remove</tt>, and <tt>contains</tt>. |
388 | <li><a |
389 | href="http://clang.llvm.org/doxygen/CheckerContext_8h.html#ad90f9387b94b344eaaf499afec05f4d1">REGISTER_SET_WITH_PROGRAMSTATE</a>: |
390 | Used when the state information is a set of values. The methods available for |
391 | state types declared with this macro are <tt>add</tt>, <tt>get</tt>, |
392 | <tt>remove</tt>, and <tt>contains</tt>. |
393 | <li><a |
394 | href="http://clang.llvm.org/doxygen/CheckerContext_8h.html#a6d1893bb8c18543337b6c363c1319fcf">REGISTER_MAP_WITH_PROGRAMSTATE</a>: |
395 | Used when the state information is a map from a key to a value. The methods |
396 | available for state types declared with this macro are <tt>add</tt>, |
397 | <tt>set</tt>, <tt>get</tt>, <tt>remove</tt>, and <tt>contains</tt>. |
398 | </ul> |
399 | |
400 | <p>All of these macros take as parameters the name to be used for the custom |
401 | category of state information and the data type(s) to be used for storage. The |
402 | data type(s) specified will become the parameter type and/or return type of the |
403 | methods that manipulate the new category of state information. Each of these |
404 | methods are templated with the name of the custom data type. |
405 | |
406 | <p>For example, a common case is the need to track data associated with a |
407 | symbolic expression; a map type is the most logical way to implement this. The |
408 | key for this map will be a pointer to a symbolic expression |
409 | (<tt>SymbolRef</tt>). If the data type to be associated with the symbolic |
410 | expression is an integer, then the custom category of state information would be |
411 | declared as |
412 | |
413 | <pre class="code_example"> |
414 | REGISTER_MAP_WITH_PROGRAMSTATE(ExampleDataType, SymbolRef, int) |
415 | </pre> |
416 | |
417 | The data would be accessed with the function |
418 | |
419 | <pre class="code_example"> |
420 | ProgramStateRef state; |
421 | SymbolRef Sym; |
422 | ... |
423 | int currentlValue = state->get<ExampleDataType>(Sym); |
424 | </pre> |
425 | |
426 | and set with the function |
427 | |
428 | <pre class="code_example"> |
429 | ProgramStateRef state; |
430 | SymbolRef Sym; |
431 | int newValue; |
432 | ... |
433 | ProgramStateRef newState = state->set<ExampleDataType>(Sym, newValue); |
434 | </pre> |
435 | |
436 | <p>In addition, the macros define a data type used for storing the data of the |
437 | new data category; the name of this type is the name of the data category with |
438 | "Ty" appended. For <tt>REGISTER_TRAIT_WITH_PROGRAMSTATE</tt>, this will simply |
439 | be passed data type; for the other three macros, this will be a specialized |
440 | version of the <a |
441 | href="http://llvm.org/doxygen/classllvm_1_1ImmutableList.html">llvm::ImmutableList</a>, |
442 | <a |
443 | href="http://llvm.org/doxygen/classllvm_1_1ImmutableSet.html">llvm::ImmutableSet</a>, |
444 | or <a |
445 | href="http://llvm.org/doxygen/classllvm_1_1ImmutableMap.html">llvm::ImmutableMap</a> |
446 | templated class. For the <tt>ExampleDataType</tt> example above, the type |
447 | created would be equivalent to writing the declaration: |
448 | |
449 | <pre class="code_example"> |
450 | typedef llvm::ImmutableMap<SymbolRef, int> ExampleDataTypeTy; |
451 | </pre> |
452 | |
453 | <p>These macros will cover a majority of use cases; however, they still have a |
454 | few limitations. They cannot be used inside namespaces (since they expand to |
455 | contain top-level namespace references), and the data types that they define |
456 | cannot be referenced from more than one file. |
457 | |
458 | <p>Note that <tt>ProgramStates</tt> are immutable; instead of modifying an existing |
459 | one, functions that modify the state will return a copy of the previous state |
460 | with the change applied. This updated state must be then provided to the |
461 | analyzer core by calling the <tt>CheckerContext::addTransition</tt> function. |
462 | <h2 id=bugs>Bug Reports</h2> |
463 | |
464 | |
465 | <p> When a checker detects a mistake in the analyzed code, it needs a way to |
466 | report it to the analyzer core so that it can be displayed. The two classes used |
467 | to construct this report are <tt><a |
468 | href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1BugType.html">BugType</a></tt> |
469 | and <tt><a |
470 | href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1BugReport.html"> |
471 | BugReport</a></tt>. |
472 | |
473 | <p> |
474 | <tt>BugType</tt>, as the name would suggest, represents a type of bug. The |
475 | constructor for <tt>BugType</tt> takes two parameters: The name of the bug |
476 | type, and the name of the category of the bug. These are used (e.g.) in the |
477 | summary page generated by the scan-build tool. |
478 | |
479 | <P> |
480 | The <tt>BugReport</tt> class represents a specific occurrence of a bug. In |
481 | the most common case, three parameters are used to form a <tt>BugReport</tt>: |
482 | <ol> |
483 | <li>The type of bug, specified as an instance of the <tt>BugType</tt> class. |
484 | <li>A short descriptive string. This is placed at the location of the bug in |
485 | the detailed line-by-line output generated by scan-build. |
486 | <li>The context in which the bug occurred. This includes both the location of |
487 | the bug in the program and the program's state when the location is reached. These are |
488 | both encapsulated in an <tt>ExplodedNode</tt>. |
489 | </ol> |
490 | |
491 | <p>In order to obtain the correct <tt>ExplodedNode</tt>, a decision must be made |
492 | as to whether or not analysis can continue along the current path. This decision |
493 | is based on whether the detected bug is one that would prevent the program under |
494 | analysis from continuing. For example, leaking of a resource should not stop |
495 | analysis, as the program can continue to run after the leak. Dereferencing a |
496 | null pointer, on the other hand, should stop analysis, as there is no way for |
497 | the program to meaningfully continue after such an error. |
498 | |
499 | <p>If analysis can continue, then the most recent <tt>ExplodedNode</tt> |
500 | generated by the checker can be passed to the <tt>BugReport</tt> constructor |
501 | without additional modification. This <tt>ExplodedNode</tt> will be the one |
502 | returned by the most recent call to <a |
503 | href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1CheckerContext.html#a264f48d97809707049689c37aa35af78">CheckerContext::addTransition</a>. |
504 | If no transition has been performed during the current callback, the checker should call <a |
505 | href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1CheckerContext.html#a264f48d97809707049689c37aa35af78">CheckerContext::addTransition()</a> |
506 | and use the returned node for bug reporting. |
507 | |
508 | <p>If analysis can not continue, then the current state should be transitioned |
509 | into a so-called <i>sink node</i>, a node from which no further analysis will be |
510 | performed. This is done by calling the <a |
511 | href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1CheckerContext.html#adeea33a5a2bed190210c4a2bb807a6f0"> |
512 | CheckerContext::generateSink</a> function; this function is the same as the |
513 | <tt>addTransition</tt> function, but marks the state as a sink node. Like |
514 | <tt>addTransition</tt>, this returns an <tt>ExplodedNode</tt> with the updated |
515 | state, which can then be passed to the <tt>BugReport</tt> constructor. |
516 | |
517 | <p> |
518 | After a <tt>BugReport</tt> is created, it should be passed to the analyzer core |
519 | by calling <a href = "http://clang.llvm.org/doxygen/classclang_1_1ento_1_1CheckerContext.html#ae7738af2cbfd1d713edec33d3203dff5">CheckerContext::emitReport</a>. |
520 | |
521 | <h2 id=ast>AST Visitors</h2> |
522 | Some checks might not require path-sensitivity to be effective. Simple AST walk |
523 | might be sufficient. If that is the case, consider implementing a Clang |
524 | compiler warning. On the other hand, a check might not be acceptable as a compiler |
525 | warning; for example, because of a relatively high false positive rate. In this |
526 | situation, AST callbacks <tt><b>checkASTDecl</b></tt> and |
527 | <tt><b>checkASTCodeBody</b></tt> are your best friends. |
528 | |
529 | <h2 id=testing>Testing</h2> |
530 | Every patch should be well tested with Clang regression tests. The checker tests |
531 | live in <tt>clang/test/Analysis</tt> folder. To run all of the analyzer tests, |
532 | execute the following from the <tt>clang</tt> build directory: |
533 | <pre class="code"> |
534 | $ <b>bin/llvm-lit -sv ../llvm/tools/clang/test/Analysis</b> |
535 | </pre> |
536 | |
537 | <h2 id=commands>Useful Commands/Debugging Hints</h2> |
538 | |
539 | <h3 id=attaching>Attaching the Debugger</h3> |
540 | |
541 | <p>When your command contains the <tt><b>-cc1</b></tt> flag, you can attach the |
542 | debugger to it directly:</p> |
543 | |
544 | <pre class="code"> |
545 | $ <b>gdb --args clang -cc1 -analyze -analyzer-checker=core test.c</b> |
546 | $ <b>lldb -- clang -cc1 -analyze -analyzer-checker=core test.c</b> |
547 | </pre> |
548 | |
549 | <p> |
550 | Otherwise, if your command line contains <tt><b>--analyze</b></tt>, |
551 | the actual clang instance would be run in a separate process. In |
552 | order to debug it, use the <tt><b>-###</b></tt> flag for obtaining |
553 | the command line of the child process: |
554 | </p> |
555 | |
556 | <pre class="code"> |
557 | $ <b>clang --analyze test.c -\#\#\#</b> |
558 | </pre> |
559 | |
560 | <p> |
561 | Below we describe a few useful command line arguments, all of which assume that |
562 | you are running <tt><b>clang -cc1</b></tt>. |
563 | </p> |
564 | |
565 | <h3 id=narrowing>Narrowing Down the Problem</h3> |
566 | |
567 | <p>While investigating a checker-related issue, instruct the analyzer to only |
568 | execute a single checker: |
569 | </p> |
570 | <pre class="code"> |
571 | $ <b>clang -cc1 -analyze -analyzer-checker=osx.KeychainAPI test.c</b> |
572 | </pre> |
573 | |
574 | <p>If you are experiencing a crash, to see which function is failing while |
575 | processing a large file use the <tt><b>-analyzer-display-progress</b></tt> |
576 | option.</p> |
577 | |
578 | <p>To selectively analyze only the given function, use the |
579 | <tt><b>-analyze-function</b></tt> option:</p> |
580 | <pre class="code"> |
581 | $ <b>clang -cc1 -analyze -analyzer-checker=core test.c -analyzer-display-progress</b> |
582 | ANALYZE (Syntax): test.c foo |
583 | ANALYZE (Syntax): test.c bar |
584 | ANALYZE (Path, Inline_Regular): test.c bar |
585 | ANALYZE (Path, Inline_Regular): test.c foo |
586 | $ <b>clang -cc1 -analyze -analyzer-checker=core test.c -analyzer-display-progress -analyze-function=foo</b> |
587 | ANALYZE (Syntax): test.c foo |
588 | ANALYZE (Path, Inline_Regular): test.c foo |
589 | </pre> |
590 | |
591 | <b>Note: </b> a fully qualified function name has to be used when selecting |
592 | C++ functions and methods, Objective-C methods and blocks, e.g.: |
593 | |
594 | <pre class="code"> |
595 | $ <b>clang -cc1 -analyze -analyzer-checker=core test.cc -analyze-function=foo(int)</b> |
596 | </pre> |
597 | |
598 | The fully qualified name can be found from the |
599 | <tt><b>-analyzer-display-progress</b></tt> output. |
600 | |
601 | <p>The bug reporter mechanism removes path diagnostics inside intermediate |
602 | function calls that have returned by the time the bug was found and contain |
603 | no interesting pieces. Usually it is up to the checkers to produce more |
604 | interesting pieces by adding custom <tt>BugReporterVisitor</tt> objects. |
605 | However, you can disable path pruning while debugging with the |
606 | <tt><b>-analyzer-config prune-paths=false</b></tt> option. |
607 | |
608 | <h3 id=visualizing>Visualizing the Analysis</h3> |
609 | |
610 | <p>To dump the AST, which often helps understanding how the program should |
611 | behave:</p> |
612 | <pre class="code"> |
613 | $ <b>clang -cc1 -ast-dump test.c</b> |
614 | </pre> |
615 | |
616 | <p>To view/dump CFG use <tt>debug.ViewCFG</tt> or <tt>debug.DumpCFG</tt> |
617 | checkers:</p> |
618 | <pre class="code"> |
619 | $ <b>clang -cc1 -analyze -analyzer-checker=debug.ViewCFG test.c</b> |
620 | </pre> |
621 | |
622 | <p><tt>ExplodedGraph</tt> (the state graph explored by the analyzer) can be |
623 | visualized with another debug checker:</p> |
624 | <pre class="code"> |
625 | $ <b>clang -cc1 -analyze -analyzer-checker=debug.ViewExplodedGraph test.c</b> |
626 | </pre> |
627 | <p>Or, equivalently, with <tt><b>-analyzer-viz-egraph-graphviz</b></tt> |
628 | option, which does the same thing - dumps the exploded graph in graphviz |
629 | <tt><b>.dot</b></tt> format.</p> |
630 | |
631 | <p>You can convert <tt><b>.dot</b></tt> files into other formats - in |
632 | particular, converting to <tt><b>.svg</b></tt> and viewing in your web |
633 | browser might be more comfortable than using a <tt><b>.dot</b></tt> viewer:</p> |
634 | <pre class="code"> |
635 | $ <b>dot -Tsvg ExprEngine-501e2e.dot -o ExprEngine-501e2e.svg</b> |
636 | </pre> |
637 | |
638 | <p>The <tt><b>-trim-egraph</b></tt> option removes all paths except those |
639 | leading to bug reports from the exploded graph dump. This is useful |
640 | because exploded graphs are often huge and hard to navigate.</p> |
641 | |
642 | <p>Viewing <tt>ExplodedGraph</tt> is your most powerful tool for understanding |
643 | the analyzer's false positives, because it gives comprehensive information |
644 | on every decision made by the analyzer across all analysis paths.</p> |
645 | |
646 | <p>There are more debug checkers available. To see all available debug checkers: |
647 | </p> |
648 | <pre class="code"> |
649 | $ <b>clang -cc1 -analyzer-checker-help | grep "debug"</b> |
650 | </pre> |
651 | |
652 | <h3 id=debugprints>Debug Prints and Tricks</h3> |
653 | |
654 | <p>To view "half-baked" <tt>ExplodedGraph</tt> while debugging, jump to a frame |
655 | that has <tt>clang::ento::ExprEngine</tt> object and execute:</p> |
656 | <pre class="code"> |
657 | (gdb) <b>p ViewGraph(0)</b> |
658 | </pre> |
659 | |
660 | <p>To see the <tt>ProgramState</tt> while debugging use the following command. |
661 | <pre class="code"> |
662 | (gdb) <b>p State->dump()</b> |
663 | </pre> |
664 | |
665 | <p>To see <tt>clang::Expr</tt> while debugging use the following command. If you |
666 | pass in a <tt>SourceManager</tt> object, it will also dump the corresponding line in the |
667 | source code.</p> |
668 | <pre class="code"> |
669 | (gdb) <b>p E->dump()</b> |
670 | </pre> |
671 | |
672 | <p>To dump AST of a method that the current <tt>ExplodedNode</tt> belongs |
673 | to:</p> |
674 | <pre class="code"> |
675 | (gdb) <b>p C.getPredecessor()->getCodeDecl().getBody()->dump()</b> |
676 | </pre> |
677 | |
678 | <h2 id=links>Making Your Checker Better</h2> |
679 | <ul> |
680 | <li>User facing documentation is important for adoption! Make sure the <a href="/available_checks.html">checker list </a>is updated |
681 | at the homepage of the analyzer. Also ensure the description is clear to |
682 | non-analyzer-developers in <tt>Checkers.td</tt>.</li> |
683 | <li>Warning and note messages should be clear and easy to understand, even if a bit long.</li> |
684 | <ul> |
685 | <li>Messages should start with a capital letter (unlike Clang warnings!) and should not |
686 | end with <tt>.</tt>.</li> |
687 | <li>Articles are usually omitted, eg. <tt>Dereference of a null pointer</tt> -> |
688 | <tt>Dereference of null pointer</tt>.</li> |
689 | <li>Introduce <tt>BugReporterVisitor</tt>s to emit additional notes that explain the warning |
690 | to the user better. There are some existing visitors that might be useful for your check, |
691 | e.g. <tt>trackNullOrUndefValue</tt>. For example, SimpleStreamChecker should highlight |
692 | the event of opening the file when reporting a file descriptor leak.</li> |
693 | </ul> |
694 | <li>If the check tracks anything in the program state, it needs to implement the |
695 | <tt>checkDeadSymbols</tt>callback to clean the state up.</li> |
696 | <li>The check should conservatively assume that the program is correct when a tracked symbol |
697 | is passed to a function that is unknown to the analyzer. |
698 | <tt>checkPointerEscape</tt> callback could help you handle that case.</li> |
699 | <li>Use safe and convenient APIs!</li> |
700 | <ul> |
701 | <li>Always use <tt>CheckerContext::generateErrorNode</tt> and |
702 | <tt>CheckerContext::generateNonFatalErrorNode</tt> for emitting bug reports. |
703 | Most importantly, never emit report against <tt>CheckerContext::getPredecessor</tt>.</li> |
704 | <li>Prefer <tt>checkPreCall</tt> and <tt>checkPostCall</tt> to |
705 | <tt>checkPreStmt<CallExpr></tt> and <tt>checkPostStmt<CallExpr></tt>.</li> |
706 | <li>Use <tt>CallDescription</tt> to detect hardcoded API calls in the program.</li> |
707 | <li>Simplify <tt>C.getState()->getSVal(E, C.getLocationContext())</tt> to <tt>C.getSVal(E)</tt>.</li> |
708 | </ul> |
709 | <li>Common sources of crashes:</li> |
710 | <ul> |
711 | <li><tt>CallEvent::getOriginExpr</tt> is nullable - for example, it returns null for an |
712 | automatic destructor of a variable. The same applies to some values generated while the |
713 | call was modeled, eg. <tt>SymbolConjured::getStmt</tt> is nullable.</li> |
714 | <li><tt>CallEvent::getDecl</tt> is nullable - for example, it returns null for a |
715 | call of symbolic function pointer.</li> |
716 | <li><tt>addTransition</tt>, <tt>generateSink</tt>, <tt>generateNonFatalErrorNode</tt>, |
717 | <tt>generateErrorNode</tt> are nullable because you can transition to a node that you have already visited.</li> |
718 | <li>Methods of <tt>CallExpr</tt>/<tt>FunctionDecl</tt>/<tt>CallEvent</tt> that |
719 | return arguments crash when the argument is out-of-bounds. If you checked the function name, |
720 | it doesn't mean that the function has the expected number of arguments! |
721 | Which is why you should use <tt>CallDescription</tt>.</li> |
722 | <li>Nullability of different entities within different kinds of symbols and regions is usually |
723 | documented via assertions in their constructors.</li> |
724 | <li><tt>NamedDecl::getName</tt> will fail if the name of the declaration is not a single token, |
725 | e.g. for destructors. You could use <tt>NamedDecl::getNameAsString</tt> for those cases. |
726 | Note that this method is much slower and should be used sparringly, e.g. only when generating reports |
727 | but not during analysis.</li> |
728 | <li>Is <tt>-analyzer-checker=core</tt> included in all test <tt>RUN:</tt> lines? It was never supported |
729 | to run the analyzer with the core checks disabled. It might cause unexpected behavior and |
730 | crashes. You should do all your testing with the core checks enabled.</li> |
731 | </ul> |
732 | </ul> |
733 | <li>Patterns that you should most likely avoid even if they're not technically wrong:</li> |
734 | <ul> |
735 | <li><tt>BugReporterVisitor</tt> should most likely not match the AST of the current program point |
736 | to decide when to emit a note. It is much easier to determine that by observing changes in |
737 | the program state.</li> |
738 | <li>In <tt>State->getSVal(Region)</tt>, if <tt>Region</tt> is not known to be a <tt>TypedValueRegion</tt> |
739 | and the optional type argument is not specified, the checker may accidentally try to dereference a |
740 | void pointer.</li> |
741 | <li>Checker logic should not depend on whether a certain value is a <tt>Loc</tt> or <tt>NonLoc</tt>. |
742 | It should be immediately obvious whether the <tt>SVal</tt> is a <tt>Loc</tt> or a |
743 | <tt>NonLoc</tt> depending on the AST that is being checked. Checking whether a value |
744 | is <tt>Loc</tt> or <tt>Unknown</tt>/<tt>Undefined</tt> or whether the value is |
745 | <tt>NonLoc</tt> or <tt>Unknown</tt>/<tt>Undefined</tt> is totally fine.</li> |
746 | <li>New symbols should not be constructed in the checker via direct calls to <tt>SymbolManager</tt>, |
747 | unless they are of <tt>SymbolMetadata</tt> class tagged by the checker, |
748 | or they represent newly created values such as the return value in <tt>evalCall</tt>. |
749 | For modeling arithmetic/bitwise/comparison operations, <tt>SValBuilder</tt> should be used.</li> |
750 | <li>Custom <tt>ProgramPointTag</tt>s should not be created within the checker. There is usually |
751 | no good reason for a checker to chain multiple nodes together, because checkers aren't worklists.</li> |
752 | </ul> |
753 | <li>Checkers are encouraged to actively participate in the analysis by sharing |
754 | their knowledge about the program state with the rest of the analyzer, |
755 | but they should not be disrupting the analysis unnecessarily:</li> |
756 | <ul> |
757 | <li>If a checker splits program state, this must be based on knowledge that |
758 | the newly appearing branches are definitely possible and worth exploring |
759 | from the user's perspective. Otherwise the state split should be delayed |
760 | until there's an indication that one of the paths is taken, or one of the |
761 | paths needs to be dropped entirely. For example, it is fine to eagerly split |
762 | paths while modeling <tt>isalpha(x)</tt> as long as <tt>x</tt> is constrained accordingly on |
763 | each path. At the same time, it is not a good idea to split paths over the |
764 | return value of <tt>printf()</tt> while modeling the call because nobody ever checks |
765 | for errors in <tt>printf</tt>; at best, it'd just double the remaining analysis time. |
766 | </li> |
767 | <li>Caution is advised when using <tt>CheckerContext::generateNonFatalErrorNode</tt> |
768 | because it generates an independent transition, much like <tt>addTransition</tt>. |
769 | It is easy to accidentally split paths while using it. Ideally, try to |
770 | structure the code so that it was obvious that every <tt>addTransition</tt> or |
771 | <tt>generateNonFatalErrorNode</tt> (or sequence of such if the split is intended) is |
772 | immediately followed by return from the checker callback.</li> |
773 | <li>Multiple implementations of <tt>evalCall</tt> in different checkers should not conflict.</li> |
774 | <li>When implementing <tt>evalAssume</tt>, the checker should always return a non-null state |
775 | for either the true assumption or the false assumption (or both).</li> |
776 | <li>Checkers shall not mutate values of expressions, i.e. use the <tt>ProgramState::BindExpr</tt> API, |
777 | unless they are fully responsible for computing the value. |
778 | Under no circumstances should they change non-<tt>Unknown</tt> values of expressions. |
779 | Currently the only valid use case for this API in checkers is to model the return value in the <tt>evalCall</tt> callback. |
780 | If expression values are incorrect, <tt>ExprEngine</tt> needs to be fixed instead.</li> |
781 | </ul> |
782 | |
783 | <h2 id=additioninformation>Additional Sources of Information</h2> |
784 | |
785 | Here are some additional resources that are useful when working on the Clang |
786 | Static Analyzer: |
787 | |
788 | <ul> |
789 | <li><a href="http://lcs.ios.ac.cn/~xuzb/canalyze/memmodel.pdf">Xu, Zhongxing & |
790 | Kremenek, Ted & Zhang, Jian. (2010). A Memory Model for Static Analysis of C |
791 | Programs.</a></li> |
792 | <li><a href="https://github.com/llvm/llvm-project/blob/master/clang/lib/StaticAnalyzer/README.txt"> |
793 | The Clang Static Analyzer README</a></li> |
794 | <li><a href="https://github.com/llvm/llvm-project/blob/master/clang/docs/analyzer/RegionStore.txt"> |
795 | Documentation for how the Store works</a></li> |
796 | <li><a href="https://github.com/llvm/llvm-project/blob/master/clang/docs/analyzer/IPA.txt"> |
797 | Documentation about inlining</a></li> |
798 | <li> The "Building a Checker in 24 hours" presentation given at the <a |
799 | href="http://llvm.org/devmtg/2012-11">November 2012 LLVM Developer's |
800 | meeting</a>. Describes the construction of SimpleStreamChecker. <a |
801 | href="http://llvm.org/devmtg/2012-11/Zaks-Rose-Checker24Hours.pdf">Slides</a> |
802 | and <a |
803 | href="https://youtu.be/kdxlsP5QVPw">video</a> |
804 | are available.</li> |
805 | <li> |
806 | <a href="https://github.com/haoNoQ/clang-analyzer-guide/releases/download/v0.1/clang-analyzer-guide-v0.1.pdf"> |
807 | Artem Degrachev: Clang Static Analyzer: A Checker Developer's Guide |
808 | </a> (reading the previous items first might be a good idea)</li> |
809 | <li>The list of <a href="implicit_checks.html">Implicit Checkers</a></li> |
810 | <li> <a href="http://clang.llvm.org/doxygen">Clang doxygen</a>. Contains |
811 | up-to-date documentation about the APIs available in Clang. Relevant entries |
812 | have been linked throughout this page. Also of use is the |
813 | <a href="http://llvm.org/doxygen">LLVM doxygen</a>, when dealing with classes |
814 | from LLVM.</li> |
815 | <li> The <a href="http://lists.llvm.org/mailman/listinfo/cfe-dev"> |
816 | cfe-dev mailing list</a>. This is the primary mailing list used for |
817 | discussion of Clang development (including static code analysis). The |
818 | <a href="http://lists.llvm.org/pipermail/cfe-dev">archive</a> also contains |
819 | a lot of information.</li> |
820 | </ul> |
821 | |
822 | </div> |
823 | </div> |
824 | </body> |
825 | </html> |
826 | |