checker_dev_manual.html source code [clang_source_code/www/analyzer/checker_dev

1	<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
2	"http://www.w3.org/TR/html4/strict.dtd">
3	<html>
4	<head>
5	<title>Checker Developer Manual</title>
6	<link type="text/css" rel="stylesheet" href="menu.css">
7	<link type="text/css" rel="stylesheet" href="content.css">
8	<script type="text/javascript" src="scripts/menu.js"></script>
9	</head>
10	<body>
11
12	<div id="page">
13	<!--#include virtual="menu.html.incl"-->
14
15	<div id="content">
16
17	<h3 style="color:red">This Page Is Under Construction</h3>
18
19	<h1>Checker Developer Manual</h1>
20
21	<p>The static analyzer engine performs path-sensitive exploration of the program and
22	relies on a set of checkers to implement the logic for detecting and
23	constructing specific bug reports. Anyone who is interested in implementing their own
24	checker, should check out the Building a Checker in 24 Hours talk
25	(<a href="http://llvm.org/devmtg/2012-11/Zaks-Rose-Checker24Hours.pdf">slides</a>
26	<a href="https://youtu.be/kdxlsP5QVPw">video</a>)
27	and refer to this page for additional information on writing a checker. The static analyzer is a
28	part of the Clang project, so consult <a href="http://clang.llvm.org/hacking.html">Hacking on Clang</a>
29	and <a href="http://llvm.org/docs/ProgrammersManual.html">LLVM Programmer's Manual</a>
30	for developer guidelines and send your questions and proposals to
31	<a href=http://lists.llvm.org/mailman/listinfo/cfe-dev>cfe-dev mailing list</a>.
32	</p>
33
34	<ul>
35	<li><a href="#start">Getting Started</a></li>
36	<li><a href="#analyzer">Static Analyzer Overview</a>
37	<ul>
38	<li><a href="#interaction">Interaction with Checkers</a></li>
39	<li><a href="#values">Representing Values</a></li>
40	</ul></li>
41	<li><a href="#idea">Idea for a Checker</a></li>
42	<li><a href="#registration">Checker Registration</a></li>
43	<li><a href="#events_callbacks">Events, Callbacks, and Checker Class Structure</a></li>
44	<li><a href="#extendingstates">Custom Program States</a></li>
45	<li><a href="#bugs">Bug Reports</a></li>
46	<li><a href="#ast">AST Visitors</a></li>
47	<li><a href="#testing">Testing</a></li>
48	<li><a href="#commands">Useful Commands/Debugging Hints</a>
49	<ul>
50	<li><a href="#attaching">Attaching the Debugger</a></li>
51	<li><a href="#narrowing">Narrowing Down the Problem</a></li>
52	<li><a href="#visualizing">Visualizing the Analysis</a></li>
53	<li><a href="#debugprints">Debug Prints and Tricks</a></li>
54	</ul></li>
55	<li><a href="#additioninformation">Additional Sources of Information</a></li>
56	<li><a href="#links">Useful Links</a></li>
57	</ul>
58
59	<h2 id=start>Getting Started</h2>
60	<ul>
61	<li>To check out the source code and build the project, follow steps 1-4 of
62	the <a href="http://clang.llvm.org/get_started.html">Clang Getting Started</a>
63	page.</li>
64
65	<li>The analyzer source code is located under the Clang source tree:
66	<br><tt>
67	$ <b>cd llvm/tools/clang</b>
68	</tt>
69	<br>See: <tt>include/clang/StaticAnalyzer</tt>, <tt>lib/StaticAnalyzer</tt>,
70	<tt>test/Analysis</tt>.</li>
71
72	<li>The analyzer regression tests can be executed from the Clang's build
73	directory:
74	<br><tt>
75	$ <b>cd ../../../; cd build/tools/clang; TESTDIRS=Analysis make test</b>
76	</tt></li>
77
78	<li>Analyze a file with the specified checker:
79	<br><tt>
80	$ <b>clang -cc1 -analyze -analyzer-checker=core.DivideZero test.c</b>
81	</tt></li>
82
83	<li>List the available checkers:
84	<br><tt>
85	$ <b>clang -cc1 -analyzer-checker-help</b>
86	</tt></li>
87
88	<li>See the analyzer help for different output formats, fine tuning, and
89	debug options:
90	<br><tt>
91	$ <b>clang -cc1 -help \| grep "analyzer"</b>
92	</tt></li>
93
94	</ul>
95
96	<h2 id=analyzer>Static Analyzer Overview</h2>
97	The analyzer core performs symbolic execution of the given program. All the
98	input values are represented with symbolic values; further, the engine deduces
99	the values of all the expressions in the program based on the input symbols
100	and the path. The execution is path sensitive and every possible path through
101	the program is explored. The explored execution traces are represented with
102	<a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1ExplodedGraph.html">ExplodedGraph</a> object.
103	Each node of the graph is
104	<a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1ExplodedNode.html">ExplodedNode</a>,
105	which consists of a <tt>ProgramPoint</tt> and a <tt>ProgramState</tt>.
106	<p>
107	<a href="http://clang.llvm.org/doxygen/classclang_1_1ProgramPoint.html">ProgramPoint</a>
108	represents the corresponding location in the program (or the CFG).
109	<tt>ProgramPoint</tt> is also used to record additional information on
110	when/how the state was added. For example, <tt>PostPurgeDeadSymbolsKind</tt>
111	kind means that the state is the result of purging dead symbols - the
112	analyzer's equivalent of garbage collection.
113	<p>
114	<a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1ProgramState.html">ProgramState</a>
115	represents abstract state of the program. It consists of:
116	<ul>
117	<li><tt>Environment</tt> - a mapping from source code expressions to symbolic
118	values
119	<li><tt>Store</tt> - a mapping from memory locations to symbolic values
120	<li><tt>GenericDataMap</tt> - constraints on symbolic values
121	</ul>
122
123	<h3 id=interaction>Interaction with Checkers</h3>
124
125	<p>
126	Checkers are not merely passive receivers of the analyzer core changes - they
127	actively participate in the <tt>ProgramState</tt> construction through the
128	<tt>GenericDataMap</tt> which can be used to store the checker-defined part
129	of the state. Each time the analyzer engine explores a new statement, it
130	notifies each checker registered to listen for that statement, giving it an
131	opportunity to either report a bug or modify the state. (As a rule of thumb,
132	the checker itself should be stateless.) The checkers are called one after another
133	in the predefined order; thus, calling all the checkers adds a chain to the
134	<tt>ExplodedGraph</tt>.
135	</p>
136
137	<h3 id=values>Representing Values</h3>
138
139	<p>
140	During symbolic execution, <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1SVal.html">SVal</a>
141	objects are used to represent the semantic evaluation of expressions.
142	They can represent things like concrete
143	integers, symbolic values, or memory locations (which are memory regions).
144	They are a discriminated union of "values", symbolic and otherwise.
145	If a value isn't symbolic, usually that means there is no symbolic
146	information to track. For example, if the value was an integer, such as
147	<tt>42</tt>, it would be a <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1nonloc_1_1ConcreteInt.html">ConcreteInt</a>,
148	and the checker doesn't usually need to track any state with the concrete
149	number. In some cases, <tt>SVal</tt> is not a symbol, but it really should be
150	a symbolic value. This happens when the analyzer cannot reason about something
151	(yet). An example is floating point numbers. In such cases, the
152	<tt>SVal</tt> will evaluate to <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1UnknownVal.html">UnknownVal</a>.
153	This represents a case that is outside the realm of the analyzer's reasoning
154	capabilities. <tt>SVals</tt> are value objects and their values can be viewed
155	using the <tt>.dump()</tt> method. Often they wrap persistent objects such as
156	symbols or regions.
157	</p>
158
159	<p>
160	<a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1SymExpr.html">SymExpr</a> (symbol)
161	is meant to represent abstract, but named, symbolic value. Symbols represent
162	an actual (immutable) value. We might not know what its specific value is, but
163	we can associate constraints with that value as we analyze a path. For
164	example, we might record that the value of a symbol is greater than
165	<tt>0</tt>, etc.
166	</p>
167
168	<p>
169	<a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1MemRegion.html">MemRegion</a> is similar to a symbol.
170	It is used to provide a lexicon of how to describe abstract memory. Regions can
171	layer on top of other regions, providing a layered approach to representing memory.
172	For example, a struct object on the stack might be represented by a <tt>VarRegion</tt>,
173	but a <tt>FieldRegion</tt> which is a subregion of the <tt>VarRegion</tt> could
174	be used to represent the memory associated with a specific field of that object.
175	So how do we represent symbolic memory regions? That's what
176	<a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1SymbolicRegion.html">SymbolicRegion</a>
177	is for. It is a <tt>MemRegion</tt> that has an associated symbol. Since the
178	symbol is unique and has a unique name; that symbol names the region.
179	</p>
180
181	<p>
182	Let's see how the analyzer processes the expressions in the following example:
183	</p>
184
185	<p>
186	<pre class="code_example">
187	int foo(int x) {
188	int y = x * 2;
189	int z = x;
190	...
191	}
192	</pre>
193	</p>
194
195	<p>
196	Let's look at how <tt>x*2</tt> gets evaluated. When <tt>x</tt> is evaluated,
197	we first construct an <tt>SVal</tt> that represents the lvalue of <tt>x</tt>, in
198	this case it is an <tt>SVal</tt> that references the <tt>MemRegion</tt> for <tt>x</tt>.
199	Afterwards, when we do the lvalue-to-rvalue conversion, we get a new <tt>SVal</tt>,
200	which references the value <b>currently bound</b> to <tt>x</tt>. That value is
201	symbolic; it's whatever <tt>x</tt> was bound to at the start of the function.
202	Let's call that symbol <tt>$0</tt>. Similarly, we evaluate the expression for <tt>2</tt>,
203	and get an <tt>SVal</tt> that references the concrete number <tt>2</tt>. When
204	we evaluate <tt>x*2</tt>, we take the two <tt>SVals</tt> of the subexpressions,
205	and create a new <tt>SVal</tt> that represents their multiplication (which in
206	this case is a new symbolic expression, which we might call <tt>$1</tt>). When we
207	evaluate the assignment to <tt>y</tt>, we again compute its lvalue (a <tt>MemRegion</tt>),
208	and then bind the <tt>SVal</tt> for the RHS (which references the symbolic value <tt>$1</tt>)
209	to the <tt>MemRegion</tt> in the symbolic store.
210	<br>
211	The second line is similar. When we evaluate <tt>x</tt> again, we do the same
212	dance, and create an <tt>SVal</tt> that references the symbol <tt>$0</tt>. Note, two <tt>SVals</tt>
213	might reference the same underlying values.
214	</p>
215
216	<p>
217	To summarize, MemRegions are unique names for blocks of memory. Symbols are
218	unique names for abstract symbolic values. Some MemRegions represents abstract
219	symbolic chunks of memory, and thus are also based on symbols. SVals are just
220	references to values, and can reference either MemRegions, Symbols, or concrete
221	values (e.g., the number 1).
222	</p>
223
224	<!--
225	TODO: Add a picture.
226	<br>
227	Symbols<br>
228	FunctionalObjects are used throughout.
229	-->
230
231	<h2 id=idea>Idea for a Checker</h2>
232	Here are several questions which you should consider when evaluating your
233	checker idea:
234	<ul>
235	<li>Can the check be effectively implemented without path-sensitive
236	analysis? See <a href="#ast">AST Visitors</a>.</li>
237
238	<li>How high the false positive rate is going to be? Looking at the occurrences
239	of the issue you want to write a checker for in the existing code bases might
240	give you some ideas. </li>
241
242	<li>How the current limitations of the analysis will effect the false alarm
243	rate? Currently, the analyzer only reasons about one procedure at a time (no
244	inter-procedural analysis). Also, it uses a simple range tracking based
245	solver to model symbolic execution.</li>
246
247	<li>Consult the <a
248	href="http://llvm.org/bugs/buglist.cgi?query_format=advanced&bug_status=NEW&bug_status=REOPENED&version=trunk&component=Static%20Analyzer&product=clang">Bugzilla database</a>
249	to get some ideas for new checkers and consider starting with improving/fixing
250	bugs in the existing checkers.</li>
251	</ul>
252
253	<p>Once an idea for a checker has been chosen, there are two key decisions that
254	need to be made:
255	<ul>
256	<li> Which events the checker should be tracking. This is discussed in more
257	detail in the section <a href="#events_callbacks">Events, Callbacks, and
258	Checker Class Structure</a>.
259	<li> What checker-specific data needs to be stored as part of the program
260	state (if any). This should be minimized as much as possible. More detail about
261	implementing custom program state is given in section <a
262	href="#extendingstates">Custom Program States</a>.
263	</ul>
264
265
266	<h2 id=registration>Checker Registration</h2>
267	All checker implementation files are located in
268	<tt>clang/lib/StaticAnalyzer/Checkers</tt> folder. The steps below describe
269	how the checker <tt>SimpleStreamChecker</tt>, which checks for misuses of
270	stream APIs, was registered with the analyzer.
271	Similar steps should be followed for a new checker.
272	<ol>
273	<li>A new checker implementation file, <tt>SimpleStreamChecker.cpp</tt>, was
274	created in the directory <tt>lib/StaticAnalyzer/Checkers</tt>.
275	<li>The following registration code was added to the implementation file:
276	<pre class="code_example">
277	void ento::registerSimpleStreamChecker(CheckerManager &mgr) {
278	mgr.registerChecker<SimpleStreamChecker&gt();
279	}
280	</pre>
281	<li>A package was selected for the checker and the checker was defined in the
282	table of checkers at <tt>include/clang/StaticAnalyzer/Checkers/Checkers.td</tt>.
283	Since all checkers should first be developed as "alpha", and the SimpleStreamChecker
284	performs UNIX API checks, the correct package is "alpha.unix", and the following
285	was added to the corresponding <tt>UnixAlpha</tt> section of <tt>Checkers.td</tt>:
286	<pre class="code_example">
287	let ParentPackage = UnixAlpha in {
288	...
289	def SimpleStreamChecker : Checker<"SimpleStream">,
290	HelpText<"Check for misuses of stream APIs">,
291	DescFile<"SimpleStreamChecker.cpp">;
292	...
293	} // end "alpha.unix"
294	</pre>
295
296	<li>The source code file was made visible to CMake by adding it to
297	<tt>lib/StaticAnalyzer/Checkers/CMakeLists.txt</tt>.
298
299	</ol>
300
301	After adding a new checker to the analyzer, one can verify that the new checker
302	was successfully added by seeing if it appears in the list of available checkers:
303	<br> <tt><b>$clang -cc1 -analyzer-checker-help</b></tt>
304
305	<h2 id=events_callbacks>Events, Callbacks, and Checker Class Structure</h2>
306
307	<p> All checkers inherit from the <tt><a
308	href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1Checker.html">
309	Checker</a></tt> template class; the template parameter(s) describe the type of
310	events that the checker is interested in processing. The various types of events
311	that are available are described in the file <a
312	href="http://clang.llvm.org/doxygen/CheckerDocumentation_8cpp_source.html">
313	CheckerDocumentation.cpp</a>
314
315	<p> For each event type requested, a corresponding callback function must be
316	defined in the checker class (<a
317	href="http://clang.llvm.org/doxygen/CheckerDocumentation_8cpp_source.html">
318	CheckerDocumentation.cpp</a> shows the
319	correct function name and signature for each event type).
320
321	<p> As an example, consider <tt>SimpleStreamChecker</tt>. This checker needs to
322	take action at the following times:
323
324	<ul>
325	<li>Before making a call to a function, check if the function is <tt>fclose</tt>.
326	If so, check the parameter being passed.
327	<li>After making a function call, check if the function is <tt>fopen</tt>. If
328	so, process the return value.
329	<li>When values go out of scope, check whether they are still-open file
330	descriptors, and report a bug if so. In addition, remove any information about
331	them from the program state in order to keep the state as small as possible.
332	<li>When file pointers "escape" (are used in a way that the analyzer can no longer
333	track them), mark them as such. This prevents false positives in the cases where
334	the analyzer cannot be sure whether the file was closed or not.
335	</ul>
336
337	<p>These events that will be used for each of these actions are, respectively, <a
338	href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1check_1_1PreCall.html">PreCall</a>,
339	<a
340	href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1check_1_1PostCall.html">PostCall</a>,
341	<a
342	href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1check_1_1DeadSymbols.html">DeadSymbols</a>,
343	and <a
344	href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1check_1_1PointerEscape.html">PointerEscape</a>.
345	The high-level structure of the checker's class is thus:
346
347	<pre class="code_example">
348	class SimpleStreamChecker : public Checker<check::PreCall,
349	check::PostCall,
350	check::DeadSymbols,
351	check::PointerEscape> {
352	public:
353
354	void checkPreCall(const CallEvent &Call, CheckerContext &C) const;
355
356	void checkPostCall(const CallEvent &Call, CheckerContext &C) const;
357
358	void checkDeadSymbols(SymbolReaper &SR, CheckerContext &C) const;
359
360	ProgramStateRef checkPointerEscape(ProgramStateRef State,
361	const InvalidatedSymbols &Escaped,
362	const CallEvent *Call,
363	PointerEscapeKind Kind) const;
364	};
365	</pre>
366
367	<h2 id=extendingstates>Custom Program States</h2>
368
369	<p> Checkers often need to keep track of information specific to the checks they
370	perform. However, since checkers have no guarantee about the order in which the
371	program will be explored, or even that all possible paths will be explored, this
372	state information cannot be kept within individual checkers. Therefore, if
373	checkers need to store custom information, they need to add new categories of
374	data to the <tt>ProgramState</tt>. The preferred way to do so is to use one of
375	several macros designed for this purpose. They are:
376
377	<ul>
378	<li><a
379	href="http://clang.llvm.org/doxygen/ProgramStateTrait_8h.html#ae4cddb54383cd702a045d7c61b009147">REGISTER_TRAIT_WITH_PROGRAMSTATE</a>:
380	Used when the state information is a single value. The methods available for
381	state types declared with this macro are <tt>get</tt>, <tt>set</tt>, and
382	<tt>remove</tt>.
383	<li><a
384	href="http://clang.llvm.org/doxygen/CheckerContext_8h.html#aa27656fa0ce65b0d9ba12eb3c02e8be9">REGISTER_LIST_WITH_PROGRAMSTATE</a>:
385	Used when the state information is a list of values. The methods available for
386	state types declared with this macro are <tt>add</tt>, <tt>get</tt>,
387	<tt>remove</tt>, and <tt>contains</tt>.
388	<li><a
389	href="http://clang.llvm.org/doxygen/CheckerContext_8h.html#ad90f9387b94b344eaaf499afec05f4d1">REGISTER_SET_WITH_PROGRAMSTATE</a>:
390	Used when the state information is a set of values. The methods available for
391	state types declared with this macro are <tt>add</tt>, <tt>get</tt>,
392	<tt>remove</tt>, and <tt>contains</tt>.
393	<li><a
394	href="http://clang.llvm.org/doxygen/CheckerContext_8h.html#a6d1893bb8c18543337b6c363c1319fcf">REGISTER_MAP_WITH_PROGRAMSTATE</a>:
395	Used when the state information is a map from a key to a value. The methods
396	available for state types declared with this macro are <tt>add</tt>,
397	<tt>set</tt>, <tt>get</tt>, <tt>remove</tt>, and <tt>contains</tt>.
398	</ul>
399
400	<p>All of these macros take as parameters the name to be used for the custom
401	category of state information and the data type(s) to be used for storage. The
402	data type(s) specified will become the parameter type and/or return type of the
403	methods that manipulate the new category of state information. Each of these
404	methods are templated with the name of the custom data type.
405
406	<p>For example, a common case is the need to track data associated with a
407	symbolic expression; a map type is the most logical way to implement this. The
408	key for this map will be a pointer to a symbolic expression
409	(<tt>SymbolRef</tt>). If the data type to be associated with the symbolic
410	expression is an integer, then the custom category of state information would be
411	declared as
412
413	<pre class="code_example">
414	REGISTER_MAP_WITH_PROGRAMSTATE(ExampleDataType, SymbolRef, int)
415	</pre>
416
417	The data would be accessed with the function
418
419	<pre class="code_example">
420	ProgramStateRef state;
421	SymbolRef Sym;
422	...
423	int currentlValue = state->get<ExampleDataType>(Sym);
424	</pre>
425
426	and set with the function
427
428	<pre class="code_example">
429	ProgramStateRef state;
430	SymbolRef Sym;
431	int newValue;
432	...
433	ProgramStateRef newState = state->set<ExampleDataType>(Sym, newValue);
434	</pre>
435
436	<p>In addition, the macros define a data type used for storing the data of the
437	new data category; the name of this type is the name of the data category with
438	"Ty" appended. For <tt>REGISTER_TRAIT_WITH_PROGRAMSTATE</tt>, this will simply
439	be passed data type; for the other three macros, this will be a specialized
440	version of the <a
441	href="http://llvm.org/doxygen/classllvm_1_1ImmutableList.html">llvm::ImmutableList</a>,
442	<a
443	href="http://llvm.org/doxygen/classllvm_1_1ImmutableSet.html">llvm::ImmutableSet</a>,
444	or <a
445	href="http://llvm.org/doxygen/classllvm_1_1ImmutableMap.html">llvm::ImmutableMap</a>
446	templated class. For the <tt>ExampleDataType</tt> example above, the type
447	created would be equivalent to writing the declaration:
448
449	<pre class="code_example">
450	typedef llvm::ImmutableMap<SymbolRef, int> ExampleDataTypeTy;
451	</pre>
452
453	<p>These macros will cover a majority of use cases; however, they still have a
454	few limitations. They cannot be used inside namespaces (since they expand to
455	contain top-level namespace references), and the data types that they define
456	cannot be referenced from more than one file.
457
458	<p>Note that <tt>ProgramStates</tt> are immutable; instead of modifying an existing
459	one, functions that modify the state will return a copy of the previous state
460	with the change applied. This updated state must be then provided to the
461	analyzer core by calling the <tt>CheckerContext::addTransition</tt> function.
462	<h2 id=bugs>Bug Reports</h2>
463
464
465	<p> When a checker detects a mistake in the analyzed code, it needs a way to
466	report it to the analyzer core so that it can be displayed. The two classes used
467	to construct this report are <tt><a
468	href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1BugType.html">BugType</a></tt>
469	and <tt><a
470	href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1BugReport.html">
471	BugReport</a></tt>.
472
473	<p>
474	<tt>BugType</tt>, as the name would suggest, represents a type of bug. The
475	constructor for <tt>BugType</tt> takes two parameters: The name of the bug
476	type, and the name of the category of the bug. These are used (e.g.) in the
477	summary page generated by the scan-build tool.
478
479	<P>
480	The <tt>BugReport</tt> class represents a specific occurrence of a bug. In
481	the most common case, three parameters are used to form a <tt>BugReport</tt>:
482	<ol>
483	<li>The type of bug, specified as an instance of the <tt>BugType</tt> class.
484	<li>A short descriptive string. This is placed at the location of the bug in
485	the detailed line-by-line output generated by scan-build.
486	<li>The context in which the bug occurred. This includes both the location of
487	the bug in the program and the program's state when the location is reached. These are
488	both encapsulated in an <tt>ExplodedNode</tt>.
489	</ol>
490
491	<p>In order to obtain the correct <tt>ExplodedNode</tt>, a decision must be made
492	as to whether or not analysis can continue along the current path. This decision
493	is based on whether the detected bug is one that would prevent the program under
494	analysis from continuing. For example, leaking of a resource should not stop
495	analysis, as the program can continue to run after the leak. Dereferencing a
496	null pointer, on the other hand, should stop analysis, as there is no way for
497	the program to meaningfully continue after such an error.
498
499	<p>If analysis can continue, then the most recent <tt>ExplodedNode</tt>
500	generated by the checker can be passed to the <tt>BugReport</tt> constructor
501	without additional modification. This <tt>ExplodedNode</tt> will be the one
502	returned by the most recent call to <a
503	href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1CheckerContext.html#a264f48d97809707049689c37aa35af78">CheckerContext::addTransition</a>.
504	If no transition has been performed during the current callback, the checker should call <a
505	href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1CheckerContext.html#a264f48d97809707049689c37aa35af78">CheckerContext::addTransition()</a>
506	and use the returned node for bug reporting.
507
508	<p>If analysis can not continue, then the current state should be transitioned
509	into a so-called <i>sink node</i>, a node from which no further analysis will be
510	performed. This is done by calling the <a
511	href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1CheckerContext.html#adeea33a5a2bed190210c4a2bb807a6f0">
512	CheckerContext::generateSink</a> function; this function is the same as the
513	<tt>addTransition</tt> function, but marks the state as a sink node. Like
514	<tt>addTransition</tt>, this returns an <tt>ExplodedNode</tt> with the updated
515	state, which can then be passed to the <tt>BugReport</tt> constructor.
516
517	<p>
518	After a <tt>BugReport</tt> is created, it should be passed to the analyzer core
519	by calling <a href = "http://clang.llvm.org/doxygen/classclang_1_1ento_1_1CheckerContext.html#ae7738af2cbfd1d713edec33d3203dff5">CheckerContext::emitReport</a>.
520
521	<h2 id=ast>AST Visitors</h2>
522	Some checks might not require path-sensitivity to be effective. Simple AST walk
523	might be sufficient. If that is the case, consider implementing a Clang
524	compiler warning. On the other hand, a check might not be acceptable as a compiler
525	warning; for example, because of a relatively high false positive rate. In this
526	situation, AST callbacks <tt><b>checkASTDecl</b></tt> and
527	<tt><b>checkASTCodeBody</b></tt> are your best friends.
528
529	<h2 id=testing>Testing</h2>
530	Every patch should be well tested with Clang regression tests. The checker tests
531	live in <tt>clang/test/Analysis</tt> folder. To run all of the analyzer tests,
532	execute the following from the <tt>clang</tt> build directory:
533	<pre class="code">
534	$ <b>bin/llvm-lit -sv ../llvm/tools/clang/test/Analysis</b>
535	</pre>
536
537	<h2 id=commands>Useful Commands/Debugging Hints</h2>
538
539	<h3 id=attaching>Attaching the Debugger</h3>
540
541	<p>When your command contains the <tt><b>-cc1</b></tt> flag, you can attach the
542	debugger to it directly:</p>
543
544	<pre class="code">
545	$ <b>gdb --args clang -cc1 -analyze -analyzer-checker=core test.c</b>
546	$ <b>lldb -- clang -cc1 -analyze -analyzer-checker=core test.c</b>
547	</pre>
548
549	<p>
550	Otherwise, if your command line contains <tt><b>--analyze</b></tt>,
551	the actual clang instance would be run in a separate process. In
552	order to debug it, use the <tt><b>-###</b></tt> flag for obtaining
553	the command line of the child process:
554	</p>
555
556	<pre class="code">
557	$ <b>clang --analyze test.c -\#\#\#</b>
558	</pre>
559
560	<p>
561	Below we describe a few useful command line arguments, all of which assume that
562	you are running <tt><b>clang -cc1</b></tt>.
563	</p>
564
565	<h3 id=narrowing>Narrowing Down the Problem</h3>
566
567	<p>While investigating a checker-related issue, instruct the analyzer to only
568	execute a single checker:
569	</p>
570	<pre class="code">
571	$ <b>clang -cc1 -analyze -analyzer-checker=osx.KeychainAPI test.c</b>
572	</pre>
573
574	<p>If you are experiencing a crash, to see which function is failing while
575	processing a large file use the <tt><b>-analyzer-display-progress</b></tt>
576	option.</p>
577
578	<p>To selectively analyze only the given function, use the
579	<tt><b>-analyze-function</b></tt> option:</p>
580	<pre class="code">
581	$ <b>clang -cc1 -analyze -analyzer-checker=core test.c -analyzer-display-progress</b>
582	ANALYZE (Syntax): test.c foo
583	ANALYZE (Syntax): test.c bar
584	ANALYZE (Path, Inline_Regular): test.c bar
585	ANALYZE (Path, Inline_Regular): test.c foo
586	$ <b>clang -cc1 -analyze -analyzer-checker=core test.c -analyzer-display-progress -analyze-function=foo</b>
587	ANALYZE (Syntax): test.c foo
588	ANALYZE (Path, Inline_Regular): test.c foo
589	</pre>
590
591	<b>Note: </b> a fully qualified function name has to be used when selecting
592	C++ functions and methods, Objective-C methods and blocks, e.g.:
593
594	<pre class="code">
595	$ <b>clang -cc1 -analyze -analyzer-checker=core test.cc -analyze-function=foo(int)</b>
596	</pre>
597
598	The fully qualified name can be found from the
599	<tt><b>-analyzer-display-progress</b></tt> output.
600
601	<p>The bug reporter mechanism removes path diagnostics inside intermediate
602	function calls that have returned by the time the bug was found and contain
603	no interesting pieces. Usually it is up to the checkers to produce more
604	interesting pieces by adding custom <tt>BugReporterVisitor</tt> objects.
605	However, you can disable path pruning while debugging with the
606	<tt><b>-analyzer-config prune-paths=false</b></tt> option.
607
608	<h3 id=visualizing>Visualizing the Analysis</h3>
609
610	<p>To dump the AST, which often helps understanding how the program should
611	behave:</p>
612	<pre class="code">
613	$ <b>clang -cc1 -ast-dump test.c</b>
614	</pre>
615
616	<p>To view/dump CFG use <tt>debug.ViewCFG</tt> or <tt>debug.DumpCFG</tt>
617	checkers:</p>
618	<pre class="code">
619	$ <b>clang -cc1 -analyze -analyzer-checker=debug.ViewCFG test.c</b>
620	</pre>
621
622	<p><tt>ExplodedGraph</tt> (the state graph explored by the analyzer) can be
623	visualized with another debug checker:</p>
624	<pre class="code">
625	$ <b>clang -cc1 -analyze -analyzer-checker=debug.ViewExplodedGraph test.c</b>
626	</pre>
627	<p>Or, equivalently, with <tt><b>-analyzer-viz-egraph-graphviz</b></tt>
628	option, which does the same thing - dumps the exploded graph in graphviz
629	<tt><b>.dot</b></tt> format.</p>
630
631	<p>You can convert <tt><b>.dot</b></tt> files into other formats - in
632	particular, converting to <tt><b>.svg</b></tt> and viewing in your web
633	browser might be more comfortable than using a <tt><b>.dot</b></tt> viewer:</p>
634	<pre class="code">
635	$ <b>dot -Tsvg ExprEngine-501e2e.dot -o ExprEngine-501e2e.svg</b>
636	</pre>
637
638	<p>The <tt><b>-trim-egraph</b></tt> option removes all paths except those
639	leading to bug reports from the exploded graph dump. This is useful
640	because exploded graphs are often huge and hard to navigate.</p>
641
642	<p>Viewing <tt>ExplodedGraph</tt> is your most powerful tool for understanding
643	the analyzer's false positives, because it gives comprehensive information
644	on every decision made by the analyzer across all analysis paths.</p>
645
646	<p>There are more debug checkers available. To see all available debug checkers:
647	</p>
648	<pre class="code">
649	$ <b>clang -cc1 -analyzer-checker-help \| grep "debug"</b>
650	</pre>
651
652	<h3 id=debugprints>Debug Prints and Tricks</h3>
653
654	<p>To view "half-baked" <tt>ExplodedGraph</tt> while debugging, jump to a frame
655	that has <tt>clang::ento::ExprEngine</tt> object and execute:</p>
656	<pre class="code">
657	(gdb) <b>p ViewGraph(0)</b>
658	</pre>
659
660	<p>To see the <tt>ProgramState</tt> while debugging use the following command.
661	<pre class="code">
662	(gdb) <b>p State->dump()</b>
663	</pre>
664
665	<p>To see <tt>clang::Expr</tt> while debugging use the following command. If you
666	pass in a <tt>SourceManager</tt> object, it will also dump the corresponding line in the
667	source code.</p>
668	<pre class="code">
669	(gdb) <b>p E->dump()</b>
670	</pre>
671
672	<p>To dump AST of a method that the current <tt>ExplodedNode</tt> belongs
673	to:</p>
674	<pre class="code">
675	(gdb) <b>p C.getPredecessor()->getCodeDecl().getBody()->dump()</b>
676	</pre>
677
678	<h2 id=links>Making Your Checker Better</h2>
679	<ul>
680	<li>User facing documentation is important for adoption! Make sure the <a href="/available_checks.html">checker list </a>is updated
681	at the homepage of the analyzer. Also ensure the description is clear to
682	non-analyzer-developers in <tt>Checkers.td</tt>.</li>
683	<li>Warning and note messages should be clear and easy to understand, even if a bit long.</li>
684	<ul>
685	<li>Messages should start with a capital letter (unlike Clang warnings!) and should not
686	end with <tt>.</tt>.</li>
687	<li>Articles are usually omitted, eg. <tt>Dereference of a null pointer</tt> ->
688	<tt>Dereference of null pointer</tt>.</li>
689	<li>Introduce <tt>BugReporterVisitor</tt>s to emit additional notes that explain the warning
690	to the user better. There are some existing visitors that might be useful for your check,
691	e.g. <tt>trackNullOrUndefValue</tt>. For example, SimpleStreamChecker should highlight
692	the event of opening the file when reporting a file descriptor leak.</li>
693	</ul>
694	<li>If the check tracks anything in the program state, it needs to implement the
695	<tt>checkDeadSymbols</tt>callback to clean the state up.</li>
696	<li>The check should conservatively assume that the program is correct when a tracked symbol
697	is passed to a function that is unknown to the analyzer.
698	<tt>checkPointerEscape</tt> callback could help you handle that case.</li>
699	<li>Use safe and convenient APIs!</li>
700	<ul>
701	<li>Always use <tt>CheckerContext::generateErrorNode</tt> and
702	<tt>CheckerContext::generateNonFatalErrorNode</tt> for emitting bug reports.
703	Most importantly, never emit report against <tt>CheckerContext::getPredecessor</tt>.</li>
704	<li>Prefer <tt>checkPreCall</tt> and <tt>checkPostCall</tt> to
705	<tt>checkPreStmt<CallExpr></tt> and <tt>checkPostStmt<CallExpr></tt>.</li>
706	<li>Use <tt>CallDescription</tt> to detect hardcoded API calls in the program.</li>
707	<li>Simplify <tt>C.getState()->getSVal(E, C.getLocationContext())</tt> to <tt>C.getSVal(E)</tt>.</li>
708	</ul>
709	<li>Common sources of crashes:</li>
710	<ul>
711	<li><tt>CallEvent::getOriginExpr</tt> is nullable - for example, it returns null for an
712	automatic destructor of a variable. The same applies to some values generated while the
713	call was modeled, eg. <tt>SymbolConjured::getStmt</tt> is nullable.</li>
714	<li><tt>CallEvent::getDecl</tt> is nullable - for example, it returns null for a
715	call of symbolic function pointer.</li>
716	<li><tt>addTransition</tt>, <tt>generateSink</tt>, <tt>generateNonFatalErrorNode</tt>,
717	<tt>generateErrorNode</tt> are nullable because you can transition to a node that you have already visited.</li>
718	<li>Methods of <tt>CallExpr</tt>/<tt>FunctionDecl</tt>/<tt>CallEvent</tt> that
719	return arguments crash when the argument is out-of-bounds. If you checked the function name,
720	it doesn't mean that the function has the expected number of arguments!
721	Which is why you should use <tt>CallDescription</tt>.</li>
722	<li>Nullability of different entities within different kinds of symbols and regions is usually
723	documented via assertions in their constructors.</li>
724	<li><tt>NamedDecl::getName</tt> will fail if the name of the declaration is not a single token,
725	e.g. for destructors. You could use <tt>NamedDecl::getNameAsString</tt> for those cases.
726	Note that this method is much slower and should be used sparringly, e.g. only when generating reports
727	but not during analysis.</li>
728	<li>Is <tt>-analyzer-checker=core</tt> included in all test <tt>RUN:</tt> lines? It was never supported
729	to run the analyzer with the core checks disabled. It might cause unexpected behavior and
730	crashes. You should do all your testing with the core checks enabled.</li>
731	</ul>
732	</ul>
733	<li>Patterns that you should most likely avoid even if they're not technically wrong:</li>
734	<ul>
735	<li><tt>BugReporterVisitor</tt> should most likely not match the AST of the current program point
736	to decide when to emit a note. It is much easier to determine that by observing changes in
737	the program state.</li>
738	<li>In <tt>State->getSVal(Region)</tt>, if <tt>Region</tt> is not known to be a <tt>TypedValueRegion</tt>
739	and the optional type argument is not specified, the checker may accidentally try to dereference a
740	void pointer.</li>
741	<li>Checker logic should not depend on whether a certain value is a <tt>Loc</tt> or <tt>NonLoc</tt>.
742	It should be immediately obvious whether the <tt>SVal</tt> is a <tt>Loc</tt> or a
743	<tt>NonLoc</tt> depending on the AST that is being checked. Checking whether a value
744	is <tt>Loc</tt> or <tt>Unknown</tt>/<tt>Undefined</tt> or whether the value is
745	<tt>NonLoc</tt> or <tt>Unknown</tt>/<tt>Undefined</tt> is totally fine.</li>
746	<li>New symbols should not be constructed in the checker via direct calls to <tt>SymbolManager</tt>,
747	unless they are of <tt>SymbolMetadata</tt> class tagged by the checker,
748	or they represent newly created values such as the return value in <tt>evalCall</tt>.
749	For modeling arithmetic/bitwise/comparison operations, <tt>SValBuilder</tt> should be used.</li>
750	<li>Custom <tt>ProgramPointTag</tt>s should not be created within the checker. There is usually
751	no good reason for a checker to chain multiple nodes together, because checkers aren't worklists.</li>
752	</ul>
753	<li>Checkers are encouraged to actively participate in the analysis by sharing
754	their knowledge about the program state with the rest of the analyzer,
755	but they should not be disrupting the analysis unnecessarily:</li>
756	<ul>
757	<li>If a checker splits program state, this must be based on knowledge that
758	the newly appearing branches are definitely possible and worth exploring
759	from the user's perspective. Otherwise the state split should be delayed
760	until there's an indication that one of the paths is taken, or one of the
761	paths needs to be dropped entirely. For example, it is fine to eagerly split
762	paths while modeling <tt>isalpha(x)</tt> as long as <tt>x</tt> is constrained accordingly on
763	each path. At the same time, it is not a good idea to split paths over the
764	return value of <tt>printf()</tt> while modeling the call because nobody ever checks
765	for errors in <tt>printf</tt>; at best, it'd just double the remaining analysis time.
766	</li>
767	<li>Caution is advised when using <tt>CheckerContext::generateNonFatalErrorNode</tt>
768	because it generates an independent transition, much like <tt>addTransition</tt>.
769	It is easy to accidentally split paths while using it. Ideally, try to
770	structure the code so that it was obvious that every <tt>addTransition</tt> or
771	<tt>generateNonFatalErrorNode</tt> (or sequence of such if the split is intended) is
772	immediately followed by return from the checker callback.</li>
773	<li>Multiple implementations of <tt>evalCall</tt> in different checkers should not conflict.</li>
774	<li>When implementing <tt>evalAssume</tt>, the checker should always return a non-null state
775	for either the true assumption or the false assumption (or both).</li>
776	<li>Checkers shall not mutate values of expressions, i.e. use the <tt>ProgramState::BindExpr</tt> API,
777	unless they are fully responsible for computing the value.
778	Under no circumstances should they change non-<tt>Unknown</tt> values of expressions.
779	Currently the only valid use case for this API in checkers is to model the return value in the <tt>evalCall</tt> callback.
780	If expression values are incorrect, <tt>ExprEngine</tt> needs to be fixed instead.</li>
781	</ul>
782
783	<h2 id=additioninformation>Additional Sources of Information</h2>
784
785	Here are some additional resources that are useful when working on the Clang
786	Static Analyzer:
787
788	<ul>
789	<li><a href="http://lcs.ios.ac.cn/~xuzb/canalyze/memmodel.pdf">Xu, Zhongxing &
790	Kremenek, Ted & Zhang, Jian. (2010). A Memory Model for Static Analysis of C
791	Programs.</a></li>
792	<li><a href="https://github.com/llvm/llvm-project/blob/master/clang/lib/StaticAnalyzer/README.txt">
793	The Clang Static Analyzer README</a></li>
794	<li><a href="https://github.com/llvm/llvm-project/blob/master/clang/docs/analyzer/RegionStore.txt">
795	Documentation for how the Store works</a></li>
796	<li><a href="https://github.com/llvm/llvm-project/blob/master/clang/docs/analyzer/IPA.txt">
797	Documentation about inlining</a></li>
798	<li> The "Building a Checker in 24 hours" presentation given at the <a
799	href="http://llvm.org/devmtg/2012-11">November 2012 LLVM Developer's
800	meeting</a>. Describes the construction of SimpleStreamChecker. <a
801	href="http://llvm.org/devmtg/2012-11/Zaks-Rose-Checker24Hours.pdf">Slides</a>
802	and <a
803	href="https://youtu.be/kdxlsP5QVPw">video</a>
804	are available.</li>
805	<li>
806	<a href="https://github.com/haoNoQ/clang-analyzer-guide/releases/download/v0.1/clang-analyzer-guide-v0.1.pdf">
807	Artem Degrachev: Clang Static Analyzer: A Checker Developer's Guide
808	</a> (reading the previous items first might be a good idea)</li>
809	<li>The list of <a href="implicit_checks.html">Implicit Checkers</a></li>
810	<li> <a href="http://clang.llvm.org/doxygen">Clang doxygen</a>. Contains
811	up-to-date documentation about the APIs available in Clang. Relevant entries
812	have been linked throughout this page. Also of use is the
813	<a href="http://llvm.org/doxygen">LLVM doxygen</a>, when dealing with classes
814	from LLVM.</li>
815	<li> The <a href="http://lists.llvm.org/mailman/listinfo/cfe-dev">
816	cfe-dev mailing list</a>. This is the primary mailing list used for
817	discussion of Clang development (including static code analysis). The
818	<a href="http://lists.llvm.org/pipermail/cfe-dev">archive</a> also contains
819	a lot of information.</li>
820	</ul>
821
822	</div>
823	</div>
824	</body>
825	</html>
826

Clang Project