README.txt source code [clang_source_code/lib/StaticAnalyzer/README.txt]

1	//===----------------------------------------------------------------------===//
2	// Clang Static Analyzer
3	//===----------------------------------------------------------------------===//
4
5	= Library Structure =
6
7	The analyzer library has two layers: a (low-level) static analysis
8	engine (GRExprEngine.cpp and friends), and some static checkers
9	(*Checker.cpp). The latter are built on top of the former via the
10	Checker and CheckerVisitor interfaces (Checker.h and
11	CheckerVisitor.h). The Checker interface is designed to be minimal
12	and simple for checker writers, and attempts to isolate them from much
13	of the gore of the internal analysis engine.
14
15	= How It Works =
16
17	The analyzer is inspired by several foundational research papers ([1],
18	[2]). (FIXME: kremenek to add more links)
19
20	In a nutshell, the analyzer is basically a source code simulator that
21	traces out possible paths of execution. The state of the program
22	(values of variables and expressions) is encapsulated by the state
23	(ProgramState). A location in the program is called a program point
24	(ProgramPoint), and the combination of state and program point is a
25	node in an exploded graph (ExplodedGraph). The term "exploded" comes
26	from exploding the control-flow edges in the control-flow graph (CFG).
27
28	Conceptually the analyzer does a reachability analysis through the
29	ExplodedGraph. We start at a root node, which has the entry program
30	point and initial state, and then simulate transitions by analyzing
31	individual expressions. The analysis of an expression can cause the
32	state to change, resulting in a new node in the ExplodedGraph with an
33	updated program point and an updated state. A bug is found by hitting
34	a node that satisfies some "bug condition" (basically a violation of a
35	checking invariant).
36
37	The analyzer traces out multiple paths by reasoning about branches and
38	then bifurcating the state: on the true branch the conditions of the
39	branch are assumed to be true and on the false branch the conditions
40	of the branch are assumed to be false. Such "assumptions" create
41	constraints on the values of the program, and those constraints are
42	recorded in the ProgramState object (and are manipulated by the
43	ConstraintManager). If assuming the conditions of a branch would
44	cause the constraints to be unsatisfiable, the branch is considered
45	infeasible and that path is not taken. This is how we get
46	path-sensitivity. We reduce exponential blow-up by caching nodes. If
47	a new node with the same state and program point as an existing node
48	would get generated, the path "caches out" and we simply reuse the
49	existing node. Thus the ExplodedGraph is not a DAG; it can contain
50	cycles as paths loop back onto each other and cache out.
51
52	ProgramState and ExplodedNodes are basically immutable once created. Once
53	one creates a ProgramState, you need to create a new one to get a new
54	ProgramState. This immutability is key since the ExplodedGraph represents
55	the behavior of the analyzed program from the entry point. To
56	represent these efficiently, we use functional data structures (e.g.,
57	ImmutableMaps) which share data between instances.
58
59	Finally, individual Checkers work by also manipulating the analysis
60	state. The analyzer engine talks to them via a visitor interface.
61	For example, the PreVisitCallExpr() method is called by GRExprEngine
62	to tell the Checker that we are about to analyze a CallExpr, and the
63	checker is asked to check for any preconditions that might not be
64	satisfied. The checker can do nothing, or it can generate a new
65	ProgramState and ExplodedNode which contains updated checker state. If it
66	finds a bug, it can tell the BugReporter object about the bug,
67	providing it an ExplodedNode which is the last node in the path that
68	triggered the problem.
69
70	= Notes about C++ =
71
72	Since now constructors are seen before the variable that is constructed
73	in the CFG, we create a temporary object as the destination region that
74	is constructed into. See ExprEngine::VisitCXXConstructExpr().
75
76	In ExprEngine::processCallExit(), we always bind the object region to the
77	evaluated CXXConstructExpr. Then in VisitDeclStmt(), we compute the
78	corresponding lazy compound value if the variable is not a reference, and
79	bind the variable region to the lazy compound value. If the variable
80	is a reference, just use the object region as the initializer value.
81
82	Before entering a C++ method (or ctor/dtor), the 'this' region is bound
83	to the object region. In ctors, we synthesize 'this' region with
84	CXXRecordDecl*, which means we do not use type qualifiers. In methods, we
85	synthesize 'this' region with CXXMethodDecl*, which has getThisType()
86	taking type qualifiers into account. It does not matter we use qualified
87	'this' region in one method and unqualified 'this' region in another
88	method, because we only need to ensure the 'this' region is consistent
89	when we synthesize it and create it directly from CXXThisExpr in a single
90	method call.
91
92	= Working on the Analyzer =
93
94	If you are interested in bringing up support for C++ expressions, the
95	best place to look is the visitation logic in GRExprEngine, which
96	handles the simulation of individual expressions. There are plenty of
97	examples there of how other expressions are handled.
98
99	If you are interested in writing checkers, look at the Checker and
100	CheckerVisitor interfaces (Checker.h and CheckerVisitor.h). Also look
101	at the files named *Checker.cpp for examples on how you can implement
102	these interfaces.
103
104	= Debugging the Analyzer =
105
106	There are some useful command-line options for debugging. For example:
107
108	$ clang -cc1 -help \| grep analyze
109	-analyze-function <value>
110	-analyzer-display-progress
111	-analyzer-viz-egraph-graphviz
112	...
113
114	The first allows you to specify only analyzing a specific function.
115	The second prints to the console what function is being analyzed. The
116	third generates a graphviz dot file of the ExplodedGraph. This is
117	extremely useful when debugging the analyzer and viewing the
118	simulation results.
119
120	Of course, viewing the CFG (Control-Flow Graph) is also useful:
121
122	$ clang -cc1 -help \| grep cfg
123	-cfg-add-implicit-dtors Add C++ implicit destructors to CFGs for all analyses
124	-cfg-add-initializers Add C++ initializers to CFGs for all analyses
125	-cfg-dump Display Control-Flow Graphs
126	-cfg-view View Control-Flow Graphs using GraphViz
127	-unoptimized-cfg Generate unoptimized CFGs for all analyses
128
129	-cfg-dump dumps a textual representation of the CFG to the console,
130	and -cfg-view creates a GraphViz representation.
131
132	= References =
133
134	[1] Precise interprocedural dataflow analysis via graph reachability,
135	T Reps, S Horwitz, and M Sagiv, POPL '95,
136	http://portal.acm.org/citation.cfm?id=199462
137
138	[2] A memory model for static analysis of C programs, Z Xu, T
139	Kremenek, and J Zhang, http://lcs.ios.ac.cn/~xzx/memmodel.pdf
140

Clang Project