SanitizerCoverage.rst source code [clang_source_code/docs/SanitizerCoverage.rst]

1	=================
2	SanitizerCoverage
3	=================
4
5	.. contents::
6	:local:
7
8	Introduction
9	============
10
11	LLVM has a simple code coverage instrumentation built in (SanitizerCoverage).
12	It inserts calls to user-defined functions on function-, basic-block-, and edge- levels.
13	Default implementations of those callbacks are provided and implement
14	simple coverage reporting and visualization,
15	however if you need just coverage visualization you may want to use
16	:doc:`SourceBasedCodeCoverage <SourceBasedCodeCoverage>` instead.
17
18	Tracing PCs with guards
19	=======================
20
21	With ``-fsanitize-coverage=trace-pc-guard`` the compiler will insert the following code
22	on every edge:
23
24	.. code-block:: none
25
26	__sanitizer_cov_trace_pc_guard(&guard_variable)
27
28	Every edge will have its own `guard_variable` (uint32_t).
29
30	The compler will also insert calls to a module constructor:
31
32	.. code-block:: c++
33
34	// The guards are [start, stop).
35	// This function will be called at least once per DSO and may be called
36	// more than once with the same values of start/stop.
37	__sanitizer_cov_trace_pc_guard_init(uint32_t start, uint32_t stop);
38
39	With an additional ``...=trace-pc,indirect-calls`` flag
40	``__sanitizer_cov_trace_pc_indirect(void *callee)`` will be inserted on every indirect call.
41
42	The functions `__sanitizer_cov_trace_pc_*` should be defined by the user.
43
44	Example:
45
46	.. code-block:: c++
47
48	// trace-pc-guard-cb.cc
49	#include <stdint.h>
50	#include <stdio.h>
51	#include <sanitizer/coverage_interface.h>
52
53	// This callback is inserted by the compiler as a module constructor
54	// into every DSO. 'start' and 'stop' correspond to the
55	// beginning and end of the section with the guards for the entire
56	// binary (executable or DSO). The callback will be called at least
57	// once per DSO and may be called multiple times with the same parameters.
58	extern "C" void __sanitizer_cov_trace_pc_guard_init(uint32_t *start,
59	uint32_t *stop) {
60	static uint64_t N; // Counter for the guards.
61	if (start == stop \|\| *start) return; // Initialize only once.
62	printf("INIT: %p %p\n", start, stop);
63	for (uint32_t *x = start; x < stop; x++)
64	*x = ++N; // Guards should start from 1.
65	}
66
67	// This callback is inserted by the compiler on every edge in the
68	// control flow (some optimizations apply).
69	// Typically, the compiler will emit the code like this:
70	// if(*guard)
71	// __sanitizer_cov_trace_pc_guard(guard);
72	// But for large functions it will emit a simple call:
73	// __sanitizer_cov_trace_pc_guard(guard);
74	extern "C" void __sanitizer_cov_trace_pc_guard(uint32_t *guard) {
75	if (!*guard) return; // Duplicate the guard check.
76	// If you set *guard to 0 this code will not be called again for this edge.
77	// Now you can get the PC and do whatever you want:
78	// store it somewhere or symbolize it and print right away.
79	// The values of `*guard` are as you set them in
80	// __sanitizer_cov_trace_pc_guard_init and so you can make them consecutive
81	// and use them to dereference an array or a bit vector.
82	void *PC = __builtin_return_address(0);
83	char PcDescr[1024];
84	// This function is a part of the sanitizer run-time.
85	// To use it, link with AddressSanitizer or other sanitizer.
86	__sanitizer_symbolize_pc(PC, "%p %F %L", PcDescr, sizeof(PcDescr));
87	printf("guard: %p %x PC %s\n", guard, *guard, PcDescr);
88	}
89
90	.. code-block:: c++
91
92	// trace-pc-guard-example.cc
93	void foo() { }
94	int main(int argc, char **argv) {
95	if (argc > 1) foo();
96	}
97
98	.. code-block:: console
99
100	clang++ -g -fsanitize-coverage=trace-pc-guard trace-pc-guard-example.cc -c
101	clang++ trace-pc-guard-cb.cc trace-pc-guard-example.o -fsanitize=address
102	ASAN_OPTIONS=strip_path_prefix=`pwd`/ ./a.out
103
104	.. code-block:: console
105
106	INIT: 0x71bcd0 0x71bce0
107	guard: 0x71bcd4 2 PC 0x4ecd5b in main trace-pc-guard-example.cc:2
108	guard: 0x71bcd8 3 PC 0x4ecd9e in main trace-pc-guard-example.cc:3:7
109
110	.. code-block:: console
111
112	ASAN_OPTIONS=strip_path_prefix=`pwd`/ ./a.out with-foo
113
114
115	.. code-block:: console
116
117	INIT: 0x71bcd0 0x71bce0
118	guard: 0x71bcd4 2 PC 0x4ecd5b in main trace-pc-guard-example.cc:3
119	guard: 0x71bcdc 4 PC 0x4ecdc7 in main trace-pc-guard-example.cc:4:17
120	guard: 0x71bcd0 1 PC 0x4ecd20 in foo() trace-pc-guard-example.cc:2:14
121
122	Inline 8bit-counters
123	====================
124
125	Experimental, may change or disappear in future
126
127	With ``-fsanitize-coverage=inline-8bit-counters`` the compiler will insert
128	inline counter increments on every edge.
129	This is similar to ``-fsanitize-coverage=trace-pc-guard`` but instead of a
130	callback the instrumentation simply increments a counter.
131
132	Users need to implement a single function to capture the counters at startup.
133
134	.. code-block:: c++
135
136	extern "C"
137	void __sanitizer_cov_8bit_counters_init(char start, char end) {
138	// [start,end) is the array of 8-bit counters created for the current DSO.
139	// Capture this array in order to read/modify the counters.
140	}
141
142	PC-Table
143	========
144
145	Experimental, may change or disappear in future
146
147	Note: this instrumentation might be incompatible with dead code stripping
148	(``-Wl,-gc-sections``) for linkers other than LLD, thus resulting in a
149	significant binary size overhead. For more information, see
150	`Bug 34636 <https://bugs.llvm.org/show_bug.cgi?id=34636>`_.
151
152	With ``-fsanitize-coverage=pc-table`` the compiler will create a table of
153	instrumented PCs. Requires either ``-fsanitize-coverage=inline-8bit-counters`` or
154	``-fsanitize-coverage=trace-pc-guard``.
155
156	Users need to implement a single function to capture the PC table at startup:
157
158	.. code-block:: c++
159
160	extern "C"
161	void __sanitizer_cov_pcs_init(const uintptr_t *pcs_beg,
162	const uintptr_t *pcs_end) {
163	// [pcs_beg,pcs_end) is the array of ptr-sized integers representing
164	// pairs [PC,PCFlags] for every instrumented block in the current DSO.
165	// Capture this array in order to read the PCs and their Flags.
166	// The number of PCs and PCFlags for a given DSO is the same as the number
167	// of 8-bit counters (-fsanitize-coverage=inline-8bit-counters) or
168	// trace_pc_guard callbacks (-fsanitize-coverage=trace-pc-guard)
169	// A PCFlags describes the basic block:
170	// * bit0: 1 if the block is the function entry block, 0 otherwise.
171	}
172
173
174	Tracing PCs
175	===========
176
177	With ``-fsanitize-coverage=trace-pc`` the compiler will insert
178	``__sanitizer_cov_trace_pc()`` on every edge.
179	With an additional ``...=trace-pc,indirect-calls`` flag
180	``__sanitizer_cov_trace_pc_indirect(void *callee)`` will be inserted on every indirect call.
181	These callbacks are not implemented in the Sanitizer run-time and should be defined
182	by the user.
183	This mechanism is used for fuzzing the Linux kernel
184	(https://github.com/google/syzkaller).
185
186	Instrumentation points
187	======================
188	Sanitizer Coverage offers different levels of instrumentation.
189
190	* ``edge`` (default): edges are instrumented (see below).
191	* ``bb``: basic blocks are instrumented.
192	* ``func``: only the entry block of every function will be instrumented.
193
194	Use these flags together with ``trace-pc-guard`` or ``trace-pc``,
195	like this: ``-fsanitize-coverage=func,trace-pc-guard``.
196
197	When ``edge`` or ``bb`` is used, some of the edges/blocks may still be left
198	uninstrumented (pruned) if such instrumentation is considered redundant.
199	Use ``no-prune`` (e.g. ``-fsanitize-coverage=bb,no-prune,trace-pc-guard``)
200	to disable pruning. This could be useful for better coverage visualization.
201
202
203	Edge coverage
204	-------------
205
206	Consider this code:
207
208	.. code-block:: c++
209
210	void foo(int *a) {
211	if (a)
212	*a = 0;
213	}
214
215	It contains 3 basic blocks, let's name them A, B, C:
216
217	.. code-block:: none
218
219	A
220	\|\
221	\| \
222	\| B
223	\| /
224	\|/
225	C
226
227	If blocks A, B, and C are all covered we know for certain that the edges A=>B
228	and B=>C were executed, but we still don't know if the edge A=>C was executed.
229	Such edges of control flow graph are called
230	`critical <https://en.wikipedia.org/wiki/Control_flow_graph#Special_edges>`_.
231	The edge-level coverage simply splits all critical edges by introducing new
232	dummy blocks and then instruments those blocks:
233
234	.. code-block:: none
235
236	A
237	\|\
238	\| \
239	D B
240	\| /
241	\|/
242	C
243
244	Tracing data flow
245	=================
246
247	Support for data-flow-guided fuzzing.
248	With ``-fsanitize-coverage=trace-cmp`` the compiler will insert extra instrumentation
249	around comparison instructions and switch statements.
250	Similarly, with ``-fsanitize-coverage=trace-div`` the compiler will instrument
251	integer division instructions (to capture the right argument of division)
252	and with ``-fsanitize-coverage=trace-gep`` --
253	the `LLVM GEP instructions <https://llvm.org/docs/GetElementPtr.html>`_
254	(to capture array indices).
255
256	Unless ``no-prune`` option is provided, some of the comparison instructions
257	will not be instrumented.
258
259	.. code-block:: c++
260
261	// Called before a comparison instruction.
262	// Arg1 and Arg2 are arguments of the comparison.
263	void __sanitizer_cov_trace_cmp1(uint8_t Arg1, uint8_t Arg2);
264	void __sanitizer_cov_trace_cmp2(uint16_t Arg1, uint16_t Arg2);
265	void __sanitizer_cov_trace_cmp4(uint32_t Arg1, uint32_t Arg2);
266	void __sanitizer_cov_trace_cmp8(uint64_t Arg1, uint64_t Arg2);
267
268	// Called before a comparison instruction if exactly one of the arguments is constant.
269	// Arg1 and Arg2 are arguments of the comparison, Arg1 is a compile-time constant.
270	// These callbacks are emitted by -fsanitize-coverage=trace-cmp since 2017-08-11
271	void __sanitizer_cov_trace_const_cmp1(uint8_t Arg1, uint8_t Arg2);
272	void __sanitizer_cov_trace_const_cmp2(uint16_t Arg1, uint16_t Arg2);
273	void __sanitizer_cov_trace_const_cmp4(uint32_t Arg1, uint32_t Arg2);
274	void __sanitizer_cov_trace_const_cmp8(uint64_t Arg1, uint64_t Arg2);
275
276	// Called before a switch statement.
277	// Val is the switch operand.
278	// Cases[0] is the number of case constants.
279	// Cases[1] is the size of Val in bits.
280	// Cases[2:] are the case constants.
281	void __sanitizer_cov_trace_switch(uint64_t Val, uint64_t *Cases);
282
283	// Called before a division statement.
284	// Val is the second argument of division.
285	void __sanitizer_cov_trace_div4(uint32_t Val);
286	void __sanitizer_cov_trace_div8(uint64_t Val);
287
288	// Called before a GetElemementPtr (GEP) instruction
289	// for every non-constant array index.
290	void __sanitizer_cov_trace_gep(uintptr_t Idx);
291
292	Default implementation
293	======================
294
295	The sanitizer run-time (AddressSanitizer, MemorySanitizer, etc) provide a
296	default implementations of some of the coverage callbacks.
297	You may use this implementation to dump the coverage on disk at the process
298	exit.
299
300	Example:
301
302	.. code-block:: console
303
304	% cat -n cov.cc
305	1 #include <stdio.h>
306	2 __attribute__((noinline))
307	3 void foo() { printf("foo\n"); }
308	4
309	5 int main(int argc, char **argv) {
310	6 if (argc == 2)
311	7 foo();
312	8 printf("main\n");
313	9 }
314	% clang++ -g cov.cc -fsanitize=address -fsanitize-coverage=trace-pc-guard
315	% ASAN_OPTIONS=coverage=1 ./a.out; wc -c *.sancov
316	main
317	SanitizerCoverage: ./a.out.7312.sancov 2 PCs written
318	24 a.out.7312.sancov
319	% ASAN_OPTIONS=coverage=1 ./a.out foo ; wc -c *.sancov
320	foo
321	main
322	SanitizerCoverage: ./a.out.7316.sancov 3 PCs written
323	24 a.out.7312.sancov
324	32 a.out.7316.sancov
325
326	Every time you run an executable instrumented with SanitizerCoverage
327	one ``*.sancov`` file is created during the process shutdown.
328	If the executable is dynamically linked against instrumented DSOs,
329	one ``*.sancov`` file will be also created for every DSO.
330
331	Sancov data format
332	------------------
333
334	The format of ``*.sancov`` files is very simple: the first 8 bytes is the magic,
335	one of ``0xC0BFFFFFFFFFFF64`` and ``0xC0BFFFFFFFFFFF32``. The last byte of the
336	magic defines the size of the following offsets. The rest of the data is the
337	offsets in the corresponding binary/DSO that were executed during the run.
338
339	Sancov Tool
340	-----------
341
342	An simple ``sancov`` tool is provided to process coverage files.
343	The tool is part of LLVM project and is currently supported only on Linux.
344	It can handle symbolization tasks autonomously without any extra support
345	from the environment. You need to pass .sancov files (named
346	``<module_name>.<pid>.sancov`` and paths to all corresponding binary elf files.
347	Sancov matches these files using module names and binaries file names.
348
349	.. code-block:: console
350
351	USAGE: sancov [options] <action> (<binary file>\|<.sancov file>)...
352
353	Action (required)
354	-print - Print coverage addresses
355	-covered-functions - Print all covered functions.
356	-not-covered-functions - Print all not covered functions.
357	-symbolize - Symbolizes the report.
358
359	Options
360	-blacklist=<string> - Blacklist file (sanitizer blacklist format).
361	-demangle - Print demangled function name.
362	-strip_path_prefix=<string> - Strip this prefix from file paths in reports
363
364
365	Coverage Reports
366	----------------
367
368	Experimental
369
370	``.sancov`` files do not contain enough information to generate a source-level
371	coverage report. The missing information is contained
372	in debug info of the binary. Thus the ``.sancov`` has to be symbolized
373	to produce a ``.symcov`` file first:
374
375	.. code-block:: console
376
377	sancov -symbolize my_program.123.sancov my_program > my_program.123.symcov
378
379	The ``.symcov`` file can be browsed overlayed over the source code by
380	running ``tools/sancov/coverage-report-server.py`` script that will start
381	an HTTP server.
382
383	Output directory
384	----------------
385
386	By default, .sancov files are created in the current working directory.
387	This can be changed with ``ASAN_OPTIONS=coverage_dir=/path``:
388
389	.. code-block:: console
390
391	% ASAN_OPTIONS="coverage=1:coverage_dir=/tmp/cov" ./a.out foo
392	% ls -l /tmp/cov/*sancov
393	-rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov
394	-rw-r----- 1 kcc eng 8 Nov 27 12:21 a.out.22679.sancov
395

Clang Project