1 | ================= |
2 | SanitizerCoverage |
3 | ================= |
4 | |
5 | .. contents:: |
6 | :local: |
7 | |
8 | Introduction |
9 | ============ |
10 | |
11 | LLVM has a simple code coverage instrumentation built in (SanitizerCoverage). |
12 | It inserts calls to user-defined functions on function-, basic-block-, and edge- levels. |
13 | Default implementations of those callbacks are provided and implement |
14 | simple coverage reporting and visualization, |
15 | however if you need *just* coverage visualization you may want to use |
16 | :doc:`SourceBasedCodeCoverage <SourceBasedCodeCoverage>` instead. |
17 | |
18 | Tracing PCs with guards |
19 | ======================= |
20 | |
21 | With ``-fsanitize-coverage=trace-pc-guard`` the compiler will insert the following code |
22 | on every edge: |
23 | |
24 | .. code-block:: none |
25 | |
26 | __sanitizer_cov_trace_pc_guard(&guard_variable) |
27 | |
28 | Every edge will have its own `guard_variable` (uint32_t). |
29 | |
30 | The compler will also insert calls to a module constructor: |
31 | |
32 | .. code-block:: c++ |
33 | |
34 | // The guards are [start, stop). |
35 | // This function will be called at least once per DSO and may be called |
36 | // more than once with the same values of start/stop. |
37 | __sanitizer_cov_trace_pc_guard_init(uint32_t *start, uint32_t *stop); |
38 | |
39 | With an additional ``...=trace-pc,indirect-calls`` flag |
40 | ``__sanitizer_cov_trace_pc_indirect(void *callee)`` will be inserted on every indirect call. |
41 | |
42 | The functions `__sanitizer_cov_trace_pc_*` should be defined by the user. |
43 | |
44 | Example: |
45 | |
46 | .. code-block:: c++ |
47 | |
48 | // trace-pc-guard-cb.cc |
49 | #include <stdint.h> |
50 | #include <stdio.h> |
51 | #include <sanitizer/coverage_interface.h> |
52 | |
53 | // This callback is inserted by the compiler as a module constructor |
54 | // into every DSO. 'start' and 'stop' correspond to the |
55 | // beginning and end of the section with the guards for the entire |
56 | // binary (executable or DSO). The callback will be called at least |
57 | // once per DSO and may be called multiple times with the same parameters. |
58 | extern "C" void __sanitizer_cov_trace_pc_guard_init(uint32_t *start, |
59 | uint32_t *stop) { |
60 | static uint64_t N; // Counter for the guards. |
61 | if (start == stop || *start) return; // Initialize only once. |
62 | printf("INIT: %p %p\n", start, stop); |
63 | for (uint32_t *x = start; x < stop; x++) |
64 | *x = ++N; // Guards should start from 1. |
65 | } |
66 | |
67 | // This callback is inserted by the compiler on every edge in the |
68 | // control flow (some optimizations apply). |
69 | // Typically, the compiler will emit the code like this: |
70 | // if(*guard) |
71 | // __sanitizer_cov_trace_pc_guard(guard); |
72 | // But for large functions it will emit a simple call: |
73 | // __sanitizer_cov_trace_pc_guard(guard); |
74 | extern "C" void __sanitizer_cov_trace_pc_guard(uint32_t *guard) { |
75 | if (!*guard) return; // Duplicate the guard check. |
76 | // If you set *guard to 0 this code will not be called again for this edge. |
77 | // Now you can get the PC and do whatever you want: |
78 | // store it somewhere or symbolize it and print right away. |
79 | // The values of `*guard` are as you set them in |
80 | // __sanitizer_cov_trace_pc_guard_init and so you can make them consecutive |
81 | // and use them to dereference an array or a bit vector. |
82 | void *PC = __builtin_return_address(0); |
83 | char PcDescr[1024]; |
84 | // This function is a part of the sanitizer run-time. |
85 | // To use it, link with AddressSanitizer or other sanitizer. |
86 | __sanitizer_symbolize_pc(PC, "%p %F %L", PcDescr, sizeof(PcDescr)); |
87 | printf("guard: %p %x PC %s\n", guard, *guard, PcDescr); |
88 | } |
89 | |
90 | .. code-block:: c++ |
91 | |
92 | // trace-pc-guard-example.cc |
93 | void foo() { } |
94 | int main(int argc, char **argv) { |
95 | if (argc > 1) foo(); |
96 | } |
97 | |
98 | .. code-block:: console |
99 | |
100 | clang++ -g -fsanitize-coverage=trace-pc-guard trace-pc-guard-example.cc -c |
101 | clang++ trace-pc-guard-cb.cc trace-pc-guard-example.o -fsanitize=address |
102 | ASAN_OPTIONS=strip_path_prefix=`pwd`/ ./a.out |
103 | |
104 | .. code-block:: console |
105 | |
106 | INIT: 0x71bcd0 0x71bce0 |
107 | guard: 0x71bcd4 2 PC 0x4ecd5b in main trace-pc-guard-example.cc:2 |
108 | guard: 0x71bcd8 3 PC 0x4ecd9e in main trace-pc-guard-example.cc:3:7 |
109 | |
110 | .. code-block:: console |
111 | |
112 | ASAN_OPTIONS=strip_path_prefix=`pwd`/ ./a.out with-foo |
113 | |
114 | |
115 | .. code-block:: console |
116 | |
117 | INIT: 0x71bcd0 0x71bce0 |
118 | guard: 0x71bcd4 2 PC 0x4ecd5b in main trace-pc-guard-example.cc:3 |
119 | guard: 0x71bcdc 4 PC 0x4ecdc7 in main trace-pc-guard-example.cc:4:17 |
120 | guard: 0x71bcd0 1 PC 0x4ecd20 in foo() trace-pc-guard-example.cc:2:14 |
121 | |
122 | Inline 8bit-counters |
123 | ==================== |
124 | |
125 | **Experimental, may change or disappear in future** |
126 | |
127 | With ``-fsanitize-coverage=inline-8bit-counters`` the compiler will insert |
128 | inline counter increments on every edge. |
129 | This is similar to ``-fsanitize-coverage=trace-pc-guard`` but instead of a |
130 | callback the instrumentation simply increments a counter. |
131 | |
132 | Users need to implement a single function to capture the counters at startup. |
133 | |
134 | .. code-block:: c++ |
135 | |
136 | extern "C" |
137 | void __sanitizer_cov_8bit_counters_init(char *start, char *end) { |
138 | // [start,end) is the array of 8-bit counters created for the current DSO. |
139 | // Capture this array in order to read/modify the counters. |
140 | } |
141 | |
142 | PC-Table |
143 | ======== |
144 | |
145 | **Experimental, may change or disappear in future** |
146 | |
147 | **Note:** this instrumentation might be incompatible with dead code stripping |
148 | (``-Wl,-gc-sections``) for linkers other than LLD, thus resulting in a |
149 | significant binary size overhead. For more information, see |
150 | `Bug 34636 <https://bugs.llvm.org/show_bug.cgi?id=34636>`_. |
151 | |
152 | With ``-fsanitize-coverage=pc-table`` the compiler will create a table of |
153 | instrumented PCs. Requires either ``-fsanitize-coverage=inline-8bit-counters`` or |
154 | ``-fsanitize-coverage=trace-pc-guard``. |
155 | |
156 | Users need to implement a single function to capture the PC table at startup: |
157 | |
158 | .. code-block:: c++ |
159 | |
160 | extern "C" |
161 | void __sanitizer_cov_pcs_init(const uintptr_t *pcs_beg, |
162 | const uintptr_t *pcs_end) { |
163 | // [pcs_beg,pcs_end) is the array of ptr-sized integers representing |
164 | // pairs [PC,PCFlags] for every instrumented block in the current DSO. |
165 | // Capture this array in order to read the PCs and their Flags. |
166 | // The number of PCs and PCFlags for a given DSO is the same as the number |
167 | // of 8-bit counters (-fsanitize-coverage=inline-8bit-counters) or |
168 | // trace_pc_guard callbacks (-fsanitize-coverage=trace-pc-guard) |
169 | // A PCFlags describes the basic block: |
170 | // * bit0: 1 if the block is the function entry block, 0 otherwise. |
171 | } |
172 | |
173 | |
174 | Tracing PCs |
175 | =========== |
176 | |
177 | With ``-fsanitize-coverage=trace-pc`` the compiler will insert |
178 | ``__sanitizer_cov_trace_pc()`` on every edge. |
179 | With an additional ``...=trace-pc,indirect-calls`` flag |
180 | ``__sanitizer_cov_trace_pc_indirect(void *callee)`` will be inserted on every indirect call. |
181 | These callbacks are not implemented in the Sanitizer run-time and should be defined |
182 | by the user. |
183 | This mechanism is used for fuzzing the Linux kernel |
184 | (https://github.com/google/syzkaller). |
185 | |
186 | Instrumentation points |
187 | ====================== |
188 | Sanitizer Coverage offers different levels of instrumentation. |
189 | |
190 | * ``edge`` (default): edges are instrumented (see below). |
191 | * ``bb``: basic blocks are instrumented. |
192 | * ``func``: only the entry block of every function will be instrumented. |
193 | |
194 | Use these flags together with ``trace-pc-guard`` or ``trace-pc``, |
195 | like this: ``-fsanitize-coverage=func,trace-pc-guard``. |
196 | |
197 | When ``edge`` or ``bb`` is used, some of the edges/blocks may still be left |
198 | uninstrumented (pruned) if such instrumentation is considered redundant. |
199 | Use ``no-prune`` (e.g. ``-fsanitize-coverage=bb,no-prune,trace-pc-guard``) |
200 | to disable pruning. This could be useful for better coverage visualization. |
201 | |
202 | |
203 | Edge coverage |
204 | ------------- |
205 | |
206 | Consider this code: |
207 | |
208 | .. code-block:: c++ |
209 | |
210 | void foo(int *a) { |
211 | if (a) |
212 | *a = 0; |
213 | } |
214 | |
215 | It contains 3 basic blocks, let's name them A, B, C: |
216 | |
217 | .. code-block:: none |
218 | |
219 | A |
220 | |\ |
221 | | \ |
222 | | B |
223 | | / |
224 | |/ |
225 | C |
226 | |
227 | If blocks A, B, and C are all covered we know for certain that the edges A=>B |
228 | and B=>C were executed, but we still don't know if the edge A=>C was executed. |
229 | Such edges of control flow graph are called |
230 | `critical <https://en.wikipedia.org/wiki/Control_flow_graph#Special_edges>`_. |
231 | The edge-level coverage simply splits all critical edges by introducing new |
232 | dummy blocks and then instruments those blocks: |
233 | |
234 | .. code-block:: none |
235 | |
236 | A |
237 | |\ |
238 | | \ |
239 | D B |
240 | | / |
241 | |/ |
242 | C |
243 | |
244 | Tracing data flow |
245 | ================= |
246 | |
247 | Support for data-flow-guided fuzzing. |
248 | With ``-fsanitize-coverage=trace-cmp`` the compiler will insert extra instrumentation |
249 | around comparison instructions and switch statements. |
250 | Similarly, with ``-fsanitize-coverage=trace-div`` the compiler will instrument |
251 | integer division instructions (to capture the right argument of division) |
252 | and with ``-fsanitize-coverage=trace-gep`` -- |
253 | the `LLVM GEP instructions <https://llvm.org/docs/GetElementPtr.html>`_ |
254 | (to capture array indices). |
255 | |
256 | Unless ``no-prune`` option is provided, some of the comparison instructions |
257 | will not be instrumented. |
258 | |
259 | .. code-block:: c++ |
260 | |
261 | // Called before a comparison instruction. |
262 | // Arg1 and Arg2 are arguments of the comparison. |
263 | void __sanitizer_cov_trace_cmp1(uint8_t Arg1, uint8_t Arg2); |
264 | void __sanitizer_cov_trace_cmp2(uint16_t Arg1, uint16_t Arg2); |
265 | void __sanitizer_cov_trace_cmp4(uint32_t Arg1, uint32_t Arg2); |
266 | void __sanitizer_cov_trace_cmp8(uint64_t Arg1, uint64_t Arg2); |
267 | |
268 | // Called before a comparison instruction if exactly one of the arguments is constant. |
269 | // Arg1 and Arg2 are arguments of the comparison, Arg1 is a compile-time constant. |
270 | // These callbacks are emitted by -fsanitize-coverage=trace-cmp since 2017-08-11 |
271 | void __sanitizer_cov_trace_const_cmp1(uint8_t Arg1, uint8_t Arg2); |
272 | void __sanitizer_cov_trace_const_cmp2(uint16_t Arg1, uint16_t Arg2); |
273 | void __sanitizer_cov_trace_const_cmp4(uint32_t Arg1, uint32_t Arg2); |
274 | void __sanitizer_cov_trace_const_cmp8(uint64_t Arg1, uint64_t Arg2); |
275 | |
276 | // Called before a switch statement. |
277 | // Val is the switch operand. |
278 | // Cases[0] is the number of case constants. |
279 | // Cases[1] is the size of Val in bits. |
280 | // Cases[2:] are the case constants. |
281 | void __sanitizer_cov_trace_switch(uint64_t Val, uint64_t *Cases); |
282 | |
283 | // Called before a division statement. |
284 | // Val is the second argument of division. |
285 | void __sanitizer_cov_trace_div4(uint32_t Val); |
286 | void __sanitizer_cov_trace_div8(uint64_t Val); |
287 | |
288 | // Called before a GetElemementPtr (GEP) instruction |
289 | // for every non-constant array index. |
290 | void __sanitizer_cov_trace_gep(uintptr_t Idx); |
291 | |
292 | Default implementation |
293 | ====================== |
294 | |
295 | The sanitizer run-time (AddressSanitizer, MemorySanitizer, etc) provide a |
296 | default implementations of some of the coverage callbacks. |
297 | You may use this implementation to dump the coverage on disk at the process |
298 | exit. |
299 | |
300 | Example: |
301 | |
302 | .. code-block:: console |
303 | |
304 | % cat -n cov.cc |
305 | 1 #include <stdio.h> |
306 | 2 __attribute__((noinline)) |
307 | 3 void foo() { printf("foo\n"); } |
308 | 4 |
309 | 5 int main(int argc, char **argv) { |
310 | 6 if (argc == 2) |
311 | 7 foo(); |
312 | 8 printf("main\n"); |
313 | 9 } |
314 | % clang++ -g cov.cc -fsanitize=address -fsanitize-coverage=trace-pc-guard |
315 | % ASAN_OPTIONS=coverage=1 ./a.out; wc -c *.sancov |
316 | main |
317 | SanitizerCoverage: ./a.out.7312.sancov 2 PCs written |
318 | 24 a.out.7312.sancov |
319 | % ASAN_OPTIONS=coverage=1 ./a.out foo ; wc -c *.sancov |
320 | foo |
321 | main |
322 | SanitizerCoverage: ./a.out.7316.sancov 3 PCs written |
323 | 24 a.out.7312.sancov |
324 | 32 a.out.7316.sancov |
325 | |
326 | Every time you run an executable instrumented with SanitizerCoverage |
327 | one ``*.sancov`` file is created during the process shutdown. |
328 | If the executable is dynamically linked against instrumented DSOs, |
329 | one ``*.sancov`` file will be also created for every DSO. |
330 | |
331 | Sancov data format |
332 | ------------------ |
333 | |
334 | The format of ``*.sancov`` files is very simple: the first 8 bytes is the magic, |
335 | one of ``0xC0BFFFFFFFFFFF64`` and ``0xC0BFFFFFFFFFFF32``. The last byte of the |
336 | magic defines the size of the following offsets. The rest of the data is the |
337 | offsets in the corresponding binary/DSO that were executed during the run. |
338 | |
339 | Sancov Tool |
340 | ----------- |
341 | |
342 | An simple ``sancov`` tool is provided to process coverage files. |
343 | The tool is part of LLVM project and is currently supported only on Linux. |
344 | It can handle symbolization tasks autonomously without any extra support |
345 | from the environment. You need to pass .sancov files (named |
346 | ``<module_name>.<pid>.sancov`` and paths to all corresponding binary elf files. |
347 | Sancov matches these files using module names and binaries file names. |
348 | |
349 | .. code-block:: console |
350 | |
351 | USAGE: sancov [options] <action> (<binary file>|<.sancov file>)... |
352 | |
353 | Action (required) |
354 | -print - Print coverage addresses |
355 | -covered-functions - Print all covered functions. |
356 | -not-covered-functions - Print all not covered functions. |
357 | -symbolize - Symbolizes the report. |
358 | |
359 | Options |
360 | -blacklist=<string> - Blacklist file (sanitizer blacklist format). |
361 | -demangle - Print demangled function name. |
362 | -strip_path_prefix=<string> - Strip this prefix from file paths in reports |
363 | |
364 | |
365 | Coverage Reports |
366 | ---------------- |
367 | |
368 | **Experimental** |
369 | |
370 | ``.sancov`` files do not contain enough information to generate a source-level |
371 | coverage report. The missing information is contained |
372 | in debug info of the binary. Thus the ``.sancov`` has to be symbolized |
373 | to produce a ``.symcov`` file first: |
374 | |
375 | .. code-block:: console |
376 | |
377 | sancov -symbolize my_program.123.sancov my_program > my_program.123.symcov |
378 | |
379 | The ``.symcov`` file can be browsed overlayed over the source code by |
380 | running ``tools/sancov/coverage-report-server.py`` script that will start |
381 | an HTTP server. |
382 | |
383 | Output directory |
384 | ---------------- |
385 | |
386 | By default, .sancov files are created in the current working directory. |
387 | This can be changed with ``ASAN_OPTIONS=coverage_dir=/path``: |
388 | |
389 | .. code-block:: console |
390 | |
391 | % ASAN_OPTIONS="coverage=1:coverage_dir=/tmp/cov" ./a.out foo |
392 | % ls -l /tmp/cov/*sancov |
393 | -rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov |
394 | -rw-r----- 1 kcc eng 8 Nov 27 12:21 a.out.22679.sancov |
395 | |