ShadowCallStack.rst source code [clang_source_code/docs/ShadowCallStack.rst]

1	===============
2	ShadowCallStack
3	===============
4
5	.. contents::
6	:local:
7
8	Introduction
9	============
10
11	ShadowCallStack is an instrumentation pass, currently only implemented for
12	aarch64, that protects programs against return address overwrites
13	(e.g. stack buffer overflows.) It works by saving a function's return address
14	to a separately allocated 'shadow call stack' in the function prolog in
15	non-leaf functions and loading the return address from the shadow call stack
16	in the function epilog. The return address is also stored on the regular stack
17	for compatibility with unwinders, but is otherwise unused.
18
19	The aarch64 implementation is considered production ready, and
20	an `implementation of the runtime`_ has been added to Android's libc
21	(bionic). An x86_64 implementation was evaluated using Chromium and was found
22	to have critical performance and security deficiencies--it was removed in
23	LLVM 9.0. Details on the x86_64 implementation can be found in the
24	`Clang 7.0.1 documentation`_.
25
26	.. _`implementation of the runtime`: https://android.googlesource.com/platform/bionic/+/808d176e7e0dd727c7f929622ec017f6e065c582/libc/bionic/pthread_create.cpp#128
27	.. _`Clang 7.0.1 documentation`: https://releases.llvm.org/7.0.1/tools/clang/docs/ShadowCallStack.html
28
29	Comparison
30	----------
31
32	To optimize for memory consumption and cache locality, the shadow call
33	stack stores only an array of return addresses. This is in contrast to other
34	schemes, like :doc:`SafeStack`, that mirror the entire stack and trade-off
35	consuming more memory for shorter function prologs and epilogs with fewer
36	memory accesses.
37
38	`Return Flow Guard`_ is a pure software implementation of shadow call stacks
39	on x86_64. Like the previous implementation of ShadowCallStack on x86_64, it is
40	inherently racy due to the architecture's use of the stack for calls and
41	returns.
42
43	Intel `Control-flow Enforcement Technology`_ (CET) is a proposed hardware
44	extension that would add native support to use a shadow stack to store/check
45	return addresses at call/return time. Being a hardware implementation, it
46	would not suffer from race conditions and would not incur the overhead of
47	function instrumentation, but it does require operating system support.
48
49	.. _`Return Flow Guard`: https://xlab.tencent.com/en/2016/11/02/return-flow-guard/
50	.. _`Control-flow Enforcement Technology`: https://software.intel.com/sites/default/files/managed/4d/2a/control-flow-enforcement-technology-preview.pdf
51
52	Compatibility
53	-------------
54
55	A runtime is not provided in compiler-rt so one must be provided by the
56	compiled application or the operating system. Integrating the runtime into
57	the operating system should be preferred since otherwise all thread creation
58	and destruction would need to be intercepted by the application.
59
60	The instrumentation makes use of the platform register ``x18``. On some
61	platforms, ``x18`` is reserved, and on others, it is designated as a scratch
62	register. This generally means that any code that may run on the same thread
63	as code compiled with ShadowCallStack must either target one of the platforms
64	whose ABI reserves ``x18`` (currently Android, Darwin, Fuchsia and Windows)
65	or be compiled with the flag ``-ffixed-x18``. If absolutely necessary, code
66	compiled without ``-ffixed-x18`` may be run on the same thread as code that
67	uses ShadowCallStack by saving the register value temporarily on the stack
68	(`example in Android`_) but this should be done with care since it risks
69	leaking the shadow call stack address.
70
71	.. _`example in Android`: https://android-review.googlesource.com/c/platform/frameworks/base/+/803717
72
73	Because of the use of register ``x18``, the ShadowCallStack feature is
74	incompatible with any other feature that may use ``x18``. However, there
75	is no inherent reason why ShadowCallStack needs to use register ``x18``
76	specifically; in principle, a platform could choose to reserve and use another
77	register for ShadowCallStack, but this would be incompatible with the AAPCS64.
78
79	Special unwind information is required on functions that are compiled
80	with ShadowCallStack and that may be unwound, i.e. functions compiled with
81	``-fexceptions`` (which is the default in C++). Some unwinders (such as the
82	libgcc 4.9 unwinder) do not understand this unwind info and will segfault
83	when encountering it. LLVM libunwind processes this unwind info correctly,
84	however. This means that if exceptions are used together with ShadowCallStack,
85	the program must use a compatible unwinder.
86
87	Security
88	========
89
90	ShadowCallStack is intended to be a stronger alternative to
91	``-fstack-protector``. It protects from non-linear overflows and arbitrary
92	memory writes to the return address slot.
93
94	The instrumentation makes use of the ``x18`` register to reference the shadow
95	call stack, meaning that references to the shadow call stack do not have
96	to be stored in memory. This makes it possible to implement a runtime that
97	avoids exposing the address of the shadow call stack to attackers that can
98	read arbitrary memory. However, attackers could still try to exploit side
99	channels exposed by the operating system `[1]`_ `[2]`_ or processor `[3]`_
100	to discover the address of the shadow call stack.
101
102	.. _`[1]`: https://eyalitkin.wordpress.com/2017/09/01/cartography-lighting-up-the-shadows/
103	.. _`[2]`: https://www.blackhat.com/docs/eu-16/materials/eu-16-Goktas-Bypassing-Clangs-SafeStack.pdf
104	.. _`[3]`: https://www.vusec.net/projects/anc/
105
106	Unless care is taken when allocating the shadow call stack, it may be
107	possible for an attacker to guess its address using the addresses of
108	other allocations. Therefore, the address should be chosen to make this
109	difficult. One way to do this is to allocate a large guard region without
110	read/write permissions, randomly select a small region within it to be
111	used as the address of the shadow call stack and mark only that region as
112	read/write. This also mitigates somewhat against processor side channels.
113	The intent is that the Android runtime `will do this`_, but the platform will
114	first need to be `changed`_ to avoid using ``setrlimit(RLIMIT_AS)`` to limit
115	memory allocations in certain processes, as this also limits the number of
116	guard regions that can be allocated.
117
118	.. _`will do this`: https://android-review.googlesource.com/c/platform/bionic/+/891622
119	.. _`changed`: https://android-review.googlesource.com/c/platform/frameworks/av/+/837745
120
121	The runtime will need the address of the shadow call stack in order to
122	deallocate it when destroying the thread. If the entire program is compiled
123	with ``-ffixed-x18``, this is trivial: the address can be derived from the
124	value stored in ``x18`` (e.g. by masking out the lower bits). If a guard
125	region is used, the address of the start of the guard region could then be
126	stored at the start of the shadow call stack itself. But if it is possible
127	for code compiled without ``-ffixed-x18`` to run on a thread managed by the
128	runtime, which is the case on Android for example, the address must be stored
129	somewhere else instead. On Android we store the address of the start of the
130	guard region in TLS and deallocate the entire guard region including the
131	shadow call stack at thread exit. This is considered acceptable given that
132	the address of the start of the guard region is already somewhat guessable.
133
134	One way in which the address of the shadow call stack could leak is in the
135	``jmp_buf`` data structure used by ``setjmp`` and ``longjmp``. The Android
136	runtime `avoids this`_ by only storing the low bits of ``x18`` in the
137	``jmp_buf``, which requires the address of the shadow call stack to be
138	aligned to its size.
139
140	.. _`avoids this`: https://android.googlesource.com/platform/bionic/+/808d176e7e0dd727c7f929622ec017f6e065c582/libc/arch-arm64/bionic/setjmp.S#49
141
142	The architecture's call and return instructions (``bl`` and ``ret``) operate on
143	a register rather than the stack, which means that leaf functions are generally
144	protected from return address overwrites even without ShadowCallStack.
145
146	Usage
147	=====
148
149	To enable ShadowCallStack, just pass the ``-fsanitize=shadow-call-stack``
150	flag to both compile and link command lines. On aarch64, you also need to pass
151	``-ffixed-x18`` unless your target already reserves ``x18``.
152
153	Low-level API
154	-------------
155
156	``__has_feature(shadow_call_stack)``
157	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
158
159	In some cases one may need to execute different code depending on whether
160	ShadowCallStack is enabled. The macro ``__has_feature(shadow_call_stack)`` can
161	be used for this purpose.
162
163	.. code-block:: c
164
165	#if defined(__has_feature)
166	# if __has_feature(shadow_call_stack)
167	// code that builds only under ShadowCallStack
168	# endif
169	#endif
170
171	``__attribute__((no_sanitize("shadow-call-stack")))``
172	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
173
174	Use ``__attribute__((no_sanitize("shadow-call-stack")))`` on a function
175	declaration to specify that the shadow call stack instrumentation should not be
176	applied to that function, even if enabled globally.
177
178	Example
179	=======
180
181	The following example code:
182
183	.. code-block:: c++
184
185	int foo() {
186	return bar() + 1;
187	}
188
189	Generates the following aarch64 assembly when compiled with ``-O2``:
190
191	.. code-block:: none
192
193	stp x29, x30, [sp, #-16]!
194	mov x29, sp
195	bl bar
196	add w0, w0, #1
197	ldp x29, x30, [sp], #16
198	ret
199
200	Adding ``-fsanitize=shadow-call-stack`` would output the following assembly:
201
202	.. code-block:: none
203
204	str x30, [x18], #8
205	stp x29, x30, [sp, #-16]!
206	mov x29, sp
207	bl bar
208	add w0, w0, #1
209	ldp x29, x30, [sp], #16
210	ldr x30, [x18, #-8]!
211	ret
212

Clang Project