1 | ======= |
2 | ThinLTO |
3 | ======= |
4 | |
5 | .. contents:: |
6 | :local: |
7 | |
8 | Introduction |
9 | ============ |
10 | |
11 | *ThinLTO* compilation is a new type of LTO that is both scalable and |
12 | incremental. *LTO* (Link Time Optimization) achieves better |
13 | runtime performance through whole-program analysis and cross-module |
14 | optimization. However, monolithic LTO implements this by merging all |
15 | input into a single module, which is not scalable |
16 | in time or memory, and also prevents fast incremental compiles. |
17 | |
18 | In ThinLTO mode, as with regular LTO, clang emits LLVM bitcode after the |
19 | compile phase. The ThinLTO bitcode is augmented with a compact summary |
20 | of the module. During the link step, only the summaries are read and |
21 | merged into a combined summary index, which includes an index of function |
22 | locations for later cross-module function importing. Fast and efficient |
23 | whole-program analysis is then performed on the combined summary index. |
24 | |
25 | However, all transformations, including function importing, occur |
26 | later when the modules are optimized in fully parallel backends. |
27 | By default, linkers_ that support ThinLTO are set up to launch |
28 | the ThinLTO backends in threads. So the usage model is not affected |
29 | as the distinction between the fast serial thin link step and the backends |
30 | is transparent to the user. |
31 | |
32 | For more information on the ThinLTO design and current performance, |
33 | see the LLVM blog post `ThinLTO: Scalable and Incremental LTO |
34 | <http://blog.llvm.org/2016/06/thinlto-scalable-and-incremental-lto.html>`_. |
35 | While tuning is still in progress, results in the blog post show that |
36 | ThinLTO already performs well compared to LTO, in many cases matching |
37 | the performance improvement. |
38 | |
39 | Current Status |
40 | ============== |
41 | |
42 | Clang/LLVM |
43 | ---------- |
44 | .. _compiler: |
45 | |
46 | The 3.9 release of clang includes ThinLTO support. However, ThinLTO |
47 | is under active development, and new features, improvements and bugfixes |
48 | are being added for the next release. For the latest ThinLTO support, |
49 | `build a recent version of clang and LLVM |
50 | <https://llvm.org/docs/CMake.html>`_. |
51 | |
52 | Linkers |
53 | ------- |
54 | .. _linkers: |
55 | .. _linker: |
56 | |
57 | ThinLTO is currently supported for the following linkers: |
58 | |
59 | - **gold (via the gold-plugin)**: |
60 | Similar to monolithic LTO, this requires using |
61 | a `gold linker configured with plugins enabled |
62 | <https://llvm.org/docs/GoldPlugin.html>`_. |
63 | - **ld64**: |
64 | Starting with `Xcode 8 <https://developer.apple.com/xcode/>`_. |
65 | - **lld**: |
66 | Starting with r284050 for ELF, r298942 for COFF. |
67 | |
68 | Usage |
69 | ===== |
70 | |
71 | Basic |
72 | ----- |
73 | |
74 | To utilize ThinLTO, simply add the -flto=thin option to compile and link. E.g. |
75 | |
76 | .. code-block:: console |
77 | |
78 | % clang -flto=thin -O2 file1.c file2.c -c |
79 | % clang -flto=thin -O2 file1.o file2.o -o a.out |
80 | |
81 | When using lld-link, the -flto option need only be added to the compile step: |
82 | |
83 | .. code-block:: console |
84 | |
85 | % clang-cl -flto=thin -O2 -c file1.c file2.c |
86 | % lld-link /out:a.exe file1.obj file2.obj |
87 | |
88 | As mentioned earlier, by default the linkers will launch the ThinLTO backend |
89 | threads in parallel, passing the resulting native object files back to the |
90 | linker for the final native link. As such, the usage model the same as |
91 | non-LTO. |
92 | |
93 | With gold, if you see an error during the link of the form: |
94 | |
95 | .. code-block:: console |
96 | |
97 | /usr/bin/ld: error: /path/to/clang/bin/../lib/LLVMgold.so: could not load plugin library: /path/to/clang/bin/../lib/LLVMgold.so: cannot open shared object file: No such file or directory |
98 | |
99 | Then either gold was not configured with plugins enabled, or clang |
100 | was not built with ``-DLLVM_BINUTILS_INCDIR`` set properly. See |
101 | the instructions for the |
102 | `LLVM gold plugin <https://llvm.org/docs/GoldPlugin.html#how-to-build-it>`_. |
103 | |
104 | Controlling Backend Parallelism |
105 | ------------------------------- |
106 | .. _parallelism: |
107 | |
108 | By default, the ThinLTO link step will launch as many |
109 | threads in parallel as there are cores. If the number of |
110 | cores can't be computed for the architecture, then it will launch |
111 | ``std::thread::hardware_concurrency`` number of threads in parallel. |
112 | For machines with hyper-threading, this is the total number of |
113 | virtual cores. For some applications and machine configurations this |
114 | may be too aggressive, in which case the amount of parallelism can |
115 | be reduced to ``N`` via: |
116 | |
117 | - gold: |
118 | ``-Wl,-plugin-opt,jobs=N`` |
119 | - ld64: |
120 | ``-Wl,-mllvm,-threads=N`` |
121 | - lld: |
122 | ``-Wl,--thinlto-jobs=N`` |
123 | - lld-link: |
124 | ``/opt:lldltojobs=N`` |
125 | |
126 | Incremental |
127 | ----------- |
128 | .. _incremental: |
129 | |
130 | ThinLTO supports fast incremental builds through the use of a cache, |
131 | which currently must be enabled through a linker option. |
132 | |
133 | - gold (as of LLVM 4.0): |
134 | ``-Wl,-plugin-opt,cache-dir=/path/to/cache`` |
135 | - ld64 (support in clang 3.9 and Xcode 8): |
136 | ``-Wl,-cache_path_lto,/path/to/cache`` |
137 | - ELF lld (as of LLVM 5.0): |
138 | ``-Wl,--thinlto-cache-dir=/path/to/cache`` |
139 | - COFF lld-link (as of LLVM 6.0): |
140 | ``/lldltocache:/path/to/cache`` |
141 | |
142 | Cache Pruning |
143 | ------------- |
144 | |
145 | To help keep the size of the cache under control, ThinLTO supports cache |
146 | pruning. Cache pruning is supported with gold, ld64 and ELF and COFF lld, but |
147 | currently only gold, ELF and COFF lld allow you to control the policy with a |
148 | policy string. The cache policy must be specified with a linker option. |
149 | |
150 | - gold (as of LLVM 6.0): |
151 | ``-Wl,-plugin-opt,cache-policy=POLICY`` |
152 | - ELF lld (as of LLVM 5.0): |
153 | ``-Wl,--thinlto-cache-policy,POLICY`` |
154 | - COFF lld-link (as of LLVM 6.0): |
155 | ``/lldltocachepolicy:POLICY`` |
156 | |
157 | A policy string is a series of key-value pairs separated by ``:`` characters. |
158 | Possible key-value pairs are: |
159 | |
160 | - ``cache_size=X%``: The maximum size for the cache directory is ``X`` percent |
161 | of the available space on the disk. Set to 100 to indicate no limit, |
162 | 50 to indicate that the cache size will not be left over half the available |
163 | disk space. A value over 100 is invalid. A value of 0 disables the percentage |
164 | size-based pruning. The default is 75%. |
165 | |
166 | - ``cache_size_bytes=X``, ``cache_size_bytes=Xk``, ``cache_size_bytes=Xm``, |
167 | ``cache_size_bytes=Xg``: |
168 | Sets the maximum size for the cache directory to ``X`` bytes (or KB, MB, |
169 | GB respectively). A value over the amount of available space on the disk |
170 | will be reduced to the amount of available space. A value of 0 disables |
171 | the byte size-based pruning. The default is no byte size-based pruning. |
172 | |
173 | Note that ThinLTO will apply both size-based pruning policies simultaneously, |
174 | and changing one does not affect the other. For example, a policy of |
175 | ``cache_size_bytes=1g`` on its own will cause both the 1GB and default 75% |
176 | policies to be applied unless the default ``cache_size`` is overridden. |
177 | |
178 | - ``cache_size_files=X``: |
179 | Set the maximum number of files in the cache directory. Set to 0 to indicate |
180 | no limit. The default is 1000000 files. |
181 | |
182 | - ``prune_after=Xs``, ``prune_after=Xm``, ``prune_after=Xh``: Sets the |
183 | expiration time for cache files to ``X`` seconds (or minutes, hours |
184 | respectively). When a file hasn't been accessed for ``prune_after`` seconds, |
185 | it is removed from the cache. A value of 0 disables the expiration-based |
186 | pruning. The default is 1 week. |
187 | |
188 | - ``prune_interval=Xs``, ``prune_interval=Xm``, ``prune_interval=Xh``: |
189 | Sets the pruning interval to ``X`` seconds (or minutes, hours |
190 | respectively). This is intended to be used to avoid scanning the directory |
191 | too often. It does not impact the decision of which files to prune. A |
192 | value of 0 forces the scan to occur. The default is every 20 minutes. |
193 | |
194 | Clang Bootstrap |
195 | --------------- |
196 | |
197 | To bootstrap clang/LLVM with ThinLTO, follow these steps: |
198 | |
199 | 1. The host compiler_ must be a version of clang that supports ThinLTO. |
200 | #. The host linker_ must support ThinLTO (and in the case of gold, must be |
201 | `configured with plugins enabled <https://llvm.org/docs/GoldPlugin.html>`_. |
202 | #. Use the following additional `CMake variables |
203 | <https://llvm.org/docs/CMake.html#options-and-variables>`_ |
204 | when configuring the bootstrap compiler build: |
205 | |
206 | * ``-DLLVM_ENABLE_LTO=Thin`` |
207 | * ``-DCMAKE_C_COMPILER=/path/to/host/clang`` |
208 | * ``-DCMAKE_CXX_COMPILER=/path/to/host/clang++`` |
209 | * ``-DCMAKE_RANLIB=/path/to/host/llvm-ranlib`` |
210 | * ``-DCMAKE_AR=/path/to/host/llvm-ar`` |
211 | |
212 | Or, on Windows: |
213 | |
214 | * ``-DLLVM_ENABLE_LTO=Thin`` |
215 | * ``-DCMAKE_C_COMPILER=/path/to/host/clang-cl.exe`` |
216 | * ``-DCMAKE_CXX_COMPILER=/path/to/host/clang-cl.exe`` |
217 | * ``-DCMAKE_LINKER=/path/to/host/lld-link.exe`` |
218 | * ``-DCMAKE_RANLIB=/path/to/host/llvm-ranlib.exe`` |
219 | * ``-DCMAKE_AR=/path/to/host/llvm-ar.exe`` |
220 | |
221 | #. To use additional linker arguments for controlling the backend |
222 | parallelism_ or enabling incremental_ builds of the bootstrap compiler, |
223 | after configuring the build, modify the resulting CMakeCache.txt file in the |
224 | build directory. Specify any additional linker options after |
225 | ``CMAKE_EXE_LINKER_FLAGS:STRING=``. Note the configure may fail if |
226 | linker plugin options are instead specified directly in the previous step. |
227 | |
228 | More Information |
229 | ================ |
230 | |
231 | * From LLVM project blog: |
232 | `ThinLTO: Scalable and Incremental LTO |
233 | <http://blog.llvm.org/2016/06/thinlto-scalable-and-incremental-lto.html>`_ |
234 | |