1 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" |
2 | "http://www.w3.org/TR/html4/strict.dtd"> |
3 | <html> |
4 | <head> |
5 | <META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"> |
6 | <title>Clang - Features and Goals</title> |
7 | <link type="text/css" rel="stylesheet" href="menu.css"> |
8 | <link type="text/css" rel="stylesheet" href="content.css"> |
9 | <style type="text/css"> |
10 | </style> |
11 | </head> |
12 | <body> |
13 | |
14 | <!--#include virtual="menu.html.incl"--> |
15 | |
16 | <div id="content"> |
17 | |
18 | <!--*************************************************************************--> |
19 | <h1>Clang - Features and Goals</h1> |
20 | <!--*************************************************************************--> |
21 | |
22 | <p> |
23 | This page describes the <a href="index.html#goals">features and goals</a> of |
24 | Clang in more detail and gives a more broad explanation about what we mean. |
25 | These features are: |
26 | </p> |
27 | |
28 | <p>End-User Features:</p> |
29 | |
30 | <ul> |
31 | <li><a href="#performance">Fast compiles and low memory use</a></li> |
32 | <li><a href="#expressivediags">Expressive diagnostics</a></li> |
33 | <li><a href="#gcccompat">GCC compatibility</a></li> |
34 | </ul> |
35 | |
36 | <p>Utility and Applications:</p> |
37 | |
38 | <ul> |
39 | <li><a href="#libraryarch">Library based architecture</a></li> |
40 | <li><a href="#diverseclients">Support diverse clients</a></li> |
41 | <li><a href="#ideintegration">Integration with IDEs</a></li> |
42 | <li><a href="#license">Use the LLVM 'BSD' License</a></li> |
43 | </ul> |
44 | |
45 | <p>Internal Design and Implementation:</p> |
46 | |
47 | <ul> |
48 | <li><a href="#real">A real-world, production quality compiler</a></li> |
49 | <li><a href="#simplecode">A simple and hackable code base</a></li> |
50 | <li><a href="#unifiedparser">A single unified parser for C, Objective C, C++, |
51 | and Objective C++</a></li> |
52 | <li><a href="#conformance">Conformance with C/C++/ObjC and their |
53 | variants</a></li> |
54 | </ul> |
55 | |
56 | <!--*************************************************************************--> |
57 | <h2><a name="enduser">End-User Features</a></h2> |
58 | <!--*************************************************************************--> |
59 | |
60 | |
61 | <!--=======================================================================--> |
62 | <h3><a name="performance">Fast compiles and Low Memory Use</a></h3> |
63 | <!--=======================================================================--> |
64 | |
65 | <p>A major focus of our work on clang is to make it fast, light and scalable. |
66 | The library-based architecture of clang makes it straight-forward to time and |
67 | profile the cost of each layer of the stack, and the driver has a number of |
68 | options for performance analysis. Many detailed benchmarks can be found online.</p> |
69 | |
70 | <p>Compile time performance is important, but when using clang as an API, often |
71 | memory use is even more so: the less memory the code takes the more code you can |
72 | fit into memory at a time (useful for whole program analysis tools, for |
73 | example).</p> |
74 | |
75 | <p>In addition to being efficient when pitted head-to-head against GCC in batch |
76 | mode, clang is built with a <a href="#libraryarch">library based |
77 | architecture</a> that makes it relatively easy to adapt it and build new tools |
78 | with it. This means that it is often possible to apply out-of-the-box thinking |
79 | and novel techniques to improve compilation in various ways.</p> |
80 | |
81 | |
82 | <!--=======================================================================--> |
83 | <h3><a name="expressivediags">Expressive Diagnostics</a></h3> |
84 | <!--=======================================================================--> |
85 | |
86 | <p>In addition to being fast and functional, we aim to make Clang extremely user |
87 | friendly. As far as a command-line compiler goes, this basically boils down to |
88 | making the diagnostics (error and warning messages) generated by the compiler |
89 | be as useful as possible. There are several ways that we do this, but the |
90 | most important are pinpointing exactly what is wrong in the program, |
91 | highlighting related information so that it is easy to understand at a glance, |
92 | and making the wording as clear as possible.</p> |
93 | |
94 | <p>Here is one simple example that illustrates the difference between a typical |
95 | GCC and Clang diagnostic:</p> |
96 | |
97 | <pre> |
98 | $ <b>gcc-4.9 -fsyntax-only t.c</b> |
99 | t.c: In function 'int f(int, int)': |
100 | t.c:7:39: error: invalid operands to binary + (have 'int' and 'struct A') |
101 | return y + func(y ? ((SomeA.X + 40) + SomeA) / 42 + SomeA.X : SomeA.X); |
102 | ^ |
103 | $ <b>clang -fsyntax-only t.c</b> |
104 | t.c:7:39: error: invalid operands to binary expression ('int' and 'struct A') |
105 | <span style="color:darkgreen"> return y + func(y ? ((SomeA.X + 40) + SomeA) / 42 + SomeA.X : SomeA.X);</span> |
106 | <span style="color:blue"> ~~~~~~~~~~~~~~ ^ ~~~~~</span> |
107 | </pre> |
108 | |
109 | <p>Here you can see that you don't even need to see the original source code to |
110 | understand what is wrong based on the Clang error: Because Clang prints a |
111 | caret, you know exactly <em>which</em> plus it is complaining about. The range |
112 | information highlights the left and right side of the plus which makes it |
113 | immediately obvious what the compiler is talking about, which is very useful for |
114 | cases involving precedence issues and many other situations.</p> |
115 | |
116 | <p>Clang diagnostics are very polished and have many features. For more |
117 | information and examples, please see the <a href="diagnostics.html">Expressive |
118 | Diagnostics</a> page.</p> |
119 | |
120 | <!--=======================================================================--> |
121 | <h3><a name="gcccompat">GCC Compatibility</a></h3> |
122 | <!--=======================================================================--> |
123 | |
124 | <p>GCC is currently the defacto-standard open source compiler today, and it |
125 | routinely compiles a huge volume of code. GCC supports a huge number of |
126 | extensions and features (many of which are undocumented) and a lot of |
127 | code and header files depend on these features in order to build.</p> |
128 | |
129 | <p>While it would be nice to be able to ignore these extensions and focus on |
130 | implementing the language standards to the letter, pragmatics force us to |
131 | support the GCC extensions that see the most use. Many users just want their |
132 | code to compile, they don't care to argue about whether it is pedantically C99 |
133 | or not.</p> |
134 | |
135 | <p>As mentioned above, all |
136 | extensions are explicitly recognized as such and marked with extension |
137 | diagnostics, which can be mapped to warnings, errors, or just ignored. |
138 | </p> |
139 | |
140 | |
141 | <!--*************************************************************************--> |
142 | <h2><a name="applications">Utility and Applications</a></h2> |
143 | <!--*************************************************************************--> |
144 | |
145 | <!--=======================================================================--> |
146 | <h3><a name="libraryarch">Library Based Architecture</a></h3> |
147 | <!--=======================================================================--> |
148 | |
149 | <p>A major design concept for clang is its use of a library-based |
150 | architecture. In this design, various parts of the front-end can be cleanly |
151 | divided into separate libraries which can then be mixed up for different needs |
152 | and uses. In addition, the library-based approach encourages good interfaces |
153 | and makes it easier for new developers to get involved (because they only need |
154 | to understand small pieces of the big picture).</p> |
155 | |
156 | <blockquote><p> |
157 | "The world needs better compiler tools, tools which are built as libraries. |
158 | This design point allows reuse of the tools in new and novel ways. However, |
159 | building the tools as libraries isn't enough: they must have clean APIs, be as |
160 | decoupled from each other as possible, and be easy to modify/extend. This |
161 | requires clean layering, decent design, and keeping the libraries independent of |
162 | any specific client."</p></blockquote> |
163 | |
164 | <p> |
165 | Currently, clang is divided into the following libraries and tool: |
166 | </p> |
167 | |
168 | <ul> |
169 | <li><b>libsupport</b> - Basic support library, from LLVM.</li> |
170 | <li><b>libsystem</b> - System abstraction library, from LLVM.</li> |
171 | <li><b>libbasic</b> - Diagnostics, SourceLocations, SourceBuffer abstraction, |
172 | file system caching for input source files.</li> |
173 | <li><b>libast</b> - Provides classes to represent the C AST, the C type system, |
174 | builtin functions, and various helpers for analyzing and manipulating the |
175 | AST (visitors, pretty printers, etc).</li> |
176 | <li><b>liblex</b> - Lexing and preprocessing, identifier hash table, pragma |
177 | handling, tokens, and macro expansion.</li> |
178 | <li><b>libparse</b> - Parsing. This library invokes coarse-grained 'Actions' |
179 | provided by the client (e.g. libsema builds ASTs) but knows nothing about |
180 | ASTs or other client-specific data structures.</li> |
181 | <li><b>libsema</b> - Semantic Analysis. This provides a set of parser actions |
182 | to build a standardized AST for programs.</li> |
183 | <li><b>libcodegen</b> - Lower the AST to LLVM IR for optimization & code |
184 | generation.</li> |
185 | <li><b>librewrite</b> - Editing of text buffers (important for code rewriting |
186 | transformation, like refactoring).</li> |
187 | <li><b>libanalysis</b> - Static analysis support.</li> |
188 | <li><b>clang</b> - A driver program, client of the libraries at various |
189 | levels.</li> |
190 | </ul> |
191 | |
192 | <p>As an example of the power of this library based design.... If you wanted to |
193 | build a preprocessor, you would take the Basic and Lexer libraries. If you want |
194 | an indexer, you would take the previous two and add the Parser library and |
195 | some actions for indexing. If you want a refactoring, static analysis, or |
196 | source-to-source compiler tool, you would then add the AST building and |
197 | semantic analyzer libraries.</p> |
198 | |
199 | <p>For more information about the low-level implementation details of the |
200 | various clang libraries, please see the <a href="docs/InternalsManual.html"> |
201 | clang Internals Manual</a>.</p> |
202 | |
203 | <!--=======================================================================--> |
204 | <h3><a name="diverseclients">Support Diverse Clients</a></h3> |
205 | <!--=======================================================================--> |
206 | |
207 | <p>Clang is designed and built with many grand plans for how we can use it. The |
208 | driving force is the fact that we use C and C++ daily, and have to suffer due to |
209 | a lack of good tools available for it. We believe that the C and C++ tools |
210 | ecosystem has been significantly limited by how difficult it is to parse and |
211 | represent the source code for these languages, and we aim to rectify this |
212 | problem in clang.</p> |
213 | |
214 | <p>The problem with this goal is that different clients have very different |
215 | requirements. Consider code generation, for example: a simple front-end that |
216 | parses for code generation must analyze the code for validity and emit code |
217 | in some intermediate form to pass off to a optimizer or backend. Because |
218 | validity analysis and code generation can largely be done on the fly, there is |
219 | not hard requirement that the front-end actually build up a full AST for all |
220 | the expressions and statements in the code. TCC and GCC are examples of |
221 | compilers that either build no real AST (in the former case) or build a stripped |
222 | down and simplified AST (in the later case) because they focus primarily on |
223 | codegen.</p> |
224 | |
225 | <p>On the opposite side of the spectrum, some clients (like refactoring) want |
226 | highly detailed information about the original source code and want a complete |
227 | AST to describe it with. Refactoring wants to have information about macro |
228 | expansions, the location of every paren expression '(((x)))' vs 'x', full |
229 | position information, and much more. Further, refactoring wants to look |
230 | <em>across the whole program</em> to ensure that it is making transformations |
231 | that are safe. Making this efficient and getting this right requires a |
232 | significant amount of engineering and algorithmic work that simply are |
233 | unnecessary for a simple static compiler.</p> |
234 | |
235 | <p>The beauty of the clang approach is that it does not restrict how you use it. |
236 | In particular, it is possible to use the clang preprocessor and parser to build |
237 | an extremely quick and light-weight on-the-fly code generator (similar to TCC) |
238 | that does not build an AST at all. As an intermediate step, clang supports |
239 | using the current AST generation and semantic analysis code and having a code |
240 | generation client free the AST for each function after code generation. Finally, |
241 | clang provides support for building and retaining fully-fledged ASTs, and even |
242 | supports writing them out to disk.</p> |
243 | |
244 | <p>Designing the libraries with clean and simple APIs allows these high-level |
245 | policy decisions to be determined in the client, instead of forcing "one true |
246 | way" in the implementation of any of these libraries. Getting this right is |
247 | hard, and we don't always get it right the first time, but we fix any problems |
248 | when we realize we made a mistake.</p> |
249 | |
250 | <!--=======================================================================--> |
251 | <h3 id="ideintegration">Integration with IDEs</h3> |
252 | <!--=======================================================================--> |
253 | |
254 | <p> |
255 | We believe that Integrated Development Environments (IDE's) are a great way |
256 | to pull together various pieces of the development puzzle, and aim to make clang |
257 | work well in such an environment. The chief advantage of an IDE is that they |
258 | typically have visibility across your entire project and are long-lived |
259 | processes, whereas stand-alone compiler tools are typically invoked on each |
260 | individual file in the project, and thus have limited scope.</p> |
261 | |
262 | <p>There are many implications of this difference, but a significant one has to |
263 | do with efficiency and caching: sharing an address space across different files |
264 | in a project, means that you can use intelligent caching and other techniques to |
265 | dramatically reduce analysis/compilation time.</p> |
266 | |
267 | <p>A further difference between IDEs and batch compiler is that they often |
268 | impose very different requirements on the front-end: they depend on high |
269 | performance in order to provide a "snappy" experience, and thus really want |
270 | techniques like "incremental compilation", "fuzzy parsing", etc. Finally, IDEs |
271 | often have very different requirements than code generation, often requiring |
272 | information that a codegen-only frontend can throw away. Clang is |
273 | specifically designed and built to capture this information. |
274 | </p> |
275 | |
276 | |
277 | <!--=======================================================================--> |
278 | <h3><a name="license">Use the LLVM 'BSD' License</a></h3> |
279 | <!--=======================================================================--> |
280 | |
281 | <p>We actively intend for clang (and LLVM as a whole) to be used for |
282 | commercial projects, not only as a stand-alone compiler but also as a library |
283 | embedded inside a proprietary application. The BSD license is the simplest way |
284 | to allow this. We feel that the license encourages contributors to pick up the |
285 | source and work with it, and believe that those individuals and organizations |
286 | will contribute back their work if they do not want to have to maintain a fork |
287 | forever (which is time consuming and expensive when merges are involved). |
288 | Further, nobody makes money on compilers these days, but many people need them |
289 | to get bigger goals accomplished: it makes sense for everyone to work |
290 | together.</p> |
291 | |
292 | <p>For more information about the LLVM/clang license, please see the <a |
293 | href="http://llvm.org/docs/DeveloperPolicy.html#license">LLVM License |
294 | Description</a> for more information.</p> |
295 | |
296 | |
297 | |
298 | <!--*************************************************************************--> |
299 | <h2><a name="design">Internal Design and Implementation</a></h2> |
300 | <!--*************************************************************************--> |
301 | |
302 | <!--=======================================================================--> |
303 | <h3><a name="real">A real-world, production quality compiler</a></h3> |
304 | <!--=======================================================================--> |
305 | |
306 | <p> |
307 | Clang is designed and built by experienced compiler developers who |
308 | are increasingly frustrated with the problems that <a |
309 | href="comparison.html">existing open source compilers</a> have. Clang is |
310 | carefully and thoughtfully designed and built to provide the foundation of a |
311 | whole new generation of C/C++/Objective C development tools, and we intend for |
312 | it to be production quality.</p> |
313 | |
314 | <p>Being a production quality compiler means many things: it means being high |
315 | performance, being solid and (relatively) bug free, and it means eventually |
316 | being used and depended on by a broad range of people. While we are still in |
317 | the early development stages, we strongly believe that this will become a |
318 | reality.</p> |
319 | |
320 | <!--=======================================================================--> |
321 | <h3><a name="simplecode">A simple and hackable code base</a></h3> |
322 | <!--=======================================================================--> |
323 | |
324 | <p>Our goal is to make it possible for anyone with a basic understanding |
325 | of compilers and working knowledge of the C/C++/ObjC languages to understand and |
326 | extend the clang source base. A large part of this falls out of our decision to |
327 | make the AST mirror the languages as closely as possible: you have your friendly |
328 | if statement, for statement, parenthesis expression, structs, unions, etc, all |
329 | represented in a simple and explicit way.</p> |
330 | |
331 | <p>In addition to a simple design, we work to make the source base approachable |
332 | by commenting it well, including citations of the language standards where |
333 | appropriate, and designing the code for simplicity. Beyond that, clang offers |
334 | a set of AST dumpers, printers, and visualizers that make it easy to put code in |
335 | and see how it is represented.</p> |
336 | |
337 | <!--=======================================================================--> |
338 | <h3><a name="unifiedparser">A single unified parser for C, Objective C, C++, |
339 | and Objective C++</a></h3> |
340 | <!--=======================================================================--> |
341 | |
342 | <p>Clang is the "C Language Family Front-end", which means we intend to support |
343 | the most popular members of the C family. We are convinced that the right |
344 | parsing technology for this class of languages is a hand-built recursive-descent |
345 | parser. Because it is plain C++ code, recursive descent makes it very easy for |
346 | new developers to understand the code, it easily supports ad-hoc rules and other |
347 | strange hacks required by C/C++, and makes it straight-forward to implement |
348 | excellent diagnostics and error recovery.</p> |
349 | |
350 | <p>We believe that implementing C/C++/ObjC in a single unified parser makes the |
351 | end result easier to maintain and evolve than maintaining a separate C and C++ |
352 | parser which must be bugfixed and maintained independently of each other.</p> |
353 | |
354 | <!--=======================================================================--> |
355 | <h3><a name="conformance">Conformance with C/C++/ObjC and their |
356 | variants</a></h3> |
357 | <!--=======================================================================--> |
358 | |
359 | <p>When you start work on implementing a language, you find out that there is a |
360 | huge gap between how the language works and how most people understand it to |
361 | work. This gap is the difference between a normal programmer and a (scary? |
362 | super-natural?) "language lawyer", who knows the ins and outs of the language |
363 | and can grok standardese with ease.</p> |
364 | |
365 | <p>In practice, being conformant with the languages means that we aim to support |
366 | the full language, including the dark and dusty corners (like trigraphs, |
367 | preprocessor arcana, C99 VLAs, etc). Where we support extensions above and |
368 | beyond what the standard officially allows, we make an effort to explicitly call |
369 | this out in the code and emit warnings about it (which are disabled by default, |
370 | but can optionally be mapped to either warnings or errors), allowing you to use |
371 | clang in "strict" mode if you desire.</p> |
372 | |
373 | <p>We also intend to support "dialects" of these languages, such as C89, K&R |
374 | C, C++'03, Objective-C 2, etc.</p> |
375 | |
376 | </div> |
377 | </body> |
378 | </html> |
379 | |