Optimisation de GCC

From Gentoo Wiki
Jump to: navigation, search
This page is a translated version of the page GCC optimization and the translation is 39% complete.

Other languages:
English • ‎español • ‎français • ‎italiano • ‎日本語 • ‎한국어 • ‎русский • ‎Türkçe

This guide provides an introduction to optimizing compiled code using safe, sane CFLAGS and CXXFLAGS. It also as describes the theory behind optimizing in general.


Que sont les variables CFLAGS et CXXFLAGS ?

CFLAGS and CXXFLAGS are environment variables that are used to tell the GNU Compiler Collection (GCC) what kinds of switches to use when compiling source code. The CFLAGS variable is used for compiling code written in C, while the CXXFLAGS variable is for code written in C++.

Elles peuvent être utilisées pour diminuer le nombre de messages de débogage pour un programme, augmenter le niveau d'alerte, et bien-sûr, optimiser le code produit. Le manuel de gcc (en anglais) tient à jour une liste exhaustive des options disponibles et de leurs objectifs.

Comment sont-elles utilisées ?

CFLAGS and CXXFLAGS can be used in two ways. First, they can be used per-program with Makefiles generated by the automake program.

However, this should not be done when installing packages found in the Portage tree. Instead, for Gentoo-based machines, set the CFLAGS and CXXFLAGS variables in /etc/portage/make.conf This way all packages will be compiled using the options specified in make.conf

CODE Setting CFLAGS in /etc/portage/make.conf
CFLAGS="-march=athlon64 -O2 -pipe"
While it is possible to have multiple lines in USE flags, having multiple lines in CFLAGS can and will result in problems with programs such as cmake. Make sure the CFLAGS declaration is on a single line, with as little whitespace as possible to avoid issues. See bug #500034 as an example.

As seen in the example above the CXXFLAGS variable is set to use all the options present in CFLAGS. Most every system should be configured in this manner; additional options for CXXFLAGS are extremely rare in common use cases.

Erreurs de conception

While CFLAGS and CXXFLAGS can be very effective means of getting source code to produce smaller and/or faster binaries, they can also impair the function of the code, bloat its size, slow down its execution time. Setting them incorrectly can even cause compilation failures!

CFLAGS are not a magic bullet; they will not automatically make the system run faster or reduce the size of binaries on the disk. Adding too many flags in an attempt to optimize (or "rice") the system is a sure recipe for failure. The point of diminishing returns is reached rather quickly when dealing with CFLAGS.

Despite the boasts and brags found on the internet, aggressive CFLAGS and CXXFLAGS are far more likely to harm binaries than to do any good. Keep in mind the flags are designed to be used at specific places for specific purposes. Few flags work as intended globally.

Prêt ?

Being aware of the risks involved, take a look at some sane, safe optimizations. These will hold in good stead and will be endearing to developers the next time a problem is reported on Bugzilla. (Developers will usually request the user to recompile a package with minimal CFLAGS to see if the problem persists. Remember: aggressive flags can ruin code!)


Les bases

The goal behind CFLAGS and CXXFLAGS is to create code tailor-made to your system; it should function perfectly while being lean and fast, if possible. Sometimes these conditions are mutually exclusive, so this guide will stick to combinations known to work well. Ideally, they are the best available for any CPU architecture. For informational purposes, aggressive flag use will be covered later. Not every option listed on the GCC manual (there are hundreds) will be discussed, but basic, most common flags will be reviewed.

When unaware of what a flag does refer to the relevant chapter of the GCC manual. If still stumped after viewing the manual, try a search engine or check out the GCC mailing lists.


The first and most important option is -march. This tells the compiler what code it should produce for the system's processor architecture (or arch); it tells GCC that it should produce code for a certain kind of CPU. Different CPUs have different capabilities, support different instruction sets, and have different ways of executing code. The -march flag will instruct the compiler to produce specific code for the system's CPU, with all its capabilities, features, instruction sets, quirks, and so on.

Even though the CHOST variable in /etc/portage/make.conf specifies the general architecture used, -march should still be used so that programs can be optimized for the system specific processor. x86 and x86-64 CPUs (among others) should make use of the -march flag.

What kind of CPU does the system have? To find out, run the following command:

user $cat /proc/cpuinfo

To get more details, including march and mtune values, two commands can be used.

user $gcc -c -Q -march=native --help=target
  • The second command will show the compiler directives for building the header file, but without actually performing the steps and instead showing them on the screen (-###). The final output line is the command that holds all the optimization options and architecture selection:
user $gcc -### -march=native /usr/include/stdlib.h

Now lets see -march in action. This example is for an older Pentium III chip:

FILE /etc/portage/make.confPentium III example

En voici un autre pour un processeur AMD 64-bit :

FILE /etc/portage/make.confAMD64 example

If the type of CPU is undetermined, or if the user does not know what setting to choose, it is possible use the -march=native setting. When this flag is used, GCC will attempt to detect the processor and automatically set appropriate flags for it. However, this should not be used when intending to compile packages for different CPUs!

Do not use -march=native or -mtune=native in the CFLAGS or CXXFLAGS variables of make.conf when compiling with distcc.

If compiling packages on one computer in order to run them on a different computer (such as when using a fast computer to build for an older, slower machine), then do not use -march=native. "Native" means that the code produced will run only on that type of CPU. The applications built with -march=native on an AMD Athlon 64 CPU will not be able to run on an old VIA C3 CPU.

Also available are the -mtune and -mcpu flags. These flags are normally only used when there is no available -march option; certain processor architectures may require -mtune or even -mcpu. Unfortunately, GCC's behavior isn't very consistent with how each flag behaves from one architecture to the next.

On x86 and x86-64 CPUs, -march will generate code specifically for that CPU using its available instruction sets and the correct ABI; it will have no backwards compatibility for older/different CPUs. Consider using -mtune when generating code for older CPUs such as i386 and i486. -mtune produces more generic code than -march; though it will tune code for a certain CPU, it does not take into account available instruction sets and ABI. Do not use -mcpu on x86 or x86-64 systems, as it is deprecated for those arches.

Only non-x86/x86-64 CPUs (such as Sparc, Alpha, and PowerPC) may require -mtune or -mcpu instead of -march. On these architectures, -mtune / -mcpu will sometimes behave just like -march (on x86/x86-64) but with a different flag name. Again, GCC's behavior and flag naming is not consistent across architectures, so be sure to check the GCC manual to determine which one should be used.

For more suggested -march / -mtune / -mcpu settings, please read chapter 5 of the appropriate Gentoo Installation Handbook for the arch. Also, read the GCC manual's list of architecture-specific options, as well as more detailed explanations about the differences between -march, -mcpu, and -mtune.


Next up is the -O variable. This variable controls the overall level of optimization. Changing this value will make the code compilation take more time and will use much more memory, especially as the level of optimization is increased.

There are seven -O settings: -O0, -O1, -O2, -O3, -Os, -Og, and -Ofast. Only use one of them in /etc/portage/make.conf

À l'exception de -O0 ,les réglages de -O activent chacun une série d'options additionnelles, c'est pourquoi vous devriez lire le chapitre sur les options d'optimisation dans le manuel de gcc, pour connaître les options qui sont activées par chacun des niveaux de -O, et des explications sur ce qu'elles font.

Let us examine each optimization level:

  • -O0: This level (that is the letter "O" followed by a zero) turns off optimization entirely and is the default if no -O level is specified in CFLAGS or CXXFLAGS. This reduces compilation time and can improve debugging info, but some applications will not work properly without optimization enabled. This option is not recommended except for debugging purposes.
  • -O1: the most basic optimization level. The compiler will try to produce faster, smaller code without taking much compilation time. It is basic, but it should get the job done all the time.
  • -O2: A step up from -O1. The recommended level of optimization unless the system has special needs. -O2 will activate a few more flags in addition to the ones activated by -O1. With -O2, the compiler will attempt to increase code performance without compromising on size, and without taking too much compilation time.
  • -O3: the highest level of optimization possible. It enables optimizations that are expensive in terms of compile time and memory usage. Compiling with -O3 is not a guaranteed way to improve performance, and in fact, in many cases, can slow down a system due to larger binaries and increased memory usage. -O3 is also known to break several packages. Using -O3 is not recommended.
  • -Os: optimizes code for size. It activates all -O2 options that do not increase the size of the generated code. It can be useful for machines that have extremely limited disk storage space and/or CPUs with small cache sizes.
  • -Og : In gcc 4.8, un nouveau niveau d'optimisation général , -Og a été introduit.Il répond au besoin d'une compilation rapide et une amélioration du débogage tout en procurant un niveau de performance en exécution raisonnable. Le ressenti en développement devrait être meilleur qu'avec le niveau d'optimisation -O0. Notez que -Og n'implique pas -g, il se contente de désactiver les optimisations qui pourrait interférer avec le débogage.
  • -Ofast: nouveau dans GCC 4.7, consiste en -O3 plus -ffast math, -fno-protect-parens<c/ode>, et <code>-fstack-arrays. Cette option brise la conformité stricte avec les normes, et n'est pas recommandée en utilisation.

As previously mentioned, -O2 is the recommended optimization level. If package compilation fails and while not using -O2, try rebuilding with that option. As a fallback option, try setting the CFLAGS and CXXFLAGS to a lower optimization level, such as -O1 or even -O0 -g2 -ggdb (for error reporting and checking for possible problems).


A common flag is -pipe. This flag has no effect on the generated code, but it makes the compilation process faster. It tells the compiler to use pipes instead of temporary files during the different stages of compilation, which uses more memory. On systems with low memory, GCC might get killed. In those cases do not use this flag.


This is a very common flag designed to reduce generated code size. It is turned on at all levels of -O (except -O0) on architectures where doing so does not interfere with debugging (such as x86-64), but it may need to be activated. In that case add it to the flags. Though the GCC manual does not specify all architectures, it is turned on by using the -O option. It's still necessary to explicitly enable the -fomit-frame-pointer option, to activate it on x86-32 with GCC up to version 4.6, or when using -Os on x86-32 with any version of GCC. However, using -fomit-frame-pointer will make debugging hard or impossible.

In particular, it makes troubleshooting applications written in Java much harder, though Java is not the only code affected by using this flag. So while the flag can help, it also makes debugging harder; backtraces in particular will be useless. When not doing software debugging and no other debugging-related CFLAGS such as -ggdb have been used, then try using -fomit-frame-pointer.

Ne combinez pas -fomit-frame-pointer avec l'option similaire -momit-leaf-frame-pointer . Utiliser cette dernière option est déconseillé car -fomit-frame-pointer fait déjà le travail proprement. De plus, -momit-leaf-frame-pointer a démontré un impact négatif sur la performance du code.

-msse, -msse2, -msse3, -mmmx et -m3dnow

These flags enable the Streaming SIMD Extentions (SSE), SSE2, SSE3, MMX, and 3DNow! instruction sets for x86 and x86-64 architectures. These are useful primarily in multimedia, gaming, and other floating point-intensive computing tasks, though they also contain several other mathematical enhancements. These instruction sets are found in more modern CPUs.

Be sure to see if the CPU supports these instruction sets by running cat /proc/cpuinfo. The output will include any supported additional instruction sets. Note that pni is just a different name for SSE3.

Normally none of these flags need to be added to /etc/portage/make.conf, as long as the system is using the correct -march (for example, -march=nocona implies -msse3). Some notable exceptions are newer VIA and AMD64 CPUs that support instructions not implied by -march (such as SSE3). For CPUs like these additional flags will need to be enabled where appropriate after checking /proc/cpuinfo.

Check the list of x86 and x86-64-specific flags to see which of these instruction sets are activated by the proper CPU type flag. If an instruction is listed, then it does not need to be separately specified; it will be turned on by using the proper -march setting.

FAQs sur l'optimisation

Mais j'obtiens de meilleures performance avec -funroll-loops -fomg-optimize !

No, you only think you do because someone has convinced you that more flags are better. Aggressive flags will only hurt applications when used system-wide. Even the GCC manual says that using -funroll-loops and -funroll-all-loops will make code larger and run more slowly. Yet for some reason, these two flags, along with -ffast-math, -fforce-mem, -fforce-addr, and similar flags, continue to be very popular among ricers who want the biggest bragging rights.

La vérité sur ce sujet, c'est qu'il y a des options dangereusement agressives. Jetez donc un coup d'œil aux forums Gentoo et à Bugzilla pour savoir ce que ces options font réellement : rien de bon !

You do not need to use those flags globally in CFLAGS or CXXFLAGS. They will only hurt performance. They may make you sound like you have a high-performance system running on the bleeding edge, but they don't do anything but bloat the code and get your bugs marked INVALID or WONTFIX.

Vous n'avez pas besoin de telles options dangereuses. Ne les utilisez pas !. Contentez-vous de vous en tenir aux basiques : -march , -O et -pipe.

Que dire des niveaux -O supérieurs à 3 ?

Some users boast about even better performance obtained by using -O4, -O9, and so on, but the reality is that -O levels higher than 3 have no effect. The compiler may accept CFLAGS like -O4, but it actually doesn't do anything with them. It only performs the optimizations for -O3, nothing more.

Need more proof? Examine the source code:

CODE -O source code
if (optimize >= 3)
      flag_inline_functions = 1;
      flag_unswitch_loops = 1;
      flag_gcse_after_reload = 1;
      /* Allow even more virtual operators.  */
      set_param_value ("max-aliased-vops", 1000);
      set_param_value ("avg-aliased-vops", 3);

Comme vous pouvez le constater, aucune valeur supérieure à -O3 n'est prise en compte.

What about compiling outside the target machine?

Some readers might wonder if compiling outside the target machine with a strictly inferior CPU or GCC sub-architecture will result in inferior optimization results (compared to a native compilation). The answer is simple: No. Regardless of the actual hardware on which the compilation takes place and the CHOST for which GCC was built, as long as the same arguments are used (except for -march=native) and the same version of GCC is used (although minor version might be different), the resulting optimizations are strictly the same.

To exemplify, if Gentoo is installed on a machine whose GCC's CHOST is i686-pc-linux-gnu, and a Distcc server is setup on another computer whose GCC's CHOST is i486-linux-gnu, then there is no need to be afraid that the results would be less optimal because of the strictly inferior sub-architecture of the remote compiler and/or hardware. The result would be as optimized as a native build, as long as the same options are passed to both compilers (and the -march parameter doesn't get a native argument). In this particular case the target architecture needs to be specified explicitly as explained in Distcc and -march=native.

The only difference in behavior between two GCC versions built targeting different sub-architectures is the implicit default argument for the -march parameter, which is derived from the GCC's CHOST when not explicitly provided in the command line.

Que dire des options redondantes ?

Oftentimes CFLAGS and CXXFLAGS that are turned on at various -O levels are specified redundantly in /etc/portage/make.conf. Sometimes this is done out of ignorance, but it is also done to avoid flag filtering or flag replacing.

Flag filtering/replacing is done in many of the ebuilds in the Portage tree. It is usually done because packages fail to compile at certain -O levels, or when the source code is too sensitive for any additional flags to be used. The ebuild will either filter out some or all CFLAGS and CXXFLAGS, or it may replace -O with a different level.

Le Manuel du développeur de Gentoo indique quand et comment le filtrage/remplacement d'options fonctionne.

Il est possible de contrecarrer le filtrage de -O en listant de manière redondante les options d'un certain niveau, (tel que -O3) en faisant ceci :

CODE Specifying redundant CFLAGS
CFLAGS="-O3 -finline-functions -funswitch-loops"

However, this is not a smart thing to do. CFLAGS are filtered for a reason! When flags are filtered, it means that it is unsafe to build a package with those flags. Clearly, it is not safe to compile your whole system with -O3 if some of the flags turned on by that level will cause problems with certain packages. Therefore, you shouldn't try to "outsmart" the developers who maintain those packages. Trust the developers. Flag filtering and replacing is done for your benefit! If an ebuild specifies alternative flags, then don't try to get around it.

Vous continuerez probablement à rencontrer des problèmes si vous compilez un paquet avec des options inacceptables. Quand vous rapportez vos problèmes sur Bugzilla, les options que vous utilisez dans /etc/portage/make.conf seront pleinement visibles et on vous demandera de recompiler le paquet sans ces options. Évitez d'avoir à recompiler en n'utilisant pas ces options redondantes dès l'origine ! Ne supposez pas de manière automatique que vous en savez plus que les développeurs.

Que dire de LDFLAGS ?

The Gentoo developers have already set basic, safe LDFLAGS in the base profiles, so they do not need to be changed.

Puis-je utiliser des options par paquet ?

Attention !
L'utilisation d'options par paquet complique le débogage et l'assistance. Pensez à signaler dans vos rapport de bogues si vous utilisez cette fonctionnalité et quels changements vous avez faits.

Information on how to use per-package environment variables (including CFLAGS) is described in the Gentoo Handbook, "Per-Package Environment Variables".


Les ressources suivantes vous seront utiles pour aller plus loin dans la compréhension de l'optimisation :

  • man make.conf

This article is based on a document formerly found on our main website gentoo.org.
The following people contributed to the original document: nightmorph
They are listed here as the Wiki history does not allow for any external attribution. If you edit the Wiki article, please do not add yourself here; your contributions are recorded on the history page.