Difference between revisions of "GCC optimization"
(Marked this version for translation)
m (HTTPS is bettere than HTTP)
|Line 68:||Line 68:|
The first and most important option is <code>-march</code>. This tells the compiler what code it should produce for your processor [
The first and most important option is <code>-march</code>. This tells the compiler what code it should produce for your processor [://en.wikipedia.org/wiki/Microarchitecture architecture] (or ''arch''); it says that it should produce code for a certain kind of CPU. Different CPUs have different capabilities, support different instruction sets, and have different ways of executing code. The <code>-march</code> flag will instruct the compiler to produce code specifically for your CPU, with all its capabilities, features, instruction sets, quirks, and so on.
Revision as of 14:18, 21 February 2014
This guide provides an introduction to optimizing compiled code using safe, sane CFLAGS and CXXFLAGS. It also as describes the theory behind optimizing in general.
- 1 Introduction
- 2 Optimizing
- 3 Optimization FAQs
- 4 Resources
- 5 Acknowledgements
What are CFLAGS and CXXFLAGS?
CFLAGS and CXXFLAGS are environment variables that are used to tell the GNU Compiler Collection,
GCC , what kinds of switches to use when compiling source code. CFLAGS are for code written in C, while CXXFLAGS are for code written in C++.
They can be used to decrease the amount of debug messages for a program, increase error warning levels, and, of course, to optimize the code produced. The GCC manual maintains a complete list of available options and their purposes.
How are they used?
CFLAGS and CXXFLAGS can be used in two ways. First, they can be used per-program with Makefiles generated by automake.
However, this should not be done when installing packages found in the Portage tree. Instead, set your CFLAGS and CXXFLAGS in /etc/portage/make.conf. This way all packages will be compiled using the options you specify.
As you can see, CXXFLAGS is set to use all the options present in CFLAGS. This is what you'll want almost without fail. You shouldn't ever need to specify additional options in CXXFLAGS.
While CFLAGS and CXXFLAGS can be very effective means of getting source code to produce smaller and/or faster binaries, they can also impair the function of your code, bloat its size, slow down its execution time, or even cause compilation failures!
CFLAGS are not a magic bullet; they will not automatically make your system run any faster or your binaries to take up less space on disk. Adding more and more flags in an attempt to optimize (or "rice") your system is a sure recipe for failure. There is a point at which you will reach diminishing returns.
Despite the bragging you'll find on the internet, aggressive CFLAGS and CXXFLAGS are far more likely to harm your programs than do them any good. Keep in mind that the reason the flags exist in the first place is because they are designed to be used at specific places for specific purposes. Just because one particular CFLAG is good for one bit of code doesn't mean that it is suited to compiling everything you will ever install on your machine!
Now that you're aware of some of the risks involved, let's take a look at some sane, safe optimizations for your computer. These will hold you in good stead and will endear you to developers the next time you report a problem on Bugzilla. (Developers will usually request that you recompile a package with minimal CFLAGS to see if the problem persists. Remember, aggressive flags can ruin code.)
The goal behind using CFLAGS and CXXFLAGS is to create code tailor-made to your system; it should function perfectly while being lean and fast, if possible. Sometimes these conditions are mutually exclusive, so we'll stick with combinations known to work well. Ideally, they are the best available for any CPU architecture. We'll mention the aggressive flags later so you know what to look out for. We won't discuss every option listed on the
GCC manual (there are hundreds), but we'll cover the basic, most common flags.
The first and most important option is
-march. This tells the compiler what code it should produce for your processor architecture (or arch); it says that it should produce code for a certain kind of CPU. Different CPUs have different capabilities, support different instruction sets, and have different ways of executing code. The
-march flag will instruct the compiler to produce code specifically for your CPU, with all its capabilities, features, instruction sets, quirks, and so on.
Even though the CHOST variable in /etc/portage/make.conf specifies the general architecture used,
-march should still be used so that programs can be optimized for your specific processor. x86 and x86-64 CPUs (among others) should make use of the
What kind of CPU do you have? To find out, run the following command:
Now let's see
-march in action. This example is for an older Pentium III chip:
Here's another one for a 64-bit AMD CPU:
If you still aren't sure what kind of CPU you have, you may just want to use
-march=native. When this flag is used, GCC will detect your processor and automatically set appropriate flags for it. However, this should not be used if you intend to compile packages for a different CPU!
So if you're compiling packages on one computer, but intend to run them on a different computer (such as when using a fast computer to build for an older, slower machine), then do not use
-march=native. "Native" means that the code produced will run only on that type of CPU. The applications built with
-march=native on an AMD Athlon 64 CPU will not be able to run on an old VIA C3 CPU.
Also available are the
-mcpu flags. These flags are normally only used when there is no available
-march option; certain processor architectures may require
-mtune or even
GCC's behavior isn't very consistent with how each flag behaves from one architecture to the next.
On x86 and x86-64 CPUs,
-march will generate code specifically for that CPU using all its available instruction sets and the correct ABI; it will have no backwards compatibility for older/different CPUs. If you don't need to execute code on anything other than the system you're running Gentoo on, continue to use
-march. You should only consider using
-mtune when you need to generate code for older CPUs such as i386 and i486.
-mtune produces more generic code than
-march; though it will tune code for a certain CPU, it doesn't take into account available instruction sets and ABI. Don't use
-mcpu on x86 or x86-64 systems, as it is deprecated for those arches.
Only non-x86/x86-64 CPUs (such as Sparc, Alpha, and PowerPC) may require
-mcpu instead of
-march. On these architectures,
-mcpu will sometimes behave just like
-march (on x86/x86-64)... but with a different flag name. Again,
GCC's behavior and flag naming just isn't consistent across architectures, so be sure to check the
GCC manual to determine which one you should use for your system.
Next up is the
-O variable. This controls the overall level of optimization. This makes the code compilation take somewhat more time, and can take up much more memory, especially as you increase the level of optimization.
There are seven
-Ofast. You should use only one of them in /etc/portage/make.conf.
With the exception of
-O settings each activate several additional flags, so be sure to read the GCC manual's chapter on optimization options to learn which flags are activated at each
-O level, as well as some explanations as to what they do.
Let's examine each optimization level:
-O0: This level (that's the letter "O" followed by a zero) turns off optimization entirely and is the default if no
-Olevel is specified in CFLAGS or CXXFLAGS. This reduces compilation time and can improve debugging info, but some applications will not work properly without optimization enabled. This option is not recommended except for debugging purposes.
-O1: This is the most basic optimization level. The compiler will try to produce faster, smaller code without taking much compilation time. It's pretty basic, but it should get the job done all the time.
-O2: A step up from
-O1. This is the recommended level of optimization unless you have special needs.
-O2will activate a few more flags in addition to the ones activated by
-O2, the compiler will attempt to increase code performance without compromising on size, and without taking too much compilation time.
-O3: This is the highest level of optimization possible. It enables optimizations that are expensive in terms of compile time and memory usage. Compiling with
-O3is not a guaranteed way to improve performance, and in fact in many cases can slow down a system due to larger binaries and increased memory usage.
-O3is also known to break several packages. Therefore, using
-O3is not recommended.
-Os: This option will optimize your code for size. It activates all
-O2options that don't increase the size of the generated code. It can be useful for machines that have extremely limited disk storage space and/or have CPUs with small cache sizes.
-Og: In GCC 4.8, a new general optimization level,
-Og, has been introduced. It addresses the need for fast compilation and a superior debugging experience while providing a reasonable level of runtime performance. Overall experience for development should be better than the default optimization level
-O0. Note that
-Ogdoes not imply
-g, it simply disables optimizations that may interfere with debugging.
-Ofast: New in GCC 4.7, consists of
-fstack-arrays. This option breaks strict standards compliance, and is not recommended for use.
As previously mentioned,
-O2 is the recommended optimization level. If package compilation fails and you aren't using
-O2, try rebuilding with that option. As a fallback option, try setting your CFLAGS and CXXFLAGS to a lower optimization level, such as
-O1 or even
-O0 -g2 -ggdb (for error reporting and checking for possible problems).
A common flag is
-pipe . This flag actually has no effect on the generated code, but it makes the compilation process faster. It tells the compiler to use pipes instead of temporary files during the different stages of compilation, which uses more memory. On systems with low memory, GCC might get killed. In that case, do not use this flag.
This is a very common flag designed to reduce generated code size. It is turned on at all levels of
-O0) on architectures where doing so does not interfere with debugging (such as x86-64), but you may need to activate it yourself by adding it to your flags. Though the
GCC manual does not specify all architectures it is turned on by using
-O, you will need to explicitly activate it on x86. However, using this flag will make debugging hard to impossible.
In particular, it makes troubleshooting applications written in Java much harder, though Java is not the only code affected by using this flag. So while the flag can help, it also makes debugging harder; backtraces in particular will be useless. However, if you don't plan to do much software debugging and haven't added any other debugging-related CFLAGS such as
-ggdb, then you can try using
-msse, -msse2, -msse3, -mmmx, -m3dnow
These flags enable the SSE, SSE2, SSE3, MMX, and 3DNow! instruction sets for x86 and x86-64 architectures. These are useful primarily in multimedia, gaming, and other floating point-intensive computing tasks, though they also contain several other mathematical enhancements. These instruction sets are found in more modern CPUs.
You normally don't need to add any of these flags to /etc/portage/make.conf as long as you are using the correct
-march (for example,
-msse3). Some notable exceptions are newer VIA and AMD64 CPUs that support instructions not implied by
-march (such as SSE3). For CPUs like these you'll need to enable additional flags where appropriate after checking the output of
But I get better performance with -funroll-loops -fomg-optimize!
No, you only think you do because someone has convinced you that more flags are better. Aggressive flags will only hurt your applications when used system-wide. Even the
GCC manual says that using
-funroll-all-loops makes code larger and run more slowly. Yet for some reason, these two flags, along with
-fforce-addr, and similar flags, continue to be very popular among ricers who want the biggest bragging rights.
You don't need to use those flags globally in CFLAGS or CXXFLAGS. They will only hurt performance. They may make you sound like you have a high-performance system running on the bleeding edge, but they don't do anything but bloat your code and get your bugs marked INVALID or WONTFIX.
You don't need dangerous flags like these. Don't use them. Stick to the basics:
What about -O levels higher than 3?
Some users boast about even better performance obtained by using
-O9, and so on, but the reality is that
-O levels higher than 3 have no effect. The compiler may accept CFLAGS like
-O4, but it actually doesn't do anything with them. It only performs the optimizations for
-O3, nothing more.
Need more proof? Examine the
code source code:
As you can see, any value higher than 3 is treated as just
What about redundant flags?
Oftentimes CFLAGS and CXXFLAGS that are turned on at various
-O levels are specified redundantly in /etc/portage/make.conf. Sometimes this is done out of ignorance, but it is also done to avoid flag filtering or flag replacing.
Flag filtering/replacing is done in many of the ebuilds in the Portage tree. It is usually done because packages fail to compile at certain
-O levels, or when the source code is too sensitive for any additional flags to be used. The ebuild will either filter out some or all CFLAGS and CXXFLAGS, or it may replace
-O with a different level.
The Gentoo Developer Manual outlines where and how flag filtering/replacing works.
It's possible to circumvent
-O filtering by redundantly listing the flags for a certain level, such as
-O3, by doing things like:
However, this is not a smart thing to do. CFLAGS are filtered for a reason! When flags are filtered, it means that it is unsafe to build a package with those flags. Clearly, it is not safe to compile your whole system with
-O3 if some of the flags turned on by that level will cause problems with certain packages. Therefore, you shouldn't try to "outsmart" the developers who maintain those packages. Trust the developers. Flag filtering and replacing is done for your benefit! If an ebuild specifies alternative flags, then don't try to get around it.
You will most likely continue to run into problems when you build a package with unacceptable flags. When you report your troubles on Bugzilla, the flags you use in /etc/portage/make.conf will be readily visible and you will be told to recompile without those flags. Save yourself the trouble of recompiling by not using redundant flags in the first place! Don't just automatically assume that you know better than the developers.
What about LDFLAGS?
The Gentoo developers have already set basic, safe LDFLAGS in the base profiles, so you don't need to change them.
Can I use per-package flags?
Information on how to use per-package environment variables (including CFLAGS) is described in the Gentoo Handbook, "Per-Package Environment Variables".
The following resources are of some help in further understanding optimization:
- Chapter 5 of the Gentoo Installation Handbooks
- The Gentoo Forums
We would like to thank the following authors and editors for their contributions to this guide: