GCC optimization/ko

이 안내서에서는 안전하고 멀쩡한 CFLAGS와 CXXFLAGS를 사용하여 컴파일한 코드를 최적화 하는 방법을 소개합니다. 일반적으로 최적화 하기 이전의 이론적인 내용도 설명합니다.

CFLAGS와 CXXFLAGS란게 뭔가요?
CFLAGS와 CXXFLAGS는 소스코드를 컴파일할 때 어떤 종류의 스위치를 사용할지 GNU 컴파일러 모음 에 알려주는 환경 변수입니다. CFLAGS는 C로 작성한 코드용, CXXFLAGS는 C++로 작성한 코드용 변수입니다.

이 변수는 프로그램에 대한 많은 양의 디버그 메시지를 줄여주거나 오류 경고 수준을 높이고, 물론 생산 코드의 최적화 수준을 조절하는데 사용할 수도 있습니다. GCC 설명서 에서는 이들 변수에서 사용할 수 있는 옵션과 목적에 대한 완전한 목록을 제공합니다.

어떻게 사용하나요?
CFLAGS와 CXXFLAGS는 두가지 방식으로 사용합니다. 첫번째 방법으로는 automake가 만든 MakeFile에서 프로그램별로 사용할 수 있습니다.

그러나 포티지 트리에서 설치 패키지를 찾았을때는 이걸 활용할 수는 없습니다. 대신 의 CFLAGS와 CXXFLAGS를 설정합니다. 이 방식으로 여러분이 지정한 옵션을 사용하여 모든 패키지를 컴파일합니다.

/etc/portage/make.conf의 CFLAGS

보시는 바와 같이, CXXFLAGS는 CFLAGS에 나타나는 모든 옵션을 사용하는 집합입니다. 이 방식이야 말로 거의 별다른 문제 없이 처리하길 원하는 방법입니다. CXXFLAGS에 추가 옵션을 지정할 필요조차도 없습니다.

오해
CFLAGS와 CXXFLAGS는 소스 코드를 작고 동작이 빠른 바이너리로 만드는데 매우 효율적인 수단이 될 수 있음을 의미하기도 하지만, 코드 기능을 망가뜨리거나, 바이너리 크기를 키우기도 하고, 실행 시간을 늦추며, 심지어는 컴파일 실패를 야기하기도 합니다.

CFLAGS가 만병 통치약 같은건 아닙니다. 시스템을 자동적으로 좀 더 빠르게 동작하게 하거나 디스크상에서 바이너리가 적은 공간을 차지하게 하진 않습니다. 시스템을 최적화(또는 "성능을 좋게") 하려는 플래그를 추가하면 할수록 골로 가게 하는 확실한 방법이 됩니다. 그러니까 .. 성능을 감소시키는 시점이 있습니다.

인터넷에서 찾아보겠다고 큰소리 치실지 모르겠지만, CFLAGS와 CXXFLAGS를 과감하게 이것저것 설정하는 것은 오히려 좋은 상황으로 끌고가기 보다는 프로그램을 더욱 안좋게 할 수가 있습니다. 특별한 목적으로 특별한 시점에서 플래그를 사용하도록 설계한것이 처음 장소에 플래그가 존재하는 이유임을 기억하십시오. CFLAG 일부는 코드 일부에 좋을 뿐이지만 이것이 결코 머신에 설치하는 모든 컴파일 요소에 맞춰진 것임을 의미하는게 아닙니다.

준비됐죠?
이제 약간의 위험성이 있다는 사실을 인지하고, 여러분의 컴퓨터에 멀쩡하고 안전한 최적화를 수행하도록 해보겠습니다. 여러분께 도움이 될 것이고 버그질라에 문제를 알리면 개발자들에게 촉망받을 것입니다. (개발자들은 종종 어떤 문제가 집요하게 나타나면 최소한의 CFLAGS로 패키지를 다시 컴파일 하라고 합니다. 과감한 플래그 설정은 오히려 코드를 제대로 동작하지 못하게 함을 기억하십시오.)

기본
CFLAGS와 CXXFLAGS를 사용하는 목적은 시스템에 코드를 잘 다듬어 놓기 위함입니다. 가능하면 잘 빠지고 빠르게 제 기능을 완벽하게 다 할 것입니다. 가끔은 상호간에 배타적이어서 두 요소가 잘 동작하게 붙들고 있을 때도 있습니다. 이상적으로는 어떤 CPU 아키텍처에든 잘 돌아갑니다. 후반에 적극적인 플래그를 언급하여 여러분이 알아보고자 하는 바를 알 수 있게끔 할 것입니다. 설명서에 있는 모든 옵션(수백개!)에 대해 언급하지 않겠지만 대부분 기본적이고 일반적인 플래그를 다루도록 하겠습니다.

-march
The first and most important option is. This tells the compiler what code it should produce for your processor architecture (or arch); it says that it should produce code for a certain kind of CPU. Different CPUs have different capabilities, support different instruction sets, and have different ways of executing code. The  flag will instruct the compiler to produce code specifically for your CPU, with all its capabilities, features, instruction sets, quirks, and so on.

Even though the CHOST variable in specifies the general architecture used,   should still be used so that programs can be optimized for your specific processor. x86 and x86-64 CPUs (among others) should make use of the  flag.

What kind of CPU do you have? To find out, run the following command:

Now let's see  in action. This example is for an older Pentium III chip:

/etc/portage/make.conf: Pentium III

64-bit AMD CPU에 대한 또 다른 설정 내용입니다:

/etc/portage/make.conf: AMD64

If you still aren't sure what kind of CPU you have, you may just want to use. When this flag is used, GCC will detect your processor and automatically set appropriate flags for it. However, this should not be used if you intend to compile packages for a different CPU!

So if you're compiling packages on one computer, but intend to run them on a different computer (such as when using a fast computer to build for an older, slower machine), then do not use. "Native" means that the code produced will run only on that type of CPU. The applications built with  on an AMD Athlon 64 CPU will not be able to run on an old VIA C3 CPU.

Also available are the  and   flags. These flags are normally only used when there is no available  option; certain processor architectures may require   or even. Unfortunately, 's behavior isn't very consistent with how each flag behaves from one architecture to the next.

On x86 and x86-64 CPUs,  will generate code specifically for that CPU using all its available instruction sets and the correct ABI; it will have no backwards compatibility for older/different CPUs. If you don't need to execute code on anything other than the system you're running Gentoo on, continue to use. You should only consider using  when you need to generate code for older CPUs such as i386 and i486. produces more generic code than ; though it will tune code for a certain CPU, it doesn't take into account available instruction sets and ABI. Don't use  on x86 or x86-64 systems, as it is deprecated for those arches.

Only non-x86/x86-64 CPUs (such as Sparc, Alpha, and PowerPC) may require  or   instead of. On these architectures,  /   will sometimes behave just like   (on x86/x86-64)... but with a different flag name. Again, 's behavior and flag naming just isn't consistent across architectures, so be sure to check the   manual to determine which one you should use for your system.

-O
Next up is the  variable. This controls the overall level of optimization. This makes the code compilation take somewhat more time, and can take up much more memory, especially as you increase the level of optimization.

There are seven  settings: ,  ,  ,  ,  ,  , and. You should use only one of them in.

With the exception of, the   settings each activate several additional flags, so be sure to read the GCC manual's chapter on optimization options to learn which flags are activated at each   level, as well as some explanations as to what they do.

Let's examine each optimization level:


 * : This level (that's the letter "O" followed by a zero) turns off optimization entirely and is the default if no  level is specified in CFLAGS or CXXFLAGS.  This reduces compilation time and can improve debugging info, but some applications will not work properly without optimization enabled.  This option is not recommended except for debugging purposes.


 * : This is the most basic optimization level. The compiler will try to produce faster, smaller code without taking much compilation time. It's pretty basic, but it should get the job done all the time.


 * : A step up from . This is the recommended level of optimization unless you have special needs.   will activate a few more flags in addition to the ones activated by  . With , the compiler will attempt to increase code performance without compromising on size, and without taking too much compilation time.


 * : This is the highest level of optimization possible. It enables optimizations that are expensive in terms of compile time and memory usage.  Compiling with   is not a guaranteed way to improve performance, and in fact in many cases can slow down a system due to larger binaries and increased memory usage.    is also known to break several packages.  Therefore, using   is not recommended.


 * : This option will optimize your code for size. It activates all  options that don't increase the size of the generated code. It can be useful for machines that have extremely limited disk storage space and/or have CPUs with small cache sizes.


 * : In GCC 4.8, a new general optimization level,, has been introduced. It addresses the need for fast compilation and a superior debugging experience while providing a reasonable level of runtime performance. Overall experience for development should be better than the default optimization level  .  Note that   does not imply  , it simply disables optimizations that may interfere with debugging.


 * : New in GCC 4.7, consists of  plus ,  , and  . This option breaks strict standards compliance, and is not recommended for use.

As previously mentioned,  is the recommended optimization level. If package compilation fails and you aren't using, try rebuilding with that option. As a fallback option, try setting your CFLAGS and CXXFLAGS to a lower optimization level, such as  or even   (for error reporting and checking for possible problems).

-pipe
A common flag is. This flag actually has no effect on the generated code, but it makes the compilation process faster. It tells the compiler to use pipes instead of temporary files during the different stages of compilation, which uses more memory. On systems with low memory, GCC might get killed. In that case, do not use this flag.

-fomit-frame-pointer
This is a very common flag designed to reduce generated code size. It is turned on at all levels of  (except  ) on architectures where doing so does not interfere with debugging (such as x86-64), but you may need to activate it yourself by adding it to your flags. Though the  manual does not specify all architectures it is turned on by using , you will need to explicitly activate it on x86. However, using this flag will make debugging hard to impossible.

In particular, it makes troubleshooting applications written in Java much harder, though Java is not the only code affected by using this flag. So while the flag can help, it also makes debugging harder; backtraces in particular will be useless. However, if you don't plan to do much software debugging and haven't added any other debugging-related CFLAGS such as, then you can try using.

-msse, -msse2, -msse3, -mmmx, -m3dnow
These flags enable the SSE, SSE2, SSE3, MMX, and 3DNow! instruction sets for x86 and x86-64 architectures. These are useful primarily in multimedia, gaming, and other floating point-intensive computing tasks, though they also contain several other mathematical enhancements. These instruction sets are found in more modern CPUs.

You normally don't need to add any of these flags to as long as you are using the correct   (for example,   implies  ). Some notable exceptions are newer VIA and AMD64 CPUs that support instructions not implied by  (such as SSE3). For CPUs like these you'll need to enable additional flags where appropriate after checking the output of.

근데 -funroll-loops -fomg-optimize로 성능이 더 좋아졌는데요?!
No, you only think you do because someone has convinced you that more flags are better. Aggressive flags will only hurt your applications when used system-wide. Even the  manual says that using   and   makes code larger and run more slowly. Yet for some reason, these two flags, along with,  ,  , and similar flags, continue to be very popular among ricers who want the biggest bragging rights.

The truth of the matter is that they are dangerously aggressive flags. Take a good look around the Gentoo Forums and Bugzilla to see what those flags do: nothing good!

You don't need to use those flags globally in CFLAGS or CXXFLAGS. They will only hurt performance. They may make you sound like you have a high-performance system running on the bleeding edge, but they don't do anything but bloat your code and get your bugs marked INVALID or WONTFIX.

You don't need dangerous flags like these. Don't use them. Stick to the basics:,  , and.

3 보다 높은 -O 레벨은 어떤가요?
Some users boast about even better performance obtained by using,  , and so on, but the reality is that   levels higher than 3 have no effect. The compiler may accept CFLAGS like, but it actually doesn't do anything with them. It only performs the optimizations for, nothing more.

Need more proof? Examine the  source code:

-O 소스 코드

As you can see, any value higher than 3 is treated as just.

중복 플래그는 무엇인가요?
Oftentimes CFLAGS and CXXFLAGS that are turned on at various  levels are specified redundantly in. Sometimes this is done out of ignorance, but it is also done to avoid flag filtering or flag replacing.

Flag filtering/replacing is done in many of the ebuilds in the Portage tree. It is usually done because packages fail to compile at certain  levels, or when the source code is too sensitive for any additional flags to be used. The ebuild will either filter out some or all CFLAGS and CXXFLAGS, or it may replace  with a different level.

The Gentoo Developer Manual outlines where and how flag filtering/replacing works.

It's possible to circumvent  filtering by redundantly listing the flags for a certain level, such as , by doing things like:

중복 CFLAGS 지정

However, this is not a smart thing to do. CFLAGS are filtered for a reason! When flags are filtered, it means that it is unsafe to build a package with those flags. Clearly, it is not safe to compile your whole system with  if some of the flags turned on by that level will cause problems with certain packages. Therefore, you shouldn't try to "outsmart" the developers who maintain those packages. Trust the developers. Flag filtering and replacing is done for your benefit! If an ebuild specifies alternative flags, then don't try to get around it.

You will most likely continue to run into problems when you build a package with unacceptable flags. When you report your troubles on Bugzilla, the flags you use in will be readily visible and you will be told to recompile without those flags. Save yourself the trouble of recompiling by not using redundant flags in the first place! Don't just automatically assume that you know better than the developers.

What about LDFLAGS?
The Gentoo developers have already set basic, safe LDFLAGS in the base profiles, so you don't need to change them.

Can I use per-package flags?
Information on how to use per-package environment variables (including CFLAGS) is described in the Gentoo Handbook, "Per-Package Environment Variables".

자료
다음 자료는 최적화에 대해 더 이해하는데 도움이 될 것입니다:


 * GCC 온라인 문서


 * 젠투 설치 핸드북 5장




 * 위키피디아


 * 젠투 포럼

감사문
이 안내서에 제공한 노고에 대해 다음 작성자와 편집자분들께 감사의 말을 전하고자 합니다:


 * nightmorph