GCC optimization/ko

이 안내서에서는 안전하고 멀쩡한 CFLAGS와 CXXFLAGS를 사용하여 컴파일한 코드를 최적화 하는 방법을 소개합니다. 일반적으로 최적화 하기 이전의 이론적인 내용도 설명합니다.

CFLAGS와 CXXFLAGS란 뭔가요?
CFLAGS 와 CXXFLAGS 는 소스 코드를 컴파일할 때 어떤 종류의 스위치를 사용할지 GNU 컴파일러 모음에 알려주는 환경 변수입니다. CFLAGS 는 C로 작성한 코드용, CXXFLAGS 는 C++로 작성한 코드용 변수입니다.

이 변수는 프로그램에 대한 많은 양의 디버그 메시지를 줄여주거나 오류 경고 수준을 높이고, 물론 생산 코드의 최적화 수준을 조절하는데 사용할 수도 있습니다. GCC 설명서 에서는 이들 변수에서 사용할 수 있는 옵션과 목적에 대한 완전한 목록을 제공합니다.

어떻게 사용하나요?
CFLAGS and CXXFLAGS can be used in two ways. First, they can be used per-program with Makefiles generated by the automake program.

However, this should not be done when installing packages found in the Portage tree. Instead, for Gentoo-based machines, set the  and   variables in  This way all packages will be compiled using the options specified in {{Path|make.conf}

As seen in the example above the  variable is set to use all the options present in. Most every system should be configured in this manner; additional options for  are extremely rare in common use cases.

오해
While  and   can be very effective means of getting source code to produce smaller and/or faster binaries, they can also impair the function of the code, bloat its size, slow down its execution time. Setting them incorrectly can even cause compilation failures!

are not a magic bullet; they will not automatically make the system run faster or reduce the size of binaries on the disk. Adding too many flags in an attempt to optimize (or "rice") the system is a sure recipe for failure. The point of diminishing returns is reached rather quickly when dealing with.

Despite the boasts and brags found on the internet, aggressive  and   are far more likely to harm binaries than to do any good. Keep in mind the flags are designed to be used at specific places for specific purposes. Few flags work as intended globally.

준비됐죠?
Being aware of the risks involved, take a look at some sane, safe optimizations. These will hold in good stead and will be endearing to developers the next time a problem is reported on Bugzilla. (Developers will usually request the user to recompile a package with minimal  to see if the problem persists. Remember: aggressive flags can ruin code!)

기본
The goal behind  and   is to create code tailor-made to your system; it should function perfectly while being lean and fast, if possible. Sometimes these conditions are mutually exclusive, so this guide will stick to combinations known to work well. Ideally, they are the best available for any CPU architecture. For informational purposes, aggressive flag use will be covered later. Not every option listed on the GCC manual (there are hundreds) will be discussed, but basic, most common flags will be reviewed.

-march
The first and most important option is. This tells the compiler what code it should produce for the system's processor architecture (or arch); it tells GCC that it should produce code for a certain kind of CPU. Different CPUs have different capabilities, support different instruction sets, and have different ways of executing code. The  flag will instruct the compiler to produce specific code the system's CPU, with all its capabilities, features, instruction sets, quirks, and so on.

Even though the  variable in  specifies the general architecture used,   should still be used so that programs can be optimized for the system specific processor. x86 and x86-64 CPUs (among others) should make use of the  flag.

What kind of CPU does the system have? To find out, run the following command:

와  값에 대한 자세한 내용을 살펴보려면 다음 명령을 사용하십시오:

Now lets see  in action. This example is for an older Pentium III chip:

64-bit AMD CPU에 대한 또 다른 설정 내용입니다:

If the type of CPU is undetermined, or if the user does not know what setting to choose, it is possible use the  setting. When this flag is used, GCC will attempt to detect the processor and automatically set appropriate flags for it. However, this should not be used when intending to compile packages for different CPUs!

If compiling packages on one computer in order to run them on a different computer (such as when using a fast computer to build for an older, slower machine), then do not use. "Native" means that the code produced will run only on that type of CPU. The applications built with  on an AMD Athlon 64 CPU will not be able to run on an old VIA C3 CPU.

Also available are the  and   flags. These flags are normally only used when there is no available  option; certain processor architectures may require   or even. Unfortunately, GCC's behavior isn't very consistent with how each flag behaves from one architecture to the next.

On x86 and x86-64 CPUs,  will generate code specifically for that CPU using its available instruction sets and the correct ABI; it will have no backwards compatibility for older/different CPUs. Consider using  when generating code for older CPUs such as i386 and i486. produces more generic code than ; though it will tune code for a certain CPU, it does not take into account available instruction sets and ABI. Do not use  on x86 or x86-64 systems, as it is deprecated for those arches.

Only non-x86/x86-64 CPUs (such as Sparc, Alpha, and PowerPC) may require  or   instead of. On these architectures,  /   will sometimes behave just like   (on x86/x86-64) but with a different flag name. Again, GCC's behavior and flag naming is not consistent across architectures, so be sure to check the GCC manual to determine which one should be used.

-O
Next up is the  variable. This variable controls the overall level of optimization. Changing this value will make the code compilation take more time and will use much more memory, especially as the level of optimization is increased.

There are seven  settings: ,  ,  ,  ,  ,  , and. Only use one of them in

는 예외로 간주하고, 각각의  설정은 몇가지 추가 플래그를 활성화 하므로, GCC 메뉴얼의 최적화 옵션 장을 읽어 각각의   레벨에서 어떤 플래그를 활성화 하는지, 이들이 각각 어떤 동작을 취하는지 알아보십시오.

Let us examine each optimization level:


 * : This level (that is the letter "O" followed by a zero) turns off optimization entirely and is the default if no  level is specified in   or  . This reduces compilation time and can improve debugging info, but some applications will not work properly without optimization enabled. This option is not recommended except for debugging purposes.


 * : the most basic optimization level. The compiler will try to produce faster, smaller code without taking much compilation time. It is basic, but it should get the job done all the time.


 * : A step up from . The recommended level of optimization unless the system has special needs.   will activate a few more flags in addition to the ones activated by  . With , the compiler will attempt to increase code performance without compromising on size, and without taking too much compilation time.


 * : the highest level of optimization possible. It enables optimizations that are expensive in terms of compile time and memory usage. Compiling with   is not a guaranteed way to improve performance, and in fact, in many cases, can slow down a system due to larger binaries and increased memory usage.   is also known to break several packages. Using   is not recommended.


 * : optimizes code for size. It activates all  options that do not increase the size of the generated code. It can be useful for machines that have extremely limited disk storage space and/or CPUs with small cache sizes.


 * : GCC4.8에 새로운 일반 최적화 레벨 를 도입했습니다. 빠른 컴파일을 필요로 하며 실행시간 성능의 타당한 수준을 제공하면서 우수한 디버깅 경험을 할 수 있게 바로 잡았습니다. 개발에 있어 전체적인 경험은 기본 최적화 레벨  보단 낫습니다. 참고로  는  를 의미하지 않으며, 디버깅에 혼란을 주는 최적화 기능을 끌 뿐입니다.


 * : GCC 4.7에서 새로 도입했으며,,   ,  , and  로 이루어져 있습니다. 이 옵션은 엄격한 표준 준수를 깨며, 사용을 권장하지 않습니다.

As previously mentioned,  is the recommended optimization level. If package compilation fails and while not using, try rebuilding with that option. As a fallback option, try setting the  and   to a lower optimization level, such as   or even   (for error reporting and checking for possible problems).

-pipe
A common flag is. This flag has no effect on the generated code, but it makes the compilation process faster. It tells the compiler to use pipes instead of temporary files during the different stages of compilation, which uses more memory. On systems with low memory, GCC might get killed. In those cases do not use this flag.

-fomit-frame-pointer
This is a very common flag designed to reduce generated code size. It is turned on at all levels of  (except  ) on architectures where doing so does not interfere with debugging (such as x86-64), but it may need to activated. In that case add it to the flags. Though the GCC manual does not specify all architectures, it is turned on by using the  option. It's still necessary to explicitly enable the  option, to activate it on x86-32 with GCC up to version 4.6, or when using   on x86-32 with any version of GCC. However, using  will make debugging hard or impossible.

In particular, it makes troubleshooting applications written in Java much harder, though Java is not the only code affected by using this flag. So while the flag can help, it also makes debugging harder; backtraces in particular will be useless. When not doing software debugging and no other debugging-related CFLAGS such as  have been used, then try using.

-msse, -msse2, -msse3, -mmmx, -m3dnow
These flags enable the Streaming SIMD Extentions (SSE), SSE2, SSE3, MMX, and 3DNow! instruction sets for x86 and x86-64 architectures. These are useful primarily in multimedia, gaming, and other floating point-intensive computing tasks, though they also contain several other mathematical enhancements. These instruction sets are found in more modern CPUs.

Normally none of these flags need to be added to, as long as the system is using the correct  (for example,   implies  ). Some notable exceptions are newer VIA and AMD64 CPUs that support instructions not implied by  (such as SSE3). For CPUs like these additional flags will need to be enabled where appropriate after checking.

근데 -funroll-loops -fomg-optimize로 성능이 더 좋아졌는데요?!
No, you only think you do because someone has convinced you that more flags are better. Aggressive flags will only hurt applications when used system-wide. Even the GCC manual says that using  and   will make code larger and run more slowly. Yet for some reason, these two flags, along with,  ,  , and similar flags, continue to be very popular among ricers who want the biggest bragging rights.

사실은 이런 플래그 추가 사용이 굉장히 무모한 행위라는 점입니다. 어떤 플래그가 무슨 역할을 하는지에 대해서는 바람직한 젠투 포럼en 과 버그질라en 에서 확인해보십시오. 좋을게 하나도 없습니다!

You do not need to use those flags globally in  or. They will only hurt performance. They may make you sound like you have a high-performance system running on the bleeding edge, but they don't do anything but bloat the code and get your bugs marked INVALID or WONTFIX.

이런 위험한 플래그는 필요하지 않습니다. 사용하지 마십시오. 기본 플래그,  ,  에 집착하십시오.

3 보다 높은 -O 레벨은 어떤가요?
어떤 사용자는,   등의 플래그를 사용하여 더 나은 성능으로 끌어올렸다고 자랑하기까지 합니다만, 실제로는 3보다 큰   레벨은 효과가 없습니다. 와 같은 CFLAGS를 컴파일러가 받아들이겠지만, 실제로는 이들 플래그가 하는 일은 없습니다. 이상의 플래그는 그 이상의 최적화를 수행하지 않습니다.

증명이 좀 더 필요한가요? 소스 코드를 시험해보십시오:

보시는 바와 같이 3보다 큰 값은  처럼 취급합니다.

대상 머신이 아닌곳에서 컴파일은 어떤가요?
Some readers might wonder if compiling outside the target machine with a strictly inferior CPU or GCC sub-architecture will result in inferior optimization results (compared to a native compilation). The answer is simple: No. Regardless of the actual hardware on which the compilation takes place and the CHOST for which GCC was built, as long as the same arguments are used (except for ) and the same version of GCC is used (although minor version might be different), the resulting optimizations are strictly the same.

To exemplify, if Gentoo is installed on a machine whose GCC's CHOST is i686-pc-linux-gnu, and a Distcc server is setup on another computer whose GCC's CHOST is i486-linux-gnu, then there is no need to be afraid that the results would be less optimal because of the strictly inferior sub-architecture of the remote compiler and/or hardware. The result would be as optimized as a native build, as long as the same options are passed to both compilers (and the  parameter doesn't get a   argument). In this particular case the target architecture needs to be specified explicitly as explained in Distcc and -march=native.

The only difference in behavior between two GCC versions built targeting different sub-architectures is the implicit default argument for the  parameter, which is derived from the GCC's CHOST when not explicitly provided in the command line.

중복 플래그는 무엇인가요?
종종 다양한  레벨로 맞춰놓은 CFLAGS 와 CXXFLAGS 값은 에 중복 지정되어 있습니다. 가끔은 무시하는걸로 끝나지만, 플래그를 걸러내거나 플래그를 바꾸는 일을 막아주기도 합니다.

포티지 트리에서 대부분의 이빌드가 플래그를 걸러내거나 바꿉니다. 어떤  레벨에 대해서는 꾸러미에서 컴파일 오류가 나거나 추가 플래그를 사용했을 경우 소스코드가 민감하게 동작하기 때문에 이렇게 처리합니다. 이빌드는 CFLAGS와 CXXFLAGS 둘 중 하나 또는 전부를 걸러내거나,  레벨을 다른 레벨로 바꿉니다.

The Gentoo Developer Manual outlines where and how flag filtering/replacing works.

It's possible to circumvent  filtering by redundantly listing the flags for a certain level, such as , by doing things like:

그러나 이건 현명한 방법이 아닙니다. CFLAGS를 어떤 이유로 무시할 수 있습니다. 플래그를 가려 인식하면, 해당 플래그로 구러미를 빌드하는것이 안전하지 않음을 의미합니다. 분명하게 말해서 어떤 꾸러미에 대해  레벨로 플래그를 활성화하면 문제가 생길 경우, 이 레벨로 전체 시스템을 컴파일 하는게 안전하지 않다는 의미가 됩니다. 따라서 꾸러미를 관리하는 개발자보다 "앞서 나가려" 하지 마십시오. "개발자를 믿으십시오". 플래그를 선별하고 대체하는건 이미 여러분들을 위해 끝냈습니다! 이빌드에 다른 플래그를 정의했다면 다른곳에 넣으려 하지 마십시오.

허용할 수 없는 플래그로 꾸러미를 빌드하면, 문제로 거의 직면하게 됩니다. 버그질라에 이 문제를 보고할 때, 에 사용하는 플래그가 분명히 나타나며, 누군가가 해당 플래그를 빼고 다시 컴파일하라고 알려줄겁니다. 처음에 언급한대로 중복 플래그를 빼서 다시 컴파일하는일이 없도록 하십시오! 개발자들보다 여러분이 더 잘 알거라고 멋대로 판단하지 마십시오.

LDFLAGS란 무엇인가요?
The Gentoo developers have already set basic, safe LDFLAGS in the base profiles, so they do not need to be changed.

패키지별로 플래그를 사용해도 되나요?
패키지별 환경 변수 사용법(CFLAGS 포함)은 젠투 핸드북, "꾸러미별 환경 변수"편에 설명했습니다.

자료
다음 자료는 최적화에 대해 더 이해하는데 도움이 될 것입니다:


 * GCC 온라인 문서


 * 젠투 설치 핸드북 5장


 * man make.conf


 * 위키피디아


 * 젠투 포럼