Otimização do GCC

This page is a translated version of the page GCC optimization and the translation is 27% complete.

Outdated translations are marked like this.

Other languages:

Este guia fornece uma introdução à otimização de códigos usando CFLAGS e CXXFLAGS seguras e sensatas. Também descreve a teoria por trás da otimização em geral.

Default CFLAGS can be set in make.conf for Gentoo systems. CFLAGS can also be specified per-package.

See also
For more information see CFLAGS and CXXFLAGS in the Gentoo Handbook, and the safe CFLAGS article. See also the FAQ.

Introdução

O que são CFLAGS e CXXFLAGS?

CFLAGS e CXXFLAGS estão entre as variáveis de ambiente convencionalmente usadas para especificar opções do compilador para um sistema de compilação ao compilar códigos C e C++. Enquanto estas variáveis não são padronizadas, o seu uso é essencialmente onipresente e qualquer construção escrita corretamente deve compreender estas para passar opções extras ou opções personalizadas quando se invoca o compilador. Veja a página de informação do GNU make para uma lista de algumas das variáveis comumente usadas nesta categoria.

Como uma grande proporção dos pacotes que compõem o sistema do Gentoo são escritos em C e C++, estas são duas variáveis que certamente os administradores irão querer definir corretamente como elas vão influenciar significativamente a forma como grande parte do sistema é construída.

Elas podem ser usadas para diminuir a quantidade de mensagens de debug para um programa, aumentar os níveis de alertas de erro e, é claro, otimizar o código produzido. O Manual do GCC mantém uma lista completa das opções disponíveis e seus propósitos.

Como elas são usadas?

Normalmente, CFLAGS e CXXFLAGS seria definida no ambiente ao chamar um script de configuração ou com makefiles gerados pelo programa automake. Em sistemas baseados no Gentoo, defina as variáveis CFLAGS e CXXFLAGS no /etc/portage/make.conf. As variáveis definidas neste arquivo serão exportadas para o ambiente de programas chamados pelo Portage de tal forma que todos os pacotes serão compilados usando estas opções como base.

CODE Setting CFLAGS in /etc/portage/make.conf

CFLAGS="-march=skylake -O2 -pipe"
CXXFLAGS="${CFLAGS}"

Importante
Ainda que seja possível ter múltiplas linhas nas flags USE, tendo várias linhas na CFLAGS pode e "irá" resultar em problemas com programas como cmake. Certifique-se que as declarações da CFLAGS estejam em uma única linha, com o mínimo de espaço possível entre elas para evitar problemas. Veja bug #500034 como um exemplo.

Como vimos no exemplo anterior, a variável CXXFLAGS é definida para usar todas as opções presentes na CFLAGS. A maioria dos sistemas devem ser configurados desta maneira. Opções adicionais para as CXXFLAGS são menos comuns e usualmente não são aplicáveis de forma geral o suficiente para merecer colocá-los globalmente.

Dica
O artigo sobre CFLAGS seguras pode ajudar iniciantes a otimizarem seus sistemas.

Equívocos

Enquanto otimizações habilitadas por diversas CFLAGS podem ser úteis para produzir binários mais rápidos e/ou menores, elas também podem prejudicar a função do código, aumentar seu tamanho, o tempo de execução, ou apenas causar um erro de compilação. O ponto de performance reduzida é alcançado de forma bem rápida quando lidando com CFLAGS. Não as defina arbitrariamente.

Lembre-se, CFLAGS globais configuradas em /etc/portage/make.conf serão aplicadas a todos os pacotes no sistema, então administradores normalmente apenas definem opções gerais e amplamente aplicáveis. Pacotes individuais modificam essas opções no ebuild ou no próprio sistema de compilação para gerar o conjunto final de flags quando invocando o compilador.

Pronto?

Estando ciente dos riscos envolvidos, dê uma olhada em algumas otimizações seguras e sensatas. Estas serão mantidas em um bom lugar e serão agradáveis para os desenvolvedores da próxima vez que um problema for relatado no Bugzilla. (Os desenvolvedores irão, normalmente, solicitar aos usuários para recompilar um pacote com o mínimo de CFLAGS para ver se o problema persiste. Lembre-se: flags agressivas podem arruinar o código!)

Otimizando

O básico

O objetivo por trás da CFLAGS e CXXFLAGS é a criação de código feito sob medida para o sistema; deve funcionar perfeitamente ao ser enxuto e rápido, se possível. Às vezes estas condições são mutuamente exclusivas, assim este guia irá manter combinações conhecidas por funcionarem bem. Idealmente, elas são as melhores disponíveis para qualquer arquitetura de CPU. Para fins informativos, flags agressivas serão cobertas posteriormente. Nem toda opção listada no manual do GCC (há centenas) serão discutidas, mas basicamente, a maioria das flags serão revistas.

Nota
Quando desconhecer o que uma flag faz refira-se ao capítulo do manual do GCC se ainda desconhecer após ver o manual, tente um mecanismo de busca ou verifique a lista de discussão do GCC.

-march

A primeira e mais importante opção é -march. Isto informa ao compilador que código deve produzir para a arquitetura do processador (ou "arch"); ele informa ao GCC que deverá produzir código para um determinado tipo de CPU. Diferentes CPUs tem diferentes capacidades, suportam diferentes conjuntos de instruções e têm maneiras diferentes de execução de código. A flag -march irá instruir o compilador a produzir um código específico para a CPU do sistema, com todas as suas capacidades, recursos, conjunto de instruções, particularidades e assim por diante, considerando que o código-fonte esteja preparado para o uso das mesmas. Por exemplo, para utilizar dos benefícios de instruções AVX, o código-fonte deve estar preparado para as suportar.

-march é uma opção para a seleção do conjunto de instruções; ela diz ao compilador que ele pode usar instruções do conjunto definido. Em uma plataforma Intel/AMD64 com -march=native -O2 ou níveis otimização menores, o código provavelmente terá instruções AVX porém usando registradores SSE XMM menores. Para se aproveitar ao máximo de registradores AVX YMM, as opções -ftree-vectorize, -O3 ou -Ofast devem ser usadas também^[1].

-ftree-vectorize é uma opção de otimização (padrão em code>-O3 e -Ofast), na qual tenta vetorizar loops usando o conjunto de instruções definido se possível. O motivo desta flag não estar habilitada no nível -O2 é porquê nem sempre melhora o código, podendo deixá-lo mais lento, e normalmente deixando-o maior; isso depende muito de como o loop é.

Mesmo que a variável CHOST no /etc/portage/make.conf especifique a arquitetura geral usada, -march ainda deve ser usado de modo que os programas possam ser otimizados para o processador específico do sistema. CPUs x86 e x86-64 (entre outras) devem utilizar a flag -march.

Que tipo de CPU o sistema tem? Para encontrar, execute o seguinte comando:

user $cat /proc/cpuinfo

ou até mesmo instale app-portage/cpuid2cpuflags e adicione as opções específicas ao processador disponíveis ao arquivo /etc/portage/package.use/00cpuflags, que a ferramenta usa através, por exemplo, da variável CPU_FLAGS_X86:

user $cpuid2cpuflags

CPU_FLAGS_X86: aes avx avx2 f16c fma3 mmx mmxext pclmul popcnt sha sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3

root #echo "*/* $(cpuid2cpuflags)" >> /etc/portage/package.use/00cpuflags

Para obter maiores detalhes, incluindo os valores march e mtune, dois comandos podem ser usados.

O primeiro comando informa ao compilador não fazer nenhuma ligação (-c) e, ao invés de interpretar a opção --help para esclarecer as opções da linha de comando, ele agora irá mostrar se certas opções estão habilitadas ou desabilitadas (-Q). Neste caso, as opções mostradas são aquelas habilitadas para o alvo selecionado:
user $gcc -c -Q -march=native --help=target

O segundo comando irá mostrar as diretivas do compilador para construir o arquivo cabeçalho, mas sem realmente executar os passos e ao invés disto mostrá-los na tela (-###). A saída final é o comando que contém todas as opções de otimização e seleção de arquitetura:
user $gcc -### -march=native /usr/include/stdlib.h

Nota
The l2-cache-size option represents processor's last level cache (L2 or higher if present).^[2]

O recurso glibc-hwcaps (>=sys-libs/glibc-2.33) pode ser usado para definir -march para uma arquitetura de processador mais geral (for >=sys-devel/gcc-11):

user $/lib64/ld-linux-x86-64.so.2 --help

...

Subdirectories of glibc-hwcaps directories, in priority order:

x86-64-v4
x86-64-v3 (supported, searched)
x86-64-v2 (supported, searched)

x86_64 (supported, searched)

user $/lib64/ld-linux-x86-64.so.2 --help

...
Subdirectories of glibc-hwcaps directories, in priority order:
 x86-64-v4
 x86-64-v3 (supported, searched)
 x86-64-v2 (supported, searched)
 x86_64 (supported, searched)

Neste exemplo, o processador suporta x86-64-v3 psABI x86_64 e você pode usá-lo com -march=x86-64-v3

Agora vamos ver -march em ação. Este exemplo é para um antigo chip AMD Athlon 64:

FILE /etc/portage/make.confAMD64 example

CFLAGS="-march=athlon64"
CXXFLAGS="${CFLAGS}"

Aqui está outro exemplo para um processador Intel comum:

FILE /etc/portage/make.confIntel Core example

CFLAGS="-march=skylake"
CXXFLAGS="${CFLAGS}"

Se o tipo da CPU é indeterminado, ou se o usuário não sabe qual configuração escolher, é possível usar a configuração -march=native. Quando esta flag é usada, o GCC tentará detectar o processador e automaticamente definir as flags apropriadas para ele. Entretanto, isto não deve ser usado quando se pretende compilar pacotes para diferentes CPUs!

Se compilar pacotes em um computador a fim de executá-los em um computador diferente (tal como quando se utiliza um computador rápido para construir para uma máquina antiga, lenta), então nãouse -march=native. "Native" significa que o código produzido executará apenas naquele tipo de CPU. As aplicações construídas com -march=native em um processador Intel Core não serão capazes de executar em um antigo processador Intel Atom.

Também estão disponíveis as flags -mtune e -mcpu. Estas flags são normalmente usadas apenas quando não há a opção -march disponível; certas arquiteturas de processadores podem exigir -mtune ou até mesmo -mcpu. Infelizmente o comportamento do GCC não é muito consistente com a forma como cada flag se comporta de uma arquitetura para outra.

Em CPUs x86 e x86-64, -march irá gerar um código especificamente para esta CPU usando seu conjunto de instruções disponíveis e a correta ABI; ele não terá compatibilidade com versões anteriores para antigas/diferentes CPUs. Considere usar -mtune quando gerar códigos para antigas CPUs tais como i386 e i486. -mtune produz um código mais genérico do que -march; mas isto vai ajustar o código para uma determinada CPU, ele não leva em conta os conjuntos de instruções disponíveis e ABI. Não use -mcpu em sistemas x86 e x86-64, de forma que ele é obsoleto para estas arquiteturas.

Apenas em CPUs não x86/x86-64 (como ARM, SPARC, Alpha e PowerPC) pode exigir -mtune ou -mcpu ao invés de -march. Nestas arquiteturas, -mtune / -mcpu às vezes se comportam como -march (em x86/x86-64), com um nome de flag diferente. Mais uma vez, o comportamento do GCC e nomeação de flag não é consistente em todas as arquiteturas, por isso não deixe de verificar o manual do GCC para determinar qual deve ser usado.

Nota
Para mais sugestões de configurações para -march / -mtune / -mcpu, leia o capítulo 5 da arquitetura apropriada do Manual de Instalação. Também leia a lista do manual do GCC de otimizações para arquiteturas específicas, assim como explicações mais detalhadas sobre as diferenças entre -march, -mcpu, e -mtune.

Aviso
Não use -march=native ou -mtune=native nas variáveis CFLAGS ou CXXFLAGS do make.conf quando compilar com distcc.

-O

Aviso
Usar -O3 ou -Ofast pode causar falhas na compilação de alguns pacotes.

Nota
Para ver quais pacotes foram compilados com certas CFLAGS/CXXFLAGS use o seguinte comando: grep Ofast /var/db/pkg/*/*/CFLAGS

A seguir veremos a variável -O. Essa variável define o nível geral de otimização. Mudar esse valor fará com que a compilação demore mais tempo e use mais memória, especialmente quando seu valor é aumentado.

Existem sete configurações para o -O: -O0, -O1, -O2, -O3, -Os, -Og, e -Ofast. Use apenas uma em /etc/portage/make.conf.

Com a exceção de -O0, cada configuração do -O ativa diversas flags adicionais, então tenha certeza de ler o manual do GCC sobre opções de otimização para aprender quais flags são ativadas em cada nível de otimização, assim como explicações para o que fazem.

Vamos examinar cada nível de otimização:

-O0: Esse nível (a letra "O" seguido por um zero) desativa otimizações por completo e é o padrão caso nenhum nível de otimização seja especificado em CFLAGS ou CXXFLAGS. Ele reduz o tempo de compilação e melhora informações de depuração, porém algumas aplicações não funcionarão corretamente sem otimizações ativadas. Essa opção não é recomendada senão para própositos de depuração.

-O1: O nível de otimização mais baixo. O compilador tentará produzir código menor e mais rápido sem levar muito tempo na compilação. Esse nível é basico, mas cumpre seu propósito.

-O2: Um nível acima do -O1. Esse é o nível recomendado de otimização, a não ser que o sistema tenha necessidades especiais. -O2 ativará mais algumas flags em conjunto do -O1. Com -O2, o compilador tentará aumentar a performance sem comprometer o tamanho do código gerado, e sem levar muito tempo na compilação. SSE ou AVX podem ser utilizados neste nível, porém registradores YMM não serão mais utilizados a não ser que -ftree-vectorize também esteja ativado.

-O3: O maior nível de otimização possível. Ele ativa otimizações que são custosas em termos de tempo de compilação e uso de memória. Compilações com -O3 não são garantem a melhora da performance, e, na verdade, em muitos casos podem diminuir a performance do sistema por causa de binários maiores e maior uso de memória. -O3 também é conhecido por causar falhas a diversos pacotes. Usar -O3 não é recomendável. Porém, ele também ativa -ftree-vectorize fazendo com que loops sejam vetorizados e usem registradores AVX e YMM.

-Ofast: Disponível a partir do GCC 4.7, consiste de -O3 com -ffast-math, -fno-protect-parens, e -fstack-arrays. Essa opção viola padrões de conformidade estritos e não é recomendável para uso.

-Os: Otimiza código para tamanho. Ele ativa todas as opções do -O2 que não aumentam o tamanho do código gerado. Ele pode ser útil para máquinas que possuem um armazenamento extremamente limitado de disco e/ou processadores com pequenos caches.

-Oz: Introduced in GCC 12.1, more aggressively optimize for size than -Os. Note this will heavily degrade runtime performance than -O2, due to increasing the number of instructions executed if those instructions require fewer bytes to encode.

-Og: A partir do GCC 4.8, o nível de otimização -Og está disponível. Ele visa a necessidade de compilação rápida e uma melhor experiência de depuração enquanto proporciona um nível de performance razoável. No geral, a experiência para desenvolvimento é melhor do que o nível padrão de otimização (o nível -O0). Note que -Og não implica -g, ele simplesmente desativa otimizações que podem interferir com a depuração.

Como previamente dito, -O2 é o nível de otimização recomendado. Caso a compilação de um pacote falhe enquanto -O2 não está em uso, tente recompilar com ela. Como uma opção reserva, tente definir CFLAGS e CXXFLAGS com um nível menor de otimização, como -O1, ou até mesmo -O0 -g2 -ggdb (para o relatório de erros e checar por possíveis problemas).

-pipe

Uma flag comum é -pipe. Essa flag não possui efeito no código gerado, mas faz o processo de compilação mais rápido. Ele instrui o compilador a usar pipes ao invés de arquivos temporários durante os diferentes estágios de compilação, usando mais memória. Em sistemas com pouca memória, o GCC pode ser terminado forçadamente. Nestes casos, não use esta flags.

-fomit-frame-pointer

Essa é uma flag muito comum, feita para reduzir o tamanho do código gerado. Ela é ativada em todos os níveis de otimização (exceto em -O0) em arquiteturas onde isso não afete a depuração (como em x86-64), mas talvez seja necessário ativá-la manualmente. Nesse caso, adicione-a às flags. Apesar do manual do GCC não especificar todas as arquiteturas, ela é ativada usando a opção -O. Ainda é necessário ativar essa opção para ativá-la em x86-32 até a versão 4.6 do GCC, ou quando utilizando -Os em x86-32 em qualquer versão do GCC, Entretanto, usar -fomit-frame-pointer tornará a depuração difícil ou impossível.

Em particular, ela deixa a depuração de aplicações programadas em Java e compiladas pelo gcj muito mais difícil, mas o Java não é o único afetado por essa flag. Então, enquanto usar essa flag pode ajudar, ela também deixa a depuração mais difícil; backtraces em particular serão inúteis. Quando não fazendo a depuração de software e nenhuma outra depuração relacionada às CFLAGS como -ggdb foram usadas, então tente usar -fomit-frame-pointer.

Importante
Não combine -fomit-frame-pointer com a flag -momit-leaf-frame-pointer. Usar a segunda opção é desencorajado, pois -fomit-frame-pointer já faz seu trabalho corretamente. Além do mais, -momit-leaf-frame-pointer demonstrou impactos negativos na performance do código.

-msse, -msse2, -msse3, -mmmx, -m3dnow

Essas flags ativam os conjuntos de instruções Streaming SIMD Extensions (SSE), SSE2, SSE3, MMX, e 3DNow! para as arquiteturas x86 e x86-64. Elas são úteis para multimídia, jogos, e outras tarefas computacionais intensivas em pontos flutuantes, contendo também diversos outros aprimoramentos matemáticos. Esses conjuntos de instruções podem ser encontrados em processadores mais modernos.

Importante
Tenha certeza que o processador suporta esses conjuntos de instruções rodando cat /proc/cpuinfo. O resultado incluirá qualquer conjunto de instruções adicional. Nota: pni é apenas um nome diferente para SSE3.

Normally none of these flags need to be added to /etc/portage/make.conf, as long as the system is using the correct -march (for example, -march=nocona implies -msse3). Some notable exceptions are newer VIA and AMD64 CPUs that support instructions not implied by -march (such as SSE3). For CPUs like these additional flags will need to be enabled where appropriate after checking /proc/cpuinfo.

Nota
Check the list of x86 and x86-64-specific flags to see which of these instruction sets are activated by the proper CPU type flag. If an instruction is listed, then it does not need to be separately specified; it will be turned on by using the proper -march setting.

Hardening optimizations

Nota
While it is possible to use a hardened profile, it certainly isn't necessary for adding some hardening flags to /etc/portage/make.conf on a different profile. Especially on a desktop system, the use of position independent code (PIC) and position independent executables (PIE) on a system-wide level may cause compilation failures.

Hardening an otherwise unhardened system, like when using a desktop profile, can be considered a GCC optimization as well, especially in the light of security vulnerabilities such as Meltdown and Spectre.

Some packages feature an individual hardened USE flag, enabling tested security enhancements (like CFLAGS/CXXFLAGS). It may be a good idea to set this system-wide in /etc/portage/make.conf.

Nota
Reading section Do I need to pass any flags to LDFLAGS/CFLAGS in order to turn on hardened building? in the Hardened/FAQ is recommended for retrieving some basic hardened CFLAGS/CXXFLAGS.

Aviso
Changing the CFLAGS/CXXFLAGS can cause problems and in some cases may even render a system unusable. Rebuilding the whole system with emerge -e @world may resolve the situation.

Overflow protection

Optimizing CFLAGS/CXXFLAGS for overflow protection can be a good idea if security concerns outweigh speed optimization. This may be the case on a daily-use desktop system, while e.g. on an optimized gaming PC it will be considered counterproductive.

For GCC version 12, package sys-devel/gcc, the USE flags default-stack-clash-protection and default-znow will automatically enable additional overflow protection.

Nota
A lot of these flags are now applied internally through the GCC toolchain automatically under the hardened profile, and some even under the regular profile. See table at Hardened/Toolchain.

CFLAGS/CXXFLAGS	LDFLAGS	function
`-D_FORTIFY_SOURCE=2`		run-time buffer overflow detection
`-D_GLIBCXX_ASSERTIONS`		run-time bounds checking for C++ strings and containers
`-fstack-protector-strong`		stack smashing protector
`-fstack-clash-protection`		increased reliability of stack overflow detection
`-fcf-protection`		control flow integrity protection
	`-Wl,-z,defs`	detect and reject underlinking
	`-Wl,-z,now`	disable lazy binding
	`-Wl,-z,relro`	read-only segments after relocation

ASLR

Address Space Layout Randomization (ASLR) is measure to increase security by randomly placing each function and buffer in memory. This makes it harder for attack vectors to succeed.

PIE is enabled by default when it is safe to do so on all 17.0 profiles^[3]. PIC may also be enabled by default on executables for architectures that require it (like AMD64).

There is no need to set PIE or PIC manually in CFLAGS.

CFLAGS/CXXFLAGS	LDFLAGS	function
`-fpie`	`-Wl,-pie`	full ASLR for executables
`-fpic -shared`		no text relocations for shared libraries

Optimization FAQs

Higher version of GCC should mean better optimizations?

Not always because of software regression, where an optimization with an earlier version of GCC no longer optimizes. A full list of regressions can be found at this link. Should this happen, please file a bug to Gentoo's bugzilla and/or GCC's bugzilla.

Is there a perfect optimizer?

No, because it would solve the halting problem, where it can tell if any program stops or runs forever ^[4].

What about optimizing GCC itself?

gcc has pgo and lto use flags that enables Profile Guided Optimization and Link Time Optimization respectively. To enable for building gcc itself with PGO and LTO:

FILE /etc/portage/package.use/gcc

sys-devel/gcc pgo lto

In Gentoo, a 3-stage bootstrap of gcc is done, meaning it compiles itself three times ^[5]. In stage1, gcc is complied using an older gcc. In stage2, gcc is compiled using stage1 gcc. In stage3, gcc is compiled using stage2 gcc and is used to verify that stage2 gcc and stage3 gcc are the same. This is done because it is tested more completely and has better performance. The lto use flag adds -flto to BOOT_CFLAGS. The pgo use flag adds -fprofile-generate to stage2 gcc and adds -fprofile-use -fprofile-reproducible=parallel-runs to stage4 gcc.

gcc performance may improve via PGO, although it may as much as double the compile times.

But I get better performance with -funroll-loops -fomg-optimize!

No, people only think they do because someone has convinced them that more flags are better. Aggressive flags will only hurt applications when used system-wide. Even the GCC manual says that using -funroll-loops and -funroll-all-loops will make code larger and run more slowly. Yet for some reason, these two flags, along with -ffast-math, -fforce-mem, -fforce-addr, and similar flags, continue to be very popular among ricers who want the biggest bragging rights.

The truth of the matter is that they are dangerously aggressive flags. Take a good look around the Gentoo Forums and Bugzilla to see what those flags do: nothing good!

These flags are not needed globally in CFLAGS or CXXFLAGS. They will only hurt performance. They might bring on the idea of having a high-performance system running on the bleeding edge, but they don't do anything but bloat the code and get bugs marked INVALID or WONTFIX.

Dangerous flags like these are not needed. Don't use them. Stick to the basics: -march, -O, and -pipe.

What about -O levels higher than 3?

Some users boast about even better performance obtained by using -O4, -O9, and so on, but the reality is that -O levels higher than 3 have no effect. The compiler may accept CFLAGS like -O4, but it actually doesn't do anything with them. It only performs the optimizations for -O3, nothing more.

Need more proof? Examine the source code:

CODE -O source code

case OPT_LEVELS_3_PLUS:
    enabled = (level >= 3);
    break;
 
case OPT_LEVELS_3_PLUS_AND_SIZE:
    enabled = (level >= 3 || size);
    break;

As can be seen, any value higher than 3 is treated as just -O3.

What about compiling outside the target machine?

Some readers might wonder if compiling outside the target machine with a strictly inferior CPU or GCC sub-architecture will result in inferior optimization results (compared to a native compilation). The answer is simple: No. Regardless of the actual hardware on which the compilation takes place and the CHOST for which GCC was built, as long as the same arguments are used (except for -march=native) and the same version of GCC is used (although minor version might be different), the resulting optimizations are strictly the same.

To exemplify, if Gentoo is installed on a machine whose GCC's CHOST is i686-pc-linux-gnu, and a Distcc server is setup on another computer whose GCC's CHOST is i486-linux-gnu, then there is no need to be afraid that the results would be less optimal because of the strictly inferior sub-architecture of the remote compiler and/or hardware. The result would be as optimized as a native build, as long as the same options are passed to both compilers (and the -march parameter doesn't get a native argument). In this particular case the target architecture needs to be specified explicitly as explained in Distcc.

The only difference in behavior between two GCC versions built targeting different sub-architectures is the implicit default argument for the -march parameter, which is derived from the GCC's CHOST when not explicitly provided in the command line.

What about redundant flags?

Oftentimes CFLAGS and CXXFLAGS that are turned on at various -O levels are specified redundantly in /etc/portage/make.conf. Sometimes this is done out of ignorance, but it is also done to avoid flag filtering or flag replacing.

Flag filtering/replacing is done in many of the ebuilds in the Portage tree. It is usually done because packages fail to compile at certain -O levels, or when the source code is too sensitive for any additional flags to be used. The ebuild will either filter out some or all CFLAGS and CXXFLAGS, or it may replace -O with a different level.

The Gentoo Developer Manual outlines where and how flag filtering/replacing works.

It's possible to circumvent -O filtering by redundantly listing the flags for a certain level, such as -O3, by doing things like:

CODE Specifying redundant CFLAGS

CFLAGS="-O3 -finline-functions -funswitch-loops"

However, this is not a smart thing to do. CFLAGS are filtered for a reason! When flags are filtered, it means that it is unsafe to build a package with those flags. Clearly, it is not safe to compile the whole system with -O3 if some of the flags turned on by that level will cause problems with certain packages. Therefore, don't try to "outsmart" the developers who maintain those packages. Trust the developers. Flag filtering and replacing is done to ensure stability of the system and application! If an ebuild specifies alternative flags, then don't try to get around it.

Building packages with unacceptable flags will most likely lead to problems. When reporting problems on Bugzilla, the flags that are used in /etc/portage/make.conf will be readily visible and developers will ask to recompile without those flags. Save the trouble of recompiling by not using redundant flags in the first place! Don't just automatically assume to be more knowledgeable than the developers.

What about LDFLAGS?

The Gentoo developers have already set basic, safe LDFLAGS in the base profiles, so they do not need to be changed.

Can I use per-package flags?

Aviso
Using per-package flags complicates debugging and support. Make sure to mention the use of this feature in the bug reports together with the changes made.

Information on how to use per-package environment variables (including CFLAGS) is described in the Gentoo Handbook, "Per-Package Environment Variables".

Profile Guided Optimization (PGO)

Not to be confused with the packages.gentoo.org tool and website.

Profile guided optimization (PGO) consists of compiling and profiling a program to assess hot paths in the code. Optimizations are then applied based on this analysis. Specifically, the code is compiled with -fprofile-generate, which instrument the code. Second, the code is run with applications to collect profile information. Finally, using the profiled data, the code is compiled with -fprofile-use. To manually enable PGO for packages, see this link.

Firefox also supports PGO although sometimes it may break the build.

Link Time Optimization (LTO)

Nota
LTO heavily increases compile times and if changing even one object file when compiling, LTO recompiles the whole code again. There is a ongoing GSoC project to make sure LTO only recompiles what it deems necessary.

LTO is still experimental. LTO may need to be disabled before reporting bugs because it is a common source of problems. The -flto flag is used, with an optional auto argument (Detects how many jobs to use) or a integer argument (An integer number of jobs to execute parallel).

See the LTO article for more information on LTO on Gentoo.

External resources

The following resources are of some help in further understanding optimization:

GCC online documentation

References

↑ GNU GCC Bugzilla, [https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57952#c8 AVX/AVX2 sem registradores YMM usado em redução trivial]. Recuperado em 18/07/2017.
↑ GNU GCC Bugzilla, 'gcc -marc=native' sets L2 cache size equal to L3 cache size on i7 and i5 CPU. Retrieved on 2022/08/14.
↑ New 17.0 profiles in the Gentoo repository
↑ https://en.wikipedia.org/wiki/Full-employment_theorem
↑ https://gcc.gnu.org/install/build.html

This page is based on a document formerly found on our main website gentoo.org.
The following people contributed to the original document:
They are listed here because wiki history does not allow for any external attribution. If you edit the wiki article, please do not add yourself here; your contributions are recorded on each article's associated history page.

[1] GNU GCC Bugzilla, [https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57952#c8 AVX/AVX2 sem registradores YMM usado em redução trivial]. Recuperado em 18/07/2017.

[2] GNU GCC Bugzilla, 'gcc -marc=native' sets L2 cache size equal to L3 cache size on i7 and i5 CPU. Retrieved on 2022/08/14.

[3] New 17.0 profiles in the Gentoo repository

[4] ttps://en.wikipedia.org/wiki/Full-employment_theorem

[5] ttps://gcc.gnu.org/install/build.html

[1]

[2]

[3]

[4]

[5]