Talk:CFLAGS

From Gentoo Wiki
Jump to:navigation Jump to:search
Note
This is a Talk page - please see the documentation about using talk pages. Add newer comments below older ones, sign comments using four tildes (~~~~), and indent successive comments with colons (:). Add new sections at the bottom of the page, under a heading (== ==). Please remember to mark sections as "open for discussion" using {{talk|open}}, so they will show up in the list of open discussions.

Dev-zero

Talk status
This discussion is done as of 2020-09-20.

Unless a reference is being added which states that passing -fopenmp together with -floop-parallelize-all really enables OpenMP-based parallel loops I am going to strip the recommendation of enabling OpenMP globally since this is pure non-sense: If a package does not use OpenMP there is afaik no way to automatically enable it. On the other hand what would help generically is loop-rewriting to use vectorization (see -ftree-vectorize, enabled with -O3). Rewriting loops for processor-cache-awareness probably only gives a speed-up in HPC code since modern CPUs have scatter/gather and the likes (see also AVX2). — The preceding unsigned comment was added by Dev-zero (talkcontribs)

Also, packages that are able to use openmp should support the respective use flag and will, if openmp is to be used, add the corresponding compiler and linker options automatically. — The preceding unsigned comment was added by Soulsource (talkcontribs)
Not sure if I agree with the approach in this article at all, but at least I corrected the OpenMP bit. Autoparallelization in GCC does indeed reuse their OpenMP library (libgomp) and some internal implementation details, so if you use the "-floop-parallelize" options, the "-lgomp" in LDFLAGS is necessary. But yes, OpenMP itself is basically a set of language extensions for C/C++/Fortran, so by definition it can only affect programs that are explicitly written to use it. If there's a package that can actually use OpenMP then it should have a USE flag, and "-fopenmp" is redundant. If it does not use OpenMP directives, adding "-fopenmp" does nothing. Either way, you should expect absolutely no benefit from putting it in CFLAGS. Quantheory (talk) 04:55, 24 November 2014 (UTC)
Nobody said anything more in six years. Time to put this discussion to bed. --Davidbryant (talk) 15:32, 20 September 2020 (UTC)

The performance gain by enabling auto parallelization system wide seems to be relatively small:

Talk status
This discussion is done as of 2020-09-20.

I've been experimenting with enabling graphite and auto-parallelisation for selected packages with gcc-4.8. My findings indicate that autoparallelisation only works in very rare cases, namely when the loops to be parallelized run over arrays, not pointers+offsets. Since in the vast majority of cases loops run over pointers with offsets, auto parallelization does not work. The CFLAGS I used for testing were CFLAGS="-O3 -ftree-parallelize-loops=6 -floop-parallelize-all -march=native -floop-interchange -ftree-loop-distribution -floop-strip-mine -floop-block -pipe". Additional output on parallelization can be generated using "-fdump-tree-parloops-details -fdump-tree-graphite-all".

Let me clarify this: The following function will not be parallelized.

void sum(double *a, double *b, double *result, int n){
    int i;
    for(i=0;i<n;i++){
        result[i]=a[i]+b[i];
    }
}


Changing the variables to arrays, for instance by making them global arrays, allows graphite to parallelize the function:

int n;
double a[n], b[n], result[n];
void sum(){
    int i;
    for(i=0;i<n;i++){
        result[i]=a[i]+b[i];
    }
}

Considering the better readability of the first variant, the second variant is obviously not something encountered very often.

Another issue is, that gcc does not check profitability of auto parallelization, so even in cases where it works the added overhead might cause decreased overall performance. An example is the dotproduct code posted here, which of course has to be changed to make the loops run over arrays, not pointers and to do the dot product instead of the trivial calculation.

My conclusion is, that the performance gains possible by using auto parallelization are pretty limited, except for packages intentionally written to use this gcc feature. On the other hand there are a few packages that either do not compile or run with auto parallelization enabled. Therefore, I would not recommend to use auto parallelization system wide, and rather to only enable it selectively for packages known to benefit from it (CFLAGS can be set on a per package basis using /etc/portage/env and /etc/portage/package.env). — The preceding unsigned comment was added by Soulsource (talkcontribs) 04 March, 2014.

It's been six years. Time to say this "discussion" is kaput. --Davidbryant (talk) 15:43, 20 September 2020 (UTC)