Hardened/Textrels Guide

A guide for tracking down and fixing .text relocations (TEXTRELs)

Introduction
You should make sure to read the Introduction to Position Independent Code before tackling this guide.

This guide is x86-centric for now. The reason being, the majority of broken object files are due to poorly written x86 assembly stemming from the simple fact that the x86 architecture has so few registers. Other architectures have a large enough register set that they can reserve a register as the "PIC register" without incurring a performance hit. Every architecture has to be mindful of PIC and its implications, x86 just happens to be the dominant architecture at the moment in the 'desktop' world of open source.

We will update for non-x86 as we acquire details and useful examples.

Finding broken object code
Before you can start fixing something, you got to make sure it's broken first, right? For this reason, we've developed a suite of tools named PaX Utilities. If you are not familiar with these utilities, you should read the PaX Utilities Guide now. Gentoo users can simply do. Non-Gentoo users should be able to find a copy of the source tarball in the on a Gentoo Mirror. Once you have the PaX Utilities setup on your system, we can start playing around with.

Keep in mind that although these utilities are named PaX Utilities, they certainly do not require PaX or anything else like that on your system. The name is a historical artifact and want of a better name, has stuck.

Let's see if your system has any broken files.

Ideally, scanelf should not display anything, but on an x86 system, this is rarely the case. Here we can see six libraries with TEXTRELs in them. To quickly find out what package these files come from, Gentoo users can  and use.

Now that we know the offenders, we have a choice. We can file a bug upstream (who generally don't care unless you can provide a fix), file a bug in the Gentoo Bugzilla (which is a nice lazy cop out), or we can fix it ourselves (that is why you're reading this guide right?). You should double check that the package version you have installed is the latest upstream has to offer and the latest version your distro has to offer. Who knows, maybe you can get lucky and someone else has already fixed it. If you wish to get feedback on your work, feel free to contact the [mailto:hardened@gentoo.org Gentoo hardened team].

"False" Positives
Sometimes you may come across a package which contains a mountain of TEXTRELs with seemingly no relation to assembly code. This may simply be because the objects were not properly compiled with the appropriate PIC flag. The fix is quite simple: make sure every object file that is linked into the final shared object is compiled with the appropriate PIC flag (typically -fPIC).

For example, let's look at the silc-plugin package. It builds up a few modules, but only compiles some of the objects with -fPIC that are linked into the final libsilc_core.so module. The output of scanelf here is quite extensive!

A TEXTREL on glibc's fgetc function!? Either people are calling fgetc from assembly (and should be shot), or something else is going on. A good rule of thumb is that if it seems like just about every function/variable reference causes a TEXTREL and it is all done in C code, then the file was not built as PIC. Just review the build output and see if the command to compile it was invoked with -fPIC. If not, go fix the build system as you do not need to dig into the source. Dodged the bullet this time!

Dissecting broken object code
So we have identified some broken libraries, and we want to fix them. The trouble is, shared library code can be huge. They can have thousands of functions which come from thousands of object files and thousands of source code files which total megabytes in size (source code and compiled objects). Where the hell do we start!? Once again, Mighty Mouse^W^W  is here to save the day. Before we dive into source code, lets check out a few libraries.

Dissect libsmpeg
The output here tells us that the cpu_flags and the IDCT_mmx functions are to blame for our TEXTRELs. The first field indicates that this is poor usage of memory references. Unfortunately, the symbolic name of the memory being referenced has not been retained in the object code (probably because the code is hand written assembly), so we need to do a little more digging. This is where the offset addresses come in to play along with the  utility from the binutils package. The first address (e.g. 0x2FB3C) is the offset of the TEXTREL while the second address is the offset of the function (e.g. 0x2FB10). Get used to this because the behavior of not retaining the symbol name is quite common.

As you can see here, the two lines picked out in the body of cpu_flags have absolute memory references. In this case, they both refer to memory location 0x3d3d0. Since this object code may be loaded into any address, using an absolute reference obviously won't fly. That means everytime libsmpeg is loaded into memory, the dynamic loader has to rewrite the 0x3d3d0 to the actual calculated address on the fly.

Dissect libdv
Again, we can see that many functions (like dv_parse_ac_coeffs_pass0 and _dv_idct_block_mmx) have absolute memory references. What we also see is that a bunch of functions which refer to variables. For example, dv_decode_vlc misuses the variable dv_vlc_class_index_mask while dv_parse_video_segment misuses the variable dv_parse_ac_coeffs. Much easier to locate the problem in the source code when you have the symbol name.

Dissect libSDL
Doesn't seem to be anything new here. Poor memory usage in functions like _ConvertMMXpII32_24RGB888 and no symbol name which means it's probably pure hand written assembler. The SDL_SoftStretch function misuses the symbol _copy_row and since the symbol name has been retained, it's probably inline assembly code.

Finding the broken source code
We've identified the functions and sometimes the variables which are causing us such headaches. Before we can actually fix them though, we have to narrow down the source code to the offending lines. Since we know the function names and either the symbol name or a relative position in the function, we should be able to focus our efforts quite easily.

libsmpeg source dive
Let's start with libsmpeg. We know that both the cpu_flags and IDCT_mmx functions are broken. But where are they defined?

As we suspected, both the cpu_flags and the IDCT_mmx functions are written in pure assembly code. This makes tracking down the unamed memory reference easier because the source code should closely match the output of. If we review the output from earlier, we know the cpuid instruction is used. Since it isn't a common instruction, we search for it in the source code.

In GNU assembler, registers are prefixed with a % and constants are prefixed with a $, that flags looks suspicious. It also lines up well with the  output from earlier. So what is flags?

Seems flags is a data variable local to which functions access with absolute memory references rather than relative. Now we are pretty much done. That's all there is to it. We started with the library and tracked it back to the function cpu_flags and the variable flags in the  file. That wasn't so hard now was it? :)

If we analyze the IDCT_mmx function, we find a similar trend.

IDCT_mmx snippets

libSDL source dive
Again, before we jump into how to fix these, lets analyze a few more source files to get a better handle on identifying problematic code.

Broken _ConvertMMXpII32_24RGB888 in libSDL code

Simple enough, the _ConvertMMXpII32_24RGB888 function refers to the mmx32_rgb888_mask variable.

{{Code|Broken SDL_SoftStretch in libSDL code| int SDL_SoftStretch(SDL_Surface *src, SDL_Rect *srcrect,                   SDL_Surface *dst, SDL_Rect *dstrect) { ...           __asm__ __volatile__ (            "call _copy_row"            : "=&D" (u1), "=&S" (u2)            : "0" (dstp), "1" (srcp)            : "memory" ); }}
 * 1) [SDL_SoftStretch is defined in src/video/SDL_stretch.c]
 * 1) ifdef __GNUC__
 * 1) else

Another straight forward bug. An absolute reference to the _copy_row variable in assembly. If we were to let gcc handle the _copy_row reference instead though...

Rules of thumb
Now we know what broken code looks like. We can point out issues in code and confidently declare "that crap is broken". While this is a good thing, it certainly doesn't help much if no one knows how it's supposed to be written. Let's start with some rules of thumb.

General rules


 * Do not mix PIC and non-PIC object code
 * Shared libraries contain PIC objects
 * Static libraries contain non-PIC objects (normal/non-PIE systems only)
 * Let gcc figure out the details whenever possible (e.g. inline asm)
 * Use the stack for loading of large masks instead of variables
 * Do not clobber the PIC register when generating PIC objects

x86-specific rules


 * Use @GOT relocations when using external symbols
 * Use @GOTOFF relocations when using local symbols

Don't use the PIC register
If you come across code which uses the PIC register in some inline assembly, one fix may be to simply use a different register. For example, the x86 architecture has 6 general purpose registers (eax, ebx, ecx, edx, esi, edi). If the code uses just eax and ebx, just change all references to ebx to ecx and you're done!

A cleaner fix might be to just let gcc allocate the registers accordingly. If the inline assembly doesn't actually care which registers it uses, change the references from ebx to r in the clobber list, and refer to the variable by number.

Or, if the assembly uses an instruction which always clobbers ebx (e.g. cpuid), simply hide the value in another register (like esi).

If all else fails, you can fall back to the slow push/pop ebx on the stack method.

Just don't use the PIC register

Let gcc allocate registers

Hide the PIC register

MMX/SSE masks
A lot of x86 MMX/SSE code loads bitmasks from local variables since they need to fill up a register which is larger (MMX/64bits or SSE/128bits) than the native bitsize (x86/32bits). They do this by defining the mask in consecutive bytes in memory and then having the cpu load the data from the memory region.

One way to get around this is by being creative with the stack. Rather than use an absolute memory reference for the mask, push a bunch of 32bit values onto the stack and use the address specified by the esp register. Once you're finished, just add a constant to esp rather than popping off since you don't care about the actual values once they are loaded into the MMX/SSE registers.

Load masks into registers via stack

Let gcc worry about it
A lot of inline assembly is written with the symbol names placed right in the code. Rather than trying to write custom code to handle PIC in assembly, just let gcc worry about it. Pass in the symbol via the input operand list as a memory constraint ("m") and gcc will handle all the rest.

How to make gcc worry about it

If your get a warning/error about one of the memory inputs needing to be an lvalue, then this usually means you're trying to pass in a pointer to an array/structure rather than the memory location itself. Fixing this may be as simple as dereferencing the variable in the constraint list rather than in the assembly itself.

Thunk it in assembly
Hand written assembly sometimes need to access variables (whether they be local or global). Since none of the previous tricks will work, you just need to grind your teeth and dig in to write real PIC references yourself using the GOT. Make sure you keep in mind the first rule of thumb: Do not mix PIC and non-PIC object code. This probably will require the hand written assembly be preprocessed before it is assembled, so an assembly source file with a .s suffix will not work. It needs to be .S.

Also keep in mind that using @GOTOFF will return the variable while using @GOT will return a pointer to the variable. So accessing a variable with @GOT will require two steps.

How to refer to variables via the GOT

Since we hide the PIC details behind the preprocessor define __PIC__, we know that the correct code will be generated for both the PIC and non-PIC cases.

The __i686.get_pc_thunk.bx function is a standard method for acquiring the address of the GOT at runtime and storing the result in ebx. The funky name is what gcc uses by convention when generating PIC objects, so we too use the same name. The @GOT and @GOTOFF notation tells the assembler where to find the variables in memory. The .section .gnu.linkonce.t is useful because it tells the linker to only include one instance of this function in the final object code. So if you have multiple files which declare this same function which are compiled and linked into the same final library, the linker will discard all duplicate instances of the function thus saving space (which is always a good thing).

How to fix broken PIC (in practice)
So if the previous code snippets were broken, what should they look like you may wonder. Well let's find out.

Fix libsmpeg
Fixing cpu_flags in libsmpeg by rewriting it

Fixing IDCT_mmx in libsmpeg by using relative addressing

Fix libSDL
Fixing _ConvertMMXpII32_24RGB888 in libSDL

Fixing SDL_SoftStretch in libSDL

Acknowledgements
We would like to thank the following authors and editors for their contributions to this guide:


 * Ned Ludd
 * The PaX team
 * Kevin F. Quinn
 * Kevin F. Quinn