Why not bundle dependencies
The intent of this page is to collect information on dependency bundling and static linking as a reference to refer upstreams to, instead of explaining the same thing over and over again by e-mail. You are welcome to contribute to this page.
When code is bundled?
Say you develop and distribute a piece of software: a game, a library, anything.
Now, the code is bundled if any of the following conditions occur:
- you are statically linking against a system library,
- you are shipping and using your own copy of a library,
- you are including and (unconditionally) using snippets of code copied from a library.
In other words, code bundling occurs whenever your program or library ends up containing code that doesn't belong to it.
There are reasons why bundling dependencies and using static linking happens again and again: there are certain benefits to it! So why is it tempting to do such a thing?
Comforting non-Linux users
Especially in Windows, shipping dependencies can be a favor to your users to save them manually wiring things together. Without a package manager there is no real-solution to that on Windows anyway. So the temptation is: if you use bundled code on Windows, using it on GNU/Linux too feels consistent and is easy on your mind.
Easing up adoption despite odd dependencies
If your software P has some dependency D that is not yet packaged for major distributions, D makes it harder for P to get in as packaging P forces the new maintainer to package D him/herself or to wait for someone else to package it for him/her.
Bundling D hides the dependency on D in a way: if the packager is not paying close attention P may even get in despite and with the bundled dependency. (It's only a matter of time until someone noticed the bundling, though)
Dependencies with unstable APIs
If your program uses a library with a very unstable interface that often changes between minor versions, you are practically unable to produce packages which will work with each new release of that library.
Consider ffmpeg as a good example of that. Programs using it have to suffer major code changes and even rewrites due to changes in the library, and supporting more than a few successive versions is at least painful. Bundling an internal copy of ffmpeg with a particular release allows the developers to choose a single version which is known to be compatible.
Often, this problem is encountered if the developers of the library do not account for the difficulties which developers of programs using it will face.
In other cases, the particular package may be using private interfaces of the library which are not intended to be used by other packages — and thus they are not guaranteed to be 'stable' between releases.
If P uses a library D, the developers of P may wish to make some changes to D, for example to add a new feature, modify the API, or change the default behavior. If the developers of D for whatever reason are opposed to these changes, the developers of P may want to fork D. But publishing and properly maintaining a fork takes time and effort, so the developers of P could be tempted to take the easy road, bundle their patched version of D with P, and maybe occasionally update it for upstream D changes.
So why is bundling dependencies and static linking bad after all?
Let's consider you're a developer of foo and your foo uses libbar. Now, a very important security flaw has been found in libbar (say, remote privilege escalation). The problem is large enough that devs of libbar release fixed version right away, and distributions package it quickly to decrease the possibility of break-in to users' systems to minimum.
If a particular distribution has efficient security upgrade system, the patched library can get there in less than 24 hours. But that would be of no use to foo users which will still use the earlier vulnerable library.
Now, depending on how bad things are:
- if foo statically linked against libbar, then the users would either have to rebuild foo themselves to make it use the fixed library or distribution developers would have to make a new package for foo and make sure it gets to user systems along with libbar (assuming they are aware that the package is statically linked);
- if foo bundled local copy of libbar, then they would have to wait till you discover the vulnerability, update libbar sources, release the new version and distributions package the new version.
In the meantime, users probably even won't know they are running vulnerable application just because they won't know there's a vulnerable library statically linked into the executables.
Waste of hardware resources
Say a media player is bundling library libvorbis. If libvorbis is also installed system-wide this means that two copies of libvorbis
- occupy twice as much space on disk
- occupy (up to) twice as much RAM (of the page cache)
Waste of development time downstream
Due to the consequences of bundled dependencies, many hours of downstream developer time are wasted that could have been put to more useful work.
Potential for Symbol Collisions
If a program P uses a system-installed library A and also uses another library B which bundles library A, there is a potential for symbol collisions.
This means that P might use an interface, such as my_function() and that the my_function() symbol would be present in both A and the version of A bundled inside of library B.
If the system-installed copy of A and the copy of A compiled into library B are from different releases of library A, then the operation of the interface my_function() might behave differently in each copy of A.
Since the program P was compiled against the system-installed copy of A and for various other reasons, if P ends up using the my_function() interface from the version of A bundled in library B instead of the interface in the system-installed copy.
This can potentially result in crashes or strange unpredictable behavior.
This sort of problem can be prevented if library B uses symbol visibility tricks when it links against library A, which would cause library B not to export library A's interfaces.
When a bundled dependency is discovered downstream this has a number of bad consequences.
So there is a copy of libvorbis bundled with that media player. Which version is it? Has it been modified?
Separating forks from copies
Before the bundled dependency can be replaced by the system-widely installed one, we need to know if it has been modified: we have to know if it's a fork. If it is a fork it may or may not be replaced without breaking something. That's something to find out: more time wasted. If the code says which version it is we at least know what to run diff against; but that is not always the case.
If a bundled dependency doesn't tell its version we may have to find out ourselves. Mailing upstream could work, comparing against a number of tarball contents may work too. Lots of opportunities to waste time.
Once it is clear that a bundled dependency can be ripped out, a patch is written, applied and tested (more waste of time). If upstream is willing to co-operate the patch may be dropped later. If not the patch will need porting to each now version downstream.
What to do upstream
- (A) Remove bundled dependency
- At best, remove the bundle dependency and allow compilation against dependency D from either a system-wide installation of it or a local one at any user-defined location. That gives flexibility to users on systems without D packaged and makes it easy to compile against the system copy downstream: cool!
- (B) Keep bundled dependency, make usage completely optional
- With a build time option to disable use of the bundled dependency it is possible to bypass it downstream without patching: nice! When keeping dependency D bundled make sure to follow the upstream of D closely and update your copy to a recent version of D on every minor (and major) release to at least reduce the damage done to people using your bundled version a little. Also: Clearly document if a bundled dependency is a fork or an unmodified copy and which version of the bundled software we are dealing with.