Project:Python/Eclass design rationale

From Gentoo Wiki
Jump to: navigation, search

This document tries to explain the rationale behind some of the major eclass design issues. It aims to make the original intentions more clear and hopefully either result in developers coming up with improvement ideas, or trying to reuse the same scheme for other eclasses.

Generic problems

Multiple package types

The python-r1 suite splits Python packages into three groups which I will call shortly: single, multi and any. The existence of those groups is probably the most controversial and confusing problem of the Python project. In fact, so far the developers were unable to come with a good names unambiguously explaining the difference between them.

This split doesn't originate purely from the python-r1 suite. It rather cleans up and extends the original split used by python.eclass at the same python-r1 was designed. In particular, python.eclass had two operation modes:

  1. the traditional mode,
  2. and the SUPPORT_PYTHON_ABIS mode.

In the traditional mode, the eclass built the package with whatever Python was selected using eselect python.

In the SUPPORT_PYTHON_ABIS mode, the eclass built the package multiple times, using multiple Python implementations installed. This was specifically needed when Python 3 was introduced, in order to provide dependency Python modules both for packages using Python 2 and Python 3.

Both modes were using dumb dependencies that only required a certain Python version being installed. They didn't clearly map the implementations used to build packages into correct dependencies.

The python-r1 suite introduced a direct mapping between Python version used during the build and the dependencies. It also made the implementation choice explicit. Therefore, it became necessary to replace those two modes with three well-defined package types:

  1. any: packages that need some version of Python during build-time but it doesn't really matter which version is used.
  2. single: packages that are built for a single version of Python, either due to build system limitations or because there's no reason to install multiple copies of them (e.g. applications). Here it is important to provide user with a choice of which implementation to use.
  3. multi: packages that are built for multiple versions of Python simultaneously. All active implementations are selected by user.

Each of those types uses a different kind of USE flags and dependencies suited for the particular task. In particular:

  1. any does not introduce any explicit means of selecting the Python implementation because the choice is not important for the end result. Instead, it uses any-of dependencies to try to satisfy the dependency in some way without requiring explicit user interaction.
  2. single requires the user to choose one Python implementation, and uses it during the build time.
  3. multi allows the user to choose more than one Python implementation. Since the build needs to be repeated multiple times, this kind of ebuilds is harder to write.

To summarize:

  1. the need for explicit single and multi package types comes from the conflicting goals of providing simple means of building Python components of packages, and the necessity of building the same package for multiple Python implementations. Without the explicit multi variant, the user would be forced to choose between Python 2 and Python 3 in every package. This would mean that no package could really have both Python 2 and Python 3 rev-deps. On the other hand, without the explicit single variant, all packages would require complex changes to support building for multiple implementations, or would simply be impossible to use properly.
  2. the explicit any variant aims to improve user experience. There is a number of packages that use Python during build-time and depend on other Python modules. In this case, it is unnecessary to bother user with selecting the best implementation for the build. Instead, the eclass tries to figure it out itself, hopefully being able to achieve the goal without requiring explicit USE changes.


Multiple eclasses (vs variable control)

Another potentially controversial issue is use of multiple eclasses to handle different package types. This means 3 'core' eclasses + python-utils.eclass to handle the common bits. Such a spread makes the suite look more complex, and the documentation to be fairly split. As a result, it is easier to look up the Wiki to figure out which eclass to use rather than reading the eclassdocs.

The original python.eclass used a variable to control the eclass mode. While this may seem simpler, it had a few strong disadvantages. Most importantly, both modes were mixed in a single code, with conditionals both in code and documentation. Some of the functions could have been used only in one of the modes, some had different behavior depending on the mode, and the documentation needed to cover all that. Whenever one needed to write ebuild, he had to filter the eclassdoc for relevant bits, and ignore the description for the other mode.

With the improvements in python-r1 suite this would get even more confusing. Each of the package types is fairly different, and requires different code bits. python-single-r1 and python-any-r1 both need only to be set up for a single Python implementation, and therefore exporting pkg_setup is the simplest way of integrating them with ebuilds. python-r1 on the other hand needs explicit iterations and for such pkg_setup is completely useless — so it's not defined. python-any-r1 due to specifics of any-of blocks has a different dependency specification model than the two other eclasses.

Putting all that in a single eclass would mean a lot of conditionals and confusing documentation. Using three different eclasses, and fourth ‘common’ eclass was simply cleaner.