Project:Python/Namespace packages

From Gentoo Wiki
Jump to: navigation, search

This page attempts to shortly describe what namespace packages are, how they work and how to solve the problems caused by them.

Regular packages vs namespace packages

Regular Python packages are structured hierarchically. A subpackage foo.bar needs to be located inside the parent package foo, which also needs to be a valid Python package (i.e. contain at least __init__.py. The subpackage search is done relatively to the first parent package found in sys.path. That is, if two packages foo.bar and foo.baz are installed in different locations, they both need to install separate instances of the foo package, and the Python interpreter will be able to only load the one of them that is earlier in sys.path.

This can become a problem if two split Gentoo packages install subpackages of the same same parent package. In this case, the build directory (used e.g. by tests) will cover the system parent package, and will make it impossible to load other subpackages.

The Python developers attempt to solve the problem by introducing namespace packages. Unlike regular packages, namespace packages allow the subpackages to be split across different locations. In this case, their parent packages serve as a kind of proxy — when loaded, they do not represent pure regular packages, e.g.:

CODE Example namespace package in Python 3.5
In [1]: import zope

In [2]: zope
Out[2]: <module 'zope' (namespace)>

With a namespace package foo, even if foo.bar and foo.baz are installed in different directories across sys.path, the Python interpreter will be able to locate and load both subpackages correctly.

Implementation of namespace packages

There are currently multiple implementations of namespace packages.

Python 3.3 and newer implement implicit namespaces by PEP-0420. An implicit namespace is created when the parent package is not a valid Python package (i.e. does not have __init__.py. Unlike before this PEP, such packages are considered valid and namespace search is performed when they are used.

Namespace support for the older Python interpreters is provided via the pkgutil standard library module. In this case, special namespace support code is included in the parent package's __init__.py that alters package search when the parent package is (implicitly) loaded.

Setuptools also provide their own namespace package support.

The two first variants are compatible with each other and can be used interchangeably within the same namespace. However, the setuptools variant is incompatible with them, and so can't be used interchangeably within the same namespace.

The best source of information on namespace packages is the namespace package packaging guide.

Packaging namespace packages in Gentoo

PEP 420 namespace packages

PEP 420 namespaces are implicit, therefore require no specific code. Those kind of packages can be recognized by the fact that the namespace package does not contain __init__.py file, and no namespace_packages argument is passed to setuptools. PEP 420 is only supported by Python 3.3 and newer, and so packages using it are incompatible with Python 2. If such a compatibility is desired, pkgutil-style namespace should be used instead.

Packaging PEP 420 namespaces does not require any specific effort. The namespace becomes established implicitly as soon as the first subpackage is installed.


pkgutil/setuptools-style namespace packages

The support for both pkgutil- and setuptools-style namespace packages relies on the namespace package __init__.py file providing appropriate instructions to establish the namespace-compatible imports. Both of those methods support all Python implementations in Gentoo. However, they are not cross-compatible — that is, only one of them can be used within a single namespace (including package sources and build trees). Therefore, you usually want to follow whichever standard upstream uses within their source packages.

When establishing support for a namespace, you need to take the following steps:

  1. Choose a single package to hold the namespace. All other packages in the namespace will depend on it. If the package set already contains such a common dependency (e.g. dev-python/logilab-common; or one package actually installs real Python modules into the top namespace package), then you can reuse that. Otherwise, you need to create a new package for the namespace, preferably using dev-python/namespace-* naming (e.g. dev-python/namespace-zope).
  2. The selected package needs to install an appropriate __init__.py file for the namespace. For pkgutil-style namespaces, the file should contain:
    FILE foo/__init__.pypkgutil-style namespace support
    __path__ = __import__('pkgutil').extend_path(__path__, __name__)
    For setuptools-style namespaces, the file should contain:
    FILE bar/__init__.pysetuptools-style namespace support
    __import__('pkg_resources').declare_namespace(__name__)
    Note that the latter requires RDEPEND on dev-python/setuptools.
  3. All packages installing into the namespace must RDEPEND (+ DEPEND) on the namespace package. If they install any *-nspkg.pth files, those need to be removed. The eclass will output a warning if they are installed. The following snippet can be used for the removal:
    CODE *.pth file removal snippet
    python_install_all() {
      distutils-r1_python_install_all
      find "${ED}" -name '*.pth' -delete || die
    }

File collisions between pkgutil-style packages

In packages using the pkgutil namespace style it often happens that the namespace __init__.py file is installed by all packages using the namespace. This causes file collisions between the Gentoo packages (and the respective namespace package you are supposed to create per the instructions for pkgutil-style namespace packages.

The possible upstream solution for this would be to split out the namespace __init__.py into a separate package, add it as dependency (install_requires) and exclude installing it from all packages using that namespace. However, the standard Python (easy_install, pip) install layout usually does not have those collisions (due to using wheels/eggs), and setup.py does not detect them. This makes it hard to make it a worthwhile case for package upstreams.

The Gentoo packaging workaround is rather simple — you need to remove the appropriate __init__.py file from the build tree just before calling distutils-r1_python_install. This ensures that the file is present during the tests (as necessary to load the packages correctly) but is not installed. Removing it prior to install command saves you from chasing the compiled bytecode files.

CODE Removing colliding namespace __init__.py
python_install() {
  # remove backports/__init__.py from build tree
  rm "${BUILD_DIR}"/lib/backports/__init__.py || die
  # note: eclass may default to --skip-build in the future
  distutils-r1_python_install --skip-build
}