Corran Webster

Python Build Backends

2023-02-11

In which we look at the history of building Python packages, the new build backend systems, and a trick that lets you extend your chosen build backend when it doesn't quite do enough.

In the Beginning...

Once upon a time, back in the days of Python 1.x, if you wanted to distribute a Python library you just passed around the source code and manually put it into your Python path (usually in site-packages, but sometimes you just added the source directory to your list of paths). If it used a C extension then it probably used configure and make (but possibly cmake, or maybe some custom collection of shell scripts...) and if you were on Windows or Mac (OS 8 or 9, with no command-line) then good luck building it!

And then came distutils, which provided a unified way to both describe the contents of your library, but also the C extensions, what they needed to compile, and cross-platform backends for performing that compilation. And it was part of the standard library, so everyone had it. Batteries included! While getting Visual Studio configured just right to build for your version of Python could be tricky, it meant that pure Python libraries could be reliably distributed and installed, and simple C extensions were straightforward to share. And thus was born the setup.py.

And this is really the beginning of the Python ecosystem. The setup.py let you specify standard metadata and made the cheeseshop, aka the Python Package Index, possible. You could download a package and run python setup.py install and it would very likely install it into your (singular) Python environment. Because it was a Python file, it let you use Python code to cover the weird edge cases that complex packages inevitably ran into, and it was designed to be extensible.

And because no system is perfect (and, in particular, building complex software is Hard), and use-cases change, you had extensions and wrappers and tools like easyinstall and EPD and setuptools and pip that came along; and binary formats like eggs and wheels. But also the way that people used Python changed. You no longer had a single Python environment, or one per version of Python, but potentially many virtual environments. The community expanded and now people had jobs writing substantial amounts of Python who have never compiled a line of C.

That caused problems, but it also opened up possibilities.

One enduring problem with setup.py and distutils is that it runs in the same environment that it is installing into, because when it was written you only had one Python environment. That creates problems where you want to go beyond what you get from the Python standard library. Your package includes Cython code? Now your environment needs Cython installed. You want to use setuptools extensions to distutils? You need that installed. But worse, because developers were writing setup.py files in their development environment, dependencies would creep in unexpectedly leading to packages which could be installed on the developer's machines, but nothing else. Particularly bad were cases where the setup.py would import from the package it was trying to install (which sort-of worked if you were in the right directory).

And it led to efforts to replace setup.py with non-Python declarative files like setup.cfg and pyproject.toml - which of course couldn't come close to handling the corner cases that compiling extension modules run into, and so basically punted on that. And it also led to efforts to produce build-systems outside of the distutils/setup.py world like bento, enscons, flit, poetry and meson.

You could get around this somewhat by making sure that you did your builds in dedicated build virtual environments; and distributing using eggs or wheels. But the combination of pip and venv allows a solution to this problem - it allows you to automate this step. And that's precisely what build does: create an isolated environment to build wheels and source distributions; and pip has taken a similar approach for building things it is installing.

A Build System API

But how do these tools know what they need in the build environment?

That's the core of PEP 518. Add this section to a pyproject.toml

[build-system]
requires = [...]

and pip and build will install those into the build environment (and then do the build in a clean temporary directory). So you can now do

[build-system]
requires = ['setuptools', 'cython']

and be confident that you can use those in your setup.py script. Build dependency solved.

But there's a second part to this: PEP 517. It answers the question "How do I build it?"

The complete specification of the 'build-system section of pyproject.toml looks like:

[build-system]
requires = [...]
backend-path = [...]
build-backend = "..."

So for example, a fairly basic setuptools-based build would look like:

[build-system]
requires = ["setuptools", "wheel"]
build-backend = "setuptools.build_meta"

while if you wanted to use the Meson build system, it would look like:

[build-system]
requires = ['meson-python']
build-backend = 'mesonpy'

The build-backend is a string which specifies a module, object, or entry-point that satisfies a particular API specified in PEP 517. At a minimum it must have callables build_sdist(...) and build_wheel(...), but there are also hooks for specifying extra dependencies for each of these builds, and for preparing wheel metadata.

This means that projects can choose to build using any buildsystem that implements this API and still be compatible with standard tools like pip and build. This is a Good Thing.

But what about backend-path? This is a list of subdirectories of your project which are added to the Python path while looking for the build-backend - in other words, this is a hook to allow a project to specify their own build system. This is essentially the ultimate corner-case escape hatch: as long as you adhere to the API and produce compliant artifacts then you can do whatever you want. If you are interested in this Python packaging the hard way explores this topic by building a minimal compliant PEP 517 build backend.

But for most projects implementing a full build backend is unreasonable. In 90% of the cases existing tools do what you want, and it's only when you're looking at the 1% of extreme cases like NumPy or SciPy that the effort of implementing your own build-system might be less than the effort of adapting standard tools.

But there is that (smallish, but still there) middle-ground. The packages where you need to do a little something more than build a simple C extension and bundle up some Python modules.

Extending a Build Backend

As an example of this, I recently had to wrap a third-party C library that we didn't control the source code for. This library needed to be built into an object file that we could link our Cython extension against, but before that it needed patching, and it used a make-based build process.

Because I wanted things to be able to run under CI and also have other developers be able to work with the code, I wrote a Python script to automate the required steps, so getting a working test or development environment looked something like:

$ python -m prepare
$ python -m pip install -e .

which worked well.

But later we needed to be able to install using pip directly from a git repo in the dependencies of a second project, and in this case it didn't work - you can't tell pip to check out the git repo, call a custom thing and then continue with the install.

So we had a case where setuptools was doing almost everything that we wanted, but not quite enough. The obvious fix was to add things to the setup.py to do the ci.prepare stuff, but that has several downsides, the most obvious one being that either:

But the new build-system interface gives a 3rd way: setuptools gives us almost everything we need, so why don't we just wrap its PEP 517 API with our own build system that performs the extra step. This turns out to be extremely simple:

There are obvious improvements that could be made to this (in particular, putting the prepare and build_backend modules in a sub-directory rather than the top-level–the actual backend-path looked like backend-path = ["ci"]). But the basic idea Just Works.

What does this all mean? Basically PEP 517 gives an easy way for projects to extend an underlying build system, while keeping the power that those build-systems provide.

Where is This All Going?

Since the ideas of PEP 517 are fairly simple, I imagine that we will start to see additional wrappers around other build systems (for example, py-build-cmake wraps CMake) and stand-alone backends (such as Hatch and PDM). There are also proposals to extend the API, such as PEP 660.

But, to be blunt, any new build backend which is unable to, at a minimum, compile a self-contained C extension module is missing the point. They may find a niche, but they won't displace setuptools because they don't handle the huge secret strength of Python as a glue language. From my point of view, if they can't handle what distutils handled 20+ years ago then they are not bringing a lot of value to the table.

The wrappers around older systems (eg. CMake, SCons), or newer cross-language build systems (eg. Meson, Pants), are much more interesting because they potentially make it easier for libraries in other langauges to expose a Python wrapper. For example, if you have a CMake-based C++ library and want to expose it to Python you don't need to additionally package it with setuptools, but can just extend your existing build and use py-build-cmake.

I can see the utility of a library that helps you put together your own build backends, but there doesn't seem to be anything like that out there.

And there would be huge value in a build backend that:

That could potentially replace setuptools and setup.py: the NumPys and TensorFlows of the world won't use it, and that's fine–a more complete build-system is probably a better fit in any case–but it would be invaluable for those libraries that wrap a small C/C++ library or need to optimize a bit of their code to C.