Skip to content

Commit

Permalink
CEP for specifying the location of site-packages in repodata
Browse files Browse the repository at this point in the history
  • Loading branch information
jjhelmus committed Sep 13, 2024
1 parent 189d482 commit 2218dc0
Showing 1 changed file with 162 additions and 0 deletions.
162 changes: 162 additions & 0 deletions cep-999.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,162 @@
<table>
<tr><td> Title </td><td> Optional python site-packages path in repodata </td>
<tr><td> Status </td><td> Proposed </td></tr>
<tr><td> Author(s) </td><td> Jonathan Helmus &lt;[email protected]&gt;</td></tr>
<tr><td> Created </td><td> Sep 13, 2024</td></tr>
<tr><td> Updated </td><td> Sep 13, 2024</td></tr>
<tr><td> Discussion </td><td> NA </td></tr>
<tr><td> Implementation </td><td> NA </td></tr>
</table>

To avoid confusion this document will use a stylized `python` to indicate a conda package with that name.
Other uses of “Python” will be followed by a qualifier, like "programming language" or "interpreter".
Specific implementations of the Python programming language will be referred to by name, such as CPython or PyPy.

## Abstract

We propose adding a new optional field in repodata.json that `python` packages can use to specify the destination path when installing `noarch: python` packages.

## Background

In the conda ecosystem the package named `python` provides, either directly or via a dependency, an interpreter for the Python programming language.
This interpreter is often CPython but can also be alternative implementations such as PyPy or GraalPy.

Tools like conda, mamba and pixie support installing `noarch: python` packages into conda environments.
Whereas the content of other conda packages are linked into the target environment based upon their path within the package, files within the "site-packages" directory of `noarch: python` packages are linked into the site-packages directory of the target environment.

In conda 24.7.1 and many other versions the location of the site-packages directory within the environment is "Lib/site-packages" on Windows and "lib/pythonX.Y/site-packages" on other platforms where X and Y and the major and minor versions of the `python` package installed in the environment.
These paths are the defaults for CPython but are not the correct paths for alternative implementation of the Python programming language or for some configurations of CPython.
For example PyPy uses "lib/pypyX.Y/site-packages" on POSIX systems.

In these cases kludgy workarounds, [often a symlink](https://github.com/conda-forge/pypy3.6-feedstock/blob/ac4201e80432469971626d277718edf80b22d6ef/recipe/build.sh#L102), are used in the `python` package to support the installation of `noarch: python` packages.
In the next section a new field will be proposed that allows `python` package to explicitly declare the location of the site-packages directory to avoid the need for these work-arounds.

## Specification

An optional field `python_site_packages_path` may be included in a conda package’s info/index.json file as a string and in the corresponding repodata.json entry for the package.
When defined, this field specifies the path of the Python interpreter’s site-packages directory relative to the root of the environment.
If a `python` package that includes this field is installed into an environment, all `noarch: python` packages that are installed into the environment will have the files in their "site-packages" directory linked into the path specified by this field.

A value of "null" can be used to indicate that this field is not specified.
It is acceptable to not include the field in repodata.json in this case.

If this field is missing or specified as "null" a default site-packages path will be used.
This path is "Lib/site-packages" if the environment targets a Windows platform and "lib/pythonX.Y/site-packages" on other platforms where X and Y and the major and minor versions of the `python` package installed in the environment.
This matches the current behavior of conda, mamba and pixie.

This field should only be included in the metadata for `python` packages but tools must not fail if packages with other names include this field, rather the entry should be ignored.

To avoid unnecessarily increasing the size of repodata.json, it is highly recommended that this field only be included in `python` packages where it is required, that is where the site-packages directory is not the default value.
Packages with other names or `python` packages that use the default site-packages path should not include this field.

This path must not point to a location outside of the root of the environment (e.g. it cannot navigate up a directory or be an absolute path).
If a package specifies a path which violates this requirement the package must not be installed and an appropriate error should be shown to the user.

A possible way to check the validity of the `python_site_packages_path` field is with this function:

``` Python
def is_valid(target_prefix: str, python_site_packages_path: str) -> bool:
target_prefix = os.path.realpath(target_prefix)
full_path = os.path.realpath(os.path.join(target_prefix, python_site_packages_path))
test_prefix = os.path.commonpath((target_prefix, full_path))
return test_prefix == target_prefix
```

When a package which includes this optional field is indexed the `python_site_packages_path` field will be included in the repodata entry for the package.
For example the repodata entry for a python-3.13.0rc1 package using the free-threading build configuration might look as follows:

```
"python-3.13.0rc1-haa6bb3f_0_cpython_cp313t.tar.bz2": {
"build": "haa6bb3f_0_cpython_cp313t",
"build_number": 0,
"depends": [
"bzip2 >=1.0.8,<2.0a0",
"expat >=2.6.2,<3.0a0",
"libffi >=3.4,<4.0a0",
"ncurses >=6.4,<7.0a0",
"openssl >=3.0.14,<4.0a0",
"readline >=8.1.2,<9.0a0",
"sqlite >=3.45.3,<4.0a0",
"tk >=8.6.14,<8.7.0a0",
"tzdata",
"xz >=5.4.6,<6.0a0",
"zlib >=1.2.13,<1.3.0a0"
],
"license": "PSF-2.0",
"license_family": "PSF",
"timestamp": 1722610680768,
"track_features": "free-threading",
"md5": "c09289eb86239e1221533457d861f1a3",
"name": "python",
"size": 16332720,
"subdir": osx-arm64,
"version": "3.13.0rc1",
"sha256": "fa0ae22c13450fe6c30c754ee5efbd7fe7e7533b878d7be96e74de56211d19df",
"python_site_packages_path": "lib/python3.13t/site-packages"
},
```

This field will also be included in repodata derivatives, like current_repodata.json or sharded repodata when appropriate.

Because this field is present in repodata.json it can be hot-fixed to correct mistakes or omissions.
This allows existing `python` packages to retroactively specify the locations of their site-packages directory.

## Motivation

This proposal was motivated by the free-threading configuration of CPython 3.13 which uses a different site-packages location on POSIX systems.
Specifically, the free-threading build uses "lib/python3.13t/site-packages".

A symlink between these paths or logic in [sitecustomize.py](https://docs.python.org/3/library/site.html#module-sitecustomize) allows packages installed into the incorrect site-packages directory to operate but this is less than ideal.

For additional details and discussion see [conda issue 14053](https://github.com/conda/conda/issues/14053).

Although this is motivated by the free-threading configuration of CPython, the same underlying issue occurs in many non-CPython implementations.
This includes both PyPy and GraalPy which use different paths for site-packages than CPython.
The conda packages in the conda-forge channel for both of these use symlinks to work around tools which link files into the incorrect site-packages directory.

## Security

Because this proposal can change the location where files are installed it is worth considering what, if any, security implementation this change entails.

A primary security concern is that by allowing the `python` package to specify where files will be installed, a malicious package could write or overwrite files with malicious content.

If files can only be installed into the environment where the package is installed, the damage from an attack is limited since a malicious package can already install files anywhere in the environment.

For example a malicious `python` package could specify an invalid site-packages path, in which case `noarch: python` packages installed in the environment would not work.
This is inconvenient but not a security concern and should be addressed by removing or disabling the package in question.

The malicious package could also specify a site-packages path that would cause files from `noarch: python` packages to populate or replace key files in the environment.
But this type of attack is already possible, in fact the `python` package itself could include these files.

Of more concern is the possibility of a package writing files outside of the environment.
This could replace critical system files with malicious content.
Because of this risk, paths which are outside of the environment are invalid as `python_site_packages_path` entries.
Tools must check that the field is valid and error out when it is invalid.

## Backwards Compatibility

This change maintains backwards compatibility.
All existing `python` packages do not include this field.
The behavior when this field is not specified matches what is currently done by conda, mamba and pixi.
The only behavior change occurs when this field is included.

Because existing releases of conda, mamba and pixie do not support this field, `python` packages should continue to provide work around to allow files to be installed into an incorrect site-packages directory until such time as this proposal has been implemented and available for a sufficient amount of time.

## Other sections

A number of alternatives were discussed and considered to address this problem. These include:

* Using symlinks or a `sitecustomize.py` file to allow linking into the incorrect site-packages directory.
This is a work-around for the problem, not a fix.
* Querying the interpreter to determine the site-packages directory.
This was rejected as it requires running the interpreter as part of the install step and requires that the `python` package be installed and operational before `noarch: python` packages can be linked.
* Using the build string of the `python` or `python_abi` package to determine the site-packages location.
This is an indirect method of obtaining information that can be more effectively specified explicitly.
* Storing this information is another metadata file, such as "about.json" or "paths.json".
The downside of this approach is that the data cannot be hotfixed and the path for the site-packages directory would not be known until the `python` package is downloaded and unpacked, potentially limiting the order in which packages can be linked.

For more alternatives and discussion see [conda issue 14053](https://github.com/conda/conda/issues/14053).

## Copyright

All CEPs are explicitly [CC0 1.0 Universal](https://creativecommons.org/publicdomain/zero/1.0/).

0 comments on commit 2218dc0

Please sign in to comment.