From 2218dc02d683e45669273d177d90d43e51796fdb Mon Sep 17 00:00:00 2001 From: Jonathan Helmus Date: Fri, 13 Sep 2024 11:24:10 -0500 Subject: [PATCH] CEP for specifying the location of site-packages in repodata --- cep-999.md | 162 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 162 insertions(+) create mode 100644 cep-999.md diff --git a/cep-999.md b/cep-999.md new file mode 100644 index 0000000..03e07bb --- /dev/null +++ b/cep-999.md @@ -0,0 +1,162 @@ + + + + + + + + +
Title Optional python site-packages path in repodata
Status Proposed
Author(s) Jonathan Helmus <jjhelmus@gmail.com>
Created Sep 13, 2024
Updated Sep 13, 2024
Discussion NA
Implementation NA
+ +To avoid confusion this document will use a stylized `python` to indicate a conda package with that name. +Other uses of “Python” will be followed by a qualifier, like "programming language" or "interpreter". +Specific implementations of the Python programming language will be referred to by name, such as CPython or PyPy. + +## Abstract + +We propose adding a new optional field in repodata.json that `python` packages can use to specify the destination path when installing `noarch: python` packages. + +## Background + +In the conda ecosystem the package named `python` provides, either directly or via a dependency, an interpreter for the Python programming language. +This interpreter is often CPython but can also be alternative implementations such as PyPy or GraalPy. + +Tools like conda, mamba and pixie support installing `noarch: python` packages into conda environments. +Whereas the content of other conda packages are linked into the target environment based upon their path within the package, files within the "site-packages" directory of `noarch: python` packages are linked into the site-packages directory of the target environment. + +In conda 24.7.1 and many other versions the location of the site-packages directory within the environment is "Lib/site-packages" on Windows and "lib/pythonX.Y/site-packages" on other platforms where X and Y and the major and minor versions of the `python` package installed in the environment. +These paths are the defaults for CPython but are not the correct paths for alternative implementation of the Python programming language or for some configurations of CPython. +For example PyPy uses "lib/pypyX.Y/site-packages" on POSIX systems. + +In these cases kludgy workarounds, [often a symlink](https://github.com/conda-forge/pypy3.6-feedstock/blob/ac4201e80432469971626d277718edf80b22d6ef/recipe/build.sh#L102), are used in the `python` package to support the installation of `noarch: python` packages. +In the next section a new field will be proposed that allows `python` package to explicitly declare the location of the site-packages directory to avoid the need for these work-arounds. + +## Specification + +An optional field `python_site_packages_path` may be included in a conda package’s info/index.json file as a string and in the corresponding repodata.json entry for the package. +When defined, this field specifies the path of the Python interpreter’s site-packages directory relative to the root of the environment. +If a `python` package that includes this field is installed into an environment, all `noarch: python` packages that are installed into the environment will have the files in their "site-packages" directory linked into the path specified by this field. + +A value of "null" can be used to indicate that this field is not specified. +It is acceptable to not include the field in repodata.json in this case. + +If this field is missing or specified as "null" a default site-packages path will be used. +This path is "Lib/site-packages" if the environment targets a Windows platform and "lib/pythonX.Y/site-packages" on other platforms where X and Y and the major and minor versions of the `python` package installed in the environment. +This matches the current behavior of conda, mamba and pixie. + +This field should only be included in the metadata for `python` packages but tools must not fail if packages with other names include this field, rather the entry should be ignored. + +To avoid unnecessarily increasing the size of repodata.json, it is highly recommended that this field only be included in `python` packages where it is required, that is where the site-packages directory is not the default value. +Packages with other names or `python` packages that use the default site-packages path should not include this field. + +This path must not point to a location outside of the root of the environment (e.g. it cannot navigate up a directory or be an absolute path). +If a package specifies a path which violates this requirement the package must not be installed and an appropriate error should be shown to the user. + +A possible way to check the validity of the `python_site_packages_path` field is with this function: + +``` Python +def is_valid(target_prefix: str, python_site_packages_path: str) -> bool: + target_prefix = os.path.realpath(target_prefix) + full_path = os.path.realpath(os.path.join(target_prefix, python_site_packages_path)) + test_prefix = os.path.commonpath((target_prefix, full_path)) + return test_prefix == target_prefix +``` + +When a package which includes this optional field is indexed the `python_site_packages_path` field will be included in the repodata entry for the package. +For example the repodata entry for a python-3.13.0rc1 package using the free-threading build configuration might look as follows: + +``` + "python-3.13.0rc1-haa6bb3f_0_cpython_cp313t.tar.bz2": { + "build": "haa6bb3f_0_cpython_cp313t", + "build_number": 0, + "depends": [ + "bzip2 >=1.0.8,<2.0a0", + "expat >=2.6.2,<3.0a0", + "libffi >=3.4,<4.0a0", + "ncurses >=6.4,<7.0a0", + "openssl >=3.0.14,<4.0a0", + "readline >=8.1.2,<9.0a0", + "sqlite >=3.45.3,<4.0a0", + "tk >=8.6.14,<8.7.0a0", + "tzdata", + "xz >=5.4.6,<6.0a0", + "zlib >=1.2.13,<1.3.0a0" + ], + "license": "PSF-2.0", + "license_family": "PSF", + "timestamp": 1722610680768, + "track_features": "free-threading", + "md5": "c09289eb86239e1221533457d861f1a3", + "name": "python", + "size": 16332720, + "subdir": osx-arm64, + "version": "3.13.0rc1", + "sha256": "fa0ae22c13450fe6c30c754ee5efbd7fe7e7533b878d7be96e74de56211d19df", + "python_site_packages_path": "lib/python3.13t/site-packages" + }, +``` + +This field will also be included in repodata derivatives, like current_repodata.json or sharded repodata when appropriate. + +Because this field is present in repodata.json it can be hot-fixed to correct mistakes or omissions. +This allows existing `python` packages to retroactively specify the locations of their site-packages directory. + +## Motivation + +This proposal was motivated by the free-threading configuration of CPython 3.13 which uses a different site-packages location on POSIX systems. +Specifically, the free-threading build uses "lib/python3.13t/site-packages". + +A symlink between these paths or logic in [sitecustomize.py](https://docs.python.org/3/library/site.html#module-sitecustomize) allows packages installed into the incorrect site-packages directory to operate but this is less than ideal. + +For additional details and discussion see [conda issue 14053](https://github.com/conda/conda/issues/14053). + +Although this is motivated by the free-threading configuration of CPython, the same underlying issue occurs in many non-CPython implementations. +This includes both PyPy and GraalPy which use different paths for site-packages than CPython. +The conda packages in the conda-forge channel for both of these use symlinks to work around tools which link files into the incorrect site-packages directory. + +## Security + +Because this proposal can change the location where files are installed it is worth considering what, if any, security implementation this change entails. + +A primary security concern is that by allowing the `python` package to specify where files will be installed, a malicious package could write or overwrite files with malicious content. + +If files can only be installed into the environment where the package is installed, the damage from an attack is limited since a malicious package can already install files anywhere in the environment. + +For example a malicious `python` package could specify an invalid site-packages path, in which case `noarch: python` packages installed in the environment would not work. +This is inconvenient but not a security concern and should be addressed by removing or disabling the package in question. + +The malicious package could also specify a site-packages path that would cause files from `noarch: python` packages to populate or replace key files in the environment. +But this type of attack is already possible, in fact the `python` package itself could include these files. + +Of more concern is the possibility of a package writing files outside of the environment. +This could replace critical system files with malicious content. +Because of this risk, paths which are outside of the environment are invalid as `python_site_packages_path` entries. +Tools must check that the field is valid and error out when it is invalid. + +## Backwards Compatibility + +This change maintains backwards compatibility. +All existing `python` packages do not include this field. +The behavior when this field is not specified matches what is currently done by conda, mamba and pixi. +The only behavior change occurs when this field is included. + +Because existing releases of conda, mamba and pixie do not support this field, `python` packages should continue to provide work around to allow files to be installed into an incorrect site-packages directory until such time as this proposal has been implemented and available for a sufficient amount of time. + +## Other sections + +A number of alternatives were discussed and considered to address this problem. These include: + +* Using symlinks or a `sitecustomize.py` file to allow linking into the incorrect site-packages directory. + This is a work-around for the problem, not a fix. +* Querying the interpreter to determine the site-packages directory. + This was rejected as it requires running the interpreter as part of the install step and requires that the `python` package be installed and operational before `noarch: python` packages can be linked. +* Using the build string of the `python` or `python_abi` package to determine the site-packages location. + This is an indirect method of obtaining information that can be more effectively specified explicitly. +* Storing this information is another metadata file, such as "about.json" or "paths.json". + The downside of this approach is that the data cannot be hotfixed and the path for the site-packages directory would not be known until the `python` package is downloaded and unpacked, potentially limiting the order in which packages can be linked. + +For more alternatives and discussion see [conda issue 14053](https://github.com/conda/conda/issues/14053). + +## Copyright + +All CEPs are explicitly [CC0 1.0 Universal](https://creativecommons.org/publicdomain/zero/1.0/).