Skip to main content

PyPI caching mirror

Project description

proxpi

Build status codecov PyPI - Version

PyPI caching proxy

  • Host a proxy PyPI mirror server with caching
    • Cache the index (project list and projects' file list)
    • Cache the project files
  • Proxy multiple package indices
  • Provide and consume both JSON and HTML APIs
  • Set index cache times-to-live (individually for each index)
  • Set files cache max-size on disk
  • Manually invalidate index cache

See Alternatives.

Usage

Start server

Choose between running inside Docker container if you want to run in a known-working environment, or outside via a Python app (instructions here are for the Flask development server) if you want more control over the environment.

Note: the index cache and the management of the file cache runs in memory, but is not synchronised across multiple processes, so use multiple threads instead of multiple processes. The cache is thread-safe.

Docker

Uses a Gunicorn WSGI server

docker run -p 5000:5000 epicwink/proxpi

Without arguments, runs with 2 threads. If passing arguments, make sure to bind to an exported address (or all with 0.0.0.0) on port 5000 (ie --bind 0.0.0.0:5000).

Compose

Alternatively, use Docker Compose

docker compose up

Local

Install
pip install proxpi

Install proxpi[pretty] instead to get coloured logging and tracebacks (disable by setting environment variable NO_COLOR=1).

Run server
FLASK_APP=proxpi.server flask run

See flask run --help for more information on address and port binding, and certificate specification to use HTTPS. Alternatively, bring your own WSGI server.

Use proxy

Use PIP's index-URL flag to install packages via the proxy

pip install --index-url http://127.0.0.1:5000/index/ simplejson

Cache invalidation

Either head to http://127.0.0.1:5000/ in the browser, or run:

curl -X DELETE http://127.0.0.1:5000/cache/simplejson
curl -X DELETE http://127.0.0.1:5000/cache/list

If you need to invalidate a locally cached file, restart the server: files should never change in a package index.

Environment variables

  • PROXPI_INDEX_URL: index URL, default: https://pypi.org/simple/
  • PROXPI_INDEX_TTL: index cache time-to-live in seconds, default: 30 minutes. Disable index-cache by setting this to 0
  • PROXPI_EXTRA_INDEX_URLS: extra index URLs (comma-separated)
  • PROXPI_EXTRA_INDEX_TTLS: corresponding extra index cache times-to-live in seconds (comma-separated), default: 3 minutes, cache disabled when 0
  • PROXPI_CACHE_SIZE: size of downloaded project files cache (bytes), default 5GB. Disable files-cache by setting this to 0
  • PROXPI_CACHE_DIR: downloaded project files cache directory path, default: a new temporary directory
  • PROXPI_BINARY_FILE_MIME_TYPE=1: force file-response content-type to "application/octet-stream" instead of letting Flask guess it
  • PROXPI_DISABLE_INDEX_SSL_VERIFICATION=1: don't verify any index SSL certificates
  • PROXPI_DOWNLOAD_TIMEOUT: time (in seconds) before proxpi will redirect to the proxied index server for file downloads instead of waiting for the download, default: 0.9
  • PROXPI_CONNECT_TIMEOUT: time (in seconds) proxpi will wait for a socket to connect to the index server before requests raises a ConnectTimeout error to prevent indefinite blocking, default: none, or 3.1 if read-timeout provided
  • PROXPI_READ_TIMEOUT: time (in seconds) proxpi will wait for chunks of data from the index server before requests raises a ReadTimeout error to prevent indefinite blocking, default: none, or 20 if connect-timeout provided
  • PROXPI_LOGGING_LEVEL: Python logging level; default: INFO

Considerations with CI

proxpi was designed with three goals (particularly for continuous integration (CI)):

  • to reduce load on PyPI package serving
  • to reduce pip install times
  • not require modification to the current workflow

Specifically, proxpi was designed to run for CI services such as Travis, Jenkins, GitLab CI, Azure Pipelines and GitHub Actions.

proxpi works by caching index requests (ie which versions, wheel-types, etc are available for a given project, the index cache) and the project files themselves (to a local directory, the package cache). This means they will cache identical requests after the first request, and will be useless for just one pip install.

Cache persistence

As a basic end-user of these services, for at least most of these services you won't be able to keep a proxpi server running between multiple invocations of your project(s) CI pipeline: CI invocations are designed to be independent. This means the best that you can do is start the cache for just the current job.

A more advanced user of these CI services can bring their own runner (personally, my needs are for running GitLab CI). This means you can run proxpi on a fully-controlled server (eg EC2 instance), and proxy PyPI requests (during a pip command) through the local cache. See the instructions below.

Hopefully, in the future these CI services will all implement their own transparent caching for PyPI. For example, Azure already has Azure Artifacts which provides much more functionality than proxpi, but won't reduce pip install times for CI services not using Azure.

GitLab CI instructions

This implementation leverages the index URL configurable of pip and Docker networks. This is to be run on a server you have console access to.

  1. Create a Docker bridge network

    docker network create gitlab-runner-network
    
  2. Start a GitLab CI Docker runner using their documentation

  3. Run the proxpi Docker container

    docker run \
      --detach \
      --network gitlab-runner-network \
      --volume proxpi-cache:/var/cache/proxpi \
      --env PROXPI_CACHE_DIR=/var/cache/proxpi \
      --name proxpi epicwink/proxpi:latest
    

    You don't need to expose a port (the -p flag) as we'll be using an internal Docker network.

  4. Set pip's index URL to the proxpi server by setting it in the runner environment. Set runners[0].docker.network_mode to gitlab-runner-network. Add PIP_INDEX_URL=http://proxpi:5000/index/ and PIP_TRUSTED_HOST=proxpi to runners.environment in the GitLab CI runner configuration TOML. For example, you may end up with the following configuration:

    [[runners]]
      name = "awesome-ci-01"
      url = "https://gitlab.com/"
      token = "SECRET"
      executor = "docker"
      environment = [
        "DOCKER_TLS_CERTDIR=/certs",
        "PIP_INDEX_URL=http://proxpi:5000/index/",
        "PIP_TRUSTED_HOST=proxpi",
      ]
    
    [[runners.docker]]
      network_mode = "gitlab-runner-network"
      ...
    

This is designed to not require any changes to the GitLab CI project configuration (ie gitlab-ci.yml), unless it already sets the index URL for some reason (if that's the case, you're probably already using a cache).

Another option is to set up a proxy, but that's more effort than the above method.

Alternatives

  • simpleindex: routes URLs to multiple indices (including PyPI), supports local (or S3 with a plygin) directory of packages, no caching without custom plugins

  • bandersnatch: mirrors one index (eg PyPI), storing packages locally, or on S3 with a plugin. Manual update, no proxy

  • devpi: heavyweight, runs a full index (or multiple) in addition to mirroring (in place of proxying), supports proxying (with inheritance), supports package upload, server replication and fail-over

  • pypiserver: serves local directory of packages, proxy to PyPI when not-found, supports package upload, no caching

  • dumb-pypi: generates a static website of a package index pointing to statically-located files, no hosting (therefore no caching nor proxying unless configured in server)

  • PyPI Cloud: serves local or cloud-storage directory of packages, with redirecting/cached proxying to indexes, authentication and authorisation.

  • pypiprivate: serves local (or S3-hosted) directory of packages, no proxy to package indices (including PyPI)

  • Pulp: generic content repository, can host multiple ecosystems' packages. Python package index plugin supports local/S3 mirrors, package upload, proxying to multiple indices, no caching

  • pip2pi: manual syncing of specific packages, no proxy

  • nginx_pypi_cache: caching proxy using nginx, single index

  • Flask-Pypi-Proxy: unmaintained, no cache size limit, no caching index pages

  • http.server: standard-library, hosts directory exactly as laid out, no proxy to package indices (eg PyPI)

  • Apache with mod_rewrite: I'm not familiar with Apache, but it likely has the capability to proxy and cache (with eg mod_cache_disk)

  • Gemfury: hosted, managed. Private index is not free, documentation doesn't say anything about proxying

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

proxpi-1.3.0rc0.tar.gz (43.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

proxpi-1.3.0rc0-py3-none-any.whl (19.1 kB view details)

Uploaded Python 3

File details

Details for the file proxpi-1.3.0rc0.tar.gz.

File metadata

  • Download URL: proxpi-1.3.0rc0.tar.gz
  • Upload date:
  • Size: 43.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for proxpi-1.3.0rc0.tar.gz
Algorithm Hash digest
SHA256 c253767a721e27c6cc90f23973c1ce83a366a1872f75c73b87df89f3e804f477
MD5 742f3a945cb63fbc9885407fb37efa7c
BLAKE2b-256 69edd4fe4a4c4c808c586d6c6485bb8705bafcf0ae0914e26276b5c6be339010

See more details on using hashes here.

Provenance

The following attestation bundles were made for proxpi-1.3.0rc0.tar.gz:

Publisher: publish-python-package.yml on EpicWink/proxpi

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file proxpi-1.3.0rc0-py3-none-any.whl.

File metadata

  • Download URL: proxpi-1.3.0rc0-py3-none-any.whl
  • Upload date:
  • Size: 19.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for proxpi-1.3.0rc0-py3-none-any.whl
Algorithm Hash digest
SHA256 3cfa9360442da44d74bb1d111eb0a840ec93bcc0239cb3a318ab81851ab90475
MD5 cd6fea2468f40415dfc2794a8d029104
BLAKE2b-256 89413a745c715cfa451ecae4827f71e3a6db7596b90f3a2ef83342c6c318bab8

See more details on using hashes here.

Provenance

The following attestation bundles were made for proxpi-1.3.0rc0-py3-none-any.whl:

Publisher: publish-python-package.yml on EpicWink/proxpi

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page