Skip to main content

A caching proxy for package downloads

Project description

Package-cache is a simple caching proxy for package downloads. Just configure a list of upstream sources (with the --source option) and point clients at the package-cache server. The first time a package is requested, we download that package from one of the sources and cache it locally, while also streaming it to the client. Future requests for that package are streamed directly from the local cache. This helps reduce the load on the network and source servers, if you have a number of local clients that will repeatedly request the same files (e.g. Gentoo’s distfiles).

We don’t do anything fancy with Cache-Control headers, since package source files should include the version stamp in the filename itself (e.g. my-package-0.1.2.tar.gz). Files are cached after the first request, and stored forever. This means that every package you’ve ever requested will still be there if you need it later. That’s nice, but it will end up consuming a fair amount of disk space. You might want to periodically cull the cache, using access times to see which files you are unlikely to want in the future.

Package-cache is written in Python, and has no dependencies outside the standard library.

Running package-cache

By default, we’ll use Python’s reference WSGI implementation to run our application:

$ package-cache --source http://distfiles.gentoo.org/

For other command-line options, see:

$ package-cache --help

If you need a more perfomant backend, you might try Gunicorn.

Gentoo

There’s an OpenRC init script in contrib/openrc/init.d, and a package-cache ebuild in my wtk overlay. To use package-cache as a caching proxy for your distfiles downloads, add the wtk overlay to layman and run:

# emerge --ask --verbose net-proxy/package-cache
# rc-update add default net-proxy/package-cache
# rc-service package-cache start

You can tweak the parameters by setting variables in /etc/conf.d/package-cache (PC_USER, PC_GROUP, CACHE_DIR, HOST, PORT, SOURCES, and PC_OPTS). See the init script for default values.

Once you’ve setup your package-cache service, just point your local package manager to the cache server instead of the usual mirror. For Portage, that’s going to be something like:

GENTOO_MIRRORS=”http://my-package-cache-server.net:4000/

in your /etc/portage/make.conf.

If you don’t want to tweak your clients (perhaps there are many of them, or they are out of your direct control), you can add some firewall rules to your router to transparently proxy specific Gentoo mirrors. With an internal eth1 and an internal proxy on 192.168.0.11, that looks something like:

# CACHE_IP=192.168.0.11
# for SOURCE_IP in $(dig +short distfiles.gentoo.org);
> do
>   iptables --table nat --append PREROUTING --protocol tcp \
>     --in-interface eth1 ! --source "${CACHE_IP}" \
>     --destination "${SOURCE_IP}" \
>     --match tcp --destination-port 80 \
>     --jump DNAT --to-destination "${CACHE_IP}:4000" ;
> done

To remove those entries later, repeat the command with --delete instead of --append. You may need to list the SOURCE_IP values explicitly if the DNS entries have changed. Run:

# iptables --table nat --list PREROUTING --numeric

to list the entries. See iptables(8) and iptables-extensions(8) for more details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

package-cache-0.2.tar.gz (18.5 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page