Skip to main content

Mirroring tool that implements the client (mirror) side of PEP 381

Project description

This is a PyPI mirror client according to PEP 381.

Build status

bandersnatch
https://builds.gocept.com/job/bandersnatch/badge/icon
Packaging and PIP install
https://builds.gocept.com/job/bandersnatch-packaging-pip/badge/icon

Installation

The following instructions will place the bandersnatch executable in a virtualenv under bandersnatch/bin/bandersnatch.

pip

This installs the latest stable, released version.

$ virtualenv-2.7 bandersnatch
$ cd bandersnatch
$ bin/pip install -r https://bitbucket.org/pypa/bandersnatch/raw/stable/requirements.txt

zc.buildout

This installs the current development version. Use ‘hg up <version>’ and run buildout again to choose a specific release.

$ hg clone https://bitbucket.org/pypa/bandersnatch
$ cd bandersnatch
$ virtualenv-2.7 .
$ bin/python bootstrap.py
$ bin/buildout

Configuration

  • Run bandersnatch mirror - it will create an empty configuration file for you in /etc/bandersnatch.conf.

  • Review /etc/bandersnatch.conf and adapt to your needs.

  • Run bandersnatch mirror again. It will populate your mirror with the current status of all PyPI packages - roughly 50GiB at the time of writing.

  • Run bandersnatch mirror regularly to update your mirror with any intermediate changes.

Webserver

Configure your webserver to serve the web/ sub-directory of the mirror. For nginx it should look something like this:

server {
    listen 127.0.0.1:80;
    server_name <mymirrorname>;
    root <path-to-mirror>/web;
    autoindex on;
    charset utf-8;
}
  • Note that it is a good idea to have your webserver publish the HTML index files correctly with UTF-8 as the carset. The index pages will work without it but if humans look at the pages the characters will end up looking funny.

  • Make sure that the webserver uses UTF-8 to look up unicode path names. nginx gets this right by default - not sure about others.

Cron jobs

You need to set up one cron job to run the mirror itself. If you run a public mirror, then you need a second job that will create access statistics for aggregation on the master PyPI.

Here’s a sample that you could place in /etc/cron.d/bandersnatch:

LC_ALL=en_US.utf8
*/2 * * * * root bandersnatch mirror |& logger -t bandersnatch[mirror]
12 * * * * root bandersnatch update-stats |& logger -t bandersnatch[update-stats]

This assumes that you have a logger utility installed that will convert the output of the commands to syslog entries.

Maintenance

bandersnatch does not keep much local state in addition to the mirrored data. In general you can just keep rerunning bandersnatch mirror to make it fix errors.

If you delete the state files then the next run will force it to check everything against the master PyPI:

* delete ``./state`` file and ``./todo`` if they exist in your mirror directory
* run ``bandersnatch`` mirror to get a full sync

Be aware, that full syncs likely take hours depending on PyPIs performance and your network latency and bandwidth.

Operational notes

Case-sensitive filesystem needed

You need to run bandersnatch on a case-sensitive filesystem.

OS X natively does this OK even though the filesystem is not strictly case-sensitive and bandersnatch will work fine when running on OS X. However, tarring a bandersnatch data directory and moving it to, e.g. Linux with a case-sensitive filesystem will lead to inconsistencies. You can fix those by deleting the status files and have bandersnatch run a full check on your data.

Many sub-directories needed

The PyPI has a quite extensive list of packages that we need to maintain in a flat directory. Filesystems with small limits on the number of sub-directories per directory can run into a problem like this:

2013-07-09 16:11:33,331 ERROR: Error syncing package: zweb@802449
OSError: [Errno 31] Too many links: '../pypi/web/simple/zweb'

Specifically we recommend to avoid using ext3. Ext4 and newer does not have the limitation of 32k sub-directories.

Migrating from pep381client

  • remove old status files, but keep actual data (everything under web/)

  • create config file, port command parameters from old cronjobs

  • update cron jobs

Contact

If you have questions or comments, please submit a bug report to http://bitbucket.org/pypa/bandersnatch/issues/new.

Kudos

This client is based on the original pep381client by Martin v. Loewis.

Richard Jones was very patient answering questions at PyCon 2013 and made the protocol more reliable by implementing some PyPI enhancements.

1.6.1 (2014-09-24)

  • Create a new generation to enforce a full sync when upgrading. This is required to get the canonical names for all packages.

1.6 (2014-09-24)

  • Implement canonical package directory names to support an upcoming PIP release and other tools. (Thanks to @dstufft)

  • Fix a race condition where workers could get stuck indefinitely waiting for another item in a depleted queue. (Thanks to hongqn)

1.5 (2014-07-21)

  • Delete broken tests that I forgot to remove.

  • Reduce the officially sanctioned maximum number of connections.

1.4 (2014-04-15)

  • Move towards replacing the XMLRPC API with JSON to make our requests cacheable. Also reduces the amount of requests needed dramatically.

  • Remove apache stats script as this information is no longer being used anyway.

1.3 (2014-02-16)

  • Move to xmlrpc2 to get SSL verification on XML-RPC calls, too. (Fixes #40 and big thanks to @ewdurbin)

1.2 (2014-01-08)

  • Potential performance improvement: use requests’ session object to allow HTTP pipelining. Thanks to Wouter Bolsterlee for the recommendation in #39.

1.1 (2013-11-26)

  • Made code Python 2.6 compatible. Thanks to @ewdurbin for the pull request.

1.0.5 (2013-07-25)

  • Refactor lock acquisition to avoid shadowing exceptions when creating the lockfile vs. acquiring the lock.

  • Move from distribute back to setuptools.

1.0.4 (2013-07-10)

  • Slight brownbag release: the requirements.txt accidentally included a development version of py.test due to my usage of mr.developer.

1.0.3 (2013-07-08)

  • Fix brownbag release with broken ‘stable’ tag and missing requirements.txt update.

1.0.2 (2013-07-08)

  • Generate the index simple page ourselves: its not signed anyway and helps PyPI caching more aggressively.

  • Add a py.test plugin to actually show a green bar. Hopefully will be integrated into py.test in the near future.

  • Fix dealing with inconsistent todo files: empty files or with an incorrect header will just be deleted and processing resumes at the last known good state.

  • Mark up requirement of Python 2.7 (#19)

  • Fix dealing with new CDN cache issues. Thanks to @dstufft for making PyPI support mirrors again.

  • Improve test coverage.

1.0.1 (2013-04-18)

  • Fix packaging: include default config file. (Thanks to Jannis Leidel)

1.0 (2013-04-09)

  • Update pip install documentation to use the a URL for referring to the requirements.txt directly.

  • Adjust buildout and jenkins job to stop fighting over the distribute version to install.

1.0rc6 (2013-04-09)

  • Hopefully fixed updating the stable tag when releasing.

1.0rc5 (2013-04-09)

  • Experiment with zest.releaser integration to automatically generate requirements.txt during release process.

1.0rc4 (2013-04-09)

  • Experiment with zest.releaser integration to automatically generate requirements.txt during release process.

1.0rc3 (2013-04-09)

  • Experiment with zest.releaser integration to automatically generate requirements.txt during release process.

1.0rc2 (2013-04-09)

  • Experiment with zest.releaser integration to automatically generate requirements.txt during release process.

1.0rc1 (2013-04-09)

  • Initial release. Massive rewrite of pep381client.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bandersnatch-1.6.1.zip (27.1 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page