PyPIContents is an application that generates a Module Index from the Python Package Index (PyPI) and also from various versions of the Python Standard Library.
Project description
PyPIContents is an application that generates a Module Index from the Python Package Index (PyPI) and also from various versions of the Python Standard Library.
PyPIContents generates a configurable index written in JSON format that serves as a database for applications like pipsalabim. It can be configured to process only a range of packages (by initial letter) and to have memory, time or log size limits. It basically aims to mimic what the Contents file means for a Debian based package repository, but for the Python Package Index.
This repository stores the application in the master branch. It also stores a Module Index in the contents branch that is updated daily through a Travis cron. Read below for more information on how to use one or the other.
For more information, please read the full documentation.
Getting started
Installation
The pypicontents program is written in python and hosted on PyPI. Therefore, you can use pip to install the stable version:
$ pip install --upgrade pypicontents
If you want to install the development version (not recomended), you can install directlty from GitHub like this:
$ pip install --upgrade https://github.com/CollageLabs/pypicontents/archive/master.tar.gz
Using the application
PyPIContents is divided in several commands.
pypicontents pypi
This command generates a JSON module index with information from PyPI. Read below for more information on how to use it:
$ pypicontents pypi --help usage: pypicontents pypi [options] General Options: -V, --version Print version and exit. -h, --help Show this help message and exit. Pypi Options: -l <level>, --loglevel <level> Logger verbosity level (default: INFO). Must be one of: DEBUG, INFO, WARNING, ERROR or CRITICAL. -f <path>, --logfile <path> A path pointing to a file to be used to store logs. -o <path>, --outputfile <path> A path pointing to a file that will be used to store the JSON Module Index (required). -R <letter/number>, --letter-range <letter/number> An expression representing an alphanumeric range to be used to filter packages from PyPI (default: 0-z). You can use a single alphanumeric character like "0" to process only packages beginning with "0". You can use commas use as a list o dashes to use as an interval. -L <size>, --limit-log-size <size> Stop processing if log size exceeds <size> (default: 3M). -M <size>, --limit-mem <size> Stop processing if process memory exceeds <size> (default: 2G). -T <sec>, --limit-time <sec> Stop processing if process time exceeds <sec> (default: 2100).
pypicontents stdlib
This command generates a JSON Module Index from the Python Standard Library. Read below for more information on how to use it:
$ pypicontents stdlib --help usage: pypicontents stdlib [options] General Options: -V, --version Print version and exit. -h, --help Show this help message and exit. Stdlib Options: -o <path>, --outputfile <path> A path pointing to a file that will be used to store the JSON Module Index (required). -p <version>, --pyver <version> Python version to be used for the Standard Library (default: 2.7).
pypicontents stats
This command gathers statistics from the logs generated by the pypi command. Read below for more information on how to use it:
$ pypicontents stats --help usage: pypicontents stats [options] General Options: -V, --version Print version and exit. -h, --help Show this help message and exit. Stats Options: -i <path>, --inputdir <path> A path pointing to a directory containing JSON files generated by the pypi command (required). -o <path>, --outputfile <path> A path pointing to a file that will be used to store the statistics (required).
pypicontents errors
This command summarizes errors found in the logs generated by the pypi command. Read below for more information on how to use it:
$ pypicontents errors --help usage: pypicontents errors [options] General Options: -V, --version Print version and exit. -h, --help Show this help message and exit. Errors Options: -i <path>, --inputdir <path> A path pointing to a directory containing JSON files generated by the pypi command (required). -o <path>, --outputfile <path> A path pointing to a file that will be used to store the errors (required).
pypicontents merge
This command searches for JSON files generated by the pypi or stdlib commands and combines them into one. Read below for more information on how to use it:
$ pypicontents merge --help usage: pypicontents merge [options] General Options: -V, --version Print version and exit. -h, --help Show this help message and exit. Merge Options: -i <path>, --inputdir <path> A path pointing to a directory containing JSON files generated by pypi or stdlib commands (required). -o <path>, --outputfile <path> A path pointing to a file that will be used to store the merged JSON files (required).
About the Module Index
In the pypi.json file (located in the contents branch) you will find a dictionary with all the packages registered at the main PyPI instance, each one with the following information:
{ "pkg_a": { "version": [ "X.Y.Z" ], "modules": [ "module_1", "module_2", "..." ], "cmdline": [ "path_1", "path_2", "..." ] }, "pkg_b": { "...": "..." }, "...": {}, "...": {} }
This index is generated using Travis. This is done by executing the setup.py file of each package through a monkeypatch that allows us to read the parameters that were passed to setup(). Check out pypicontents/api/process.py for more info.
Use cases
Search which package (or packages) contain a python module. Useful to determine a project’s requirements.txt or install_requires.
import json import urllib2 from pprint import pprint pypic = 'https://raw.githubusercontent.com/CollageLabs/pypicontents/contents/pypi.json' f = urllib2.urlopen(pypic) pypicontents = json.loads(f.read()) def find_package(contents, module): for pkg, data in contents.items(): for mod in data['modules']: if mod == module: yield {pkg: data['modules']} # Which package(s) content the 'django' module? # Output: pprint(list(find_package(pypicontents, 'django')))
Hint: Check out Pip Sala Bim.
Known Issues
Some packages have partial or totally absent data because of some of these reasons:
Some packages depend on other packages outside of stdlib. We try to override these imports but if the setup heavily depends on it, it will fail anyway.
Some packages are broken and error out when executing setup.py.
Some packages are empty or have no releases.
If a package gets updated on PyPI and the change introduces or deletes modules, then it won’t be reflected until the next index rebuild. You should check for the version field for consistency. Also, if you need a more up-to-date index, feel free to download this software and build your own index.
Getting help
If you have any doubts or problems, suscribe to our Gitter Chat and ask for help. You can also ask your question on StackOverflow (tag it pypicontents) or drop me an email at luis@collagelabs.org.
Contributing
See CONTRIBUTING.rst for details.
Release history
See HISTORY.rst for details.
License
Copyright 2016-2017, PyPIContents Developers (read AUTHORS.rst for a full list of copyright holders).
Released under a GPL-3 License (read COPYING.rst for license details).
Made with :heart: and :hamburger:
Web collagelabs.org · GitHub @CollageLabs · Twitter @CollageLabs
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.