strip out unnecessary pip packages from requirements

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 2 - Pre-Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Natural Language
- English
Programming Language

Project description

TLDR: requirements without unnecessary `pip` packages.

For the purpose of this exercise, an unnecessary pip package is any package 
that is not being imported by YOUR own Python code.  
Unless you've provided a configuration override stating that you want it.

Simplest example: initialize, scan and build in current directory

$ pipstripper --init --build

pip-stripper configuration generated @ sample/tst.seedworkdir01/pip-stripper.yaml

build phase - generating requirements at:
 sample/tst.seedworkdir01/requirements.dev.txt
 sample/tst.seedworkdir01/requirements.txt

$cat requirements.txt
Django==2.2
Jinja2==2.10.1
celery==4.3.0
cx-Oracle==6.4.1
psycopg2==2.8.2
python-dateutil==2.8.0

$ cat requirements.dev.txt
pyquery==1.4.0

How it works

Lots of your packages may not be needed. Let's say you've installed black 18.9b0.
A linter and autoformatter has no need to be on a server:

pip-stripper most likely won't find import black anywhere, so it will not put it in requirements

options

usage: pipstripper [-h] [--config CONFIG] [--noscan] [--build] [--init]
                   [--workdir WORKDIR] [--verbose]

optional arguments:
  -h, --help         show this help message and exit
  --config CONFIG    config file. if not provided will look for pip-
                     stripper.yaml in --workdir, current directory
  --noscan           don't scan to classify packages. build phase will re-use
                     existing pip-stripper.scan.yaml. [False].
  --build            read pip-stripper.scan.yaml to create
                     requirements.prod/dev.txt [False]
  --init             initialize the config file (as pip-stripper.yaml) if it
                     doesn't exist
  --workdir WORKDIR  work directory [defaults to config file's value or
                     current directory]
  --verbose          verbose mode. adds extra zzz_debug: entry to pip-
                     stripper.scan.yaml [False]

Three phases, `initialization`, `scan` and `build`.

Initialization (defaults to False)

The first option --init will create pip-stripper.yaml, the configuration file for pip-stripper.

This is the only file you should edit manually!!!

Scan phase (will run unless you specify `--noscan`)

This will scan your Python source files in --workdir and use it to create pip-stripper.scan.yaml.

This is the file that contains instructions for the build phase.

Don't edit pip-stripper.scan.yaml!

Instead:

adjust the configuration in pip-stripper.yaml
re-run the scan

Scanning also creates 2 work files, tmp.pip-stripper.freeze.rpt and tmp.pip-stripper.imports.rpt, tracking pip packages and its best guesses at python imports, respectively.

Build (defaults to False)

--build takes what it finds in pip-stripper.scan.yaml and uses it to populate requirements.txt and requirements.dev.txt.

If those requirements files don't suit you, you may need to edit pip-stripper.yaml.

Editing `pip-stripper.yaml`

This allows you to specify:

pip vs import aliases
specify which packages are just workstation-level and shouldn't go into requirements.
hardcode packages that need to go into either.
Associating your source directories to either prod or dev.

Aliases

You may have to enter pip to python import alias names manually (alias matching is something that needs work).

hardcoded_aliases:
  PyYAML: yaml

Hardcoding package to requirement mapping:

Because psycopg2 is typically never really imported in a Django or SQLAlchemy context, but rather derived from the configuration, you need to specify it yourself as below. Same thing with django-redis-cache which is configured in django's settings.py as package path rather than an import.

ClassifierPip:

  #the following are used to "hardcode" package names to given buckets.
  buckets:
    prod:
      - psycopg2
      - django-redis-cache

    tests:
      - nose
      - pytest

    workstation:
      # that's a workstation only package, so it's held back
      - black

Associating Python directories to requirements:

This is a typical regex-based configuration telling which buckets the directories count as:

prod is the default outcome. First match wins, and tests is the only one needed.

ClassifierImport:
  regex_dirs:
    workstation: []
    dev: []
    tests:
      - "/tests/"
    prod: []

  default_bucket: "prod"

Walkthrough:

case 1: hardcoding

Given a pip freeze line like

psycopg2==2.7.7

First, it looks for a matching entry in ClassifierPip:buckets in pip-stripper.yaml, (basically a hardcoded decision by the user of where to put it).

ClassifierPip:
  buckets:
    workstation:
      - black
    prod:
      - psycopg2

This will result in psycopg2==2.7.7 going into requirements.prod.txt (when needed, requirements lines are always copied from the pip freeze output).

case 2: import classification.

grep-ing the Python code found this line:

./tests/helper_pyquery.py:57:    from pyquery import PyQuery

Each file path is run against the regex specified by ClassifierImport:regex_dirs, so pyquery ends up in the tests bucket.

ClassifierImport:
  regex_dirs:
    tests:
      - "/tests/"
    prod: []
  default_bucket: "prod"

finally, the --build pass looks at where buckets get mapped:

Builder:
  req_mapper:
    dev:
      buckets:
        - dev
        - tests   
    prod:
      buckets:
        - prod

which puts pyquery in requirements.dev.txt.

case 3 multiple imports

./tests/helper_pyquery.py:57:    from pyquery import PyQuery
./myserver/foobar.py:22:    from pyquery import PyQuery

As before, pyquery is put in tests and prod buckets. But also in prod as myserver did not match any ClassifierImport:regex_dirs, meaning that default_bucket: "prod" was used.

enter bucket precedence

ClassifierPip:
  bucket_precedence:
    - prod
    - tests
    - dev
    - workstation

prod beats tests so pyquery ends up only in prod bucket.

case 4. no import match was found and nothing was hardcoded.

Babel==2.6.0

will get left out of requirements. Which is not to say that it won't end up pip installed on your server if it is a dependency of some other package (pipdeptree can help you there).

Babel==2.6.0
  - pytz [required: >=0a, installed: 2018.9]

Build Phase - the result of the `--scan` phase gets put into pip-stripper.scan.yaml:

Notice our friend black? We've explicitly classified it as workstation, so the scan didn't label it as unknown.

pips:
  buckets:
    dev:
    	....
    tests:
    - pyquery
    prod:
    - psycopg2
    unknown:
    - Babel
    workstation:
    - black

Look at the end for the warnings: section. In this case, repr was used with Python 2.7 but isn't necessary with Python 3, so I won't worry about it. A typical reason for a missing import is that automatic aliasing to link the pip name and import name didn't work.

warnings:
- missing import:repr

And that's it. The outcome?

my raw pip freeze weighs in at 158 packages:

$ wc -l requirements.freeze_raw.txt
158 requirements.freeze_raw.txt

my stripped down requirements ended up with 24 packages total:

$ wc -l requirements*txt | egrep 'prod|dev'
6 requirements.dev.txt
18 requirements.prod.txt

On my test environment, pip install -r requirements.prod.txt -r requirements.dev.txt got me 48 packages, after dependencies were pulled in.

$ pip freeze | wc -l
48

This is all very nice, but hopefully you have sufficient tests to allow you to be confident you didn't miss anything!

The way you can test is to create a new virtualenv, pip install both requirements files and then run your tests.

WARNING: be cautious in hardcoding package associations in dev/tests buckets rather than prod. Your tests could run to success, but production would still fail.

These are pretty conservative as nose and pytest are really only testing tools.

ClassifierPip:
  buckets:
    tests:
      - nose
      - nose2
      - pytest

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 2 - Pre-Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Natural Language
- English
Programming Language

Release history Release notifications | RSS feed

This version

0.3.0

Apr 23, 2019

0.2.0

Apr 16, 2019

0.1.8

Apr 16, 2019

0.1.7

Apr 16, 2019

0.1.6

Apr 15, 2019

0.1.5

Apr 15, 2019

0.1.1

Apr 13, 2019

0.1.0

Apr 13, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pip_stripper-0.3.0.tar.gz (29.7 kB view details)

Uploaded Apr 23, 2019 Source

File details

Details for the file pip_stripper-0.3.0.tar.gz.

File metadata

Download URL: pip_stripper-0.3.0.tar.gz
Upload date: Apr 23, 2019
Size: 29.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.8

File hashes

Hashes for pip_stripper-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`b1b7b0b824ac35c91c08bc50e02c46bc92eab1f0303ed57204d5420630c3fc42`
MD5	`3729a0e5225312a85cf39425c8ecad13`
BLAKE2b-256	`54cf37b45bae95c536070a48daeed5115dce0814444d916b6bbd3b52c7a7e263`

See more details on using hashes here.

pip-stripper 0.3.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

TLDR: requirements without unnecessary pip packages.

Simplest example: initialize, scan and build in current directory

How it works

options

Three phases, initialization, scan and build.

Initialization (defaults to False)

Scan phase (will run unless you specify --noscan)

Build (defaults to False)

Editing pip-stripper.yaml

Aliases

Hardcoding package to requirement mapping:

Associating Python directories to requirements:

Walkthrough:

case 1: hardcoding

case 2: import classification.

case 3 multiple imports

case 4. no import match was found and nothing was hardcoded.

Build Phase - the result of the --scan phase gets put into pip-stripper.scan.yaml:

And that's it. The outcome?

This is all very nice, but hopefully you have sufficient tests to allow you to be confident you didn't miss anything!

WARNING: be cautious in hardcoding package associations in dev/tests buckets rather than prod. Your tests could run to success, but production would still fail.

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes

TLDR: requirements without unnecessary `pip` packages.

Three phases, `initialization`, `scan` and `build`.

Scan phase (will run unless you specify `--noscan`)

Editing `pip-stripper.yaml`

Build Phase - the result of the `--scan` phase gets put into pip-stripper.scan.yaml: