Skip to main content

Generate conda environment files from Python source code

Project description

Purpose

The goal of conda_deps is to generate a conda environment file as a result of the dependencies found in a repository. At the moment, it only translates Python and R dependencies but it would be great to have it working for other programming languages as well.

conda_deps translates import statements in Python source code like:

import numpy
import scipy

into a conda environment yaml file:

name: testenv

channels:
- conda-forge
- bioconda
- defaults

dependencies:
- python
- numpy
- scipy

For R it translates library imports like:

library(reshape2)
library(ggplot2)

into:

name: testenv

channels:
- conda-forge
- bioconda
- defaults

dependencies:
- r-base
- r-reshape2    
- r-ggplot2

Installation

conda_deps only works in Python 3 and will only scan properly Python 3 source code. There should be no restriction in the case of R.

conda_deps has been uploaded to conda-forge so you can install it with:

# if you don't have conda available:
curl -O https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh -b -p conda-install
source conda-install/etc/profile.d/conda.sh 
conda update --all --yes

# once conda is available:
conda create --name conda_deps --channel conda-forge conda_deps
conda activate conda_deps
conda_deps --help

Usage

This is how you scan a single Python or R file:

conda_deps </path/to/filename>

The script can also scan folders:

conda_deps </path/to/folder/>

In case you want to exclude one or more subfolders, use the --exclude-folder option one or more times:

conda_deps --exclude-folder </path/to/folder/folder1> </path/to/folder>

You may also want to scan additonal files of folders:

python conda_deps.py </path/to/folder> --include-files my-script.py --include-files </another/folder>

How it works

Python source code

The script uses Python's Abstract Syntax Trees to parse files ending in .py. It looks for import <module> statements, and discards the modules belonging to the Python Standard Library (e.g. import os). It assumes that <module> has a corresponding conda package with the same name (e.g. import numpy corresponds to conda install numpy). However, that is not always the case and you can provide a proper translation between the module name and its corresponding conda package (e.g. import yaml will require conda install pyyaml) via the python_deps.json file, which will be loaded into a dictionary at the beginning of the script. It looks like this:

{
    "Bio":"biopython",
    "Cython":"cython",
    "bs4":"beautifulsoup4",
    "bx":"bx-python",
    "lzo":"python-lzo",
    "pyBigWig":"pybigwig",
    "sklearn":"scikit-learn",
    "web":"web.py",
    "weblogolib":"python-weblogo",
    "yaml":"pyyaml"
}    

The dictionary key is the name in import <module> and the value is the name of the conda package.

The python_deps.json file is meant to be useful for generic use. However, it is possible to include additional json files specific to your project:

conda_deps --include-py-json my_project.json </path/to/project/>

The translations in my_project.json will take priority over those in python_deps.json.

If you find that there are missing translations in the general purpose python_deps.json file, please feel free to open a pull request to add more.

R source code

In the case of R files, it uses grep to look for library(name) regular expressions in files ending in .R. The same way we use a json file to detail translations for Python, we use the r_deps.json file which will be loaded into a dictionary at the beginning of the script. Here is how it looks like:

{
    "dplyr":"r-dplyr",
    "edgeR":"bioconductor-edger",
    "flashClust":"r-flashclust",
    "gcrma":"bioconductor-gcrma",
    "ggplot2":"r-ggplot2",
    "gplots":"r-gplots",
    "gridExtra":"r-gridextra",
    "grid":"r-gridbase",
    "gtools":"r-gtools",
    "hpar":"bioconductor-hpar",
    "knitr":"r-knitr",
    "limma":"bioconductor-limma",
    "maSigPro":"bioconductor-masigpro",
}

In this case the dictionary key is the name in library(name) and the value is the name of the conda package.

If you are missing a translation in r_deps.json you can either open a pull request to add it or include it in your own json file:

conda_deps --include-r-json my_project.json </path/to/project/>

Please note that the translations in my_project.json will take priority over those in r_deps.json.

Related tools

  • snakefood: a more comprehensive tool but it works only with Python 2.
  • pipreqs: does a similar job but for requirements.txt files and pip.

References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

conda_deps-0.0.4.tar.gz (7.8 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page