Skip to main content

cinful: A fully automated pipeline to identify microcinswith associated immunity proteins and export machinery

Project description

cinful

A fully automated pipeline to identify microcins along with their associated immunity proteins and export machinery

Installation

First, make sure to clone this repository:

git clone https://github.com/wilkelab/cinful.git

All software dependencies needed to run cinful are available through conda and are specified in cinful_conda.yml, the following helper script can be used to generate the cinful conda environment scripts/build_conda_env.sh, to run this script, you will need to have conda installed, as well as mamba (which helps speed up installation). To install mamba, use the following command:

conda install mamba -c conda-forge

To build the environment, run

bash scripts/build_conda_env.sh

Once setup is complete, you can activate the environment with

conda activate cinful

How to use

cinful takes a directory containing genome assemblies as input. All assemblies in the directory must end in .fna, if they end in a different extension, cinful will ignore them.

Snakemake is the core workflow management used by cinful, the main snakefile is located under cinful/Snakefile, which issues subroutines located in cinful/rules.

If installed properly, running python cinful.py -h will produce the following output.

cinful

optional arguments:
  -h, --help            show this help message and exit
  -d DIRECTORY, --directory DIRECTORY
                        Must be a directory containing uncompressed FASTA formatted genome assemblies with
                        .fna extension. Files within nested directories are fine
  -o OUTDIR, --outDir OUTDIR
                        This directory will contain all output files. It will be nested under the input
                        directory.
  -t THREADS, --threads THREADS
                        This specifies how many threads to allow snakemake to have access to for
                        parallelization

Example usage

There is a test dataset with an E. coli genome assembly to test cinful on under test/colcinV_Ecoli, you can run cinful on this dataset by running the following from the initial cinful directory:

python cinful/cinful.py -d test/colcinV_Ecoli -o <output_directory> -t <threads>

Workflow

The following workflow will be executed. cinful

Three output directories will be generated in your assembly_directory under a directory called cinfulOut.

  • 00_dbs
    • This is the initial location of the databases of verified microcins, CvaB, and immunity proteins.
  • 01_orf_homology
    • Prodigal will generate Open Reading Frame (ORF) predictions for the input assemblies
    • Those ORFs will be searched against the previously mentioned databases
  • 02_homology_results
    • The results from all the homology searches will be merged here
  • 03_best_hits
    • The top hits from the homology results will be placed here

Contributing

cinful currently exists as a wrapper to a series of snakemake subroutines, so adding functionality to it is as simple as adding additional subroutines. If there are any subroutines that you see are needed, feel free to raise an issue, and I will be glad to guide you through the process of making a pull request to add that feature.

Additionally, since cinful primarily works through snakemake, it can also be used by simply running the snakefiles separately, so if additional configuration is needed, in terms of the types of input files, this can probably be achieved that way.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cinful-1.1.7.tar.gz (1.6 MB view details)

Uploaded Source

Built Distribution

cinful-1.1.7-py3-none-any.whl (1.6 MB view details)

Uploaded Python 3

File details

Details for the file cinful-1.1.7.tar.gz.

File metadata

  • Download URL: cinful-1.1.7.tar.gz
  • Upload date:
  • Size: 1.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for cinful-1.1.7.tar.gz
Algorithm Hash digest
SHA256 efc5f1c79a25c520e2dd8a7038a2784fb512ff7027f4efe806c2e8d4b34d96b8
MD5 7f10f04b290ed5bb2f39204bc1360e7c
BLAKE2b-256 e20aacc1796bdee9916d285775dfdbc7a613ff8d10ffa5126fae1149391e681e

See more details on using hashes here.

File details

Details for the file cinful-1.1.7-py3-none-any.whl.

File metadata

  • Download URL: cinful-1.1.7-py3-none-any.whl
  • Upload date:
  • Size: 1.6 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for cinful-1.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 a02a8a88155843d20d57b705f0d2ccaf9343d25c801413f3d2f8292e814087dc
MD5 bdebcf087cf08dd64b6ffd4bd9ad01ab
BLAKE2b-256 a147f2f6e28832fe9eaa0c30f10db7a2472bd90b669ebfb1647bb1bef331cbf8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page