Skip to main content

cinful: A fully automated pipeline to identify microcinswith associated immunity proteins and export machinery

Project description

cinful: an in-silico microcin identification pipeline

cinful reads a directory of genome data and identifies class IIb microcins using a combination of HMM and BLAST. It has functionality that identifies the associated export machinery (MFP & PCAT) and putative immunity protein. Publication of this work is forthcoming and will be cited here.

cinful is developed by the Wilke lab at the Department of Integrative Biology in collaboration with the Davies lab at the Department of Molecular Biosciences, both at The University of Texas at Austin.

Installation

There are two methods for installation; one uses pip and should be more user friendly.

Installation from PyPI (recommended)

The following includes steps to install dependencies.

Setup conda environment (includes python and pip):

$ conda create --name <your-env-name> python=3.8.13 pip
$ conda activate <your-env-name>

Install other dependencies:

$ conda install mamba -c conda-forge
$ pip install cinful
$ cinful_init

Dependencies installed with $ cinful_init

  • seqkit=0.15.0
  • mafft=7.475
  • hmmer=3.3.1
  • blast=2.9.0
  • diamond=2.0.11
  • pandas=1.2.4
  • numpy=1.19.2
  • biopython=1.76
  • snakemake=6.3.0
  • prodigal=2.6.3
  • pyhmmer=0.3.0
PyPI dependencies:
  • pyTMHMM==1.3.2
  • seqhash==1.0.0
  • blake3==0.2.0

If installed properly, running cinful -h will produce the following output:

cinful

optional arguments:
  -h, --help            show this help message and exit
  -d DIRECTORY, --directory DIRECTORY
                        Must be a directory containing uncompressed FASTA 
                        formatted genome assemblies with .fna extension. 
                        Files within nested directories are fine
  -o OUTDIR, --outDir OUTDIR
                        This directory will contain all output files. 
                        It will be nested under the input directory.
  -t THREADS, --threads THREADS
                        This specifies how many threads to allow snakemake 
                        to have access to for parallelization

Installation test

I am working on a test to verify installation. As a workaround, you are able to download a test genome that contains microcin, MFP, PCAT, and immunity protein from https://github.com/wilkelab/cinful/blob/main/test/.

Once you've downloaded the test file, you can run cinful on the contents and compare the output to the results stored in the directory cinful_out.

Usage notes

cinful takes a directory containing genome assemblies as input. All assemblies in the directory must contain the extension .fna. If they end in a different extension, they will be ignored.

Nested directories will explored recursively, and all \*.fna files will be analyzed by cinful. Nested directories can be a good way to explore output, as the directory tree will be stored as a part of the cinful_id in the output files.

Snakemake is the core workflow management used by cinful -- the main snakefile is located under cinful/Snakefile, which issues subroutines located in cinful/rules.

cinful has been tested on Linux and MacOS.

Workflow

With cinful, the following workflow will be executed. cinful

Three output directories will be generated in your --directory <assembly_directory> under a directory called cinful_out (or an -outDir of your choosing):

00_dbs

  • This is the initial location of the databases of verified microcins, CvaB, and immunity proteins. 01_orf_homology
  • Prodigal will generate Open Reading Frame (ORF) predictions for the input assemblies
  • Those ORFs will be searched against the previously mentioned databases 02_homology_results
  • The results from all the homology searches will be merged here 03_best_hits
  • The top hits from the homology results will be placed here

Running from source (not recommended)

Clone this repository:

git clone https://github.com/wilkelab/cinful.git

All software dependencies needed to run cinful are available through conda and are specified in cinful_conda.yml, the following helper script can be used to generate the cinful conda environment scripts/build_conda_env.sh, to run this script, you will need to have conda installed, as well as mamba (which helps speed up installation). To install mamba, use the following command:

conda install mamba -c conda-forge

To build the environment, run:

bash scripts/build_conda_env.sh

Once setup is complete, you can activate the environment with:

conda activate cinful

There is a test dataset with an E. coli genome assembly to test cinful on under test/colcinV_Ecoli, you can run cinful on this dataset by running the following from the initial cinful directory:

python path/to/cinful.py -d <genomes_directory> -o <output_directory> -t <threads>

Contributing

cinful currently exists as a wrapper to a series of snakemake subroutines, so adding functionality to it is as simple as adding additional subroutines. If there are any subroutines that you see are needed, feel free to raise an issue, and I will be glad to guide you through the process of making a pull request to add that feature.

Additionally, since cinful primarily works through snakemake, it can also be used by simply running the snakefiles separately, so if additional configuration is needed, in terms of the types of input files, this can probably be achieved that way.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cinful-1.2.1.tar.gz (34.3 kB view details)

Uploaded Source

Built Distribution

cinful-1.2.1-py3-none-any.whl (37.0 kB view details)

Uploaded Python 3

File details

Details for the file cinful-1.2.1.tar.gz.

File metadata

  • Download URL: cinful-1.2.1.tar.gz
  • Upload date:
  • Size: 34.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for cinful-1.2.1.tar.gz
Algorithm Hash digest
SHA256 bfe71ad78b5632ae496357e62c9c213f3e4020a0ec84a5ea18ac808f39502c9d
MD5 34f2651f879bc5034bd00519c891bb66
BLAKE2b-256 90d44120f036019d09767b8b53ba8d2aa45d8e98c96f905f71bb1ee2122c8119

See more details on using hashes here.

File details

Details for the file cinful-1.2.1-py3-none-any.whl.

File metadata

  • Download URL: cinful-1.2.1-py3-none-any.whl
  • Upload date:
  • Size: 37.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for cinful-1.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 3d55b74a3b28f087ac1ddf00d4e25295ff62122ce4d5b3d6904464060d874d0d
MD5 bedd9a97776e6783b32acc513b3dacfc
BLAKE2b-256 3fd156effe18425817bb4f4658d5a1bcae1a776fb9859cd15fde7848c1230249

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page