Skip to main content

cinful: A fully automated pipeline to identify microcinsalong with their associated immunity proteins and export machinery

Project description

cinful

example workflow

A fully automated pipeline to identify microcins along with their associated immunity proteins and export machinery

Installation

First, make sure to clone this repository:

git clone https://github.com/tijeco/cinful.git

All software dependencies needed to run cinful are available through conda and are specified in cinful_conda.yml, the following helper script can be used to generate the cinful conda environment scripts/build_conda_env.sh, to run this script, you will need to have conda installed, as well as mamba (which helps speed up installation). To install mamba, use the following command:

conda install mamba -c conda-forge

Then simply run

bash scripts/build_conda_env.sh

to set up the cinful environment. You can activate the environment with

conda activate cinful

How to use

cinful takes a directory containing genome assemblies as input. All assemblies in the directory must end in .fna, if they end in a different extension, cinful will ignore them.

Snakemake is the core workflow management used by cinful, the main snakefile is located under cinful/Snakefile, which issues subroutines located in cinful/rules. To run cinful on your data set run the following command:

snakemake -d <assembly_directory> --threads <core_nums> --snakefile path/to/cinful/Snakefile

Workflow

The following workflow will be executed. cinful

Three output directories will be generated in your assembly_directory under a directory called cinfulOut.

  • 00_dbs
    • This is the initial location of the databases of verified microcins, CvaB, and immunity proteins.
  • 01_orf_homology
    • Prodigal will generate Open Reading Frame (ORF) predictions for the input assemblies
    • Those ORFs will be searched against the previously mentioned databases
  • 02_homology_results
    • The results from all the homology searches will be merged here
  • 03_best_hits
    • The top hits from the homology results will be placed here

Example usage

There is a test dataset with an E. coli genome assembly to test cinful on under test/colcinV_Ecoli, you can run cinful on this dataset by running the following:

snakemake -d test/colcinV_Ecoli --snakemake 

Todo

Currently, cinful is executed via directly issuing a snakemake command, what I will do in the future is create a python package that acts as a wrapper for snakemake to ease potential configuration of certain parameters within the workflow.

Also, the pipeline currently runs end to end, though there may be cases where the user already has data for a certain part of the pipeline and would like to plug that in. Snakemake allows for that to be a possibility, so I will work to make a set of tutorials on how to do that through snakemake, and eventually the cinful python package will have options to do that as well.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cinful-0.1.0.tar.gz (29.2 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page