a pipeline to construct a genome catalogue from metagenomics data
Project description
metapi
hello, metagenomics!
brother project
motivation
we all need a metagenomics pipeline for academic research.
principle
- bind intelligense together
- github
- why we here?
- do not make wheels
- make full use of pipeline execution engine
- make full use of awesome bioinformatics tools
- robust and module, extensible, update
- one rule, one module
- one module, one analysis
- welcome to PR
design
-
execution module
# Snakefile include: "rules/step.smk" include: "rules/simulation.smk" include: "rules/fastqc.smk" include: "rules/trimming.smk" include: "rules/rmhost.smk" include: "rules/assembly.smk" include: "rules/alignment.smk" include: "rules/binning.smk" include: "rules/cobinning.smk" include: "rules/checkm.smk" include: "rules/dereplication.smk" include: "rules/classification.smk" include: "rules/annotation.smk" include: "rules/profilling.smk"
-
analysis module
- raw data report
- quality control
- remove host sequences
- assembly
- assembly evaluation
- binning
- checkm
- dereplication
- bins profile
- taxonomy classification
- genome annotation
- function annotation
-
test module
- execution test
- analysis test
install
-
install dependencies*
- snakemake
- pigz
- ncbi-genome-download
- InSilicoSeq
- OAFilter
- sickle
- fastp
- MultiQC
- bwa
- samtools
- spades
- idba
- megahit
- quast
- MetaBat
- MaxBin2
- CheckM
- drep
- prokka
- metaphlan2
# in python3 environment conda install snakemake pigz ncbi-genome-download sickle-trim fastp bwa samtools \ bbmap spades idba megahit maxbin2 prokka metabat2 drep quast checkm-genome pip install insilicoseq # in python2 envrionment conda install metaphlan2 # database configuration wget https://data.ace.uq.edu.au/public/CheckM_databases/checkm_data_2015_01_16.tar.gz mkdir checkm_data cd checkm_data tar -xzvf ../checkm_data_2015_01_16.tar.gz cd .. ln -s checkm_data checkm_data_latest # activate python3 environment where checkm in checkm data setRoot checkm_data_latest
-
install metapi
# recommand git clone https://github.com/ohmeta/metapi # or (maybe not latest) pip install metapi
example
-
snakemake了解一下:)
rule bwa_mem: input: r1 = "fastq/sample_1.fq.gz", r2 = "fastq/sample_2.fq.gz", ref = "ref/ref.index output: bam = "sample.sort.bam", stat = "sample_flagstat.txt" threads: 8 shell: ''' bwa mem -t {threads} {input.ref} {input.r1} {input.r2} | \ samtools view -@{threads} -hbS - | \ tee >(samtools flagstat -@{threads} - > {output.stat}) | \ samtools sort -@{threads} -o {output.bam} - '''
-
a simulated metagenomics data test(uncomplete)
# in metapi/example/basic_test directory cd example/basic_test # look snakemake --dag | dot -Tsvg > dat.svg
# run on local snakemake # run on SGE cluster snakemake \ --jobs 80 \ --cluster "qsub -S /bin/bash -cwd -q {queue} -P {project} -l vf={mem},p={cores} -binding linear:{cores}"
-
a real world metagenomics data process(uncomplete)
# in metapipe directory # look cd metapi snakemake --dag | dot -Tsvg > ../docs/dat.svg
# run on local snakemake \ --cores 8 \ --snakefile metapi/Snakefile \ --configfile metapi/metaconfig.yaml \ --until all # run on SGE cluster snakemake \ --snakefile metapi/Snakefile \ --configfile metapi/metaconfig.yaml \ --cluster-config metapi/metacluster.yaml \ --jobs 80 \ --cluster "qsub -S /bin/bash -cwd -q {cluster.queue} -P {cluster.project} -l vf={cluster.mem},p={cluster.cores} -binding linear:{cluster.cores} -o {cluster.output} -e {cluster.error}" --latency-wait 360 \ --until all
metapi command line interface
-
init
metapi --help usage: metapi [subcommand] [options] metapi, a metagenomics data process pipeline optional arguments: -h, --help show this help message and exit -v, --version print software version and exit available subcommands: init a metagenomics project initialization simulation a simulation on metagenomics data workflow a workflow on real metagenomics data
please supply samples.tsv
formatid fq1 fq2 s1 s1.1.fq.gz s1.2.fq.gz s2 s2.1.fq.gz s2.2.fq.gz python /path/to/metapi/metapi/metapi.py init -d . -s samples.tsv -b raw -a metaspades
-
list
snakemake --snakefile /path/to/metapi/metapi/Snakefile --configfile metaconfig.yaml --list
-
debug
snakemake --snakefile /path/to/metapi/metapi/Snakefile \ --configfile metaconfig.yaml \ -p -r -n --debug-dag \ --until checkm_lineage_wf
-
simulation
-
workflow
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
metapi-0.6.4.tar.gz
(5.1 MB
view details)
Built Distribution
metapi-0.6.4-py3-none-any.whl
(43.1 kB
view details)
File details
Details for the file metapi-0.6.4.tar.gz
.
File metadata
- Download URL: metapi-0.6.4.tar.gz
- Upload date:
- Size: 5.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.9.0 requests-toolbelt/0.9.1 tqdm/4.28.1 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 859b6b046dd561496f5e82b8f901ffb76ada59d85ea333f5e33e0d0a7ec38a30 |
|
MD5 | 9ab61df7f3c2a70fe81be810d9670501 |
|
BLAKE2b-256 | 05b3d82260d4ab4969713daa313fb3c2884b44da61acb7b078cbab09c5c7b82f |
File details
Details for the file metapi-0.6.4-py3-none-any.whl
.
File metadata
- Download URL: metapi-0.6.4-py3-none-any.whl
- Upload date:
- Size: 43.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.9.0 requests-toolbelt/0.9.1 tqdm/4.28.1 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 20a2eb9877bafb9b17a42fa89faa5b170f717a4f48d1031d55ceb772f61877b7 |
|
MD5 | 551c8febf5890228a81fd1e76a234274 |
|
BLAKE2b-256 | 3bbe69bed4732b463eeb3abc6f7fb07265b72dfe98b3f782d63509a983578ca9 |