a metagenomics data processing pipeline to help research
Project description
metapi
hello, metagenomics!
brother project
motivation
we all need a metagenomics pipeline for academic research.
principle
- bind intelligense together
- github
- why we here?
- do not make wheels
- make full use of pipeline execution engine
- make full use of awesome bioinformatics tools
- robust and module, extensible, update
- one rule, one module
- one module, one analysis
- welcome to PR
design
-
execution module
# Snakefile include: "rules/step.smk" include: "rules/simulation.smk" include: "rules/fastqc.smk" include: "rules/trimming.smk" include: "rules/rmhost.smk" include: "rules/assembly.smk" include: "rules/alignment.smk" include: "rules/binning.smk" include: "rules/checkm.smk" include: "rules/dereplication.smk" include: "rules/classification.smk" include: "rules/annotation.smk" include: "rules/profilling.smk"
-
analysis module
- raw data report
- quality control
- remove host sequences
- assembly
- assembly evaluation
- binning
- checkm
- dereplication
- bins profile
- taxonomy classification
- genome annotation
- function annotation
-
test module
- execution test
- analysis test
install
-
install dependencies*
- snakemake
- pigz
- ncbi-genome-download
- InSilicoSeq
- OAFilter
- sickle
- fastp
- MultiQC
- bwa
- samtools
- bbmap
- spades
- idba
- megahit
- quast
- MetaBat
- MaxBin2
- CheckM
- drep
- prokka
- metaphlan2
# in python3 environment conda install snakemake pigz ncbi-genome-download sickle-trim fastp bwa samtools bbmap spades idba megahit maxbin2 prokka conda install -c ursky metabat2 pip install drep insilicoseq # in python2 envrionment conda install quast checkm-genome metaphlan2 # database configuration wget https://data.ace.uq.edu.au/public/CheckM_databases/checkm_data_2015_01_16.tar.gz mkdir checkm_data cd checkm_data tar -xzvf ../checkm_data_2015_01_16.tar.gz cd .. ln -s checkm_data checkm_data_latest # activate python2 environment where checkm in checkm data setRoot checkm_data_latest
-
install metapipe
git clone https://github.com/ohmeta/metapi
example
-
snakemake了解一下:)
rule bwa_mem: input: r1 = "fastq/sample_1.fq.gz", r2 = "fastq/sample_2.fq.gz", ref = "ref/ref.index output: bam = "sample.sort.bam", stat = "sample_flagstat.txt" params: bwa_t = 8, samtools_t = 8 shell: "bwa mem -t {params.bwa_t} {input.ref} {input.r1} {input.r2} | " "samtools view -@{params.samtools_t} -hbS - | " "tee >(samtools flagstat -@{params.samtools_t} - > {output.stat}) | " "samtools sort -@{params.samtools_t} -o {output.bam} -"
-
a simulated metagenomics data test(uncomplete)
# in metapi/example/basic_test directory cd example/basic_test # look snakemake --dag | dot -Tsvg > dat.svg
# run on local snakemake # run on SGE cluster snakemake --jobs 80 --cluster "qsub -S /bin/bash -cwd -q {queue} -P {project_id} -l vf=8G,p=8"
-
a real world metagenomics data process(uncomplete)
# in metapipe directory # look cd metapi snakemake --dag | dot -Tsvg > ../docs/dat.svg
# run on local snakemake --snakefile metapi/Snakefile --configfile metapi/metaconfig.yaml # run on SGE cluster snakemake --snakefile metapi/Snakefile --configfile metapi/metaconfig.yaml --cores 32 --jobs 80 --cluster "qsub -S /bin/bash -cwd -q {queue} -P {project_id} -l vf=8G,p=8"
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
metapi-0.1.3.tar.gz
(14.4 kB
view hashes)
Built Distribution
metapi-0.1.3-py3-none-any.whl
(17.9 kB
view hashes)