Skip to main content

a metagenomics data processing pipeline to help research

Project description

metapi

hello, metagenomics!

brother project

motivation

we all need a metagenomics pipeline for academic research.

principle

design

  • execution module

    # Snakefile
        include: "rules/step.smk"
        include: "rules/simulation.smk"
        include: "rules/fastqc.smk"
        include: "rules/trimming.smk"
        include: "rules/rmhost.smk"
        include: "rules/assembly.smk"
        include: "rules/alignment.smk"
        include: "rules/binning.smk"
        include: "rules/checkm.smk"
        include: "rules/dereplication.smk"
        include: "rules/classification.smk"
        include: "rules/annotation.smk"
        include: "rules/profilling.smk"
    
  • analysis module

    • raw data report
    • quality control
    • remove host sequences
    • assembly
    • assembly evaluation
    • binning
    • checkm
    • dereplication
    • bins profile
    • taxonomy classification
    • genome annotation
    • function annotation
  • test module

    • execution test
    • analysis test

install

  • install dependencies*

    # in python3 environment
    conda install snakemake pigz ncbi-genome-download sickle-trim fastp bwa samtools bbmap spades idba megahit maxbin2 prokka
    conda install -c ursky metabat2
    pip install drep insilicoseq
    
    # in python2 envrionment
    conda install quast checkm-genome metaphlan2
    
    # database configuration
    wget https://data.ace.uq.edu.au/public/CheckM_databases/checkm_data_2015_01_16.tar.gz
    mkdir checkm_data
    cd checkm_data
    tar -xzvf ../checkm_data_2015_01_16.tar.gz
    cd ..
    ln -s checkm_data checkm_data_latest
    
    # activate python2 environment where checkm in
    checkm data setRoot checkm_data_latest
    
  • install metapipe

    git clone https://github.com/ohmeta/metapi
    

example

  • snakemake了解一下:)

    rule bwa_mem:
    input:
        r1 = "fastq/sample_1.fq.gz",
        r2 = "fastq/sample_2.fq.gz",
        ref = "ref/ref.index
    output:
        bam = "sample.sort.bam",
        stat = "sample_flagstat.txt"
    params:
        bwa_t = 8,
        samtools_t = 8
    shell:
        "bwa mem -t {params.bwa_t} {input.ref} {input.r1} {input.r2} | "
        "samtools view -@{params.samtools_t} -hbS - | "
        "tee >(samtools flagstat -@{params.samtools_t} - > {output.stat}) | "
        "samtools sort -@{params.samtools_t} -o {output.bam} -"
    
  • a simulated metagenomics data test(uncomplete)

    # in metapi/example/basic_test directory
    cd example/basic_test
    
    # look
    snakemake --dag | dot -Tsvg > dat.svg
    
    # run on local
    snakemake
    
    # run on SGE cluster
    snakemake --jobs 80 --cluster "qsub -S /bin/bash -cwd -q {queue} -P {project_id} -l vf=8G,p=8"
    
  • a real world metagenomics data process(uncomplete)

    # in metapipe directory
    # look
    cd metapi
    snakemake --dag | dot -Tsvg > ../docs/dat.svg
    
    # run on local
    snakemake --snakefile metapi/Snakefile --configfile metapi/metaconfig.yaml
    
    # run on SGE cluster
    snakemake --snakefile metapi/Snakefile --configfile metapi/metaconfig.yaml --cores 32 --jobs 80 --cluster "qsub -S /bin/bash -cwd -q {queue} -P {project_id} -l vf=8G,p=8"
    

Release history Release notifications

This version
History Node

0.1.3

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help File type Python version Upload date
metapi-0.1.3-py3-none-any.whl (17.9 kB) Copy SHA256 hash SHA256 Wheel py3
metapi-0.1.3.tar.gz (14.4 kB) Copy SHA256 hash SHA256 Source None

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page