Skip to main content

a metagenomics data processing pipeline to help research

Project description

metapi

hello, metagenomics!

brother project

motivation

we all need a metagenomics pipeline for academic research.

principle

design

  • execution module

    # Snakefile
        include: "rules/step.smk"
        include: "rules/simulation.smk"
        include: "rules/fastqc.smk"
        include: "rules/trimming.smk"
        include: "rules/rmhost.smk"
        include: "rules/assembly.smk"
        include: "rules/alignment.smk"
        include: "rules/binning.smk"
        include: "rules/checkm.smk"
        include: "rules/dereplication.smk"
        include: "rules/classification.smk"
        include: "rules/annotation.smk"
        include: "rules/profilling.smk"
    
  • analysis module

    • raw data report
    • quality control
    • remove host sequences
    • assembly
    • assembly evaluation
    • binning
    • checkm
    • dereplication
    • bins profile
    • taxonomy classification
    • genome annotation
    • function annotation
  • test module

    • execution test
    • analysis test

install

  • install dependencies*

    # in python3 environment
    conda install snakemake pigz ncbi-genome-download sickle-trim fastp bwa samtools bbmap spades idba megahit maxbin2 prokka
    conda install -c ursky metabat2
    pip install drep insilicoseq
    
    # in python2 envrionment
    conda install quast checkm-genome metaphlan2
    
    # database configuration
    wget https://data.ace.uq.edu.au/public/CheckM_databases/checkm_data_2015_01_16.tar.gz
    mkdir checkm_data
    cd checkm_data
    tar -xzvf ../checkm_data_2015_01_16.tar.gz
    cd ..
    ln -s checkm_data checkm_data_latest
    
    # activate python2 environment where checkm in
    checkm data setRoot checkm_data_latest
    
  • install metapipe

    git clone https://github.com/ohmeta/metapi
    

example

  • snakemake了解一下:)

    rule bwa_mem:
    input:
        r1 = "fastq/sample_1.fq.gz",
        r2 = "fastq/sample_2.fq.gz",
        ref = "ref/ref.index
    output:
        bam = "sample.sort.bam",
        stat = "sample_flagstat.txt"
    params:
        bwa_t = 8,
        samtools_t = 8
    shell:
        "bwa mem -t {params.bwa_t} {input.ref} {input.r1} {input.r2} | "
        "samtools view -@{params.samtools_t} -hbS - | "
        "tee >(samtools flagstat -@{params.samtools_t} - > {output.stat}) | "
        "samtools sort -@{params.samtools_t} -o {output.bam} -"
    
  • a simulated metagenomics data test(uncomplete)

    # in metapi/example/basic_test directory
    cd example/basic_test
    
    # look
    snakemake --dag | dot -Tsvg > dat.svg
    
    # run on local
    snakemake
    
    # run on SGE cluster
    snakemake --jobs 80 --cluster "qsub -S /bin/bash -cwd -q {queue} -P {project_id} -l vf=8G,p=8"
    
  • a real world metagenomics data process(uncomplete)

    # in metapipe directory
    # look
    cd metapi
    snakemake --dag | dot -Tsvg > ../docs/dat.svg
    
    # run on local
    snakemake --snakefile metapi/Snakefile --configfile metapi/metaconfig.yaml
    
    # run on SGE cluster
    snakemake --snakefile metapi/Snakefile --configfile metapi/metaconfig.yaml --cores 32 --jobs 80 --cluster "qsub -S /bin/bash -cwd -q {queue} -P {project_id} -l vf=8G,p=8"
    

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

metapi-0.1.3.tar.gz (14.4 kB view hashes)

Uploaded Source

Built Distribution

metapi-0.1.3-py3-none-any.whl (17.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page