Skip to main content

a pipeline to construct a genome catalogue from metagenomics data

Project description

metapi

hello, metagenomics!

brother project

motivation

we all need a metagenomics pipeline for academic research.

principle

design

  • execution module

    # Snakefile
        include: "rules/step.smk"
        include: "rules/simulation.smk"
        include: "rules/fastqc.smk"
        include: "rules/trimming.smk"
        include: "rules/rmhost.smk"
        include: "rules/assembly.smk"
        include: "rules/alignment.smk"
        include: "rules/binning.smk"
        include: "rules/cobinning.smk"
        include: "rules/checkm.smk"
        include: "rules/dereplication.smk"
        include: "rules/classification.smk"
        include: "rules/annotation.smk"
        include: "rules/profilling.smk"
    
  • analysis module

    • raw data report
    • quality control
    • remove host sequences
    • assembly
    • assembly evaluation
    • binning
    • checkm
    • dereplication
    • bins profile
    • taxonomy classification
    • genome annotation
    • function annotation
  • test module

    • execution test
    • analysis test

install

  • install dependencies*

    # in python3 environment
    conda install snakemake pigz ncbi-genome-download sickle-trim fastp bwa samtools \
                  bbmap spades idba megahit maxbin2 prokka metabat2 drep quast checkm-genome
    pip install insilicoseq
    
    # in python2 envrionment
    conda install metaphlan2
    
    # database configuration
    wget https://data.ace.uq.edu.au/public/CheckM_databases/checkm_data_2015_01_16.tar.gz
    mkdir checkm_data
    cd checkm_data
    tar -xzvf ../checkm_data_2015_01_16.tar.gz
    cd ..
    ln -s checkm_data checkm_data_latest
    
    # activate python3 environment where checkm in
    checkm data setRoot checkm_data_latest
    
  • install metapi

    # recommand
    git clone https://github.com/ohmeta/metapi
    # or (maybe not latest)
    pip install metapi
    

example

  • snakemake了解一下:)

    rule bwa_mem:
        input:
            r1 = "fastq/sample_1.fq.gz",
            r2 = "fastq/sample_2.fq.gz",
            ref = "ref/ref.index
        output:
            bam = "sample.sort.bam",
            stat = "sample_flagstat.txt"
        threads:
            8
        shell:
            '''
            bwa mem -t {threads} {input.ref} {input.r1} {input.r2} | \
            samtools view -@{threads} -hbS - | \
            tee >(samtools flagstat -@{threads} - > {output.stat}) | \
            samtools sort -@{threads} -o {output.bam} -
            '''
    
  • a simulated metagenomics data test(uncomplete)

    # in metapi/example/basic_test directory
    cd example/basic_test
    
    # look
    snakemake --dag | dot -Tsvg > dat.svg
    
    # run on local
    snakemake
    
    # run on SGE cluster
    snakemake \
    --jobs 80 \
    --cluster "qsub -S /bin/bash -cwd -q {queue} -P {project} -l vf={mem},p={cores} -binding linear:{cores}"
    
  • a real world metagenomics data process(uncomplete)

    # in metapipe directory
    # look
    cd metapi
    snakemake --dag | dot -Tsvg > ../docs/dat.svg
    
    # run on local
    snakemake \
    --cores 8 \
    --snakefile metapi/Snakefile \
    --configfile metapi/metaconfig.yaml \
    --until all
    
    # run on SGE cluster
    snakemake \
    --snakefile metapi/Snakefile \
    --configfile metapi/metaconfig.yaml \
    --cluster-config metapi/metacluster.yaml \
    --jobs 80 \
    --cluster "qsub -S /bin/bash -cwd -q {cluster.queue} -P {cluster.project} -l vf={cluster.mem},p={cluster.cores} -binding linear:{cluster.cores} -o {cluster.output} -e {cluster.error}"
    --latency-wait 360 \
    --until all
    

metapi command line interface

  • init

    metapi --help
    
    usage: metapi [subcommand] [options]
    
    metapi, a metagenomics data process pipeline
    
    optional arguments:
        -h, --help     show this help message and exit
        -v, --version  print software version and exit
    
    available subcommands:
    
        init         a metagenomics project initialization
        simulation   a simulation on metagenomics data
        workflow     a workflow on real metagenomics data
    

    please supply samples.tsv
    format
    | id | fq1 | fq2 |
    |----|------------|------------|
    | s1 | s1.1.fq.gz | s1.2.fq.gz |
    | s2 | s2.1.fq.gz | s2.2.fq.gz |

    python /path/to/metapi/metapi/metapi.py init -d . -s samples.tsv -b raw -a metaspades
    
  • list

    snakemake --snakefile /path/to/metapi/metapi/Snakefile --configfile metaconfig.yaml --list
    
  • debug

    snakemake --snakefile /path/to/metapi/metapi/Snakefile \
        --configfile metaconfig.yaml \
        -p -r -n --debug-dag \
        --until checkm_lineage_wf
    
  • simulation

  • workflow

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for metapi, version 0.6.4
Filename, size File type Python version Upload date Hashes
Filename, size metapi-0.6.4-py3-none-any.whl (43.1 kB) File type Wheel Python version py3 Upload date Hashes View hashes
Filename, size metapi-0.6.4.tar.gz (5.1 MB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page