Skip to main content

a pipeline to construct a genome catalogue from metagenomics data

Project description

metapi

hello, metagenomics!

brother project

motivation

we all need a metagenomics pipeline for academic research.

principle

design

  • execution module

    # Snakefile
        include: "rules/step.smk"
        include: "rules/simulation.smk"
        include: "rules/fastqc.smk"
        include: "rules/trimming.smk"
        include: "rules/rmhost.smk"
        include: "rules/assembly.smk"
        include: "rules/alignment.smk"
        include: "rules/binning.smk"
        include: "rules/cobinning.smk"
        include: "rules/checkm.smk"
        include: "rules/dereplication.smk"
        include: "rules/classification.smk"
        include: "rules/annotation.smk"
        include: "rules/profilling.smk"
    
  • analysis module

    • raw data report
    • quality control
    • remove host sequences
    • assembly
    • assembly evaluation
    • binning
    • checkm
    • dereplication
    • bins profile
    • taxonomy classification
    • genome annotation
    • function annotation
  • test module

    • execution test
    • analysis test

install

  • install dependencies*

    # in python3 environment
    conda install snakemake pigz ncbi-genome-download sickle-trim fastp bwa samtools \
                  bbmap spades idba megahit maxbin2 prokka metabat2 drep quast checkm-genome
    pip install insilicoseq
    
    # in python2 envrionment
    conda install metaphlan2
    
    # database configuration
    wget https://data.ace.uq.edu.au/public/CheckM_databases/checkm_data_2015_01_16.tar.gz
    mkdir checkm_data
    cd checkm_data
    tar -xzvf ../checkm_data_2015_01_16.tar.gz
    cd ..
    ln -s checkm_data checkm_data_latest
    
    # activate python3 environment where checkm in
    checkm data setRoot checkm_data_latest
    
  • install metapi

    # recommand
    git clone https://github.com/ohmeta/metapi
    # or (maybe not latest)
    pip install metapi
    

example

  • snakemake了解一下:)

    rule bwa_mem:
        input:
            r1 = "fastq/sample_1.fq.gz",
            r2 = "fastq/sample_2.fq.gz",
            ref = "ref/ref.index
        output:
            bam = "sample.sort.bam",
            stat = "sample_flagstat.txt"
        threads:
            8
        shell:
            '''
            bwa mem -t {threads} {input.ref} {input.r1} {input.r2} | \
            samtools view -@{threads} -hbS - | \
            tee >(samtools flagstat -@{threads} - > {output.stat}) | \
            samtools sort -@{threads} -o {output.bam} -
            '''
    
  • a simulated metagenomics data test(uncomplete)

    # in metapi/example/basic_test directory
    cd example/basic_test
    
    # look
    snakemake --dag | dot -Tsvg > dat.svg
    
    # run on local
    snakemake
    
    # run on SGE cluster
    snakemake \
    --jobs 80 \
    --cluster "qsub -S /bin/bash -cwd -q {queue} -P {project} -l vf={mem},p={cores} -binding linear:{cores}"
    
  • a real world metagenomics data process(uncomplete)

    # in metapipe directory
    # look
    cd metapi
    snakemake --dag | dot -Tsvg > ../docs/dat.svg
    
    # run on local
    snakemake \
    --cores 8 \
    --snakefile metapi/Snakefile \
    --configfile metapi/metaconfig.yaml \
    --until all
    
    # run on SGE cluster
    snakemake \
    --snakefile metapi/Snakefile \
    --configfile metapi/metaconfig.yaml \
    --cluster-config metapi/metacluster.yaml \
    --jobs 80 \
    --cluster "qsub -S /bin/bash -cwd -q {cluster.queue} -P {cluster.project} -l vf={cluster.mem},p={cluster.cores} -binding linear:{cluster.cores} -o {cluster.output} -e {cluster.error}"
    --latency-wait 360 \
    --until all
    

metapi command line interface

  • init

    metapi --help
    
    usage: metapi [subcommand] [options]
    
    metapi, a metagenomics data process pipeline
    
    optional arguments:
        -h, --help     show this help message and exit
        -v, --version  print software version and exit
    
    available subcommands:
    
        init         a metagenomics project initialization
        simulation   a simulation on metagenomics data
        workflow     a workflow on real metagenomics data
    

    please supply samples.tsv
    format

    id fq1 fq2
    s1 s1.1.fq.gz s1.2.fq.gz
    s2 s2.1.fq.gz s2.2.fq.gz
    python /path/to/metapi/metapi/metapi.py init -d . -s samples.tsv -b raw -a metaspades
    
  • list

    snakemake --snakefile /path/to/metapi/metapi/Snakefile --configfile metaconfig.yaml --list
    
  • debug

    snakemake --snakefile /path/to/metapi/metapi/Snakefile \
        --configfile metaconfig.yaml \
        -p -r -n --debug-dag \
        --until checkm_lineage_wf
    
  • simulation

  • workflow

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

metapi-0.6.4.tar.gz (5.1 MB view details)

Uploaded Source

Built Distribution

metapi-0.6.4-py3-none-any.whl (43.1 kB view details)

Uploaded Python 3

File details

Details for the file metapi-0.6.4.tar.gz.

File metadata

  • Download URL: metapi-0.6.4.tar.gz
  • Upload date:
  • Size: 5.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.9.0 requests-toolbelt/0.9.1 tqdm/4.28.1 CPython/3.7.3

File hashes

Hashes for metapi-0.6.4.tar.gz
Algorithm Hash digest
SHA256 859b6b046dd561496f5e82b8f901ffb76ada59d85ea333f5e33e0d0a7ec38a30
MD5 9ab61df7f3c2a70fe81be810d9670501
BLAKE2b-256 05b3d82260d4ab4969713daa313fb3c2884b44da61acb7b078cbab09c5c7b82f

See more details on using hashes here.

File details

Details for the file metapi-0.6.4-py3-none-any.whl.

File metadata

  • Download URL: metapi-0.6.4-py3-none-any.whl
  • Upload date:
  • Size: 43.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.9.0 requests-toolbelt/0.9.1 tqdm/4.28.1 CPython/3.7.3

File hashes

Hashes for metapi-0.6.4-py3-none-any.whl
Algorithm Hash digest
SHA256 20a2eb9877bafb9b17a42fa89faa5b170f717a4f48d1031d55ceb772f61877b7
MD5 551c8febf5890228a81fd1e76a234274
BLAKE2b-256 3bbe69bed4732b463eeb3abc6f7fb07265b72dfe98b3f782d63509a983578ca9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page