Skip to main content

Trim lots of metagenomics samples all at once.

Project description

install with bioconda GitHub last commit (branch) Unit tests Env builds codecov


Trim lots of metagenomics samples all at once.

Motivation

We keep writing pipelines that start with read trimming. Rather than copy-pasting code each time, this standalone Snaketool handles our trimming needs. The tool will collect sample names and files from a directory or TSV file, optionally remove host reads, and trim with your favourite read trimmer. Read trimming methods supported so far:

  • Fastp
  • Prinseq++
  • BBtools for Round A/B viral metagenomics
  • Filtlong + Rasusa for longreads

Install

Trimnami is still in development but can be easily installed with pip:

Easy install

pip install trimnami

Developer install

git clone https://github.com/beardymcjohnface/Trimnami.git
cd Trimnami/
pip install -e .

Test

Trimnami comes with inbuilt tests which you can run to check everything works fine.

# test fastp only (default method)
trimnami test

# test all SR methods
trimnami test fastp prinseq roundAB

# test all SR methods with host removal
trimnami testhost fastp prinseq roundAB

# test nanopore method (with host removal)
trimnami testnp

Usage

Trim reads with Fastp or Prinseq++

# Fastp (default)
trimnami run --reads reads/

# Prinseq++
trimnami run --reads reads/ prinseq

# Why not both!
trimnami run --reads reads/ fastp prinseq

Include host removal

trimnami run --reads reads/ --host host_genome.fasta

Longreads with host removal. Specify 'nanopore' for targets and use the appropriate minimap preset.

trimnami run \
    --reads reads/ \
    --host host_genome.fasta \
    --minimap map-ont \
    nanopore

Parsing samples with --reads

You can pass either a directory of reads or a TSV file to --reads.

  • Directory: Trimnami will infer sample names and _R1/_R2 pairs from the filenames.
  • TSV file: Trimnami expects 2 or 3 columns, with column 1 being the sample name and columns 2 and 3 the reads files.

More information and examples here

Configure trimming parameters

You can customise the trimming parameters via the config file. Copy the default config file.

trimnami config

Then edit the config file trimnami.out/trimnami.config.yaml in your favourite text editor. Run trimnami like normal, or point to your custom config file if you've moved it.

trimnami run ... --configfile /my/awesome/config.yaml

Outputs

Trimmed reads will be saved in various subfolders in the output directory. e.g. if trimming with Fastp or Prinseq++, trimmed reads will be in trimnami.out/fastp/ or trimnami.out/prinseq/. Paired reads will yield three files: The R1 and R2 paired reads, and any singletons from trimming or host removal. Subsampling will produce extra files of subsampled trimmed reads. Multiqc-fastqc reports for any runs will be available in trimnami.out/reports/

Example outputs

Click to expand

prinseq

trimnami.out/
└── prinseq
    ├── A13-04-182-06_TAGCTT.paired.R1.fastq.gz
    ├── A13-04-182-06_TAGCTT.paired.R2.fastq.gz
    ├── A13-04-182-06_TAGCTT.paired.S.fastq.gz
    ├── A13-12-250-06_GGCTAC.paired.R1.fastq.gz
    ├── A13-12-250-06_GGCTAC.paired.R2.fastq.gz
    ├── A13-12-250-06_GGCTAC.paired.S.fastq.gz
    └── A13-135-177-06_AGTTCC.single.fastq.gz

prinseq with fastqc reports

trimnami.out/
├── prinseq
│   ├── A13-04-182-06_TAGCTT.paired.R1.fastq.gz
│   ├── A13-04-182-06_TAGCTT.paired.R2.fastq.gz
│   ├── A13-04-182-06_TAGCTT.paired.S.fastq.gz
│   ├── A13-12-250-06_GGCTAC.paired.R1.fastq.gz
│   ├── A13-12-250-06_GGCTAC.paired.R2.fastq.gz
│   ├── A13-12-250-06_GGCTAC.paired.S.fastq.gz
│   └── A13-135-177-06_AGTTCC.single.fastq.gz
└── reports
    ├── prinseq.fastqc.html
    └── untrimmed.fastqc.html

prinseq with host removal

trimnami.out/
└── prinseq
    ├── A13-04-182-06_TAGCTT.host_rm.paired.R1.fastq.gz
    ├── A13-04-182-06_TAGCTT.host_rm.paired.R2.fastq.gz
    ├── A13-04-182-06_TAGCTT.host_rm.paired.S.fastq.gz
    ├── A13-12-250-06_GGCTAC.host_rm.paired.R1.fastq.gz
    ├── A13-12-250-06_GGCTAC.host_rm.paired.R2.fastq.gz
    ├── A13-12-250-06_GGCTAC.host_rm.paired.S.fastq.gz
    └── A13-135-177-06_AGTTCC.host_rm.single.fastq.gz

prinseq with host removal and subsampling

trimnami.out/
└── prinseq
    ├── A13-04-182-06_TAGCTT.host_rm.paired.R1.fastq.gz
    ├── A13-04-182-06_TAGCTT.host_rm.paired.R1.subsampled.fastq.gz
    ├── A13-04-182-06_TAGCTT.host_rm.paired.R2.fastq.gz
    ├── A13-04-182-06_TAGCTT.host_rm.paired.R2.subsampled.fastq.gz
    ├── A13-04-182-06_TAGCTT.host_rm.paired.S.fastq.gz
    ├── A13-04-182-06_TAGCTT.host_rm.paired.S.subsampled.fastq.gz
    ├── A13-12-250-06_GGCTAC.host_rm.paired.R1.fastq.gz
    ├── A13-12-250-06_GGCTAC.host_rm.paired.R1.subsampled.fastq.gz
    ├── A13-12-250-06_GGCTAC.host_rm.paired.R2.fastq.gz
    ├── A13-12-250-06_GGCTAC.host_rm.paired.R2.subsampled.fastq.gz
    ├── A13-12-250-06_GGCTAC.host_rm.paired.S.fastq.gz
    ├── A13-12-250-06_GGCTAC.host_rm.paired.S.subsampled.fastq.gz
    ├── A13-135-177-06_AGTTCC.host_rm.single.fastq.gz
    └── A13-135-177-06_AGTTCC.host_rm.single.subsampled.fastq.gz

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

trimnami-0.1.4.tar.gz (6.5 MB view details)

Uploaded Source

Built Distribution

trimnami-0.1.4-py3-none-any.whl (6.5 MB view details)

Uploaded Python 3

File details

Details for the file trimnami-0.1.4.tar.gz.

File metadata

  • Download URL: trimnami-0.1.4.tar.gz
  • Upload date:
  • Size: 6.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.19

File hashes

Hashes for trimnami-0.1.4.tar.gz
Algorithm Hash digest
SHA256 b98b6011ec0c898942a2c11047ec81fbf3ee2dc2ada2c999fd741739374a498f
MD5 c8a73a164bcd35452df6059d1e5a9397
BLAKE2b-256 8c40ada26a1ef6bcb1dab7cf45aa5894e9eac7117e2ffab03800c2d15aeb96e2

See more details on using hashes here.

File details

Details for the file trimnami-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: trimnami-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 6.5 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.19

File hashes

Hashes for trimnami-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 6601e37be7740c238bf1c1c70e25a1aab74f6cdc15ea366e2e7b89d2a24002d9
MD5 219973f06db3735f59499b97e0595086
BLAKE2b-256 32cb09f05f75781a5944ec48d428f8ca39668a5afe0809d01071c4473da0145b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page