Skip to main content

Soft-clip primer sites for SAM/BAM files generated from amplicon sequencing runs

Project description

align_trim

Stand alone version of ARTIC's fieldbioinformatics align_trim.py

Installation

From conda

conda install bioconda::align_trim 

from pypi

pip install align_trim

from source

git clone https://github.com/artic-network/align_trim.git
cd align_trim
uv sync
uv run align_trim --help

Command Line Interface

Basic Usage

align_trim [OPTIONS] BEDFILE

The tool reads alignment data from either a SAM/BAM file or stdin and outputs trimmed alignments to stdout in SAM format by default.

Required Arguments

  • BEDFILE: BED file containing the amplicon primer scheme in v3 format.

Optional Arguments

Input/Output Options

  • --samfile, -s : Sorted SAM/BAM file containing the aligned reads, if this is not provided (or '-') then 'align_trim' will read from stdin.
  • --output, -o : Output file path. Format determined by extension (.sam/.bam). If not provided or '-', writes SAM to stdout

Processing Options

  • --normalise, -n : Normalise to target depth N per amplicon using a greedy per-read algorithm. Each read is kept only if it brings the amplicon depth closer to the target. Use 0 for no normalisation (default: 0)
  • --min-mapq, -m : Minimum mapping quality to keep an aligned read (default: 20)
  • --primer-match-threshold, -p : Add this many bases of padding to the 5' end of primer coordinates to allow fuzzy matching for reads with barcodes/adapters (default: 35)

Primer and Read Handling

  • --no-trim-primers : Do not trim primers from reads (by default, primers are trimmed)
  • --allow-incorrect-pairs : Allow reads to be assigned to amplicons even if primers are not correctly paired
  • --require-full-length : Require all reads to start and stop in primer sites (do not use with rapid barcoding)

Output and Reporting

  • --report, -r : Output detailed report TSV to specified filepath
  • --amp-depth-report, -a : Output mean depth for each amplicon as TSV to specified filepath
  • --genome-coverage-report, -g : Output per-position genome coverage TSV(s) using the given prefix. Summary statistics (% genome covered at >=1x, >=10x, >=20x, >=100x) are printed to stderr. See Genome Coverage Report for details
  • --no-read-groups : Do not divide reads into pool-based read groups in SAM/BAM output

General Options

  • --verbose, -v : Enable debug mode with detailed logging to stderr
  • --version : Show version information
  • --help : Show help message

Examples

Basic trimming with primer removal

align_trim primers.bed --samfile input.bam --output trimmed.bam

Normalize coverage and generate reports

align_trim primers.bed --samfile input.bam --normalise 100 \
  --report alignment_report.tsv --amp-depth-report depth_report.tsv \
  --genome-coverage-report sample1 \
  --output normalized.bam

Process from stdin with verbose output

samtools view -h input.bam | align_trim primers.bed --verbose > trimmed.sam 2> verbose.out.txt

Strict full-length read filtering

align_trim primers.bed --samfile input.bam --require-full-length \
  --min-mapq 30 --output filtered.bam

Allow mismatched primer pairs with custom threshold

align_trim primers.bed --samfile input.bam --allow-incorrect-pairs \
  --primer-match-threshold 50 --output relaxed.bam

Output Formats

The tool supports multiple output formats based on file extension:

  • .sam - SAM format (text)
  • .bam - BAM format (binary, compressed)
  • No extension or - - SAM format to stdout

Report Files

When using --report, a tab-separated file is generated with the following columns:

  • chrom: Reference chromosome/contig
  • QueryName: Read name
  • ReferenceStart/ReferenceEnd: Alignment coordinates
  • PrimerPair: Primer pair assignment
  • Primer1/Primer2: Individual primer information
  • CorrectlyPaired: Boolean indicating proper primer pairing
  • Additional alignment metrics

The --amp-depth-report generates a summary of coverage depth per amplicon.

Genome Coverage Report

When using --genome-coverage-report PREFIX, per-position genome coverage is written as tab-separated files with columns chrom, pos (1-based), and depth.

Without --normalise, a single file is produced:

  • PREFIX.pre-normalisation.coverage.tsv — coverage of all reads passing filtering and trimming

With --normalise, two files are produced:

  • PREFIX.pre-normalisation.coverage.tsv — coverage of all reads passing filtering and trimming (before normalisation subsampling)
  • PREFIX.post-normalisation.coverage.tsv — coverage of reads retained after normalisation

In both cases, a coverage summary is printed to stderr showing the percentage of genome positions covered at >=1x, >=10x, >=20x, >=100x, and >=1000x.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

align_trim-1.2.0.tar.gz (6.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

align_trim-1.2.0-py3-none-any.whl (13.4 kB view details)

Uploaded Python 3

File details

Details for the file align_trim-1.2.0.tar.gz.

File metadata

  • Download URL: align_trim-1.2.0.tar.gz
  • Upload date:
  • Size: 6.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for align_trim-1.2.0.tar.gz
Algorithm Hash digest
SHA256 18501e94027dfb6f99a7c6b2bb285c92bef36aa6e65ecd1e2201e7dd6994dd0f
MD5 9474170340a5ee58b01852983480dc98
BLAKE2b-256 05d60dbe2e085faca8b06089c505c1ee83a412c9d6e956bbcdc28b0388b63498

See more details on using hashes here.

Provenance

The following attestation bundles were made for align_trim-1.2.0.tar.gz:

Publisher: python-publish.yml on artic-network/align_trim

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file align_trim-1.2.0-py3-none-any.whl.

File metadata

  • Download URL: align_trim-1.2.0-py3-none-any.whl
  • Upload date:
  • Size: 13.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for align_trim-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 02343d9819d90b517c3d0895af5ce1013b4a4872d023662bf00556d4a78bba54
MD5 58328fc9a8d68e918247a5cc9b0ab008
BLAKE2b-256 8fecbec644a8a8a308eb7e279b2560bb731c1491fcf2e6b84655db21975c7a49

See more details on using hashes here.

Provenance

The following attestation bundles were made for align_trim-1.2.0-py3-none-any.whl:

Publisher: python-publish.yml on artic-network/align_trim

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page