Skip to main content

A command-line tool to extract and filter sequence reads from BAM and FASTQ files by ID, quality score, and length.

Project description

seq-miner

seq-miner is a lightweight tool to extract and filter reads from BAM or FASTQ files based on:

  • Specific read IDs
  • Mean quality score threshold
  • Minimum read length
  • Multi-threading (FASTQ)
  • JSON/CSV-ready summary (optional)
  • GitHub release tagging and PyPI publish automation

Installation

pip install seq-miner

Or clone from source:

git clone https://github.com/your-org/seq-miner.git
cd seq-miner
pip install .

Usage

Extract reads from BAM

seq-miner -i reads.bam -o filtered.bam -f bam -r read_ids.txt --min-qscore 10 --min-length 200

Filter FASTQ reads in parallel

seq-miner -i reads.fastq -o filtered.fastq -f fastq --min-qscore 15 --min-length 1000 --threads 4

Show version

seq-miner --version

Command-line Options

Option Description
-i, --input Input BAM or FASTQ file
-o, --output Output file for passed reads
-f, --format File format: bam or fastq
-r, --read-ids Optional file with read IDs (one per line)
--min-qscore Minimum mean Q-score (default: 0.0)
--min-length Minimum read length (default: 0)
--threads Number of CPU threads (only used for FASTQ)
--verbose Enable verbose logging
--version Print the current version

Output Summary

When finished, you'll see:

Summary:
Passed reads     : 12345
Low-quality reads : 54
Short reads      : 91

Optionally, you can pipe this to JSON or CSV (coming soon).

Auto Version + Release

  • Version is stored in seqminer/__version__.py
  • Tagged automatically with GitHub Actions on push to main
  • Published to PyPI on GitHub release

License

MIT License © Theerayut
See LICENSE for full text.

Contact

For issues, please open an issue on GitHub.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

seq_miner-1.2.0.tar.gz (6.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

seq_miner-1.2.0-py3-none-any.whl (6.9 kB view details)

Uploaded Python 3

File details

Details for the file seq_miner-1.2.0.tar.gz.

File metadata

  • Download URL: seq_miner-1.2.0.tar.gz
  • Upload date:
  • Size: 6.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for seq_miner-1.2.0.tar.gz
Algorithm Hash digest
SHA256 80804de4db3c2855d4929d3ae7573a54b11ab45f5af2b5d3eea0c0d1f2801ed9
MD5 c23c037e4d2efaeacba347636b1e679f
BLAKE2b-256 9bd43d29ed54296bc9801edd1c676e6edb7d1e26aa8f33ac7ad4c0f3551adce0

See more details on using hashes here.

Provenance

The following attestation bundles were made for seq_miner-1.2.0.tar.gz:

Publisher: python-publish.yml on aeiwz/seq-miner

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file seq_miner-1.2.0-py3-none-any.whl.

File metadata

  • Download URL: seq_miner-1.2.0-py3-none-any.whl
  • Upload date:
  • Size: 6.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for seq_miner-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c64ffbcfd35667b974c96fe968872547638ce3944526fdfd7a1b606479b98f69
MD5 54b4bc38f3b28e49760fb9b825f341fb
BLAKE2b-256 b43c82a59039e87cb3bbee98a66b6a0902c477870c7cd54e880926f7db9ccd42

See more details on using hashes here.

Provenance

The following attestation bundles were made for seq_miner-1.2.0-py3-none-any.whl:

Publisher: python-publish.yml on aeiwz/seq-miner

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page