Skip to main content

Sequence analysis and bioinformatics utilities in Python and Perl.

Project description

A hybrid Python + Perl bioinformatics toolkit for sequence alignment, Markov modeling, sequence analysis, and FM-index–based search — unified under a single Python CLI and API.


Documentation


Quick Start

1. Clone

git clone https://github.com/bibymaths/bio-sea-pearl.git
cd bio-sea-pearl

2. Install Dependencies

This project uses Python ≥ 3.10 and is built with Hatch.

With uv (recommended)

uv pip install -e ".[dev]"

With pip

pip install -e ".[dev]"

Note: Some features (alignment, Markov generation) delegate to Perl scripts at runtime. Install Perl ≥ 5.26 if you need those features.


3. Run the CLI

The unified CLI entrypoint is:

biosea --help

CLI Usage

Alignment

biosea align seq1.fa seq2.fa \
  --matrix alignment/scoring/blosum62.mat \
  --mode global

⚠️ Notes:

  • Matrix filenames are case-sensitive
  • Use blosum62.mat, not BLOSUM62.mat

Optionally, you can generate dotplot in svg format:

perl alignment/bin/dotplot.pl align.matrix.tsv dotplot.svg

Markov Chain

biosea markov \
  --fasta seq1.fa \
  --length 100 \
  --start A \
  --order 1 \
  --method alias

For higher-order models:

--order 2 --start AA

⚠️ Constraint:

  • start length must equal order

Sequence Utilities

# Hamming distance
biosea seqtools hamming ACGT AGGT

# Levenshtein distance
biosea seqtools levenshtein kitten sitting

# k-mer counts
biosea seqtools kmer ACGTACGT --k 3

BWT / FM-Index Search

biosea bwt search \
  --sequence ACGTACGT \
  --pattern CGT

REST API

Start the FastAPI server:

uvicorn api.server:app --reload

Endpoints:

  • POST /align
  • POST /markov
  • POST /distance
  • POST /kmer
  • POST /bwt/search

Example:

curl -X POST http://localhost:8000/distance \
  -H "Content-Type: application/json" \
  -d '{"seq1": "kitten", "seq2": "sitting", "metric": "levenshtein"}'

Interactive API documentation is available at http://localhost:8000/docs.


Docker

Build and start

./docker_up.sh

Stop and remove

./docker_down.sh

Interactive shell

./docker_interactive.sh

Manual Docker commands

docker compose up --build -d
docker compose exec biosea biosea --help
docker compose down

Project Structure

src/bio_sea_pearl/
├── cli.py                 # Unified CLI
├── api/                   # Clean Python API layer
├── perl_wrappers/         # Bridge to legacy Perl scripts
├── seqtools_py/           # Python ports of core algorithms
└── bwt/                   # Native Python FM-index

alignment/                 # Legacy alignment tools
markov/                    # Perl Markov models
seqtools/                  # Perl sequence utilities
api/server.py              # FastAPI layer
docs/                      # MkDocs documentation
tests/                     # Unit + integration tests

Architecture Overview

The system is layered:

CLI / API
   ↓
Python API Layer
   ↓
Wrappers (subprocess)
   ↓
Perl + Python legacy tools

This design:

  • preserves legacy code
  • enables gradual Python migration
  • provides production-ready interfaces

Running Tests

pytest

Building

pip install build
python -m build

This produces a source distribution and wheel in dist/.


Releasing

Releases are automated via GitHub Actions. Push a tag to trigger the workflow:

git tag v0.1.0
git push origin v0.1.0

This will:

  1. Run the test suite
  2. Create a GitHub release
  3. Build and publish the package to PyPI
  4. Build and push multi-arch Docker images to ghcr.io/bibymaths/bio-sea-pearl

Troubleshooting

Alignment fails

  • Check matrix path:

    alignment/scoring/blosum62.mat
    
  • Avoid uppercase filenames


Markov fails

Error:

Start state must be length N

Fix:

--order N → start string length must be N

CLI not found

pip install -e .
biosea --help

License

This project is licensed under the MIT License. See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bio_sea_pearl-0.1.2.tar.gz (117.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bio_sea_pearl-0.1.2-py3-none-any.whl (133.1 kB view details)

Uploaded Python 3

File details

Details for the file bio_sea_pearl-0.1.2.tar.gz.

File metadata

  • Download URL: bio_sea_pearl-0.1.2.tar.gz
  • Upload date:
  • Size: 117.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for bio_sea_pearl-0.1.2.tar.gz
Algorithm Hash digest
SHA256 e9694310bc617785dc25368a06f8929f9222b194adc9c94ede38c5a6cbdfbc5d
MD5 2f6c318ad3b4c9b9a2f6a576fd63cd59
BLAKE2b-256 f9b849f7943c0e049d00e025fe67c16f4cf44fd3cd73c0d406ef9a40e5be507f

See more details on using hashes here.

File details

Details for the file bio_sea_pearl-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: bio_sea_pearl-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 133.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for bio_sea_pearl-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 fb07f2193eea797ee826aabf8c5352fa4797f0ad4ff373ca7f684fec6a85967d
MD5 2bd0c20d26f368e28b71cde1f3890e80
BLAKE2b-256 fa200439267620c57368d20b5824a258ec714d8169ea2fa07c172cf3a6a1ad09

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page