Skip to main content

Lightweight MSA wrapper with position-aware slicing

Project description

MSAFrame

Lightweight Python library for working with Multiple Sequence Alignments. Query by residue position, not column index.

from msaframe import MSAFrame, read_fasta

msa = MSAFrame({
    "human":  "MALW--KTGV",
    "mouse":  "MSLWGAKTGV",
    "fish":   "M-LWGA-TGV",
})

# "What's at position 3 in human across all species?"
msa.at("human", 3)  # {"human": "W", "mouse": "W", "fish": "W"}

# Slice by residue positions in a reference
msa.slice("human", 2, 6)  # {"human": "LW--K", "mouse": "LWGAK", "fish": "LWGA-"}

The Problem

Alignment columns don't match residue positions. A mutation at "position 150" in your protein means nothing in a gapped alignment—is it column 150? 167? 183? It depends on where gaps fall upstream.

MSAFrame maintains bidirectional mappings between ungapped positions and alignment columns, so you can work in the coordinate system that matters: your reference sequence.

Installation

pip install msaframe

Or with uv:

uv pip install msaframe

Zero dependencies. Core functionality works with just Python 3.10+. Optional pandas/polars integration if you have them installed.

Usage

Creating an MSAFrame

From a dictionary:

from msaframe import MSAFrame

msa = MSAFrame({
    "seq1": "MALW--KTGV",
    "seq2": "MSLWGAKTGV",
    "seq3": "M-LWGA-TGV",
})

From a FASTA file:

from msaframe import MSAFrame, read_fasta

seqs = read_fasta("alignment.fasta")
msa = MSAFrame(seqs)

Position-Aware Queries

The core API lets you query by ungapped position in a reference sequence:

# Get all residues at position 5 in the reference
msa.at("human", 5)  # {"human": "K", "mouse": "K", "fish": "T"}

# Slice positions 10-20 in the reference
msa.slice("human", 10, 20)

# Convert between positions and columns
col = msa.col("human", 5)   # ungapped position → alignment column
pos = msa.pos("human", col)  # alignment column → ungapped position (None if gap)

Basic Operations

# Sequence access
msa["human"]                    # gapped sequence
msa.seq("human")                # gapped sequence
msa.seq("human", ungapped=True) # without gaps

# Metadata
msa.ids           # list of sequence IDs
msa.width         # alignment width (columns)
len(msa)          # number of sequences
msa.length("human")  # ungapped length of a sequence

# Column access (by alignment column index)
msa.column(42)    # {"human": "K", "mouse": "R", ...}

DataFrame Export

Convert to pandas or polars for further analysis:

df = msa.to_pandas()  # rows = sequences, columns = positions
df = msa.to_polars()

Raises ImportError if the library isn't installed—no hard dependencies.

CLI

MSAFrame includes a colorful terminal interface for exploring alignments interactively:

msaframe alignment.fasta

Or query directly from the command line:

# Show position 150 in BRCA1
msaframe alignment.fasta brca1:150

# Show positions 100-120
msaframe alignment.fasta brca1:100-120

# Multiple positions (useful for viewing mutations)
msaframe alignment.fasta brca1:95,130,185

Interactive Mode

MSA Explorer
MSAFrame(150 seqs × 892 cols)

no ref> ref brca1
Reference: BRCA1_HUMAN (863 aa)

BRCA1_HUMAN> 150
BRCA1_HUMAN position 150 (column 167):
  BRCA1_HUMAN: K

  K (142): BRCA1_MOUSE, BRCA1_RAT, BRCA1_BOVIN, ...
  R (5): BRCA1_XENLA, BRCA1_XENTR, ...
  - (3): BRCA1_DANRE, ...

BRCA1_HUMAN> 140-160

Commands:

  • ref <name> — Set reference sequence (fuzzy matching)
  • <pos> or pos <n> — Show residues at position
  • <start>-<end> or slice <s> <e> — Show a range
  • col <n> — Show alignment column by index
  • seq <name> — Show full sequence
  • find <query> — Search sequence IDs
  • list — List all sequences

The display uses Clustal X coloring: hydrophobic (blue), polar (green), positive (red), negative (magenta), and special residues (G=orange, P=yellow, C=pink).

Design Philosophy

BYOMSA (Bring Your Own MSA): MSAFrame doesn't align sequences—use MAFFT, Clustal, MUSCLE, or whatever you prefer. MSAFrame works with the result.

Zero Dependencies: The core library is a single file with no required dependencies. pandas and polars are optional extras for DataFrame export.

Position-Aware: It's often easier to think in residue positions ("mutation at position 150") rather alignment columns. MSAFrame bridges that gap with efficient bidirectional mappings.

API Reference

MSAFrame

Method Description
MSAFrame(seqs, gap_char="-") Create from dict of {id: sequence}
at(ref, pos) Get all residues at ungapped position pos in ref
slice(ref, start, end) Slice by ungapped positions [start, end) in ref
col(ref, pos) Convert ungapped position → alignment column
pos(ref, col) Convert alignment column → ungapped position (or None)
column(col) Get all residues at alignment column col
seq(id, ungapped=False) Get sequence, optionally without gaps
length(id) Ungapped length of a sequence
to_pandas() Export as pandas DataFrame
to_polars() Export as polars DataFrame

Properties

Property Description
ids List of sequence identifiers
width Alignment width (number of columns)

Helpers

Function Description
read_fasta(path) Parse FASTA file → dict[str, str]

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

msaframe-0.1.0.tar.gz (22.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

msaframe-0.1.0-py3-none-any.whl (12.0 kB view details)

Uploaded Python 3

File details

Details for the file msaframe-0.1.0.tar.gz.

File metadata

  • Download URL: msaframe-0.1.0.tar.gz
  • Upload date:
  • Size: 22.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for msaframe-0.1.0.tar.gz
Algorithm Hash digest
SHA256 51ed77b9e39b7d7e0efb0e4fddeaf360a16f9cc869f608f28f2c1ffcd8c7b5ca
MD5 68b36923b2b33b3cf91b457e8ed8150f
BLAKE2b-256 86999f774d6fe4b1d9855cb0e66281d87f3bc3c9cb07deb0a7d2e7725fd779eb

See more details on using hashes here.

File details

Details for the file msaframe-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: msaframe-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 12.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for msaframe-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8d2ac6d998f1663c9870a20af042dc3f2d710b574498f74d93a2f59073c4b901
MD5 c6b45df94e4ef77b3bf3d8edf45c4635
BLAKE2b-256 f0bf0c489be5290766f9856306bf9211ce7aede3439d32eb81cbdc2dc2a53c3a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page