Lightweight MSA wrapper with position-aware slicing
Project description
MSAFrame
Lightweight Python library for working with Multiple Sequence Alignments. Query by residue position, not column index.
from msaframe import MSAFrame, read_fasta
msa = MSAFrame({
"human": "MALW--KTGV",
"mouse": "MSLWGAKTGV",
"fish": "M-LWGA-TGV",
})
# "What's at position 3 in human across all species?"
msa.at("human", 3) # {"human": "W", "mouse": "W", "fish": "W"}
# Slice by residue positions in a reference
msa.slice("human", 2, 6) # {"human": "LW--K", "mouse": "LWGAK", "fish": "LWGA-"}
The Problem
Alignment columns don't match residue positions. A mutation at "position 150" in your protein means nothing in a gapped alignment—is it column 150? 167? 183? It depends on where gaps fall upstream.
MSAFrame maintains bidirectional mappings between ungapped positions and alignment columns, so you can work in the coordinate system that matters: your reference sequence.
Installation
pip install msaframe
Or with uv:
uv pip install msaframe
Zero dependencies. Core functionality works with just Python 3.10+. Optional pandas/polars integration if you have them installed.
Usage
Creating an MSAFrame
From a dictionary:
from msaframe import MSAFrame
msa = MSAFrame({
"seq1": "MALW--KTGV",
"seq2": "MSLWGAKTGV",
"seq3": "M-LWGA-TGV",
})
From a FASTA file:
from msaframe import MSAFrame, read_fasta
seqs = read_fasta("alignment.fasta")
msa = MSAFrame(seqs)
Position-Aware Queries
The core API lets you query by ungapped position in a reference sequence:
# Get all residues at position 5 in the reference
msa.at("human", 5) # {"human": "K", "mouse": "K", "fish": "T"}
# Slice positions 10-20 in the reference
msa.slice("human", 10, 20)
# Convert between positions and columns
col = msa.col("human", 5) # ungapped position → alignment column
pos = msa.pos("human", col) # alignment column → ungapped position (None if gap)
Basic Operations
# Sequence access
msa["human"] # gapped sequence
msa.seq("human") # gapped sequence
msa.seq("human", ungapped=True) # without gaps
# Metadata
msa.ids # list of sequence IDs
msa.width # alignment width (columns)
len(msa) # number of sequences
msa.length("human") # ungapped length of a sequence
# Column access (by alignment column index)
msa.column(42) # {"human": "K", "mouse": "R", ...}
DataFrame Export
Convert to pandas or polars for further analysis:
df = msa.to_pandas() # rows = sequences, columns = positions
df = msa.to_polars()
Raises ImportError if the library isn't installed—no hard dependencies.
CLI
MSAFrame includes a colorful terminal interface for exploring alignments interactively:
msaframe alignment.fasta
Or query directly from the command line:
# Show position 150 in BRCA1
msaframe alignment.fasta brca1:150
# Show positions 100-120
msaframe alignment.fasta brca1:100-120
# Multiple positions (useful for viewing mutations)
msaframe alignment.fasta brca1:95,130,185
Interactive Mode
MSA Explorer
MSAFrame(150 seqs × 892 cols)
no ref> ref brca1
Reference: BRCA1_HUMAN (863 aa)
BRCA1_HUMAN> 150
BRCA1_HUMAN position 150 (column 167):
BRCA1_HUMAN: K
K (142): BRCA1_MOUSE, BRCA1_RAT, BRCA1_BOVIN, ...
R (5): BRCA1_XENLA, BRCA1_XENTR, ...
- (3): BRCA1_DANRE, ...
BRCA1_HUMAN> 140-160
Commands:
ref <name>— Set reference sequence (fuzzy matching)<pos>orpos <n>— Show residues at position<start>-<end>orslice <s> <e>— Show a rangecol <n>— Show alignment column by indexseq <name>— Show full sequencefind <query>— Search sequence IDslist— List all sequences
The display uses Clustal X coloring: hydrophobic (blue), polar (green), positive (red), negative (magenta), and special residues (G=orange, P=yellow, C=pink).
Design Philosophy
BYOMSA (Bring Your Own MSA): MSAFrame doesn't align sequences—use MAFFT, Clustal, MUSCLE, or whatever you prefer. MSAFrame works with the result.
Zero Dependencies: The core library is a single file with no required dependencies. pandas and polars are optional extras for DataFrame export.
Position-Aware: It's often easier to think in residue positions ("mutation at position 150") rather alignment columns. MSAFrame bridges that gap with efficient bidirectional mappings.
API Reference
MSAFrame
| Method | Description |
|---|---|
MSAFrame(seqs, gap_char="-") |
Create from dict of {id: sequence} |
at(ref, pos) |
Get all residues at ungapped position pos in ref |
slice(ref, start, end) |
Slice by ungapped positions [start, end) in ref |
col(ref, pos) |
Convert ungapped position → alignment column |
pos(ref, col) |
Convert alignment column → ungapped position (or None) |
column(col) |
Get all residues at alignment column col |
seq(id, ungapped=False) |
Get sequence, optionally without gaps |
length(id) |
Ungapped length of a sequence |
to_pandas() |
Export as pandas DataFrame |
to_polars() |
Export as polars DataFrame |
Properties
| Property | Description |
|---|---|
ids |
List of sequence identifiers |
width |
Alignment width (number of columns) |
Helpers
| Function | Description |
|---|---|
read_fasta(path) |
Parse FASTA file → dict[str, str] |
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file msaframe-0.1.0.tar.gz.
File metadata
- Download URL: msaframe-0.1.0.tar.gz
- Upload date:
- Size: 22.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
51ed77b9e39b7d7e0efb0e4fddeaf360a16f9cc869f608f28f2c1ffcd8c7b5ca
|
|
| MD5 |
68b36923b2b33b3cf91b457e8ed8150f
|
|
| BLAKE2b-256 |
86999f774d6fe4b1d9855cb0e66281d87f3bc3c9cb07deb0a7d2e7725fd779eb
|
File details
Details for the file msaframe-0.1.0-py3-none-any.whl.
File metadata
- Download URL: msaframe-0.1.0-py3-none-any.whl
- Upload date:
- Size: 12.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8d2ac6d998f1663c9870a20af042dc3f2d710b574498f74d93a2f59073c4b901
|
|
| MD5 |
c6b45df94e4ef77b3bf3d8edf45c4635
|
|
| BLAKE2b-256 |
f0bf0c489be5290766f9856306bf9211ce7aede3439d32eb81cbdc2dc2a53c3a
|