A pure-Python library for reading and writing FASTA sequence files.
Project description
fastatacular
Pure-Python library for reading and writing FASTA sequence files, with optional parsing of UniProt-style description keys (OS=, OX=, GN=, PE=, SV=) and pipe-delimited identifiers (sp|P12345|EX_HUMAN, gi|12345|ref|NP_000001.1|).
It's the plain-FASTA companion to pefftacular and ships with the same read_* / *Reader / write_* shape.
Install
pip install fastatacular
Dev install:
just install
Quick start
read_fasta — load everything into memory at once:
from fastatacular import read_fasta
entries = read_fasta("proteins.fasta")
for entry in entries:
print(entry.identifier, len(entry.sequence))
FastaReader — iterate lazily without loading the full file:
from fastatacular import FastaReader
with FastaReader("proteins.fasta") as reader:
for entry in reader:
process(entry)
Data model
Each entry is a SequenceEntry:
| Field | Type | Description |
|---|---|---|
identifier |
str |
Token immediately after > (e.g. `sp |
sequence |
str |
Concatenated sequence with whitespace stripped |
prefix |
str | None |
Database prefix (sp, tr, gi, ...) when the id is pipe-delimited |
accession |
str | None |
First pipe field (e.g. P12345) |
entry_name |
str | None |
Third pipe field on UniProt ids (e.g. EX_HUMAN) |
description |
str | None |
Free text after the identifier |
pname |
str | None |
Protein name (description text, minus KEY=value pairs) |
gname |
str | None |
Gene name (GN=) |
os_name |
str | None |
Organism name (OS=) |
ncbi_tax_id |
int | None |
NCBI taxonomy ID (OX=) |
pe |
int | None |
Protein existence level (PE=) |
sv |
int | None |
Sequence version (SV=) |
extra |
dict[str, str] |
Any other KEY=value pairs found in the header |
raw_header |
str |
The original header line (without leading >) |
UniProt-style headers
from fastatacular import read_fasta
[entry] = read_fasta("one.fasta")
# >sp|P12345|EX_HUMAN Example protein OS=Homo sapiens OX=9606 GN=EXMP PE=1 SV=2
entry.prefix # "sp"
entry.accession # "P12345"
entry.entry_name # "EX_HUMAN"
entry.pname # "Example protein"
entry.os_name # "Homo sapiens"
entry.ncbi_tax_id # 9606
entry.gname # "EXMP"
entry.pe # 1
entry.sv # 2
Non-standard KEY=value pairs are captured in entry.extra. Headers with no KEY=value tokens leave description and pname populated and extra empty.
Writing
Construct entries and write them out:
from fastatacular import SequenceEntry, write_fasta
entries = [
SequenceEntry(
identifier="sp|P12345|EX_HUMAN",
sequence="MKTIIALSYIFCLVFA",
pname="Example protein",
os_name="Homo sapiens",
ncbi_tax_id=9606,
gname="EXMP",
pe=1,
sv=2,
),
]
write_fasta(entries, "output.fasta")
dest accepts a path string, a pathlib.Path, or a text-mode file object.
Sequence lines wrap at 60 characters by default. Override with line_width= (pass 0 to disable wrapping):
write_fasta(entries, "output.fasta", line_width=80)
write_fasta(entries, "single-line.fasta", line_width=0)
If raw_header is set on an entry (as it is on every entry produced by read_fasta), the writer round-trips it verbatim. Otherwise the header is rebuilt from the structured fields.
Error handling
Parse errors raise FastaParseError:
from fastatacular import FastaParseError, read_fasta
try:
entries = read_fasta("malformed.fasta")
except FastaParseError as e:
print(e.line) # offending line number
print(e.context) # surrounding line content
Write errors raise FastaWriteError.
Development
just install # install dependencies
just test # run tests
just test-v # run tests (verbose)
just cov # run tests with coverage
just lint # ruff lint
just format # ruff format
just check # lint + type check + test
just build # build the package
just clean # remove cache files
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fastatacular-0.1.0.tar.gz.
File metadata
- Download URL: fastatacular-0.1.0.tar.gz
- Upload date:
- Size: 10.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9f140817f1666bf9e857fbb253ba607925825cae03922ff3b639bd150cf56a56
|
|
| MD5 |
170fd5e9db7aa287d6789ebc305dcd26
|
|
| BLAKE2b-256 |
5376cc0ddb65b0e120d41ba3a294dffbbc0813551b3d62ad60acffc930a32324
|
Provenance
The following attestation bundles were made for fastatacular-0.1.0.tar.gz:
Publisher:
python-publish.yml on tacular-omics/fastatacular
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
fastatacular-0.1.0.tar.gz -
Subject digest:
9f140817f1666bf9e857fbb253ba607925825cae03922ff3b639bd150cf56a56 - Sigstore transparency entry: 1549644159
- Sigstore integration time:
-
Permalink:
tacular-omics/fastatacular@5331c830b72b0432d8c5eb51f082c4d20773223d -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/tacular-omics
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@5331c830b72b0432d8c5eb51f082c4d20773223d -
Trigger Event:
release
-
Statement type:
File details
Details for the file fastatacular-0.1.0-py3-none-any.whl.
File metadata
- Download URL: fastatacular-0.1.0-py3-none-any.whl
- Upload date:
- Size: 8.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a851bc34274dacf279bd04c6524d7cbacbd98077eb1c316fec7c1effe10311a0
|
|
| MD5 |
7fe7e2f2e3f25bbab21083bd74a72b6f
|
|
| BLAKE2b-256 |
6e3da086e9ddc70a5df9e9cda76bd04899fcac017a35f5af9252f212464eca37
|
Provenance
The following attestation bundles were made for fastatacular-0.1.0-py3-none-any.whl:
Publisher:
python-publish.yml on tacular-omics/fastatacular
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
fastatacular-0.1.0-py3-none-any.whl -
Subject digest:
a851bc34274dacf279bd04c6524d7cbacbd98077eb1c316fec7c1effe10311a0 - Sigstore transparency entry: 1549644192
- Sigstore integration time:
-
Permalink:
tacular-omics/fastatacular@5331c830b72b0432d8c5eb51f082c4d20773223d -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/tacular-omics
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@5331c830b72b0432d8c5eb51f082c4d20773223d -
Trigger Event:
release
-
Statement type: