Skip to main content

A pure-Python library for reading and writing FASTA sequence files.

Project description

fastatacular

Python Package License

Pure-Python library for reading and writing FASTA sequence files, with optional parsing of UniProt-style description keys (OS=, OX=, GN=, PE=, SV=) and pipe-delimited identifiers (sp|P12345|EX_HUMAN, gi|12345|ref|NP_000001.1|).

It's the plain-FASTA companion to pefftacular and ships with the same read_* / *Reader / write_* shape.

Install

pip install fastatacular

Dev install:

just install

Quick start

read_fasta — load everything into memory at once:

from fastatacular import read_fasta

entries = read_fasta("proteins.fasta")
for entry in entries:
    print(entry.identifier, len(entry.sequence))

FastaReader — iterate lazily without loading the full file:

from fastatacular import FastaReader

with FastaReader("proteins.fasta") as reader:
    for entry in reader:
        process(entry)

Data model

Each entry is a SequenceEntry:

Field Type Description
identifier str Token immediately after > (e.g. `sp
sequence str Concatenated sequence with whitespace stripped
prefix str | None Database prefix (sp, tr, gi, ...) when the id is pipe-delimited
accession str | None First pipe field (e.g. P12345)
entry_name str | None Third pipe field on UniProt ids (e.g. EX_HUMAN)
description str | None Free text after the identifier
pname str | None Protein name (description text, minus KEY=value pairs)
gname str | None Gene name (GN=)
os_name str | None Organism name (OS=)
ncbi_tax_id int | None NCBI taxonomy ID (OX=)
pe int | None Protein existence level (PE=)
sv int | None Sequence version (SV=)
extra dict[str, str] Any other KEY=value pairs found in the header
raw_header str The original header line (without leading >)

UniProt-style headers

from fastatacular import read_fasta

[entry] = read_fasta("one.fasta")
# >sp|P12345|EX_HUMAN Example protein OS=Homo sapiens OX=9606 GN=EXMP PE=1 SV=2

entry.prefix         # "sp"
entry.accession      # "P12345"
entry.entry_name     # "EX_HUMAN"
entry.pname          # "Example protein"
entry.os_name        # "Homo sapiens"
entry.ncbi_tax_id    # 9606
entry.gname          # "EXMP"
entry.pe             # 1
entry.sv             # 2

Non-standard KEY=value pairs are captured in entry.extra. Headers with no KEY=value tokens leave description and pname populated and extra empty.

Writing

Construct entries and write them out:

from fastatacular import SequenceEntry, write_fasta

entries = [
    SequenceEntry(
        identifier="sp|P12345|EX_HUMAN",
        sequence="MKTIIALSYIFCLVFA",
        pname="Example protein",
        os_name="Homo sapiens",
        ncbi_tax_id=9606,
        gname="EXMP",
        pe=1,
        sv=2,
    ),
]

write_fasta(entries, "output.fasta")

dest accepts a path string, a pathlib.Path, or a text-mode file object.

Sequence lines wrap at 60 characters by default. Override with line_width= (pass 0 to disable wrapping):

write_fasta(entries, "output.fasta", line_width=80)
write_fasta(entries, "single-line.fasta", line_width=0)

If raw_header is set on an entry (as it is on every entry produced by read_fasta), the writer round-trips it verbatim. Otherwise the header is rebuilt from the structured fields.

Error handling

Parse errors raise FastaParseError:

from fastatacular import FastaParseError, read_fasta

try:
    entries = read_fasta("malformed.fasta")
except FastaParseError as e:
    print(e.line)     # offending line number
    print(e.context)  # surrounding line content

Write errors raise FastaWriteError.

Development

just install      # install dependencies
just test         # run tests
just test-v       # run tests (verbose)
just cov          # run tests with coverage
just lint         # ruff lint
just format       # ruff format
just check        # lint + type check + test
just build        # build the package
just clean        # remove cache files

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastatacular-0.1.0.tar.gz (10.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fastatacular-0.1.0-py3-none-any.whl (8.9 kB view details)

Uploaded Python 3

File details

Details for the file fastatacular-0.1.0.tar.gz.

File metadata

  • Download URL: fastatacular-0.1.0.tar.gz
  • Upload date:
  • Size: 10.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for fastatacular-0.1.0.tar.gz
Algorithm Hash digest
SHA256 9f140817f1666bf9e857fbb253ba607925825cae03922ff3b639bd150cf56a56
MD5 170fd5e9db7aa287d6789ebc305dcd26
BLAKE2b-256 5376cc0ddb65b0e120d41ba3a294dffbbc0813551b3d62ad60acffc930a32324

See more details on using hashes here.

Provenance

The following attestation bundles were made for fastatacular-0.1.0.tar.gz:

Publisher: python-publish.yml on tacular-omics/fastatacular

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fastatacular-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: fastatacular-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 8.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for fastatacular-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a851bc34274dacf279bd04c6524d7cbacbd98077eb1c316fec7c1effe10311a0
MD5 7fe7e2f2e3f25bbab21083bd74a72b6f
BLAKE2b-256 6e3da086e9ddc70a5df9e9cda76bd04899fcac017a35f5af9252f212464eca37

See more details on using hashes here.

Provenance

The following attestation bundles were made for fastatacular-0.1.0-py3-none-any.whl:

Publisher: python-publish.yml on tacular-omics/fastatacular

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page