A lightweight Python module to read and write FASTA sequence records

These details have not been verified by PyPI

Project links

Project description

PyPI - Version tests workflow

fastapy

A lightweight Python package to read and write sequence records in FASTA format.

The design was inspired by the utility of BioPython’s SeqIO, which supports many sequence formats. This repo focuses only on FASTA records. It is faster than BioPython, can handle compressed FASTA files (gzip, bzip2, zip), and has no Python package dependencies.

Requirements

Python >= 3.8

Installation

You can install fastapy from PyPI:

pip install fastapy

or directly from GitHub:

pip install "git+https://github.com/aziele/fastapy.git"

You can also use fastapy without installation since it doesn't have any dependencies. Simply clone or download this repository and you're ready to use it.

git clone https://github.com/aziele/fastapy.git
cd fastapy
python
>>> import fastapy
>>> fastapy.__doc__
'A lightweight Python module to read and write FASTA sequence records'

Quick Start

Typical usage is to read a FASTA file and loop over the sequences record(s).

import fastapy

for record in fastapy.parse('tests/test.fasta'):
    print(record.id, len(record), record.seq[:10], record.desc)

Output:

NP_002433.1  362   METDAPQPGL   RNA-binding protein Musashi homolog 1 [Homo sapiens]
ENO94161.1    79   MKLLISGLGP   RRM domain-containing RNA-binding protein
sequence     292   MKLSKIALMM

Usage

This module contains the Record class representing a FASTA sequence record and the parse() function to read FASTA records from a file.

Record object

Record is an object that contains information on a FASTA sequence record, including id, description, and the sequence itself.

import fastapy

record = fastapy.Record(
    id='NP_950171.2', 
    seq='MEEEAETEEQQRFSYQQRLKAAVHYTVGCLCEEVALDKEMQFSKQTIAAISELTFRQCENFAKDLEMFASICRKRQE',
    desc='APITD1-CORT protein isoform 2 [Homo sapiens]'
)

print(record.id)            # NP_950171.2
print(record.desc)          # APITD1-CORT protein isoform 2 [Homo sapiens]
print(record.seq)           # MEEEAE..
print(record.description)   # >NP_950171.2 G APITD1-CORT protein isoform 2 [Homo sapiens]
print(len(record))          # 77
print('EEEA' in record)     # True

By default, the sequence line is wrapped to 70 characters. You can provide the line length. Use zero (or None) for no wrapping.

print(record)
# >NP_950171.2 APITD1-CORT protein isoform 2 [Homo sapiens]
# MEEEAETEEQQRFSYQQRLKAAVHYTVGCLCEEVALDKEMQFSKQTIAAISELTFRQCENFAKDLEMFAS
# ICRKRQE

print(record.format(wrap=30))
# >NP_001382951.1 G protein subunit gamma 5 [Homo sapiens]
# MEEEAETEEQQRFSYQQRLKAAVHYTVGCL
# CEEVALDKEMQFSKQTIAAISELTFRQCEN
# FAKDLEMFASICRKRQE

print(record.format(wrap=None))
# >NP_950171.2 APITD1-CORT protein isoform 2 [Homo sapiens]
# MEEEAETEEQQRFSYQQRLKAAVHYTVGCLCEEVALDKEMQFSKQTIAAISELTFRQCENFAKDLEMFASICRKRQE

parse

The parse() function is a generator to read FASTA records as Record objects one by one from a file (plain FASTA or compressed using gzip or bzip2). Because only one record is created at a time, very little memory is required.

import fastapy

for record in fastapy.parse('tests/test.fasta.gz'):
    print(record.id)

For some tasks you may need to have a reusable access to the records. For this purpose, you can use the built-in Python list() function to turn the iterator into a list:

import fastapy

records = list(fastapy.parse('tests/test.fasta.gz'))
print(records[0].id)   # First record
print(records[-1].id)  # Last record

Another common task is to index your records by sequence identifier. Use to_dict() to turn a Record iterator (or list) into a dictionary.

import fastapy

records = fastapy.to_dict(fasta.parse('tests/test.fasta.gz'))
print(records['NP_002433.1'])   # Use any record id

read

The read() function reads only the first FASTA record from a file. It does not read any subsequent records in the file.

import fastapy

seq_record = fastapy.read('tests/test.fasta')
print(seq_record.id)           # NP_002433.1

Test

You can run tests to ensure that the module works as expected.

python -m unittest discover

License

GNU General Public License, version 3

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.0.5

Nov 17, 2024

1.0.4

Sep 30, 2024

1.0.3

Mar 19, 2023

1.0.2

Mar 12, 2023

1.0.1

Mar 11, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastapy-1.0.5.tar.gz (45.2 kB view details)

Uploaded Nov 17, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

fastapy-1.0.5-py3-none-any.whl (30.4 kB view details)

Uploaded Nov 17, 2024 Python 3

File details

Details for the file fastapy-1.0.5.tar.gz.

File metadata

Download URL: fastapy-1.0.5.tar.gz
Upload date: Nov 17, 2024
Size: 45.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.12.6

File hashes

Hashes for fastapy-1.0.5.tar.gz
Algorithm	Hash digest
SHA256	`55ef76b0ded9071420023d64ca52f857b891db85157afe3e8be7b0f7ea027f1b`
MD5	`14170bf4a0883fc951bfe80f77b6243a`
BLAKE2b-256	`a1034b542f0a001849baf3a861d7ff961c006f50709b160efeeb1bba95467550`

See more details on using hashes here.

File details

Details for the file fastapy-1.0.5-py3-none-any.whl.

File metadata

Download URL: fastapy-1.0.5-py3-none-any.whl
Upload date: Nov 17, 2024
Size: 30.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.12.6

File hashes

Hashes for fastapy-1.0.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`214749fc6d945e435533ca920c81e823f050aa56538b5e4bfde2e276c3a69c06`
MD5	`fc30e87eaa651e5b5f24b8ef2f6e02c6`
BLAKE2b-256	`0eda6ab3aa39edb30857b576548189e1b1ba4114922f8a1694151c0154e1f9d0`

See more details on using hashes here.

fastapy 1.0.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

fastapy

Requirements

Installation

Quick Start

Usage

Record object

parse

read

Test

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes