Skip to main content

A simple FASTA read and write toolbox for small to medium size projects without dependencies.

Project description

miniFASTA

A simple FASTA read and write toolbox for small to medium size projects.

DOI Test Badge Code style: black
Download Badge Python Version Badge install with conda

FASTA files are text-based files for storing nucleotide or amino acid sequences. Reading such files is not particularly difficult, yet most off the shelf packages are overloaded with strange dependencies.

miniFASTA offers an alternative to this and brings many useful functions without relying on third party packages.

Installation

Using pip / pip3:

pip install miniFasta

Or by source with pip:

git clone git@github.com:not-a-feature/miniFASTA.git
cd miniFASTA
pip install .

Or by conda:

conda install -c conda-forge minifasta

How to use

miniFASTA offers easy to use functions for fasta handling. The five main parts are:

  • read()
  • write()
  • fasta_object()
    • toAmino()
    • roRevComp()
    • valid()
    • len() / str() / eq() / iter()
  • translate_seq()
  • reverse_comp()

Reading FASTA files

read() is a fasta reader which is able to handle compressed and non-compressed files. Following compressions are supported: zip, tar, tar.gz, gz. If multiple files are stored inside an archive, all files are read. This function returns a Iterator of fasta_objects. If only the sequences should be returnes set the positional argument seq=True. The entries are usually casted to upper case letters. Set read("path.fasta", upper=False) to disable casting.

# Read fasta_objects
fos = mf.read("dolphin.fasta") # Iterator of fasta_objects.
fos = list(fos) # Casts the iterator to list of fasta_objects

# Read only the sequence
fasta_strings = mf.read("dolphin.fasta", seq=True) # Iterator of string.
fasta_strings = [fo.body for fo in mf.read("dolphin.fasta")] # Alternative

# Options and compressed files
fos = mf.read("mouse.fasta", upper=False) # The entries won't be casted to upper case.
fos = mf.read("reads.tar.gz") # Is able to handle compressed files.

Writing FASTA files

write() is a basic fasta writer. It takes a single or a list of fasta_objects and writes it to the given path.

The file is usually overwritten. Set write(fo, "path.fasta", mode="a") to append file.

fos = mf.read("dolphin.fasta") # Iterator of fasta entries
fos = list(fos) # Materialize
mf.write(fos, "new.fasta")

fasta_object()

The core component of miniFASTA is the fasta_object(). This object represents an FASTA entry and consists of a head and body.

import miniFasta as mf
fo = mf.fasta_object(">Atlantic dolphin", "CGGCCTTCTATCTTCTTC", stype="DNA")
fo.getHead() or fo.head
# >Atlantic dolphin

fo.getSeq() or fo.body
# CGGCCTTCTATCTTCTTC

### Following functions are defined on a fasta_object():

str(fo) # will return:
# >Atlantic dolphin
# CGGCCTTCTATCTTCTTC

# Body length
len(fo) # will return 18, the length of the body

# Equality
fo == fo # True

fo_b = mf.fasta_object(">Same Body", "CGGCCTTCTATCTTCTTC")
fo == fo_b # True

fo_c = mf.fasta_object(">Different Body", "ZZZZAGCTAG")
fo == fo_c # False

for s in fo:
    # Iterates through the sequence of fo.

fasta_object(...).valid()

Checks if the body contains invalid characters. stype of fasta_object needs to be set in order to check for illegal characters in its body.

stype is one of:

  • ANY : [default] Allows all characters.
  • NA : Allows all Nucleic Acid Codes (DNA & RNA).
  • DNA : Allows all IUPAC DNA Codes.
  • RNA : Allows all IUPAC RNA Codes.
  • PROT: Allows all IUPAC Aminoacid Codes.

Optional: allowedChars can be set to overwrite default settings.

# The default object allows all characters.
# True
fasta_object(">valid", "Ä'_**?.asdLLA").valid()

# Only if stype is specified, valid can check for illegal characters.
# True
fasta_object(">valid", "ACGTUAGTGU", stype="NA").valid()

# False, as W is not allowed for DNA/RNA
fasta_object(">invalid", "ACWYUOTGU", stype="NA").valid()

# True
fasta_object(">valid", "AGGATTA", stype="ANY").valid(allowedChars = "AGTC")

# True, as stype is ignored if allowedChars is set.
fasta_object(">valid", "WYU", stype="DNA").valid(allowedChars = "WYU")

fasta_object(...).toAmino(translation_dict)

Translates the body to an amino-acid sequence. See tranlate_seq() for more details.

fo.toAmino()
fo.getBody() # Will return RPSIFF
d = {"CCG": "Z", "CTT": "A" ...}
fo.toAmino(d)
fo.getBody # Will return ZA...

fasta_object(...).toRevComp(complement_dict)

Converts the body to its reverse comlement. See reverse_comp() for more details.

fo.toRevComp()
fo.getBody # Will return GAAGAAGATAGAAGGCCG

Sequence translation

translate_seq() translates a sequence starting at position 0. Unless translation_dict is provided, the standart bacterial code is used. If the codon was not found, it will be replaced by an ~. Tailing bases that do not fit into a codon will be ignored.

mf.translate_seq("CGGCCTTCTATCTTCTTC") # Will return RPSIFF

d = {"CGG": "Z", "CTT": "A"}
mf.translate_seq("CGGCTT", d) # Will return ZA.

Reverse Complement

reverse_comp() converts a sequence to its reverse comlement. Unless complement_dict is provided, the standart complement is used. If no complement was found, the nucleotide remains unchanged.

mf.reverse_comp("CGGCCTTCTATCTTCTTC") # Will return GAAGAAGATAGAAGGCCG

d = {"C": "Z", "T": "Y"}
mf.reverse_comp("TC", d) # Will return ZY

License

Copyright (C) 2022 by Jules Kreuer - @not_a_feature
This piece of software is published unter the GNU General Public License v3.0
TLDR:

| Permissions      | Conditions                   | Limitations |
| ---------------- | ---------------------------- | ----------- |
| ✓ Commercial use | Disclose source              | ✕ Liability |
| ✓ Distribution   | License and copyright notice | ✕ Warranty  |
| ✓ Modification   | Same license                 |             |
| ✓ Patent use     | State changes                |             |
| ✓ Private use    |                              |             |

Go to LICENSE.md to see the full version.

Dependencies

In addition to packages included in Python 3, this piece of software uses 3rd-party software packages for development purposes that are not required in the published version. Go to DEPENDENCIES.md to see all dependencies and licenses.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

miniFasta-3.0.3.tar.gz (23.9 kB view details)

Uploaded Source

Built Distribution

miniFasta-3.0.3-py3-none-any.whl (20.9 kB view details)

Uploaded Python 3

File details

Details for the file miniFasta-3.0.3.tar.gz.

File metadata

  • Download URL: miniFasta-3.0.3.tar.gz
  • Upload date:
  • Size: 23.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.17

File hashes

Hashes for miniFasta-3.0.3.tar.gz
Algorithm Hash digest
SHA256 c956407123e681f877ef1d40f42d7f8aa19f87fb3d2cb8fde0de5148f68f3345
MD5 9165a9e50f90a24c1134045fff69f632
BLAKE2b-256 70d0a41ce802a0bcae36fb9f6dde9eaa90e58fc7f04a476a055980ecc3a39e47

See more details on using hashes here.

File details

Details for the file miniFasta-3.0.3-py3-none-any.whl.

File metadata

  • Download URL: miniFasta-3.0.3-py3-none-any.whl
  • Upload date:
  • Size: 20.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.17

File hashes

Hashes for miniFasta-3.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 5b1cdf634bee5f1b0807f47bc62acfb4f361bac63f89deb1352614e79a5ba9c9
MD5 12551431bfc764ced198b718c6d3e73f
BLAKE2b-256 3e3ab18291bb9590b23795c7fdfadd96e7ea958591fd1fa8ae9b87f3eda5b0ae

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page