Read, write, and analyze biological sequences

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: GNU General Public License v3 (GPLv3)
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

sequencelib

Using the classes and methods in sequencelib.py, you can read and write text files containing DNA or protein sequences (aligned or unaligned), and analyze or manipulate these sequences in various ways

Note: Much of the functionality in sequencelib is also available through the command-line tool seqconverter

Availability

The sequencelib.py module is available on GitHub: https://github.com/agormp/sequencelib and on PyPI: https://pypi.org/project/sequencelib/

Installation

python3 -m pip install sequencelib

Upgrading to latest version:

python3 -m pip install --upgrade sequencelib

Quick Start Tutorial for sequencelib

Note: under construction. This version mostly generated using chatGPT with some editing

This quick start guide introduces some basic functionalities of sequencelib.

Loading Sequences

sequencelib supports various file formats: fasta, nexus, clustal, phylip, raw, tab, how, and genbank. It automatically detects the file format:

Reading Unaligned Sequences

import sequencelib as sq

seqfile = sq.Seqfile("seqfilename.fasta")
seqset = seqfile.read_seqs()

Iterate over sequences

for seq in seqset:
    print(seq.name, len(seq))

Reading Aligned Sequences

import sequencelib as sq

seqfile = sq.Seqfile("alignment.fasta")
alignment = seqfile.read_alignment()

print("Number of sequences:", len(alignment))
print("Alignment length:", alignment.alignlen())

Find Columns with More than 50% Gaps

nseqs = len(alignment)
gapcols = []
for i in range(alignment.alignlen()):
    col = alignment.getcolumn(i)
    gapfrac = col.count("-") / nseqs
    if gapfrac >= 0.5:
        gapcols.append(i)

Export Alignment to File

with open("gapcols.fasta", "w") as f:
    f.write(subalignment.fasta())

with open("gapcols.nexus", "w") as f:
    f.write(subalignment.nexus())

with open("gapcols.clustal", "w") as f:
    f.write(subalignment.clustal())

Analyzing Individual Columns

Directly access columns and analyze their conservation:

column = subalignment.getcolumn(0)
if len(set(column)) > 1:
    print("This column is not conserved")

Mapping Sequence and Alignment Positions

Map positions between sequence (without gaps) and alignment:

alignpos_0index = alignment.seqpos2alignpos("seq1", 41)  # Index starts at 0
alignpos_1index = alignment.seqpos2alignpos("seq1", 42, slicesyntax=False) # Index starts at 1

Convert from alignment position to sequence position:

seqpos, gapstatus = alignment.alignpos2seqpos("seq1", 153)
if gapstatus:
    print(f"Alignment position is a gap; closest preceding residue is at sequence position {seqpos}")

Working with Individual Sequences

Each sequence object has multiple attributes and methods:

seq = seqset[0]
print(seq.name)
print(len(seq))
print(seq.fasta())

shuffled_seq = seq.shuffle()
protein_seq = seq.translate()

Window Iteration

Iterate through sequence windows:

for seqwindow in seq.windows(wsize=30):
    print(seqwindow.fasta())

More Features

The sequencelib library contains many additional functionalities such as:

Calculating pairwise sequence distances
Removing conserved or ambiguous columns
Reverse complementing DNA sequences
Handling complex alignments with partitions

SequenceLib: Class and Method Reference

Class: `Sequence`

Base class representing a biological sequence (DNA, protein, or other types).

Constructor

Sequence(name, seq, annotation='', comments='', check_alphabet=False, degap=False)

name: Identifier for the sequence.
seq: The actual biological sequence string.
annotation: Annotation information for each residue.
comments: Additional metadata or notes.
check_alphabet: Checks sequence against allowed alphabet symbols.
degap: Removes gap characters (-).

Methods

__len__(): Returns the length of the sequence.
__getitem__(index): Allows indexing and slicing of the sequence.
__setitem__(index, residue): Modifies residue at a given index.
__str__(): Returns FASTA-formatted string.
copy_seqobject(): Returns a deep copy of the sequence object.
rename(newname): Changes the sequence name.
subseq(start, stop, slicesyntax=True, rename=False): Extracts subsequence between start and stop positions.
subseqpos(poslist, namesuffix=None): Creates subsequence from specified positions.
appendseq(other): Appends another sequence at the end.
prependseq(other): Prepends another sequence at the start.
windows(wsize, stepsize=1, l_overhang=0, r_overhang=0, padding="X", rename=False): Iterates over windows of the sequence.
remgaps(): Removes gaps from the sequence.
shuffle(): Randomly shuffles the sequence residues.
indexfilter(keeplist): Keeps only residues at specified positions.
seqdiff(other, zeroindex=True): Lists differences between two sequences.
hamming(other): Computes Hamming distance (absolute differences).
hamming_ignoregaps(other): Computes Hamming distance, ignoring gaps.
pdist(other): Computes proportional differences per site.
pdist_ignoregaps(other): Computes proportional differences, ignoring gaps.
pdist_ignorechars(other, igchars): Proportional differences ignoring specified characters.
residuecounts(): Counts residues and returns a dictionary.
composition(ignoregaps=True, ignoreambig=False): Calculates composition as frequencies.
findgaps(): Identifies gap positions.
fasta(width=60, nocomments=False): Returns FASTA format representation.
how(width=80, nocomments=False): Returns HOW format representation.
gapencoded(): Encodes gaps as binary (1/0) string.
tab(nocomments=False): Returns TAB format representation.
raw(): Returns sequence in raw format.

Class: `DNA_sequence(Sequence)`

Specialized sequence class for DNA sequences. Has access to all the methods in its base class (Sequence) in addition to the ones listed here.

Methods

revcomp(): Returns reverse complement.
translate(reading_frame=1): Translates DNA to protein sequence.

Class: `Protein_sequence(Sequence)`

Specialized sequence class for protein sequences. Has access to all the methods in its base class (Sequence).

Class: `Sequences_base`

Abstract base class for sequence collections. Should not be instantiated directly. All methods here can be used in both Seq_alignment and Seq_set objects.

Methods

__len__(): Returns the number of sequences.
__getitem__(index): Accesses sequences via indexing or slicing.
__setitem__(index, value): Sets sequences by integer index.
__eq__(other): Checks equality with another sequence collection.
__ne__(other): Checks inequality with another sequence collection.
__str__(): Returns FASTA format of the collection.
sortnames(reverse=False): Alphabetically sorts sequences by name.
addseq(seq, silently_discard_dup_name=False): Adds a sequence object.
addseqset(other, silently_discard_dup_name=False): Adds sequences from another collection.
remseq(name): Removes sequence by name.
remseqs(namelist): Removes multiple sequences.
changeseqname(oldname, newname, fix_dupnames=False): Renames a sequence.
getseq(name): Retrieves sequence by name.
subset(namelist): Extracts a subset by names.
subsample(samplesize): Randomly selects a subset.
subseq(start, stop, slicesyntax=True, rename=True, aln_name=None, aln_name_number=False): Extracts subset by positions.
getnames(): Returns a list of sequence names.
range(rangefrom, rangeto): In-place subset of sequences.
removedupseqs(): Removes duplicate sequences.
group_identical_seqs(): Groups identical sequences.
residuecounts(): Counts residues across all sequences.
composition(ignoregaps=True, ignoreambig=False): Computes frequency composition.
clean_names(illegal=":;,()[]", rep="_"): Cleans illegal characters from names.
rename_numbered(basename, namefile=None): Renames sequences numerically.
rename_regexp(old_regex, new_string, namefile=None): Renames sequences using regex.
transname(namefile): Renames sequences using a mapping file.
revcomp(): Reverse complements all sequences.
translate(reading_frame=1): Translates all sequences (DNA only).
fasta(width=60, nocomments=False): FASTA format.
how(width=60, nocomments=False): HOW format.
tab(nocomments=False): TAB format.
raw(): RAW format.

Class: `Seq_alignment(Sequences_base)`

Represents aligned sequences. This class also has access to all methods defined in base class (Sequences_base).

Methods

alignlen(): Length of the alignment.
getcolumn(i): Retrieves column by index.
columns(): Iterates over columns.
samplecols(samplesize): Randomly samples columns.
conscols(): Lists conserved columns.
varcols(): Lists variable columns.
gappycols(): Lists columns with gaps.
site_summary(): Summarizes alignment sites.
indexfilter(keeplist): Keeps columns by indices.
remcols(discardlist): Removes columns by indices.
remambigcol(): Removes ambiguous columns.
remfracambigcol(frac): Removes columns with high ambiguity fraction.
remgapcol(): Removes columns with gaps.
remfracgapcol(frac): Removes columns with high gap fraction.
remendgapcol(frac=0.5): Removes end-gap columns.
remconscol(): Removes conserved columns.
findgaps(): Identifies gap positions.
gap_encode(): Binary encodes gaps.
seqpos2alignpos(seqname, seqpos, slicesyntax=True): Maps sequence to alignment position.
alignpos2seqpos(seqname, alignpos, slicesyntax=True): Maps alignment to sequence position.
shannon(countgaps=True): Computes Shannon entropy.
consensus(): Generates consensus sequence.
phylip(width=60): PHYLIP format.
clustal(width=60): CLUSTAL format.
nexus(width=60, print_partitioned=False): NEXUS format.
charsetblock(): Generates MrBayes charset block for partitioned analyses.
mbpartblock(): Generates detailed MrBayes block (charset, partitions, models, MCMC) for partitioned analyses.
bestblock(): Generates MrBayes BEST block for species-tree analyses (taxsets, charsets, BEST parameters).
nexuspart(): Generates Nexus-formatted MrBayes block with partition and model specifications.

Class `Seqfile_reader`

Base class for reading sequence files. Typically, you do not instantiate this class directly.

Methods:

makeseq(name, seq, annotation="", comments="")
- Description: Creates and returns a sequence object based on provided type information.
readseq()
- Description: Reads a single sequence from a file and returns it as a sequence object.
read_seqs(silently_discard_dup_name=False)
- Description: Reads all sequences and returns a Seq_set object.
read_alignment(silently_discard_dup_name=False)
- Description: Reads aligned sequences, returning a Seq_alignment object.

Class `Fastafilehandle`

Class for handling FASTA files.

Methods:

__init__(filename, seqtype="autodetect", check_alphabet=False, degap=False, nameishandle=False)
- Description: Initializes a FASTA file reader, performs format checks.
__next__()
- Description: Parses and returns the next sequence as a sequence object.

Class `Howfilehandle`

Class for reading HOW-formatted files.

Methods:

__init__(...)
__next__()

Class `Genbankfilehandle`

Class for reading GenBank files.

Methods:

__init__(...)
__next__()
find_LOCUS()
read_metadata()
extract_annotation(metadata)
extract_name(metadata)
read_genbankseq()

Class `Tabfilehandle`

Handles tab-delimited sequence files.

Methods:

__init__(...)
__next__()

Class `Rawfilehandle`

Handles raw-format sequence files.

Methods:

__init__(...)
__next__()

Class `Alignfile_reader`

Base class for alignment files.

Methods:

makeseq(name, seq, annotation="", comments="")
read_seqs(silently_discard_dup_name=False)

Class `Clustalfilehandle`

Reads Clustal-formatted alignment files.

Methods:

__init__(...)
read_alignment(silently_discard_dup_name=False)

Class `Phylipfilehandle`

Handles Phylip-formatted alignment files.

Methods:

__init__(...)
read_alignment(silently_discard_dup_name=False)

Class `Nexusfilehandle`

Handles Nexus-formatted alignment files.

Methods:

__init__(...)
read_alignment(silently_discard_dup_name=False)

Class `Stockholmfilehandle`

Reads Stockholm-formatted alignment files.

Methods:

__init__(...)
read_alignment(silently_discard_dup_name=False)

Class `Seqfile`

Factory class to autodetect file formats and instantiate the correct file handler.

Methods:

__new__(klass, filename, filetype="autodetect", ...)

Automatically selects the appropriate sequence or alignment reader based on file contents or explicitly provided file type.

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: GNU General Public License v3 (GPLv3)
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

2.23.4

Mar 12, 2026

2.23.3

Feb 11, 2026

2.23.2

Apr 10, 2025

2.23.1

Mar 12, 2025

2.23.0

Oct 12, 2024

2.22.0

Oct 5, 2024

2.21.5

Sep 23, 2024

2.21.4

Sep 5, 2024

2.21.3

Sep 4, 2024

2.21.2

Aug 31, 2024

2.21.1

Aug 31, 2024

2.21.0

Aug 30, 2024

2.20.1

Aug 30, 2024

2.20.0

Dec 20, 2023

2.19.0

Dec 20, 2023

2.18.5

Aug 30, 2023

2.18.4

Aug 30, 2023

2.18.3

Aug 29, 2023

2.18.2

Aug 4, 2023

2.18.1

Jul 7, 2023

2.17.1

Jul 7, 2023

2.16.1

Jul 7, 2023

2.15.1

Jul 3, 2023

2.14.1

Mar 30, 2023

2.14.0

Mar 30, 2023

2.13.0

Mar 29, 2023

2.12.0

Dec 19, 2022

2.11.1

Dec 13, 2022

2.11.0

Dec 13, 2022

2.10.4

Nov 25, 2022

2.10.3

Nov 22, 2022

2.10.2

Nov 21, 2022

2.10.1

Nov 21, 2022

2.10.0

Nov 17, 2022

2.9.0

Nov 14, 2022

2.8.3

Jun 7, 2022

2.8.2

Mar 10, 2022

2.8.1

Mar 10, 2022

2.8.0

Mar 9, 2022

2.7.1

Mar 8, 2022

2.7.0

Mar 8, 2022

2.6.0

Mar 7, 2022

2.5.2

Feb 21, 2022

2.5.1

Nov 24, 2021

0.1.0

Nov 17, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sequencelib-2.23.4.tar.gz (87.0 kB view details)

Uploaded Mar 12, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

sequencelib-2.23.4-py3-none-any.whl (55.9 kB view details)

Uploaded Mar 12, 2026 Python 3

File details

Details for the file sequencelib-2.23.4.tar.gz.

File metadata

Download URL: sequencelib-2.23.4.tar.gz
Upload date: Mar 12, 2026
Size: 87.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.4

File hashes

Hashes for sequencelib-2.23.4.tar.gz
Algorithm	Hash digest
SHA256	`05c198dd94301997ac93abbed38a88fa8fe5ae4682f0210c027b35ed938be578`
MD5	`2d1f74615064dc6b7c6cf16ce5754e8a`
BLAKE2b-256	`f6bdd4474f9b6bb26cdd384c499f39e1c17a4102b4c2fff6d5ac1c4847b370e6`

See more details on using hashes here.

File details

Details for the file sequencelib-2.23.4-py3-none-any.whl.

File metadata

Download URL: sequencelib-2.23.4-py3-none-any.whl
Upload date: Mar 12, 2026
Size: 55.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.4

File hashes

Hashes for sequencelib-2.23.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e6a6d88f2ba91a285f50bfc17c038cad061ca91464c2607d9133a9986a128ed7`
MD5	`9ff69253f9ca2efcb963125eaacb5520`
BLAKE2b-256	`518e4cc48123b0580c1c551684873acd26a0be1fb6a8d6e02c3f17887e95fbf7`

See more details on using hashes here.

sequencelib 2.23.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

sequencelib

Availability

Installation

Quick Start Tutorial for sequencelib

Loading Sequences

Reading Unaligned Sequences

Iterate over sequences

Reading Aligned Sequences

Find Columns with More than 50% Gaps

Export Alignment to File

Analyzing Individual Columns

Mapping Sequence and Alignment Positions

Working with Individual Sequences

Window Iteration

More Features

SequenceLib: Class and Method Reference

Class: Sequence

Constructor

Methods

Class: DNA_sequence(Sequence)

Methods

Class: Protein_sequence(Sequence)

Class: Sequences_base

Methods

Class: Seq_alignment(Sequences_base)

Methods

Class Seqfile_reader

Methods:

Class Fastafilehandle

Methods:

Class Howfilehandle

Methods:

Class Genbankfilehandle

Methods:

Class Tabfilehandle

Methods:

Class Rawfilehandle

Methods:

Class Alignfile_reader

Methods:

Class Clustalfilehandle

Methods:

Class Phylipfilehandle

Methods:

Class Nexusfilehandle

Methods:

Class Stockholmfilehandle

Methods:

Class Seqfile

Methods:

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Class: `Sequence`

Class: `DNA_sequence(Sequence)`

Class: `Protein_sequence(Sequence)`

Class: `Sequences_base`

Class: `Seq_alignment(Sequences_base)`

Class `Seqfile_reader`

Class `Fastafilehandle`

Class `Howfilehandle`

Class `Genbankfilehandle`

Class `Tabfilehandle`

Class `Rawfilehandle`

Class `Alignfile_reader`

Class `Clustalfilehandle`

Class `Phylipfilehandle`

Class `Nexusfilehandle`

Class `Stockholmfilehandle`

Class `Seqfile`