High-performance biological sequence analysis library with GPU acceleration

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

pritampkp15

These details have not been verified by PyPI

Project links

Documentation

Project description

Seqcore

High-performance biological sequence analysis library for Python.

A unified, GPU-accelerated library for genomics, proteomics, structural biology, and drug design.

Installation

pip install seqcore

With GPU support:

pip install seqcore[gpu]

With all optional dependencies:

pip install seqcore[full]

Quick Start

import seqcore as sc

# DNA sequences - efficient 2-bit encoding
dna = sc.DNAArray("ACGTACGTACGT" * 1_000_000)

# Batch operations
sequences = sc.DNAArray([
    "ACGTACGT",
    "TGCATGCA",
    "GGGGCCCC",
])

# Vectorized operations
gc = sc.gc_content(sequences)
lengths = sc.length(sequences)
rev_comp = sc.reverse_complement(sequences)

# Translation
proteins = sc.translate(sequences)

Features

Sequence Operations

# GC content, molecular weight, length
gc = sc.gc_content(dna)
mw = sc.molecular_weight(protein)

# Transcription and translation
rna = sc.transcribe(dna)
protein = sc.translate(dna, frame=0)

# K-mer operations
kmers = sc.extract_kmers(sequences, k=21)
kmer_counts = sc.count_kmers(sequences, k=21)

Sequence Alignment

# Pairwise alignment
result = sc.align(query, reference)
print(result.score, result.identity, result.cigar)

# Distance matrices
dm = sc.pairwise_distance(sequences, metric="edit")

# Pattern matching
matches = sc.find_pattern(sequences, "ATG[ACGT]{30,100}TAA")

File I/O

# Auto-detect format
data = sc.read("sequences.fasta")
data = sc.read("structure.pdb")
data = sc.read("reads.fastq.gz")

# Streaming for large files
for batch in sc.read_stream("huge.fastq.gz", batch_size=100_000):
    results = process(batch)

# Database fetching
seq = sc.fetch("NP_000509")      # NCBI/UniProt
structure = sc.fetch("1ABC")     # PDB

Structural Biology

# Load structure
structure = sc.read("protein.pdb")

# Access data
print(structure.chains)      # ['A', 'B']
print(structure.n_residues)  # 265

# Distance matrix
dm = sc.distance_matrix(structure, selection="CA")

# Find contacts
contacts = sc.find_contacts(structure, cutoff=4.0)

# RMSD calculation
rmsd = sc.rmsd(structure1, structure2, align=True)

# Surface analysis
sasa = sc.sasa(structure)
surface = sc.surface_residues(structure, threshold=25.0)

# Binding pockets
pockets = sc.find_pockets(structure)

Drug Design

# Small molecules
mol = sc.Molecule.from_smiles("CCO")

# Molecular properties
mw = sc.molecular_weight(molecules)
logp = sc.logp(molecules)
hbd = sc.h_bond_donors(molecules)

# ADMET filters
passes_lipinski = sc.lipinski_filter(molecules)
bbb_permeable = sc.bbb_filter(molecules)

# Fingerprints and similarity
fps = sc.morgan_fingerprint(molecules, radius=2)
similarity = sc.tanimoto_similarity(fps)

# Substructure search
matches = sc.substructure_search(molecules, "c1ccccc1")

Phylogenetics

# Tree construction
tree = sc.neighbor_joining(sequences)
tree = sc.upgma(sequences)

# Tree operations
print(tree.newick())
dist = tree.distance("Species_A", "Species_B")
subtree = tree.prune(["A", "B", "C"])

Population Genetics

# Variant analysis
variants = sc.read("variants.vcf")
af = sc.allele_frequency(variants)
maf = sc.minor_allele_frequency(variants)

# Population statistics
fst = sc.fst(pop1, pop2)
pi = sc.nucleotide_diversity(sequences)
d = sc.tajimas_d(sequences)

# Linkage disequilibrium
ld = sc.linkage_disequilibrium(variants)

GPU Acceleration

# Check GPU availability
if sc.gpu_available():
    print(sc.gpu_info())

# Device context
with sc.device("cuda:0"):
    result = sc.align(sequences, reference)

# Memory management
sc.set_memory_limit("8GB")
sc.clear_gpu_cache()

# Timing
with sc.timer() as t:
    result = sc.align(sequences, reference)
print(f"Completed in {t.elapsed:.2f}s")

Interoperability

# NumPy
arr = sequences.to_numpy()
sequences = sc.DNAArray.from_numpy(arr)

# pandas
df = sequences.to_dataframe()
df = structure.to_dataframe()

# Biopython
bio_seq = sequences[0].to_biopython()
sc_seq = sc.DNAArray.from_biopython(bio_seq)

# RDKit
rdkit_mol = molecule.to_rdkit()
sc_mol = sc.Molecule.from_rdkit(rdkit_mol)

Performance

Seqcore provides significant speedups over traditional libraries:

Operation	Biopython	Seqcore	Speedup
GC Content (1M seqs)	45s	0.8s	56x
Reverse Complement	12s	0.1s	120x
Translation	38s	0.5s	76x
K-mer Counting	89s	1.2s	74x

Benchmarks on AMD Ryzen 9 5900X, 32GB RAM. GPU benchmarks show additional 10-50x speedup.

Requirements

Python 3.9+
NumPy 1.21+

Optional:

CuPy (GPU acceleration)
Biopython (interoperability)
RDKit (molecular operations)
MDAnalysis (structure analysis)

Contributing

Contributions welcome. See CONTRIBUTING.md.

License

MIT License. See LICENSE.

Author

Dr. Pritam Kumar Panda Stanford University Email: pritam@stanford.edu

Citation

If you use Seqcore in your research, please cite:

@software{seqcore,
  author = {Panda, Pritam Kumar},
  title = {Seqcore: High-performance biological sequence analysis},
  url = {https://github.com/pritampanda15/seqcore},
  version = {0.3.0},
  year = {2025},
  institution = {Stanford University}
}

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

pritampkp15

These details have not been verified by PyPI

Project links

Documentation

Release history Release notifications | RSS feed

This version

0.3.0

Dec 31, 2025

0.2.0

Dec 31, 2025

0.1.0

Dec 31, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

seqcore-0.3.0.tar.gz (43.8 kB view details)

Uploaded Dec 31, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

seqcore-0.3.0-py3-none-any.whl (49.4 kB view details)

Uploaded Dec 31, 2025 Python 3

File details

Details for the file seqcore-0.3.0.tar.gz.

File metadata

Download URL: seqcore-0.3.0.tar.gz
Upload date: Dec 31, 2025
Size: 43.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for seqcore-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`590e34ffc700b332c50923ad70750121006fbcfe3debdc1446c1d18f6739b1a8`
MD5	`7110c57f819d67a808c5c016c04a0033`
BLAKE2b-256	`0e9cd397f8fc32a6a5dc35d3f58b5988aa743a05981140f4e4c5cb16d8e3fc1c`

See more details on using hashes here.

Provenance

The following attestation bundles were made for seqcore-0.3.0.tar.gz:

Publisher: publish.yml on pritampanda15/Seqcore

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: seqcore-0.3.0.tar.gz
- Subject digest: 590e34ffc700b332c50923ad70750121006fbcfe3debdc1446c1d18f6739b1a8
- Sigstore transparency entry: 784267221
- Sigstore integration time: Dec 31, 2025
Source repository:
- Permalink: pritampanda15/Seqcore@7022df3359954a05832ebd8a360d781baf32cd18
- Branch / Tag: refs/tags/v0.3.0
- Owner: https://github.com/pritampanda15
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@7022df3359954a05832ebd8a360d781baf32cd18
- Trigger Event: release

File details

Details for the file seqcore-0.3.0-py3-none-any.whl.

File metadata

Download URL: seqcore-0.3.0-py3-none-any.whl
Upload date: Dec 31, 2025
Size: 49.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for seqcore-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`332a2949c4383845a362d45374140460198b116ee319ac48a13942845c610e55`
MD5	`24ac58729bd369de3c711e69ccb3f6db`
BLAKE2b-256	`beaa7b4636fdd9e81ad64b80a867c2e9f43b6e5ca03b6187df9ec6f47cb11a0d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for seqcore-0.3.0-py3-none-any.whl:

Publisher: publish.yml on pritampanda15/Seqcore

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: seqcore-0.3.0-py3-none-any.whl
- Subject digest: 332a2949c4383845a362d45374140460198b116ee319ac48a13942845c610e55
- Sigstore transparency entry: 784267290
- Sigstore integration time: Dec 31, 2025
Source repository:
- Permalink: pritampanda15/Seqcore@7022df3359954a05832ebd8a360d781baf32cd18
- Branch / Tag: refs/tags/v0.3.0
- Owner: https://github.com/pritampanda15
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@7022df3359954a05832ebd8a360d781baf32cd18
- Trigger Event: release

seqcore 0.3.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Seqcore

Installation

Quick Start

Features

Sequence Operations

Sequence Alignment

File I/O

Structural Biology

Drug Design

Phylogenetics

Population Genetics

GPU Acceleration

Interoperability

Performance

Requirements

Contributing

License

Author

Citation

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance