High-performance biological sequence analysis library with GPU acceleration
Project description
Seqcore
High-performance biological sequence analysis library for Python.
A unified, GPU-accelerated library for genomics, proteomics, structural biology, and drug design.
Installation
pip install seqcore
With GPU support:
pip install seqcore[gpu]
With all optional dependencies:
pip install seqcore[full]
Quick Start
import seqcore as sc
# DNA sequences - efficient 2-bit encoding
dna = sc.DNAArray("ACGTACGTACGT" * 1_000_000)
# Batch operations
sequences = sc.DNAArray([
"ACGTACGT",
"TGCATGCA",
"GGGGCCCC",
])
# Vectorized operations
gc = sc.gc_content(sequences)
lengths = sc.length(sequences)
rev_comp = sc.reverse_complement(sequences)
# Translation
proteins = sc.translate(sequences)
Features
Sequence Operations
# GC content, molecular weight, length
gc = sc.gc_content(dna)
mw = sc.molecular_weight(protein)
# Transcription and translation
rna = sc.transcribe(dna)
protein = sc.translate(dna, frame=0)
# K-mer operations
kmers = sc.extract_kmers(sequences, k=21)
kmer_counts = sc.count_kmers(sequences, k=21)
Sequence Alignment
# Pairwise alignment
result = sc.align(query, reference)
print(result.score, result.identity, result.cigar)
# Distance matrices
dm = sc.pairwise_distance(sequences, metric="edit")
# Pattern matching
matches = sc.find_pattern(sequences, "ATG[ACGT]{30,100}TAA")
File I/O
# Auto-detect format
data = sc.read("sequences.fasta")
data = sc.read("structure.pdb")
data = sc.read("reads.fastq.gz")
# Streaming for large files
for batch in sc.read_stream("huge.fastq.gz", batch_size=100_000):
results = process(batch)
# Database fetching
seq = sc.fetch("NP_000509") # NCBI/UniProt
structure = sc.fetch("1ABC") # PDB
Structural Biology
# Load structure
structure = sc.read("protein.pdb")
# Access data
print(structure.chains) # ['A', 'B']
print(structure.n_residues) # 265
# Distance matrix
dm = sc.distance_matrix(structure, selection="CA")
# Find contacts
contacts = sc.find_contacts(structure, cutoff=4.0)
# RMSD calculation
rmsd = sc.rmsd(structure1, structure2, align=True)
# Surface analysis
sasa = sc.sasa(structure)
surface = sc.surface_residues(structure, threshold=25.0)
# Binding pockets
pockets = sc.find_pockets(structure)
Drug Design
# Small molecules
mol = sc.Molecule.from_smiles("CCO")
# Molecular properties
mw = sc.molecular_weight(molecules)
logp = sc.logp(molecules)
hbd = sc.h_bond_donors(molecules)
# ADMET filters
passes_lipinski = sc.lipinski_filter(molecules)
bbb_permeable = sc.bbb_filter(molecules)
# Fingerprints and similarity
fps = sc.morgan_fingerprint(molecules, radius=2)
similarity = sc.tanimoto_similarity(fps)
# Substructure search
matches = sc.substructure_search(molecules, "c1ccccc1")
Phylogenetics
# Tree construction
tree = sc.neighbor_joining(sequences)
tree = sc.upgma(sequences)
# Tree operations
print(tree.newick())
dist = tree.distance("Species_A", "Species_B")
subtree = tree.prune(["A", "B", "C"])
Population Genetics
# Variant analysis
variants = sc.read("variants.vcf")
af = sc.allele_frequency(variants)
maf = sc.minor_allele_frequency(variants)
# Population statistics
fst = sc.fst(pop1, pop2)
pi = sc.nucleotide_diversity(sequences)
d = sc.tajimas_d(sequences)
# Linkage disequilibrium
ld = sc.linkage_disequilibrium(variants)
GPU Acceleration
# Check GPU availability
if sc.gpu_available():
print(sc.gpu_info())
# Device context
with sc.device("cuda:0"):
result = sc.align(sequences, reference)
# Memory management
sc.set_memory_limit("8GB")
sc.clear_gpu_cache()
# Timing
with sc.timer() as t:
result = sc.align(sequences, reference)
print(f"Completed in {t.elapsed:.2f}s")
Interoperability
# NumPy
arr = sequences.to_numpy()
sequences = sc.DNAArray.from_numpy(arr)
# pandas
df = sequences.to_dataframe()
df = structure.to_dataframe()
# Biopython
bio_seq = sequences[0].to_biopython()
sc_seq = sc.DNAArray.from_biopython(bio_seq)
# RDKit
rdkit_mol = molecule.to_rdkit()
sc_mol = sc.Molecule.from_rdkit(rdkit_mol)
Performance
Seqcore provides significant speedups over traditional libraries:
| Operation | Biopython | Seqcore | Speedup |
|---|---|---|---|
| GC Content (1M seqs) | 45s | 0.8s | 56x |
| Reverse Complement | 12s | 0.1s | 120x |
| Translation | 38s | 0.5s | 76x |
| K-mer Counting | 89s | 1.2s | 74x |
Benchmarks on AMD Ryzen 9 5900X, 32GB RAM. GPU benchmarks show additional 10-50x speedup.
Requirements
- Python 3.9+
- NumPy 1.21+
Optional:
- CuPy (GPU acceleration)
- Biopython (interoperability)
- RDKit (molecular operations)
- MDAnalysis (structure analysis)
Contributing
Contributions welcome. See CONTRIBUTING.md.
License
MIT License. See LICENSE.
Author
Dr. Pritam Kumar Panda Stanford University Email: pritam@stanford.edu
Citation
If you use Seqcore in your research, please cite:
@software{seqcore,
author = {Panda, Pritam Kumar},
title = {Seqcore: High-performance biological sequence analysis},
url = {https://github.com/pritampanda15/seqcore},
version = {0.1.0},
year = {2025},
institution = {Stanford University}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file seqcore-0.1.0.tar.gz.
File metadata
- Download URL: seqcore-0.1.0.tar.gz
- Upload date:
- Size: 43.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b58c21fadff1c22681b0200ce04f74f0de94df6f8a83e3861a024f26f154140b
|
|
| MD5 |
d0bce0c06beac5bcc48e97f7fdca347e
|
|
| BLAKE2b-256 |
2348313ae21394b3a4d92ae73978f25b6704669c47440b27020a858ebefffa8a
|
Provenance
The following attestation bundles were made for seqcore-0.1.0.tar.gz:
Publisher:
publish.yml on pritampanda15/Seqcore
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
seqcore-0.1.0.tar.gz -
Subject digest:
b58c21fadff1c22681b0200ce04f74f0de94df6f8a83e3861a024f26f154140b - Sigstore transparency entry: 783714453
- Sigstore integration time:
-
Permalink:
pritampanda15/Seqcore@6eece0bd303eb2b6bac5ff535a1dc9c80585678c -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/pritampanda15
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@6eece0bd303eb2b6bac5ff535a1dc9c80585678c -
Trigger Event:
release
-
Statement type:
File details
Details for the file seqcore-0.1.0-py3-none-any.whl.
File metadata
- Download URL: seqcore-0.1.0-py3-none-any.whl
- Upload date:
- Size: 49.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ea5116322e87ae3caec88e314246a96e98a9fe9aff379515252fa74c82094ea1
|
|
| MD5 |
d2aee7fa90455dbd523cad90ef54ed23
|
|
| BLAKE2b-256 |
39f1edf30dd98461031710ce477c612ec6985ac352dcfd9fb67a99c6f2dc74c3
|
Provenance
The following attestation bundles were made for seqcore-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on pritampanda15/Seqcore
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
seqcore-0.1.0-py3-none-any.whl -
Subject digest:
ea5116322e87ae3caec88e314246a96e98a9fe9aff379515252fa74c82094ea1 - Sigstore transparency entry: 783714472
- Sigstore integration time:
-
Permalink:
pritampanda15/Seqcore@6eece0bd303eb2b6bac5ff535a1dc9c80585678c -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/pritampanda15
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@6eece0bd303eb2b6bac5ff535a1dc9c80585678c -
Trigger Event:
release
-
Statement type: