Comprehensive bioinformatics utilities for sequence analysis, alignment, annotation, and molecular biology workflows
Project description
🧬 Bioutils Collection
Production-ready bioinformatics toolkit - 77+ optimized functions for sequence analysis, alignment, annotation, and molecular biology workflows.
✨ Highlights
- 🚀 77+ specialized functions across 13 bioinformatics domains
- 🔒 Fully typed with complete type hints (mypy strict)
- 📊 Research-grade algorithms - Needleman-Wunsch, Smith-Waterman, and more
- ⚡ Performance optimized for large-scale genomic data
- ✅ Extensively tested with comprehensive test coverage
- 📝 Self-documenting - NumPy-style docstrings with examples
📦 Installation
pip install bioutils-collection
Requirements: Python 3.10+ with numpy, scipy, and scikit-learn
🎯 Quick Start
from bioutils_collection import (
reverse_complement,
gc_content,
needleman_wunsch,
parse_fasta,
translate_dna_to_protein,
)
# Sequence manipulation
seq = "ATCGATCG"
rev_comp = reverse_complement(seq) # "CGATCGAT"
# Calculate GC content
gc = gc_content("ATCGATCG") # 0.5
# Global sequence alignment
seq1, seq2 = "GATTACA", "GCATGCU"
aligned1, aligned2, score = needleman_wunsch(seq1, seq2)
# Parse FASTA files
for header, sequence in parse_fasta("genome.fasta"):
print(f"{header}: {len(sequence)} bp")
# Translate DNA to protein
protein = translate_dna_to_protein("ATGGCCTAA") # "MA*"
🧬 Modules
Core Sequence Operations
alignment_functions- Pairwise & multiple sequence alignment (Needleman-Wunsch, Smith-Waterman, BLAST score ratio)sequence_operations- Reverse complement, ORF finding, CpG islands, low-complexity filteringtranslation_functions- DNA↔RNA transcription, translation with custom codon tables
Sequence Analysis & Statistics
gc_functions- GC content, GC skew, windowed GC profilingsequence_statistics- Codon usage (CAI, ENC, RSCU), melting temp, isoelectric point, amino acid compositiondata_validation- DNA/RNA/protein sequence validation
File I/O & Parsing
fasta_misc- FASTA parsing, writing, filtering, splitting, concatenation, primer generationannotation_functions- BED/GFF/GTF/VCF parsing and conversion, annotation statistics
Pattern & Motif Discovery
motif_functions- Motif search, consensus generation, pattern matchingrepeat_functions- Tandem repeat finder, palindrome detectionrestriction_functions- Restriction enzyme site identificationclustering_functions- Motif clustering and grouping
🔬 Use Cases
Genomic Analysis
from bioutils_collection import find_orfs, gc_content_windows, find_cpg_islands
# Find all ORFs in a sequence
orfs = find_orfs(dna_sequence, min_length=300)
# Sliding window GC analysis
gc_proDevelopment
```bash
# Clone repository
git clone https://github.com/MForofontov/bioutils-collection.git
cd bioutils-collection
# Install with dev dependencies
pip install -e ".[dev]"
# Run tests
pytest
# Run specific test categories
pytest -m alignment
pytest -m fasta
pytest -m translation
# Type checking
mypy bioutils_collection
# Linting
ruff check .
# Coverage report
pytest --cov=bioutils_collection --cov-report=html
📚 API Documentation
All functions include:
- ✅ Complete type hints for static analysis
- 📖 NumPy-style docstrings with parameter descriptions
- 💡 Usage examples in docstrings
- ⚠️ Complexity notes for performance-critical code
- 📎 Algorithm references where applicable
Example:
from bioutils_collection import needleman_wunsch
help(needleman_wunsch) # Comprehensive documentation
🤝 Contributing
Contributions welcome! Please:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Add comprehensive tests
- Ensure all tests pass (
pytest) - Add type hints and docstrings
- Submit a pull request
Development Guidelines:
- Follow existing code style (ruff formatting)
- Add tests for all new functions
- Update documentation
- Keep functions focused and single-purpose
🔗 Related Projects
- BioPython - Comprehensive biological computation library
- scikit-bio - Scientific Python library for bioinformatics
- pyutils-collection - General Python utilities (sister project)
Protein properties
pi = calculate_isoelectric_point(protein_seq) composition = amino_acid_composition(protein_seq)
Primer design
tm = melting_temperature("ATCGATCGATCG")
## 🧪 Testing
```bash
# Run all tests
pytest
# Run specific module tests
pytest -m alignment
pytest -m fasta
pytest -m translation
# Run with coverage
pytest --cov=bioutils_collection --cov-report=html
📖 Documentation
Each function includes:
- **� License
MIT License - see LICENSE for details.
📊 Project Stats
- 77+ Functions across 13 specialized modules
- 670+ Tests with comprehensive coverage
- Type-safe with mypy strict mode
- Python 3.10+ with modern type hints
📮 Contact & Support
- Author: Mykyta Forofontov
- Repository: github.com/MForofontov/bioutils-collection
- Issues: Report bugs or request features
- PyPI: pypi.org/project/bioutils-collection
⭐ Star this repo if you find it useful!
🤝 Contributing
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Ensure all tests pass
- Submit a pull request
📄 License
MIT License - see LICENSE file for details.
🔗 Related Projects
- BioPython - Comprehensive bioinformatics toolkit
- scikit-bio - Scientific Python library for bioinformatics
📮 Contact
Author: Mykyta Forofontov
Repository: https://github.com/MForofontov/bioutils-collection
Issues: https://github.com/MForofontov/bioutils-collection/issues
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file bioutils_collection-0.1.1.tar.gz.
File metadata
- Download URL: bioutils_collection-0.1.1.tar.gz
- Upload date:
- Size: 46.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ec8a2e62a2737bdd56115a272024ef092b034c671693669a7514799fb6707e5a
|
|
| MD5 |
023f400bcda133e3384e11e6ffd55289
|
|
| BLAKE2b-256 |
2131ba9cbcac9d5c334049beba941e7e114bb00e78fdacdfa2b9f8dbd1278762
|
Provenance
The following attestation bundles were made for bioutils_collection-0.1.1.tar.gz:
Publisher:
publish-pypi.yml on MForofontov/bioutils-collection
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
bioutils_collection-0.1.1.tar.gz -
Subject digest:
ec8a2e62a2737bdd56115a272024ef092b034c671693669a7514799fb6707e5a - Sigstore transparency entry: 863153893
- Sigstore integration time:
-
Permalink:
MForofontov/bioutils-collection@4067d3663ccb0271ed85d0f54398a2e742df50a4 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/MForofontov
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@4067d3663ccb0271ed85d0f54398a2e742df50a4 -
Trigger Event:
release
-
Statement type:
File details
Details for the file bioutils_collection-0.1.1-py3-none-any.whl.
File metadata
- Download URL: bioutils_collection-0.1.1-py3-none-any.whl
- Upload date:
- Size: 86.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6655ea01e0d5cd6de703f824464e22c9e0bd639f267796847c776cba441cad9d
|
|
| MD5 |
93b01f3c45e1e9ca1017ca24d0814d30
|
|
| BLAKE2b-256 |
105718b0ead44c5c48bd41a3c24e2b2dabb79a2cb543b3c41f0c51577769c498
|
Provenance
The following attestation bundles were made for bioutils_collection-0.1.1-py3-none-any.whl:
Publisher:
publish-pypi.yml on MForofontov/bioutils-collection
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
bioutils_collection-0.1.1-py3-none-any.whl -
Subject digest:
6655ea01e0d5cd6de703f824464e22c9e0bd639f267796847c776cba441cad9d - Sigstore transparency entry: 863153896
- Sigstore integration time:
-
Permalink:
MForofontov/bioutils-collection@4067d3663ccb0271ed85d0f54398a2e742df50a4 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/MForofontov
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@4067d3663ccb0271ed85d0f54398a2e742df50a4 -
Trigger Event:
release
-
Statement type: