A collection of (semi) useful bioinformatics functions
Project description
biogl
A collection of useful small bioinformatics functions for working with genomic data formats.
Features
- GFF3/GTF Parsing: Comprehensive parser with type hints and opt-in GFF3 spec compliance
- FASTA Utilities: Functions for parsing and manipulating FASTA files
- Coordinate Operations: Tools for working with genomic coordinates
- Type Safety: Full type hints for better IDE support and static analysis
- Well Tested: Comprehensive test suite with real genomic data
Installation
Using pip (recommended)
python3 -m pip install biogl
Development Installation
git clone https://github.com/grahamlarue/biogl.git
cd biogl
pip install -e ".[dev]"
Quick Start
import biogl
# Parse a GFF3/GTF line
from biogl import GxfParse
line = "chr1\tEnsembl\tgene\t1000\t2000\t.\t+\t.\tID=gene1;Name=TEST"
parsed = GxfParse(line, line_number=1)
print(parsed.region) # chr1
print(parsed.start) # 1000
print(parsed.stop) # 2000
print(parsed.strand) # 1 (normalized)
print(parsed.name) # gene1
print(parsed.attributes) # {'ID': 'gene1', 'Name': 'TEST'}
# Enable GFF3 spec compliance features
parsed = GxfParse(
line,
line_number=1,
url_decode=True, # Decode percent-encoded values
case_sensitive=True, # Case-sensitive attribute matching
strict_coordinates=True # Validate start <= end
)
# Parse FASTA files
from biogl import fasta_parse
for header, sequence in fasta_parse("genome.fa"):
print(f"{header}: {len(sequence)} bp")
# Work with coordinates
from biogl import coord_overlap
overlap = coord_overlap(100, 200, 150, 250) # Returns 50
# Reverse complement
from biogl import rev_comp
seq = "ATCG"
rc = rev_comp(seq) # Returns "CGAT"
Key Modules
GxfParse: Parse GFF3/GTF annotation files with full type hintsfasta_parse: Efficient FASTA file parsingcoord_overlap: Calculate coordinate overlapsrev_comp: Reverse complement DNA sequencestranslate: Translate DNA to proteinflex_open: Open compressed or uncompressed files transparentlywindow: Sliding window generator- And more...
Breaking Changes in v3.0.0
Important: Version 3.0.0 introduces breaking changes to GxfParse:
- Features without explicit IDs now return
Noneinstead of'feature_None'strings - Migration: Change
if name == 'exon_None'toif name is None
See CHANGELOG.md for full details.
Documentation
For detailed documentation on each module, use Python's built-in help:
from biogl import GxfParse
help(GxfParse)
Requirements
- Python 3.10+
- smart_open
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
MIT License - see LICENSE file for details.
Author
Graham Larue (egrahamlarue@gmail.com)
Links
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file biogl-3.0.1.tar.gz.
File metadata
- Download URL: biogl-3.0.1.tar.gz
- Upload date:
- Size: 17.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
897384eb309b537c9b1eefcc98180c15272b5a40f8514ed92984f44db0912c24
|
|
| MD5 |
99b8cbfed1732741e6578c6aa8f54d96
|
|
| BLAKE2b-256 |
b7c62922a71d9cab67691a120431bbc71942ea18698cb6df176a238077dd3816
|
File details
Details for the file biogl-3.0.1-py3-none-any.whl.
File metadata
- Download URL: biogl-3.0.1-py3-none-any.whl
- Upload date:
- Size: 18.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1d48c61c9eda5758625e1d6acd852396e7b44b11c4150fc90e889612d776058b
|
|
| MD5 |
beedfccd8baac90e944e1d9d8a090226
|
|
| BLAKE2b-256 |
1be383258ffa0a1606be804aab5c37eb402d2e4fce88841d69420881da7f3389
|