Skip to main content

A collection of (semi) useful bioinformatics functions

Project description

biogl

PyPI version Python 3.10+ License: MIT

A collection of useful small bioinformatics functions for working with genomic data formats.

Features

  • GFF3/GTF Parsing: Comprehensive parser with type hints and opt-in GFF3 spec compliance
  • FASTA Utilities: Functions for parsing and manipulating FASTA files
  • Coordinate Operations: Tools for working with genomic coordinates
  • Type Safety: Full type hints for better IDE support and static analysis
  • Well Tested: Comprehensive test suite with real genomic data

Installation

Using pip (recommended)

python3 -m pip install biogl

Development Installation

git clone https://github.com/grahamlarue/biogl.git
cd biogl
pip install -e ".[dev]"

Quick Start

import biogl

# Parse a GFF3/GTF line
from biogl import GxfParse

line = "chr1\tEnsembl\tgene\t1000\t2000\t.\t+\t.\tID=gene1;Name=TEST"
parsed = GxfParse(line, line_number=1)

print(parsed.region)      # chr1
print(parsed.start)       # 1000
print(parsed.stop)        # 2000
print(parsed.strand)      # 1 (normalized)
print(parsed.name)        # gene1
print(parsed.attributes)  # {'ID': 'gene1', 'Name': 'TEST'}

# Enable GFF3 spec compliance features
parsed = GxfParse(
    line,
    line_number=1,
    url_decode=True,        # Decode percent-encoded values
    case_sensitive=True,    # Case-sensitive attribute matching
    strict_coordinates=True # Validate start <= end
)

# Parse FASTA files
from biogl import fasta_parse

for header, sequence in fasta_parse("genome.fa"):
    print(f"{header}: {len(sequence)} bp")

# Work with coordinates
from biogl import coord_overlap

overlap = coord_overlap(100, 200, 150, 250)  # Returns 50

# Reverse complement
from biogl import rev_comp

seq = "ATCG"
rc = rev_comp(seq)  # Returns "CGAT"

Key Modules

  • GxfParse: Parse GFF3/GTF annotation files with full type hints
  • fasta_parse: Efficient FASTA file parsing
  • coord_overlap: Calculate coordinate overlaps
  • rev_comp: Reverse complement DNA sequences
  • translate: Translate DNA to protein
  • flex_open: Open compressed or uncompressed files transparently
  • window: Sliding window generator
  • And more...

Breaking Changes in v3.0.0

Important: Version 3.0.0 introduces breaking changes to GxfParse:

  • Features without explicit IDs now return None instead of 'feature_None' strings
  • Migration: Change if name == 'exon_None' to if name is None

See CHANGELOG.md for full details.

Documentation

For detailed documentation on each module, use Python's built-in help:

from biogl import GxfParse
help(GxfParse)

Requirements

  • Python 3.10+
  • smart_open

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

MIT License - see LICENSE file for details.

Author

Graham Larue (egrahamlarue@gmail.com)

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

biogl-3.0.1.tar.gz (17.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

biogl-3.0.1-py3-none-any.whl (18.5 kB view details)

Uploaded Python 3

File details

Details for the file biogl-3.0.1.tar.gz.

File metadata

  • Download URL: biogl-3.0.1.tar.gz
  • Upload date:
  • Size: 17.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for biogl-3.0.1.tar.gz
Algorithm Hash digest
SHA256 897384eb309b537c9b1eefcc98180c15272b5a40f8514ed92984f44db0912c24
MD5 99b8cbfed1732741e6578c6aa8f54d96
BLAKE2b-256 b7c62922a71d9cab67691a120431bbc71942ea18698cb6df176a238077dd3816

See more details on using hashes here.

File details

Details for the file biogl-3.0.1-py3-none-any.whl.

File metadata

  • Download URL: biogl-3.0.1-py3-none-any.whl
  • Upload date:
  • Size: 18.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for biogl-3.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 1d48c61c9eda5758625e1d6acd852396e7b44b11c4150fc90e889612d776058b
MD5 beedfccd8baac90e944e1d9d8a090226
BLAKE2b-256 1be383258ffa0a1606be804aab5c37eb402d2e4fce88841d69420881da7f3389

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page