An object-oriented Python library for working with genomic data.
Project description
GenomeUtils
A Python library for working with annotated genomes. We developed GenomeUtils as an alternative/replacement for the no longer maintained pyensembl.
Object-oriented model for representing genomic features: genomes, chromosomes, genes, transcripts, and exons.
Features
- Object model:
Genome>Chromosome>Gene>Transcript>Exon(+Locus) - Builder workflow:
GenomeBuilderassembles aGenomefrom FASTA (DNA, cDNA) and GTF - Indexed lookups, optional scaffold separation, streaming/gzip handling
- Downloader utilities: Fetch Ensembl DNA, cDNA, and GTF assets with
EnsemblGenomeDownloader
Installation
You can install GenomeUtils via pip with the following command:
pip install GenomeUtils
Requires Python >= 3.10. Dependencies that will be installed automatically by pip are: biopython, gffutils, requests, gget.
Quickstart
1) Download and build a genome (complete workflow)
from pathlib import Path
from GenomeUtils.Downloaders import EnsemblGenomeDownloader
from GenomeUtils.Genome import GenomeBuilder
# Download Ensembl assets
downloader = EnsemblGenomeDownloader(
assembly_id="GRCh38",
ensembl_release=109,
species="homo_sapiens",
genomes_root_dir=Path("./data/genomes"),
)
files = downloader.download()
print(files) # { 'dna': Path(...), 'cdna': Path(...), 'annotation': Path(...) }
# Build genome from downloaded files
# The builder automatically uses species-appropriate chromosomes:
# Human: 1-22,X,Y,M,MT | Mouse: 1-19,X,Y,M,MT
genome, scaffold_genome = (
GenomeBuilder(id="GRCh38", species="Homo sapiens", name="Human")
.with_dna_fasta(files['dna'])
.with_cdna_fasta(files['cdna'])
.with_gtf_file(files['annotation'])
.build()
)
# For other species:
# mouse_genome = GenomeBuilder(id="GRCm39", species="Mus musculus", name="Mouse")...
# Access features
chromosome = genome.chromosome_by_id("1")
first_gene = chromosome.genes[0]
print(first_gene.id, first_gene.name)
# Fast lookups (after build() the genome is indexed)
print(genome.gene_by_id(first_gene.id))
2) Build a genome from existing files
from pathlib import Path
from GenomeUtils.Genome import GenomeBuilder
# Prepare input files (can be .gz):
dna_fasta = Path("/path/to/genome.dna.fa.gz")
cdna_fasta = Path("/path/to/genome.cdna.fa.gz")
gtf_file = Path("/path/to/annotations.gtf.gz")
builder = GenomeBuilder(
id="hg38",
species="Homo sapiens",
name="Human Reference Genome",
separate_scaffolds=False, # set True to split non-main scaffolds
)
# Optional: limit to specific chromosomes (must be called before with_dna_fasta)
builder.set_chromosome_filter(["chr1", "chr2", "chrX"]) # or ["1","2","X"]
genome, _ = (
builder
.with_dna_fasta(dna_fasta)
.with_cdna_fasta(cdna_fasta)
.with_gtf_file(gtf_file)
.build()
)
# Access features
chromosome = genome.chromosome_by_id("chr1")
first_gene = chromosome.genes[0]
print(first_gene.id, first_gene.name)
# Fast lookups (after build() the genome is indexed)
print(genome.gene_by_id(first_gene.id))
3) Minimal toy example (no files)
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
from GenomeUtils.Genome import Genome, Chromosome, Gene, Transcript, Exon
# Create a tiny in-memory genome
genome = Genome(id="toy", species="Test species", name="Toy Genome")
chr1_seq = SeqRecord(Seq("AGCATGATGCATGCATGCATGCATGCATGCATGCATGCATGCATGCATGCATGC"), id="chr1")
chromosome = Chromosome("chr1", seq_index={"chr1": chr1_seq}, genome=genome, length=len(chr1_seq.seq))
genome.add_chromosome(chromosome)
gene = Gene(id="GENE001", chr=chromosome.id, name="MYGENE", start=5, end=35, strand='+', genome=genome, chromosome=chromosome)
chromosome.add_gene(gene)
transcript = Transcript(
id="TRANSCRIPT001",
chr=chromosome.id,
start=5,
end=35,
strand='+',
sequence=Seq("CATGATGCATGCATGCATGCATGCATGC"),
gene=gene,
genome=genome,
)
gene.add_transcript(transcript)
Exon(id="EXON001", chr=chromosome.id, start=5, end=15, strand='+', gene=gene, genome=genome).add_to_transcript(transcript)
Exon(id="EXON002", chr=chromosome.id, start=25, end=35, strand='+', gene=gene, genome=genome).add_to_transcript(transcript)
genome.index()
print(genome.gene_by_id("GENE001").name)
4) Species-specific examples
from pathlib import Path
from GenomeUtils.Downloaders import EnsemblGenomeDownloader
from GenomeUtils.Genome import GenomeBuilder
# Human genome (uses chromosomes 1-22, X, Y, M, MT)
human_genome, _ = GenomeBuilder(
id="GRCh38",
species="Homo sapiens",
name="Human Reference Genome"
).with_dna_fasta(human_dna).with_gtf_file(human_gtf).build()
# Mouse genome (uses chromosomes 1-19, X, Y, M, MT)
mouse_genome, _ = GenomeBuilder(
id="GRCm39",
species="Mus musculus",
name="Mouse Reference Genome"
).with_dna_fasta(mouse_dna).with_gtf_file(mouse_gtf).build()
# Override default chromosomes if needed
custom_genome, _ = GenomeBuilder(
id="custom",
species="Custom species",
name="Custom Genome",
main_chromosomes=["chr1", "chr2", "chrX"] # Only these chromosomes
).with_dna_fasta(custom_dna).with_gtf_file(custom_gtf).build()
Technical Documentation
Find the technical documentation here. APIs may evolve.
Contributing
Issues and PRs are welcome.
Copyright 2025, Alexander Schliep
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file genomeutils-0.1.1.tar.gz.
File metadata
- Download URL: genomeutils-0.1.1.tar.gz
- Upload date:
- Size: 28.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3cbce72f6c58955d54eefdc1aaafe93c12b34cd9e81e5d60c9b9730302da1d05
|
|
| MD5 |
04b1903eedc63826625d203e94fb1e6d
|
|
| BLAKE2b-256 |
2788282d6b0375c926a0f5ee96f380972d9987cb0d0f2370f2de2a53d48ef495
|
File details
Details for the file genomeutils-0.1.1-py3-none-any.whl.
File metadata
- Download URL: genomeutils-0.1.1-py3-none-any.whl
- Upload date:
- Size: 36.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e923bc5997179fcdc813096e0f3c1cdbdf29ec79a6f0cb1e67deb8a070beafed
|
|
| MD5 |
35d7c6e929543e33868c5e6f64dfa5c0
|
|
| BLAKE2b-256 |
b8fc99e223c6ce6f0ea0ee93c8f7faf6dc0c0143772b245b19cf91bb3ef894b4
|