Skip to main content

Pre- and postprocessing tools for genome annotation.

Project description

bricks2marble

  • Python structures for nucleotide sequences and genome annotations.
  • Tensorflow implementation of an HMM used for finding genes.
  • Pre- and postprocessing tools for deep learning genome annotation models.
  • Python interfaces for common bioinformatics tools and file format converters.

Installation

Download bricks2marble via pip.

$ python -m pip install bricks2marble

For development purposes, clone the repository and install it locally inside your virtual environment.

$ git clone https://github.com/gaius-augustus/bricks2marble
$ cd bricks2marble
$ python -m pip install -e .

If access to the tensorflow part of bricks2marble is needed, specify this as an optional dependency when installing. This will install hidten.

$ python -m pip install bricks2marble[tf]
# or
$ python -m pip install -e .[tf]

When plotting is required, pip install bricks2marble[plot].

Overview

Below are some use cases of bricks2marble. All methods have docstrings that explain their behaviour and several optional arguments in detail.

Reading and writing

Loading large fasta files is implemented efficiently using bytearray translation tables (~11 seconds for the human genome). Additionally, mmap is used for indexing large fasta files (~4 seconds for the human genome).

import bricks2marble as b2m

fasta = b2m.io.load_fasta("genome.fa.gz") # load everything into memory
sequence = fasta["chr1"].positions(0, 100)

fasta = b2m.io.indexed_fasta("genome.fa") # build a sequence index
sequence = fasta.fetch("chr1", (0, 100)) # load only required parts

For some specific cases, external tools are used. For example, indexing .fa.gz files requires pyfaidx.

$ python -m pip install pyfaidx
fasta = b2m.io.indexed_fasta("genome.fa.gz") # build a sequence index for compressed files
sequence = fasta.fetch("chr1", (0, 100)) # load only required parts

Additionally, .gp (.genepred) files can be loaded and are internally sorted for optimized access.

anno = b2m.io.load_annotation("reference.gp")
anno.classify(1062, "chr1") # labels per strand: ("intergenic", "CDS")

Tools

The subpackage bricks2marble.tools contains a number of interfaces to common external tools related to genome file formats. Download the external tools yourself and tell bricks2marble where they can be found locally. Optionally, you can add them to your system path, so bricks2marble can find them automatically.

Example: Comparing genome annotations

Download gffcompare and use the bricks2marble interface for extracting metrics.

import bricks2marble as b2m

b2m.tools.configure(gffcompare="path/to/gffcompare")
comparison = b2m.tools.compare(
    ["my_annotation.gp", "other_annotation.gtf"],
    "reference.gff",
    e=3,
)
print(comparison[0].locus.sensitivity)
fig = b2m.tools.plot.comparison(
    comparison,
    labels=["My", "Other"],
    table=True,
)
fig.show()

Example: Converting files

Convert various file formats for genome annotations. The internal bricks2marble representation of annotation files is closely related to the genepred format. Conversions to gtf and gff3 are implemented directly. Conversions from these formats to genepred are handled by the corresponding external tools from UCSC, like gtfToGenePred.

import bricks2marble as b2m

b2m.tools.configure(gtfToGenePred="path/to/gtfToGenePred")

with b2m.tools.Converter("my_annotation.gtf", "gp") as tmp_file_path:
    # gp file created using Python's tempfile
    annotation = b2m.io.load_annotation(tmp_file_path)
# gp file deleted, annotation loaded into memory

b2m.tools.convert(annotation, "my_annotation.gff", source="MyTool")

License

This project is licensed under the MIT license.

Projects using bricks2marble

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bricks2marble-0.0.5.tar.gz (104.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bricks2marble-0.0.5-py3-none-any.whl (54.7 kB view details)

Uploaded Python 3

File details

Details for the file bricks2marble-0.0.5.tar.gz.

File metadata

  • Download URL: bricks2marble-0.0.5.tar.gz
  • Upload date:
  • Size: 104.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for bricks2marble-0.0.5.tar.gz
Algorithm Hash digest
SHA256 9e1711497c4e94dc0eceb09d750eb80c68cc516bd9bdbb5067987ca604739881
MD5 2711c77baae28ec08cdeaab4d7c55bc2
BLAKE2b-256 cb57c0b0094f65cc622e5119bcac4b7b21bd31933482cfdad16279fd3bf5402e

See more details on using hashes here.

File details

Details for the file bricks2marble-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: bricks2marble-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 54.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for bricks2marble-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 14f2fb2af4e7959fe7b727c9e0c739ce02938a98ab8b0b3ac8802bb8d4ecd1e1
MD5 c7a0e5aa54d5571877928756fc0d680d
BLAKE2b-256 b9a09f4e0620f25e7d480f8b8eac75b8964b02732a80f8a9a6b548c03f55cf4e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page