Pre- and postprocessing tools for genome annotation.
Project description
bricks2marble
- Python structures for nucleotide sequences and genome annotations.
- Tensorflow implementation of an HMM used for finding genes.
- Pre- and postprocessing tools for deep learning genome annotation models.
- Python interfaces for common bioinformatics tools and file format converters.
Installation
Download bricks2marble via pip.
$ python -m pip install bricks2marble
For development purposes, clone the repository and install it locally inside your virtual environment.
$ git clone https://github.com/gaius-augustus/bricks2marble
$ cd bricks2marble
$ python -m pip install -e .
If access to the tensorflow part of bricks2marble is needed, specify this as an optional dependency when installing. This will install hidten.
$ python -m pip install bricks2marble[tf]
# or
$ python -m pip install -e .[tf]
When plotting is required, pip install bricks2marble[plot].
Overview
Below are some use cases of bricks2marble. All methods have docstrings that explain their
behaviour and several optional arguments in detail.
Reading and writing
Loading large fasta files is implemented efficiently using bytearray translation tables (~11
seconds for the human genome). Additionally, mmap is used for indexing large fasta files (~4
seconds for the human genome).
import bricks2marble as b2m
fasta = b2m.io.load_fasta("genome.fa.gz") # load everything into memory
sequence = fasta["chr1"].positions(0, 100)
fasta = b2m.io.indexed_fasta("genome.fa") # build a sequence index
sequence = fasta.fetch("chr1", (0, 100)) # load only required parts
For some specific cases, external tools are used. For example, indexing .fa.gz files requires
pyfaidx.
$ python -m pip install pyfaidx
fasta = b2m.io.indexed_fasta("genome.fa.gz") # build a sequence index for compressed files
sequence = fasta.fetch("chr1", (0, 100)) # load only required parts
Additionally, .gp (.genepred) files can be loaded and are internally sorted for optimized
access.
anno = b2m.io.load_annotation("reference.gp")
anno.classify(1062, "chr1") # labels per strand: ("intergenic", "CDS")
Tools
The subpackage bricks2marble.tools contains a number of interfaces to common external tools
related to genome file formats. Download the external tools yourself and tell bricks2marble where
they can be found locally. Optionally, you can add them to your system path, so bricks2marble can
find them automatically.
Example: Comparing genome annotations
Download gffcompare and use the
bricks2marble interface for extracting metrics.
import bricks2marble as b2m
b2m.tools.configure(gffcompare="path/to/gffcompare")
comparison = b2m.tools.compare(
["my_annotation.gp", "other_annotation.gtf"],
"reference.gff",
e=3,
)
print(comparison[0].locus.sensitivity)
fig = b2m.tools.plot.comparison(
comparison,
labels=["My", "Other"],
table=True,
)
fig.show()
Example: Converting files
Convert various file formats for genome annotations. The internal bricks2marble representation of
annotation files is closely related to the genepred format. Conversions to gtf and gff3 are
implemented directly. Conversions from these formats to genepred are handled by the corresponding
external tools from UCSC, like
gtfToGenePred.
import bricks2marble as b2m
b2m.tools.configure(gtfToGenePred="path/to/gtfToGenePred")
with b2m.tools.Converter("my_annotation.gtf", "gp") as tmp_file_path:
# gp file created using Python's tempfile
annotation = b2m.io.load_annotation(tmp_file_path)
# gp file deleted, annotation loaded into memory
b2m.tools.convert(annotation, "my_annotation.gff", source="MyTool")
License
This project is licensed under the MIT license.
Projects using bricks2marble
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file bricks2marble-0.0.4.tar.gz.
File metadata
- Download URL: bricks2marble-0.0.4.tar.gz
- Upload date:
- Size: 104.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7999e42635a2f26451a1b1b13912ab6aece358edb53f9f4b8ab6cc728d240ed7
|
|
| MD5 |
eb9f8e8339f0ba65458983d42b45cc61
|
|
| BLAKE2b-256 |
54a6e40c40d890d65f13ae55e221b9b5de9e57f1cd0e1a0c6fd97867bb7a403a
|
File details
Details for the file bricks2marble-0.0.4-py3-none-any.whl.
File metadata
- Download URL: bricks2marble-0.0.4-py3-none-any.whl
- Upload date:
- Size: 54.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ae357229eef757a377ac842e5294040d38dec9cb3300ecee38afed162eac700d
|
|
| MD5 |
b744f8ba67d7e8a8f622d429a4824745
|
|
| BLAKE2b-256 |
83689fdbf76e2e413fb156608cee0601bbaf020d1a3febddf519d56e905c5ffa
|