ProkBERT
Project description
The ProkBERT model family
The ProkBERT model family is a transformer-based, encoder-only architecture based on BERT. Built on transfer learning and self-supervised methodologies, ProkBERT models capitalize on the abundant available data, demonstrating adaptability across diverse scenarios. The models’ learned representations align with established biological understanding, shedding light on phylogenetic relationships. With the novel Local Context-Aware (LCA) tokenization, the ProkBERT family overcomes the context size limitations of traditional transformer models without sacrificing performance or the information rich local context. In bioinformatics tasks like promoter prediction and phage identification, ProkBERT models excel. For promoter predictions, the best performing model achieved an MCC of 0.74 for E. coli and 0.62 in mixed-species contexts. In phage identification, they all consistently outperformed tools like VirSorter2 and DeepVirFinder, registering an MCC of 0.85. Compact yet powerful, the ProkBERT models are efficient, generalizable, and swift.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for prokbert-0.0.37-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 95e10d5181690df8824e47c6a51b78d53e40545df30c03a3ccd705d0b29055e2 |
|
MD5 | 74b2b4fe121f1825806e1d27d2f9bcc2 |
|
BLAKE2b-256 | 89b657ea3f29e8f9cb5390c60fb0b81c3d6f4d0f7d4500b100e32fe498cad0bc |