Open-source collection of biology datasets and pre-trained embeddings.
Project description
bio-datasets
Open-source collection of biology datasets and pre-trained embeddings.
Description
bio-datasets is a collaborative framework that allows the user to fetch publicly available sequence-based protein datasets. For these datasets, pre-trained contextual embeddings are also available.
Installation
Install the required dependencies with pip install biodatasets
.
How it works
from biodatasets import list_datasets, load_dataset
print(list_datasets())
my_dataset = load_dataset('test')
X, y = my_dataset.to_npy_arrays(input_names=['peptide'], target_names=['target'])
embeddings = my_dataset.get_embeddings(variable_name="peptide", model_name="protbert", embeddings_type="cls")
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
bio-datasets-0.0.2.tar.gz
(4.8 kB
view hashes)
Built Distribution
Close
Hashes for bio_datasets-0.0.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 348c3d8fdfb83f0949733f9c9374b323452dc7bb64e13bc4b543d38260676ed0 |
|
MD5 | b5ff13546cad7c71d67f8d39e27c7148 |
|
BLAKE2b-256 | e280d66fdd1a8d6289e755e516794bc3a21be2e4b3591a8c51e510ba15027988 |