A small package to manage biological data.
Project description
Bio Dataset Manager: easily encode biological sequences into tensors
Authors:
- Fabio Bove | fabio.bove.dr@gmail.com
- Eugenio Bertolini |
What is it?
Bio Data Manager is a Python project designed for managing and processing bio-sequence data, including DNA, proteins, and SMILES strings. This tool facilitates the encoding of these sequences into tensors, which can then be used for AI computations and complex model implementations.
Project Structure
bio_data_manager/: Contains core modules for bioinformatics sequence processing and management.bio_sequences/: Handles various operations related to biological sequences such as DNA and protein.
Installation
- Install it as a library
- Using
CPU:pip install bio-dataset-manager
- Using
CUDA:pip install bio-dataset-manager[cuda] -f https://download.pytorch.org/whl/torch_stable.html
- Using
Usage
Examples of the code can be found in the examples folder.
- import the modules
import torch
from bio_dataset_manager.bio_dataloader import BioDataloader
from bio_dataset_manager.bio_dataset import BioDataset
from bio_sequences.dna_sequence import DnaSequence
- create the dataset and dataloader
dataset = BioDataset(
dataset_folder="path/to/dataset",
sequences_limit=10,
randomize_choice=True,
pad_same_len=False,
window_size=1,
sequence_info=DnaSequence(),
sequences=None,
)
dataloader = BioDataloader(
dataset=dataset,
batch_size=5,
shuffle=True,
collate_fn=dataset.collate_fn,
split_ratio=0.5,
use_gpu=True if torch.cuda.is_available() else False
)
- training loop example
epochs = 5
for epoch in range(epochs):
with tqdm(total=len(dataloader.training_dataloader), desc=f"Epoch {epoch + 1}/{epochs}", unit="batch") as pbar:
for batch in dataloader.training_dataloader:
y_real, lengths = dataloader.process_batch(batch)
time.sleep(0.1)
pbar.update(1)
pbar.set_postfix(
loss_gen=f"0.0",
loss_dis=f"0.0"
)
pbar.refresh()
Contributing
Feel free to submit issues or pull requests if you'd like to contribute to this project.
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file bio_dataset_manager-0.1.4.tar.gz.
File metadata
- Download URL: bio_dataset_manager-0.1.4.tar.gz
- Upload date:
- Size: 47.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.21
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9ba4a1dcc8627c7dfd15b87dcb31f1f4254564ca70e748959c6f9ebb3510e50e
|
|
| MD5 |
93b7ae95d7a9b87ac7247937ef7f8cc0
|
|
| BLAKE2b-256 |
5d98489800ee38cd48c401eb3d0a8d7c25669b7d5308ff256fd7ba585c20fa04
|
File details
Details for the file bio_dataset_manager-0.1.4-py3-none-any.whl.
File metadata
- Download URL: bio_dataset_manager-0.1.4-py3-none-any.whl
- Upload date:
- Size: 36.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.21
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
152e85a87926038b62baca0b7fcd8b9d902b0b39b13b9c5b11d84c7d79027cbd
|
|
| MD5 |
184bf76aa9f1ae07c9041d9dc04313e1
|
|
| BLAKE2b-256 |
90c6f35905b28b243a9bb607e569b4d51b0fb9f0aa6c34e7af2302c608f6738c
|