Skip to main content

A small package to manage biological data.

Project description

description

Bio Dataset Manager: easily encode biological sequences into tensors


Coverage PyPI Latest Release Unit Tests
Powered by TaccLab
License

Authors:


- Fabio Bove | fabio.bove.dr@gmail.com
- Eugenio Bertolini |

What is it?

Bio Data Manager is a Python project designed for managing and processing bio-sequence data, including DNA, proteins, and SMILES strings. This tool facilitates the encoding of these sequences into tensors, which can then be used for AI computations and complex model implementations.


Project Structure

  • bio_data_manager/: Contains core modules for bioinformatics sequence processing and management.
  • bio_sequences/: Handles various operations related to biological sequences such as DNA and protein.

Installation

  1. Install it as a library
    • Using CPU:
      pip install bio-dataset-manager
      
    • Using CUDA:
      pip install bio-dataset-manager[cuda] -f https://download.pytorch.org/whl/torch_stable.html
      

Usage

Examples of the code can be found in the examples folder.

  1. import the modules
import torch
from bio_dataset_manager.bio_dataloader import BioDataloader
from bio_dataset_manager.bio_dataset import BioDataset
from bio_sequences.dna_sequence import DnaSequence
  1. create the dataset and dataloader
dataset = BioDataset(
        dataset_folder="path/to/dataset",
        sequences_limit=10,
        randomize_choice=True,
        pad_same_len=False,
        window_size=1,
        sequence_info=DnaSequence(),
        sequences=None,
    )

dataloader = BioDataloader(
        dataset=dataset,
        batch_size=5,
        shuffle=True,
        collate_fn=dataset.collate_fn,
        split_ratio=0.5,
        use_gpu=True if torch.cuda.is_available() else False
    )
  1. training loop example
epochs = 5
for epoch in range(epochs):
    with tqdm(total=len(dataloader.training_dataloader), desc=f"Epoch {epoch + 1}/{epochs}", unit="batch") as pbar:
        for batch in dataloader.training_dataloader:
            y_real, lengths = dataloader.process_batch(batch)
            time.sleep(0.1)
            pbar.update(1)
            pbar.set_postfix(
                loss_gen=f"0.0",
                loss_dis=f"0.0"
            )
    pbar.refresh()

Contributing

Feel free to submit issues or pull requests if you'd like to contribute to this project.


License

License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bio_dataset_manager-0.1.4.tar.gz (47.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bio_dataset_manager-0.1.4-py3-none-any.whl (36.6 kB view details)

Uploaded Python 3

File details

Details for the file bio_dataset_manager-0.1.4.tar.gz.

File metadata

  • Download URL: bio_dataset_manager-0.1.4.tar.gz
  • Upload date:
  • Size: 47.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.21

File hashes

Hashes for bio_dataset_manager-0.1.4.tar.gz
Algorithm Hash digest
SHA256 9ba4a1dcc8627c7dfd15b87dcb31f1f4254564ca70e748959c6f9ebb3510e50e
MD5 93b7ae95d7a9b87ac7247937ef7f8cc0
BLAKE2b-256 5d98489800ee38cd48c401eb3d0a8d7c25669b7d5308ff256fd7ba585c20fa04

See more details on using hashes here.

File details

Details for the file bio_dataset_manager-0.1.4-py3-none-any.whl.

File metadata

File hashes

Hashes for bio_dataset_manager-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 152e85a87926038b62baca0b7fcd8b9d902b0b39b13b9c5b11d84c7d79027cbd
MD5 184bf76aa9f1ae07c9041d9dc04313e1
BLAKE2b-256 90c6f35905b28b243a9bb607e569b4d51b0fb9f0aa6c34e7af2302c608f6738c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page