Skip to main content

BioFit: Bioinformatics Machine Learning Framework

Project description

$${\Huge{\textbf{\textsf{\color{#2E8B57}Bio\color{red}fit}}}}$$

Build GitHub Documentation GitHub release Contributor Covenant

Biofit is a machine learning library designed for bioinformatics datasets. It provides tools for transforming, extracting, training, and evaluating machine learning models on biomedical data. It also provides automatic data preprocessing, visualization, and configurable processing pipelines. Here are some of the main features of Biofit:

  • Automatic Data Preprocessing: Automatically preprocess biomedical datasets using built-in preprocessing steps.
  • Automatic Visualization: Automatically visualize data using built-in visualization methods geared towards biomedical data.
  • Configurable Processing Pipelines: Define and customize data processing pipelines.
  • Data Handling Flexibility: Support for a wide range of data formats, including:
  • Machine Learning Models: Supports a wide range of machine learning models, including:
  • Caching and Reuse: Caches intermediate results using Apache Arrow for efficient reuse.
  • Batch Processing and Multiprocessing: Utilize batch processing and multiprocessing for efficient handling of large-scale data.

Installation

You can install Biofit via pip:

pip install biofit

Quick Start

Preprocessing Data

Biofit provides preprocessing capabilities tailored for omics data. You can use built-in classes to load preprocessing steps based on the experiment type or create custom preprocessing pipelines. The preprocessing pipeline in Biofit uses a syntax similar to sklearn and supports distributed processing.

Using a Preprocessor

Biofit allows you to fit and transform your data in a few lines, similar to sklearn. For example, you can use the LogTransformer to apply a log transformation to your data:

from biofit.preprocessing import LogTransformer
import pandas as pd

dataset = pd.DataFrame({"feature1": [1, 2, 3, 4, 5]})
log_transformer = LogTransformer()
preprocessed_data = log_transformer.fit_transform(dataset)
# Applying log transformation: 100%|█████████████████████████████| 5/5 [00:00<00:00, 7656.63 examples/s]
print(preprocessed_data)
#    feature1
# 0  0.000000
# 1  0.693147
# 2  1.098612
# 3  1.386294
# 4  1.609438

Auto Preprocessing

You can automatically apply standard preprocessing steps by specifying the experiment type. This allows you to load tailored preprocessing steps for the type of data you are working with, such as "otu", "asv", "snp", or "maldi":

from biofit.preprocessing import AutoPreprocessor

preprocessor = AutoPreprocessor.for_experiment("snp", [{"min_prevalence": 0.1}, None])
print(preprocessor)
# [('min_prevalence_row', MinPrevalencFilter(min_prevalence=0.1)),
#  ('min_prevalence', MinPrevalenceFeatureSelector(min_prevalence=0.01))]

# Fit and transform the dataset using the preprocessor
preprocessed_data = preprocessor.fit_transform(dataset)

Biofit is made with Biosets in mind. You can pass the loaded dataset instead of a string to load the preprocessors:

from biosets import load_dataset

dataset = load_dataset("csv", data_files="my_file.csv", experiment_type="snp")

preprocessor = AutoPreprocessor.for_experiment(dataset)
print(preprocessor)
# [('min_prevalence_row', MinPrevalencFilter(min_prevalence=0.01)),
#  ('min_prevalence', MinPrevalenceFeatureSelector(min_prevalence=0.01))]
preprocessed_data = preprocessor.fit_transform(dataset)

Custom Preprocessing Pipeline

Biofit allows you to create custom preprocessing pipelines using the PreprocessorPipeline class. This allows chaining multiple preprocessing steps from sklearn and Biofit in a single operation:

from biofit import load_dataset
from biofit.preprocessing import LogTransformer, PreprocessorPipeline
from sklearn.preprocessing import StandardScaler

# Load the dataset
dataset = load_dataset("csv", data_files="my_file.csv")

# Define a custom preprocessing pipeline
pipeline = PreprocessorPipeline(
    [("scaler", StandardScaler()), ("log_transformer", LogTransformer())]
)

# Fit and transform the dataset using the pipeline
preprocessed_data = pipeline.fit_transform(dataset.to_pandas())

For further details, check the advance usage documentation.

License

Biofit is licensed under the Apache 2.0 License. See the LICENSE file for more information.

Contributing

If you would like to contribute to Biofit, please read the CONTRIBUTING guidelines.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

biofit-0.0.1.tar.gz (208.0 kB view details)

Uploaded Source

Built Distribution

biofit-0.0.1-py3-none-any.whl (268.5 kB view details)

Uploaded Python 3

File details

Details for the file biofit-0.0.1.tar.gz.

File metadata

  • Download URL: biofit-0.0.1.tar.gz
  • Upload date:
  • Size: 208.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.8.18

File hashes

Hashes for biofit-0.0.1.tar.gz
Algorithm Hash digest
SHA256 8578e289df6773d1a3ef60a262bc1fa957a14b02d0ed47355c07aa30f1e8fe1e
MD5 49f37ce45db98dde9cf53e4bc1c4bb02
BLAKE2b-256 f4d20077210f703999578e2a5827e69a8db4f197bfdebfa5c2d0523d87d04d29

See more details on using hashes here.

File details

Details for the file biofit-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: biofit-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 268.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.8.18

File hashes

Hashes for biofit-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 afadb1674f28d8aa7979a7ab043418b9a40c6571e45c19b26a7cfcac8fb64066
MD5 dda36965d22fa26a1a1418852796a362
BLAKE2b-256 609d2b81cc24c6c2cc814ba11de571c27dd084e662ceab828d51eed1956f5495

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page