Skip to main content

BEDMess attribute standardizer for metadata attribute standardization

Project description

BEDMS

BEDMS (BED Metadata Standardizer) is a tool desgined to standardize genomics and epigenomics metadata attributes according to user-selected schemas such as ENCODE, FAIRTRACKS and BEDBASE. BEDMS ensures consistency and FAIRness of metadata across different platforms. Additionally, users have the option to train their own standardizer model using a custom schema (CUSTOM), allowing for the standardization of attributes based on users' specific research requirements.

Installation

To install bedms use this command:

pip install bedms

or install the latest version from the GitHub repository:

pip install git+https://github.com/databio/bedms.git

Usage

Standardizing based on available schemas

To choose the schema you want to standardize according to, please refer to the HuggingFace repository. Based on the schema design .yaml files, you can select which schema best represents your attributes. In the example below, we have chosen encode schema.

from bedms import AttrStandardizer

model = AttrStandardizer(
    repo_id="databio/attribute-standardizer-model6", model_name="encode"
)
results = model.standardize(pep="geo/gse228634:default")

assert results

Training custom schemas

Training your custom schema is very easy with BEDMS. You would need two things to get started:

  1. Training Sets
  2. training_config.yaml

To instantiate TrainStandardizer class:

from bedms.train import AttrStandardizerTrainer

trainer = AttrStandardizerTrainer("training_config.yaml")

To load the datasets and encode them:

train_data, val_data, test_data, label_encoder, vectorizer = trainer.load_data()

To train the custom model:

trainer.train()

To test the custom model:

test_results_dict = trainer.test()

To generate visualizations such as Learning Curves, Confusion Matrices, and ROC Curve:

acc_fig, loss_fig, conf_fig, roc_fig = trainer.plot_visualizations() 

Where acc_fig is Accuracy Curve figure object, loss_fig is Loss Curve figure object, conf_fig is the Confusion Matrix figure object, and roc_fig is the ROC Curve figure object.

Standardizing based on custom schema

For standardizing based on custom schema, your model should be on HuggingFace. The directory structure should follow the instructions mentioned on HuggingFace.

from bedms import AttrStandardizer

model = AttrStandardizer(
    repo_id="name/of/your/hf/repo", model_name="model/name"
)
results = model.standardize(pep="geo/gse228634:default")

print(results) #Dictionary of suggested predictions with their confidence: {'attr_1':{'prediction_1': 0.70, 'prediction_2':0.30}}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bedms-0.2.0.tar.gz (17.4 kB view details)

Uploaded Source

Built Distribution

bedms-0.2.0-py3-none-any.whl (19.0 kB view details)

Uploaded Python 3

File details

Details for the file bedms-0.2.0.tar.gz.

File metadata

  • Download URL: bedms-0.2.0.tar.gz
  • Upload date:
  • Size: 17.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for bedms-0.2.0.tar.gz
Algorithm Hash digest
SHA256 501dae38ec5a9b172a7cd1fd753ad3add6037f2e2d969e5be8b68add3d298cd5
MD5 68fe5f1ab98b1cc673d230573ff95c86
BLAKE2b-256 6c5cae5ad7992b6541a3d7434b94bf3269b676d3b36148729baf1f1607e4758f

See more details on using hashes here.

Provenance

The following attestation bundles were made for bedms-0.2.0.tar.gz:

Publisher: python-publish.yml on databio/bedms

Attestations:

File details

Details for the file bedms-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: bedms-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 19.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for bedms-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1099a82a166c91cf6cf5a868cfee927454bfb50f6af50e3f7bc5bb357808c685
MD5 0a6a50b1648a65fd76f20614e5e3a75c
BLAKE2b-256 d5622519073f6c41aadeaa2b75d1b3e421db7b349c0dc46e8803be4d0d409991

See more details on using hashes here.

Provenance

The following attestation bundles were made for bedms-0.2.0-py3-none-any.whl:

Publisher: python-publish.yml on databio/bedms

Attestations:

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page