BEDMess attribute standardizer for metadata attribute standardization
Project description
BEDMS
BEDMS (BED Metadata Standardizer) is a tool desgined to standardize genomics and epigenomics metadata attributes according to user-selected schemas such as ENCODE
, FAIRTRACKS
and BEDBASE
. BEDMS ensures consistency and FAIRness of metadata across different platforms. Additionally, users have the option to train their own standardizer model using a custom schema (CUSTOM
), allowing for the standardization of attributes based on users' specific research requirements.
Installation
To install bedms
use this command:
pip install bedms
or install the latest version from the GitHub repository:
pip install git+https://github.com/databio/bedms.git
Usage
Standardizing based on available schemas
To choose the schema you want to standardize according to, please refer to the HuggingFace repository. Based on the schema design .yaml
files, you can select which schema best represents your attributes. In the example below, we have chosen encode
schema.
from bedms import AttrStandardizer
model = AttrStandardizer(
repo_id="databio/attribute-standardizer-model6", model_name="encode"
)
results = model.standardize(pep="geo/gse228634:default")
assert results
Training custom schemas
Training your custom schema is very easy with BEDMS
. You would need two things to get started:
- Training Sets
training_config.yaml
To instantiate TrainStandardizer
class:
from bedms.train import AttrStandardizerTrainer
trainer = AttrStandardizerTrainer("training_config.yaml")
To load the datasets and encode them:
train_data, val_data, test_data, label_encoder, vectorizer = trainer.load_data()
To train the custom model:
trainer.train()
To test the custom model:
test_results_dict = trainer.test()
To generate visualizations such as Learning Curves, Confusion Matrices, and ROC Curve:
acc_fig, loss_fig, conf_fig, roc_fig = trainer.plot_visualizations()
Where acc_fig
is Accuracy Curve figure object, loss_fig
is Loss Curve figure object, conf_fig
is the Confusion Matrix figure object, and roc_fig
is the ROC Curve figure object.
Standardizing based on custom schema
For standardizing based on custom schema, your model should be on HuggingFace. The directory structure should follow the instructions mentioned on HuggingFace.
from bedms import AttrStandardizer
model = AttrStandardizer(
repo_id="name/of/your/hf/repo", model_name="model/name"
)
results = model.standardize(pep="geo/gse228634:default")
print(results) #Dictionary of suggested predictions with their confidence: {'attr_1':{'prediction_1': 0.70, 'prediction_2':0.30}}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file bedms-0.2.0.tar.gz
.
File metadata
- Download URL: bedms-0.2.0.tar.gz
- Upload date:
- Size: 17.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 501dae38ec5a9b172a7cd1fd753ad3add6037f2e2d969e5be8b68add3d298cd5 |
|
MD5 | 68fe5f1ab98b1cc673d230573ff95c86 |
|
BLAKE2b-256 | 6c5cae5ad7992b6541a3d7434b94bf3269b676d3b36148729baf1f1607e4758f |
Provenance
The following attestation bundles were made for bedms-0.2.0.tar.gz
:
Publisher:
python-publish.yml
on databio/bedms
-
Statement type:
https://in-toto.io/Statement/v1
- Predicate type:
https://docs.pypi.org/attestations/publish/v1
- Subject name:
bedms-0.2.0.tar.gz
- Subject digest:
501dae38ec5a9b172a7cd1fd753ad3add6037f2e2d969e5be8b68add3d298cd5
- Sigstore transparency entry: 153133083
- Sigstore integration time:
- Predicate type:
File details
Details for the file bedms-0.2.0-py3-none-any.whl
.
File metadata
- Download URL: bedms-0.2.0-py3-none-any.whl
- Upload date:
- Size: 19.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1099a82a166c91cf6cf5a868cfee927454bfb50f6af50e3f7bc5bb357808c685 |
|
MD5 | 0a6a50b1648a65fd76f20614e5e3a75c |
|
BLAKE2b-256 | d5622519073f6c41aadeaa2b75d1b3e421db7b349c0dc46e8803be4d0d409991 |
Provenance
The following attestation bundles were made for bedms-0.2.0-py3-none-any.whl
:
Publisher:
python-publish.yml
on databio/bedms
-
Statement type:
https://in-toto.io/Statement/v1
- Predicate type:
https://docs.pypi.org/attestations/publish/v1
- Subject name:
bedms-0.2.0-py3-none-any.whl
- Subject digest:
1099a82a166c91cf6cf5a868cfee927454bfb50f6af50e3f7bc5bb357808c685
- Sigstore transparency entry: 153133086
- Sigstore integration time:
- Predicate type: