Skip to main content

Benchmarking vectorizer performance on Alzheimer's disease concept definitions

Project description

Alzheimer's Disease Harmonization Text Embedding Benchmark

DOI tests version

About

ADHTEB (Alzheimer’s Disease Harmonization Text Embedding Benchmark) is a Python package designed to evaluate the performance of text-embedding models in harmonizing variable descriptions from diverse cohorts in the context of Alzheimer’s disease.

As general purpose benchmarks often lack domain-specific evaluation for clinical data, this benchmark is specifically designed to evaluate the performance of embedding models for harmonization or clustering of data descriptins in a clinical setting.

Installation

pip install adhteb

Usage

Import a model

Models that are published on huggingface can be directly imported using the HuggingFaceVectorizer class.

from adhteb import HuggingFaceVectorizer
from sentence_transformers import SentenceTransformer

# pass model name or SentenceTransformer object instance (in case of additional params)
vectorizer = HuggingFaceVectorizer(
    model_name="sentence-transformers/all-MiniLM-L6-v2",
)

# alternative
sentence_transformer = SentenceTransformer(...)
vectorizer = HuggingFaceVectorizer(sentence_transformer)

Alternatively, you can implement your own vectorizer by implementing the get_embedding method of the base class.

from adhteb import Vectorizer

class MyVectorizer(Vectorizer):
    def get_embedding(self, text: str) -> list[float]:
        # Implement your embedding logic here
        my_vector = []
        return my_vector

Running the benchmark

You can run the benchmark and display the results using only a few lines of code.

from adhteb import Benchmark

benchmark = Benchmark(vectorizer=vectorizer)
benchmark.run()
print(benchmark.results_summary())
+------------------+-------+--------------------+
|                  | AUPRC | Zero-shot Accuracy |
+------------------+-------+--------------------+
|      GERAS       | 0.35  |        0.65        |
| PREVENT Dementia | 0.19  |        0.48        |
|    PREVENT AD    | 0.22  |        0.39        |
|       EMIF       | 0.29  |        0.54        |
+------------------+-------+--------------------+
Aggregate Score: 0.39

Publishing your results

You can check how your results compare to other models on the public leaderboard here: https://adhteb.scai.fraunhofer.de

You are also able to publish your benchmark results together with metadata on yout tested model:

from adhteb import Benchmark, ModelMetadata

model_name= "my-model-name"
url="https://huggingface.co/my-model-name"

model_metadata = ModelMetadata(model_name=model_name, url=url)
benchmark.publish(model_metadata=model_metadata)

Private Cohorts

As some of the cohort metadata presented in this benchmark is not available publicly, we do by default encrypt private metadata. We are working on finding and extending our benchmark with open, publicly available data as well, which is being used by the benchmark by default. If you want to include results from the private cohorts as well, you can either:

  1. Open a new issue to benchmark a specific model on private cohort data. We will run the benchmark on the non-public data for you and report the results based on your issue (publicly or privately).
  2. Get access to the individual cohorts by the data holders and contact us to get an decryption key for the benchmark. You can then run the private benchmark cohorts along the public ones by using the following flag and providing the key:
from adhteb import Benchmark

benchmark = Benchmark(vectorizer=vectorizer, include_private=True, decryption_key=KEY_STRING)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

adhteb-0.0.6.tar.gz (26.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

adhteb-0.0.6-py3-none-any.whl (26.5 MB view details)

Uploaded Python 3

File details

Details for the file adhteb-0.0.6.tar.gz.

File metadata

  • Download URL: adhteb-0.0.6.tar.gz
  • Upload date:
  • Size: 26.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.17 {"installer":{"name":"uv","version":"0.9.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for adhteb-0.0.6.tar.gz
Algorithm Hash digest
SHA256 376969fb396c1de1b9551bea52f248a738aa1f6504940f0efee1ceb0ab5ede08
MD5 879dadf12cc8873fec59edf2277829a1
BLAKE2b-256 8bf1a6223ee556c267ba61cfe76f9d6ace325ec6a63dad9c14207cf5005a4f6e

See more details on using hashes here.

File details

Details for the file adhteb-0.0.6-py3-none-any.whl.

File metadata

  • Download URL: adhteb-0.0.6-py3-none-any.whl
  • Upload date:
  • Size: 26.5 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.17 {"installer":{"name":"uv","version":"0.9.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for adhteb-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 8ea01062892b255a2c091e09a7f4201b2ae5a5f08df8d7b7e48360ebd0c9cdc8
MD5 85260ad3af7941e042e1e4134744e34c
BLAKE2b-256 329990c7ada538135cc6e4983adf261c93cf34ec2cab4e33710230a2594d3cc3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page