Skip to main content

A package to calculate BCUBED precision, recall, and F1-score for clustering evaluation.

Project description

B-Cubed Metrics

A simple Python package to calculate B-Cubed precision, recall, and F1-score for clustering evaluation.

What are B-Cubed Metrics

The B-Cubed algorithm was first introduced by Bagga, A. and Baldwin B. (1998) in their paper on Entity-Based Cross-Document Coreferencing Using the Vector Space Model. The algorithm compares a predicted clustering with a ground truth (or gold standard) clustering through element-wise precision and recall scores. For each element, the predicted and ground truth clusters containing the element are compared, and then the mean over all elements is taken. The B-Cubed algorithm can be useful in unsupervised techniques where the cluster labels are not available, because unlike macro-averaged metrics, it focuses on element-wise operations.
From the paper, two simple equations were devised calculating precision and recall scores for the predicted clustering:

$$ Precision = \frac{1}{\sum {elements}}\sum_{i=1}^n {\frac{(count ; of ; element)^2}{count ; of ; all ; elements ; in ; cluster}} $$

$$ Recall = \frac{1}{\sum {elements}}\sum_{i=1}^n {\frac{(count ; of ; element)^2}{count ; of ; total ; elements ; from ; this ; category}} $$

$$ F-score = \frac{1}{k}\sum_{i=1}^n {\frac{2\times Precision(C)_k \times Recall(C)_k}{Precision(C)_k + Recall(C)_k}} $$

where $n$ above denotes the number of categories in the cluster and $k$ is the number of predicted clusters. $Precision(C)_k$ and $Recall(C)_k$ are the 'partial' precision and recalls for each cluster.

Installation and Use

Download the package from any terminal using:
pip install bcubed-metrics
To use the B-Cubed class you need to import it and provide 2 dictionaries - one for the predicted clustering, and one for the ground truth clustering (actual labels):
from bcubed_metrics.bcubed import Bcubed

predicted_clustering = [
            {'blue': 4, 'red': 2, 'green': 1},
            {'blue': 2, 'red': 2, 'green': 3},
            {'blue': 1, 'red': 5},
            {'blue': 1, 'red': 2, 'green': 3}
        ]

ground_truth_clustering = {'blue': 8, 'red': 11, 'green': 7}

bcubed = Bcubed(predicted_clustering=predicted_clustering, ground_truth_clustering=ground_truth_clustering)

metrics = bcubed.get_metrics() # returns all metrics as dictionary

bcubed.print_metrics() # prints all metrics

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bcubed_metrics-1.0.1.tar.gz (4.4 kB view details)

Uploaded Source

Built Distribution

bcubed_metrics-1.0.1-py3-none-any.whl (5.2 kB view details)

Uploaded Python 3

File details

Details for the file bcubed_metrics-1.0.1.tar.gz.

File metadata

  • Download URL: bcubed_metrics-1.0.1.tar.gz
  • Upload date:
  • Size: 4.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.5

File hashes

Hashes for bcubed_metrics-1.0.1.tar.gz
Algorithm Hash digest
SHA256 b844bbedf124789dcfa5bd47b6f4cd035a683ed807c8731bc6352105e5329319
MD5 be3e68871998337b54cb1178ad33801f
BLAKE2b-256 032eb4a9579578ad73eb2350eb18fe422ea4f046975f499b7fc07749003a67cd

See more details on using hashes here.

File details

Details for the file bcubed_metrics-1.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for bcubed_metrics-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c783d4f4c82b4550f6e2ec57d686495ce323df3275bb96e385cfcc8aa24e13dc
MD5 af3abe05e656307bfdcae54d8fc8bacd
BLAKE2b-256 0b43b75c9908abcb520eb52c159268dfe628d57974c82d14f0f4914e86049683

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page