Skip to main content

A package to calculate BCUBED precision, recall, and F1-score for clustering evaluation.

Project description

B-Cubed Metrics

A simple Python package to calculate B-Cubed precision, recall, and F1-score for clustering evaluation.

What are B-Cubed Metrics

The B-Cubed algorithm was first introduced by Bagga, A. and Baldwin B. (1998) in their paper on Entity-Based Cross-Document Coreferencing Using the Vector Space Model. The algorithm compares a predicted clustering with a ground truth (or gold standard) clustering through element-wise precision and recall scores. For each element, the predicted and ground truth clusters containing the element are compared, and then the mean over all elements is taken. The B-Cubed algorithm can be useful in unsupervised techniques where the cluster labels are not available, because unlike macro-averaged metrics, it focuses on element-wise operations.
From the paper, two simple equations were devised calculating precision and recall scores for the predicted clustering:

$$ Precision = \frac{1}{\sum {elements}}\sum_{i=1}^n {\frac{(count ; of ; element)^2}{count ; of ; all ; elements ; in ; cluster}} $$

$$ Recall = \frac{1}{\sum {elements}}\sum_{i=1}^n {\frac{(count ; of ; element)^2}{count ; of ; total ; elements ; from ; this ; category}} $$

$$ F-score = \frac{1}{k}\sum_{i=1}^n {\frac{2\times Precision(C)_k \times Recall(C)_k}{Precision(C)_k + Recall(C)_k}} $$

where $n$ above denotes the number of categories in the cluster and $k$ is the number of predicted clusters. $Precision(C)_k$ and $Recall(C)_k$ are the 'partial' precision and recalls for each cluster.

Installation and Use

Download the package from any terminal using:
pip install bcubed-metrics
To use the B-Cubed class you need to import it and provide 2 dictionaries - one for the predicted clustering, and one for the ground truth clustering (actual labels):
from bcubed_metrics.bcubed import Bcubed

predicted_clustering = [
            {'blue': 4, 'red': 2, 'green': 1},
            {'blue': 2, 'red': 2, 'green': 3},
            {'blue': 1, 'red': 5},
            {'blue': 1, 'red': 2, 'green': 3}
        ]

ground_truth_clustering = {'blue': 8, 'red': 11, 'green': 7}

bcubed = Bcubed(predicted_clustering=predicted_clustering, ground_truth_clustering=ground_truth_clustering)

metrics = bcubed.get_metrics() # returns all metrics as dictionary

bcubed.print_metrics() # prints all metrics

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bcubed_metrics-1.0.1.tar.gz (4.4 kB view hashes)

Uploaded Source

Built Distribution

bcubed_metrics-1.0.1-py3-none-any.whl (5.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page