Simple extended BCubed implementation in Python for clustering evaluation

These details have not been verified by PyPI

Project links

Project description

python-bcubed

Simple extended BCubed implementation in Python for (non-)overlapping clustering evaluation.

More information on BCubed and details of the algorithm can be found in the following publication:

Amigó, Enrique, et al.: A comparison of Extrinsic Clustering Evaluation Metrics based on Formal Constraints. In: Information Retrieval 12.4 (2009): 461-486.

Installation

You can simply use pip (or any similar package manager) for installation:

pip install bcubed

or, if you prefer a local user installation:

pip install --user bcubed

Usage

To evaluate any clustering output you will need ground-truth data (also called gold-standard data). We call this the ldict. The ground-truth is represented in a dictionary where the keys are items in the gold-standard and the values are sets of annotated categories for those items. For example:

ldict = {
    "item1": set(["gray", "black"]),
    "item2": set(["gray", "black"]),
    "item3": set(["gray"]),
    "item4": set(["black"]),
    "item5": set(["black"]),
    "item6": set(["dashed"]),
    "item7": set(["dashed"]),
}

In the above example, item1 is assigned two categories in the ground-truth: gray and black. For the case of item6 and item7, both are assigned the single annotation dashed. This representation supports modelling overlapping and non-overlapping ground-truth data.

The clustering output to be evaluated is called the cdict and is also represented as a dictionary in the same way as the ldict. In this case, the keys are items in the clustering output and the values are the sets of assigned clusters for those items. For example:

cdict = {
    "item1": set(["A", "B"]),
    "item2": set(["A", "B"]),
    "item3": set(["A"]),
    "item4": set(["B"]),
    "item5": set(["B"]),
    "item6": set(["C"]),
    "item7": set(["C"]),
}

Please note that the clusters names (or IDs) do not need to be the same as in the ground-truth data because the algorithm only considers the groupings, it does not try to match the names of clusters to the ground-truth categories.

Once you have defined the ldict (ground-truth data) and the cdict (clustering output to evaluate), you can simply do the following to obtain the extended BCubed precision and recall metric values:

import bcubed

precision = bcubed.precision(cdict, ldict)
recall = bcubed.recall(cdict, ldict)
fscore = bcubed.fscore(precision, recall)

There is also included an F-score (also called F-measure) function for your convenience. This function accepts non-standard values for the beta parameter if you need, as follows:

fscore = bcubed.fscore(precision, recall, beta=2.0)  # weights recall higher
fscore = bcubed.fscore(precision, recall, beta=0.5)  # weights precision higher

A complete example can be found in the included example.py file, where the examples of the source publication are used.

License

This software is under the Apache License 2.0.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.5

Jan 17, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bcubed-1.5.tar.gz (3.6 kB view details)

Uploaded Jan 17, 2019 Source

Built Distribution

bcubed-1.5-py2.py3-none-any.whl (8.7 kB view details)

Uploaded Jan 17, 2019 Python 2Python 3

File details

Details for the file bcubed-1.5.tar.gz.

File metadata

Download URL: bcubed-1.5.tar.gz
Upload date: Jan 17, 2019
Size: 3.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.21.0 setuptools/40.6.3 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.7.2

File hashes

Hashes for bcubed-1.5.tar.gz
Algorithm	Hash digest
SHA256	`3dc67a6e4e925d4493ce6bc9747a64a48f84398f31b573f73776c9b836de6ae9`
MD5	`9034816a3c699925bcdfc90248fc3376`
BLAKE2b-256	`4efdbc5455e5cf3e3aa7cf36f9f7c5cf3c81c0a4fb36c4bff8982599e78f8458`

See more details on using hashes here.

File details

Details for the file bcubed-1.5-py2.py3-none-any.whl.

File metadata

Download URL: bcubed-1.5-py2.py3-none-any.whl
Upload date: Jan 17, 2019
Size: 8.7 kB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.21.0 setuptools/40.6.3 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.7.2

File hashes

Hashes for bcubed-1.5-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`b465d0e27527f3834d0b053503868ac09b986853a4d8c14a1fa52a4cefd3ac29`
MD5	`a7ff42a5b9ef44f5244ec802b902add4`
BLAKE2b-256	`56c4f06199a7de2236e92d3894672b1f8530d4fc1c02d7453d6c30e379fbce41`

See more details on using hashes here.

bcubed 1.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

python-bcubed

Installation

Usage

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes