Skip to main content

calculate precision or recall or f1 score on large-scale datasets

Project description

Example

from umetrics import CorefEvaluator
from umetrics import MacroMetrics
from umetrics import MicroMetrics
from umetrics import MultiLabelClassMacroF1Metric 
# macro
from umetrics import MacroMetrics
from sklearn.metrics import classification_report, f1_score, precision_score, recall_score

import random

y_trues = [random.randint(0, 10) for i in range(10000)]
y_preds = [random.randint(0, 10) for i in range(10000)]

labels = list(set(y_preds) & set(y_trues))

p_score = round(precision_score(y_trues, y_preds, labels=labels, zero_division=0, average='macro'), 5)
r_score = round(recall_score(y_trues, y_preds, labels=labels, zero_division=0, average='macro'), 5)
f_score = round(f1_score(y_trues, y_preds, labels=labels, zero_division=0, average='macro'), 5)

y_trues_chunk = [y_trues[i:i + 3] for i in range(0, len(y_trues), 3)]
y_preds_chunk = [y_preds[i:i + 3] for i in range(0, len(y_preds), 3)]

m = MacroMetrics(labels=labels)
for y_true_chunk, y_pred_chunk in zip(y_trues_chunk, y_preds_chunk):
    m.step(y_trues=y_true_chunk, y_preds=y_pred_chunk)

assert p_score == round(m.precision_score(), 5)
assert r_score == round(m.recall_score(), 5)
assert f_score == round(m.f1_score(), 5)

The Package Mission

Compared with sklearn already provides good and mature functions, For example:

  1. precision_score
  2. recall_score
  3. f1_score
  4. classification_report

Why write such a project?

For example in forecasting:

from sklearn.metrics import precision_score
y_true = []
y_pred = []
precision_score(y_true=y_true,y_pred=y_pred)

Assuming that the amount of data in y_true or y_pred is very large, then just storing these data will already consume a lot of memory, let alone calculations.

此包存在的目的

相比sklearn已经提供好的并且很成熟的函数,例如:

  1. precision_score
  2. recall_score
  3. f1_score
  4. classification_report

为什么还要写这么一个项目?

例如在预测:

from sklearn.metrics import precision_score
y_true = []
y_pred = []
precision_score(y_true=y_true,y_pred=y_pred)

假设y_true或者y_pred的数据量非常大的时候,那么光是存这些数据就已经要消耗大量内存,更别提计算了。

发布

python setup.py bdist_wheel

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

u_umetrics-1.0.11-py3-none-any.whl (9.7 kB view details)

Uploaded Python 3

File details

Details for the file u_umetrics-1.0.11-py3-none-any.whl.

File metadata

  • Download URL: u_umetrics-1.0.11-py3-none-any.whl
  • Upload date:
  • Size: 9.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for u_umetrics-1.0.11-py3-none-any.whl
Algorithm Hash digest
SHA256 501837f14f2ad4b3e90b539879cba1d642539c0fc9b67ba12088f5106efb0d27
MD5 0951cfc83c4144288e2cf8018d37cbff
BLAKE2b-256 8713eca150f54938d3b74be2c73f63917bf36c8b8d640e0026465a255f6e0d92

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page