Skip to main content

A Python package for row matching and F1 score calculations.

Project description

Tamarix Analytics

A Python package for row matching and F1 score calculations using the Hungarian algorithm.

Installation

pip install tamarix-analytics

Usage

from tamarix_analytics import match_rows, f1_score_unordered, f1_score_ordered, get_row_score

Methods

1. def match_rows(tentative_data: list[BaseModel], ground_truth: list[BaseModel]) -> Sequence[Tuple[int, int]]

Finds the optimal assignment of rows between two objects using the Hungarian algorithm. The optimal assignment is invariant to the order of the objects in the lists. This allows to make a comparison between tables even if the order/number of rows is different.

Input:

  • tentative_data: list[BaseModel] - list of arbitrary objects for comparison
  • ground_truth: list[BaseModel] - ground truth list of objects to match tentative_data against.

Output: A list of 2-tuples where the first item is the index of an object in the ground truth list and the second item is the index of an object in the tentative list.

2. def f1_score_unordered(tentative_data: list[BaseModel], ground_truth: list[BaseModel]) -> float

Calculates the F1 score between two list of arbitrary objects, without penalizing the wrong order of objects in the list. Internally uses the match_rows function to find the best mapping between the rows.

Input:

  • tentative_data: list[BaseModel] - list of arbitrary objects for comparison
  • ground_truth: list[BaseModel] - ground truth list of objects.

Output: F1 score as float

3. def f1_score_ordered(tentative_data: list[BaseModel], ground_truth: list[BaseModel]) -> float

Caluclates the F1 score where values are checked with consideration to the structure and order of the tables being compared.

Input:

  • tentative_data: list[BaseModel] - list of arbitrary objects for comparison
  • ground_truth: list[BaseModel] - ground truth list of objects.

Output: F1 score as float

4. def row_score(tentative_data: list[BaseModel], ground_truth: list[BaseModel]) -> float

The ratio between the number of items in the ground_truth list over the number of items in tentative_data.

Input:

  • tentative_data: list[BaseModel] - list of arbitrary objects for comparison
  • ground_truth: list[BaseModel] - ground truth list of objects.

Output: Row score ratio as float

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tamarix_analytics-0.0.4.tar.gz (3.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tamarix_analytics-0.0.4-py3-none-any.whl (3.7 kB view details)

Uploaded Python 3

File details

Details for the file tamarix_analytics-0.0.4.tar.gz.

File metadata

  • Download URL: tamarix_analytics-0.0.4.tar.gz
  • Upload date:
  • Size: 3.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for tamarix_analytics-0.0.4.tar.gz
Algorithm Hash digest
SHA256 6bf87ca493a8ee53539af2b0cfd48eac48d3c71bf55aae28ecc64276accaadfb
MD5 cf9c00f7037f8d8c79835c79f105a48a
BLAKE2b-256 5f19c1e5b3b462a0fa04eeea4e772df89950a3c04c50becbd3f2cf7ab687356c

See more details on using hashes here.

File details

Details for the file tamarix_analytics-0.0.4-py3-none-any.whl.

File metadata

File hashes

Hashes for tamarix_analytics-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 36c28d6d2abbabeea61d28581e90406cc3bd1a253dd999bd531c8b440b240def
MD5 e431a26e8b9d05511379fe08c78089b4
BLAKE2b-256 c2c9066e380a06422d1509bf5ab3716943da4a5eaace9ed6128720066f42b36a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page