Skip to main content

A Python package for row matching and F1 score calculations.

Project description

Tamarix Analytics

A Python package for row matching and F1 score calculations using the Hungarian algorithm.

Installation

pip install tamarix-analytics

Usage

from tamarix_analytics import match_rows, f1_score_unordered, f1_score_ordered, get_row_score

Methods

1. def match_rows(tentative_data: list[BaseModel], ground_truth: list[BaseModel]) -> Sequence[Tuple[int, int]]

Finds the optimal assignment of rows between two objects using the Hungarian algorithm. The optimal assignment is invariant to the order of the objects in the lists. This allows to make a comparison between tables even if the order/number of rows is different.

Input:

  • tentative_data: list[BaseModel] - list of arbitrary objects for comparison
  • ground_truth: list[BaseModel] - ground truth list of objects to match tentative_data against.

Output: A list of 2-tuples where the first item is the index of an object in the ground truth list and the second item is the index of an object in the tentative list.

2. def f1_score_unordered(tentative_data: list[BaseModel], ground_truth: list[BaseModel]) -> float

Calculates the F1 score between two list of arbitrary objects, without penalizing the wrong order of objects in the list. Internally uses the match_rows function to find the best mapping between the rows.

Input:

  • tentative_data: list[BaseModel] - list of arbitrary objects for comparison
  • ground_truth: list[BaseModel] - ground truth list of objects.

Output: F1 score as float

3. def f1_score_ordered(tentative_data: list[BaseModel], ground_truth: list[BaseModel]) -> float

Caluclates the F1 score where values are checked with consideration to the structure and order of the tables being compared.

Input:

  • tentative_data: list[BaseModel] - list of arbitrary objects for comparison
  • ground_truth: list[BaseModel] - ground truth list of objects.

Output: F1 score as float

4. def row_score(tentative_data: list[BaseModel], ground_truth: list[BaseModel]) -> float

The ratio between the number of items in the ground_truth list over the number of items in tentative_data.

Input:

  • tentative_data: list[BaseModel] - list of arbitrary objects for comparison
  • ground_truth: list[BaseModel] - ground truth list of objects.

Output: Row score ratio as float

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tamarix_analytics-0.0.7.tar.gz (3.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tamarix_analytics-0.0.7-py3-none-any.whl (3.8 kB view details)

Uploaded Python 3

File details

Details for the file tamarix_analytics-0.0.7.tar.gz.

File metadata

  • Download URL: tamarix_analytics-0.0.7.tar.gz
  • Upload date:
  • Size: 3.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.1

File hashes

Hashes for tamarix_analytics-0.0.7.tar.gz
Algorithm Hash digest
SHA256 f17f914e11c6bcbec95765c562364aa4811c34f9ea3a5f5ace4865c4370da13b
MD5 49d4bdb844aa4c1b5aaf8debb5722aeb
BLAKE2b-256 2b9a9cd12d8ee626c547880555a02f7facf2d3bbc680d754185c7a5e089e591a

See more details on using hashes here.

File details

Details for the file tamarix_analytics-0.0.7-py3-none-any.whl.

File metadata

File hashes

Hashes for tamarix_analytics-0.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 592f9717ae33fa50b661e059164887f9f58696060e228ceab0daefb701ed837f
MD5 e3c3eb7b5eca83cfc6dd630c38f90be0
BLAKE2b-256 f27ce81732d6c7d9f5057059ba1c590e9b7055b1fdb0a2a1ce23b4878ac92641

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page