A Python package for row matching and F1 score calculations.
Project description
Tamarix Analytics
A Python package for row matching and F1 score calculations using the Hungarian algorithm.
Installation
pip install tamarix-analytics
Usage
from tamarix_analytics import match_rows, f1_score_unordered, f1_score_ordered, get_row_score
Methods
1. def match_rows(tentative_data: list[BaseModel], ground_truth: list[BaseModel]) -> Sequence[Tuple[int, int]]
Finds the optimal assignment of rows between two objects using the Hungarian algorithm. The optimal assignment is invariant to the order of the objects in the lists. This allows to make a comparison between tables even if the order/number of rows is different.
Input:
tentative_data: list[BaseModel]- list of arbitrary objects for comparisonground_truth: list[BaseModel]- ground truth list of objects to matchtentative_dataagainst.
Output: A list of 2-tuples where the first item is the index of an object in the ground truth list and the second item is the index of an object in the tentative list.
2. def f1_score_unordered(tentative_data: list[BaseModel], ground_truth: list[BaseModel]) -> float
Calculates the F1 score between two list of arbitrary objects, without penalizing the wrong order of objects in the list. Internally uses the match_rows function to find the best mapping between the rows.
Input:
tentative_data: list[BaseModel]- list of arbitrary objects for comparisonground_truth: list[BaseModel]- ground truth list of objects.
Output: F1 score as float
3. def f1_score_ordered(tentative_data: list[BaseModel], ground_truth: list[BaseModel]) -> float
Caluclates the F1 score where values are checked with consideration to the structure and order of the tables being compared.
Input:
tentative_data: list[BaseModel]- list of arbitrary objects for comparisonground_truth: list[BaseModel]- ground truth list of objects.
Output: F1 score as float
4. def row_score(tentative_data: list[BaseModel], ground_truth: list[BaseModel]) -> float
The ratio between the number of items in the ground_truth list over the number of items in tentative_data.
Input:
tentative_data: list[BaseModel]- list of arbitrary objects for comparisonground_truth: list[BaseModel]- ground truth list of objects.
Output: Row score ratio as float
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tamarix_analytics-0.0.7.tar.gz.
File metadata
- Download URL: tamarix_analytics-0.0.7.tar.gz
- Upload date:
- Size: 3.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f17f914e11c6bcbec95765c562364aa4811c34f9ea3a5f5ace4865c4370da13b
|
|
| MD5 |
49d4bdb844aa4c1b5aaf8debb5722aeb
|
|
| BLAKE2b-256 |
2b9a9cd12d8ee626c547880555a02f7facf2d3bbc680d754185c7a5e089e591a
|
File details
Details for the file tamarix_analytics-0.0.7-py3-none-any.whl.
File metadata
- Download URL: tamarix_analytics-0.0.7-py3-none-any.whl
- Upload date:
- Size: 3.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
592f9717ae33fa50b661e059164887f9f58696060e228ceab0daefb701ed837f
|
|
| MD5 |
e3c3eb7b5eca83cfc6dd630c38f90be0
|
|
| BLAKE2b-256 |
f27ce81732d6c7d9f5057059ba1c590e9b7055b1fdb0a2a1ce23b4878ac92641
|