Fit & Predict Utils

These details have not been verified by PyPI

Project description

fitxf

Simple math utility library

Basic math that don't exist in numpy as single function call
Simple math for optimal clusters not in sklearn
Simple graph wrappers
Simple transform wrappers to simplify tensor transforms or compression via clustering (Euclid distance or cosine) or PCA
allow to do searches using transformed data
allow to save/load transform model to/from string & fine-tuning

pip install fitxf

Basic Math

Cosine or dot similarity between multi-dim vectors

import numpy as np
from fitxf import TensorUtils
ts = TensorUtils()
x = np.random.rand(5,3)
y = np.random.rand(10,3)

# Cosine similarity, find closest matches of each vector in x
# with all vectors in ref
# For Euclidean distance, just replace with "similarity_distance"
ts.similarity_cosine(x=x, ref=x, return_tensors='np')
ts.similarity_cosine(x=y, ref=y, return_tensors='np')
matches, dotsim = ts.similarity_cosine(x=x, ref=y, return_tensors='np')
print("matches",matches)
print("dot similarities",dotsim)

Clustering

Auto clustering into optimal n clusters, via heuristic manner

Case 1: All clusters (think towns) are almost-equally spaced

in this case, suppose optimal cluster centers=n (think salesmen)
if number of clusters k<n, then each salesman need to cover a larger area, and their average distances from each other is smaller
if number of clusters k>n, then things become a bit crowded, with more than 1 salesman covering a single town
Thus at transition from n --> n+1 clusters, the average distance between cluster centers will decrease

Case 2: Some clusters are spaced much larger apart

In this case, there will be multiple turning points, and we may take an earlier turning point or later turning points

Optimal cluster by Euclidean Distance

from fitxf import Cluster
x = np.array([
    [5, 1, 1], [8, 2, 1], [6, 0, 2],
    [1, 5, 1], [2, 7, 1], [0, 6, 2],
    [1, 1, 5], [2, 1, 8], [0, 2, 6],
])
obj = Cluster()
obj.kmeans_optimal(
    x = x,
    estimate_min_max = True,
)

Optimal cluster by cosine distance

from fitxf import ClusterCosine
x = np.random.rand(20,3)
ClusterCosine().kmeans_optimal(x=x)

Graph Wrappers

Simple directed graph with Dijkstra or simple path.

from fitxf import GraphUtils
gu = GraphUtils()
G = gu.create_multi_graph(
    edges = [
        {'key': 'plane', 'u': 'Shanghai', 'v': 'Tokyo', 'distance': 10},
        {'key': 'ship', 'u': 'Shanghai', 'v': 'Tokyo', 'distance': 100},
        {'key': 'plane', 'u': 'Tokyo', 'v': 'Shanghai', 'distance': 22},
        {'key': 'plane', 'u': 'Tokyo', 'v': 'Seoul', 'distance': 5},
        {'key': 'plane', 'u': 'Seoul', 'v': 'Tokyo', 'distance': 6},
        {'key': 'ship', 'u': 'Seoul', 'v': 'Tokyo', 'distance': 60},
    ],
    col_weight = 'distance',
    directed = True,
)
# Shanghai-->Tokyo-->Seoul, total weight 15
print(gu.get_paths(G=G, source="Shanghai", target="Seoul", method="dijkstra"))
# Shanghai-->Tokyo-->Seoul, total weight 105
print(gu.get_paths(G=G, source="Shanghai", target="Seoul", method="simple", agg_weight_by="max"))
# Seoul-->Tokyo-->Shanghai, total weight 28
print(gu.get_paths(G=G, source="Seoul", target="Shanghai", method="dijkstra"))
# Shanghai-->Tokyo-->Seoul, total weight 82
print(gu.get_paths(G=G, source="Seoul", target="Shanghai", method="simple", agg_weight_by="max"))

Fit Transform

Convenient wrapper

fit a set of vectors into compressed PCA, clusters, etc.
predict via cosine similarity, Euclidean distance of arbitrary vectors
fine tune

Sample code for basic training to transform data -

from fitxf import FitXformPca, FitXformCluster
import numpy as np
x = np.array([
    [5, 1, 1], [8, 2, 1], [6, 0, 2],
    [1, 5, 1], [2, 7, 1], [0, 6, 2],
    [1, 1, 5], [2, 1, 8], [0, 2, 6],
])
user_labels = [
    'a', 'a', 'a',
    'b', 'b', 'b',
    'c', 'c', 'c',
]
pca = FitXformPca()
res_fit_pca = pca.fit_optimal(X=x, X_labels=user_labels)
print('X now reduced to\n',res_fit_pca['X_transform'])

cls = FitXformCluster()
res_fit_cls = cls.fit_optimal(X=x, X_labels=user_labels)
print('X now reduced to\n',res_fit_cls['X_transform'])

pca.predict(X=x+np.random.rand(9,3))
cls.predict(X=x+np.random.rand(9,3))

From above code same you will see the original X of 3 dimensions were reduced to 2.

X now reduced to
 [[ 3.31282121 -0.0595825 ]
 [ 5.50813438  0.41970015]
 [ 4.04773021 -1.54803569]
 [-1.35557188  3.11843837]
 [-1.16795985  4.47103187]
 [-2.95485944  3.21899561]
 [-1.77834736 -2.52015183]
 [-2.23431234 -4.68080139]
 [-3.37763491 -2.41959458]]

whereas for clustering

X now reduced to
 [2 2 2 0 0 0 1 1 1]

where each point in 3 dimensions is represented by only a scalar center label.

Save Model to String & Load Back

Sample code to save and load model -

# Save this Base64 string somewhere
model_save = pca.model_to_b64json(numpy_to_base64_str=True, dump_to_b64json_str=True)

# Load back into new instance
new = FitXformPca()
new.load_model_from_b64json(model_b64json=model_save)
new.predict(X=x+np.random.rand(9,3))

Fine Tuning

After saving models & loading back, one can fine tune with new points.

Test fine tune with same centers and data

cls = FitXformCluster()
res_fit_cls = cls.fit_optimal(X=x, X_labels=user_labels)
centers = res_fit_cls["centers"]

res = cls.fine_tune(
   X = x,
   X_labels = user_labels,
   n_components = 3,
)
[print(k,v) for k,v in res.items()]
print('Expect 1 iteration, got ', res["n_iter"])

print('old centers',centers)
print('new centers',res['centers'])

Fine tune with new points

x = np.array([
    [5, 1, 1], [8, 2, 1], [6, 0, 2],
    [1, 5, 1], [2, 7, 1], [0, 6, 2],
    # remove last 2 points
    [1, 1, 5], # [2, 1, 8], [0, 2, 6],
    # new points
    [4, 2, 1], [0, 6, 2], [1, 1, 7],
])
user_labels = [
    'a', 'a', 'a',
    'b', 'b', 'b',
    # remove last 2 points
    'c', # 'c', 'c',
    'a', 'b', 'c',
]
res_fit_new = cls.fine_tune(X=x, X_labels=user_labels, n_components=3)

print('old centers',centers)
print('new centers',res_fit_new['centers'])

Tensor DB Models

The math is simple, but the technical implementation is orders of magnitude far more complicated. Transformed data needs to be updated to storage for subsequent level searches, model & data needs to be kept in sync, etc.

To be updated..

Miscellaneous

Project details

These details have not been verified by PyPI

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

1.2.5

Jun 2, 2025

1.2.4

May 29, 2025

1.2.3

May 29, 2025

1.2.2

Apr 25, 2025

1.2.1

Apr 9, 2025

1.2.0

Apr 2, 2025

1.1.4

Mar 13, 2025

1.1.3

Mar 10, 2025

1.1.2

Mar 5, 2025

1.1.1

Feb 21, 2025

1.1.0

Feb 19, 2025

1.0.9

Feb 8, 2025

1.0.8

Dec 30, 2024

1.0.6

Dec 7, 2024

1.0.5

Nov 16, 2024

1.0.4

Nov 2, 2024

0.3.1

Sep 30, 2024

0.3.0

Sep 29, 2024

0.2.57

Sep 8, 2024

0.2.56

Sep 2, 2024

0.2.54

Aug 26, 2024

0.2.53

Aug 26, 2024

0.2.52

Aug 26, 2024

0.2.50

Aug 20, 2024

0.2.46

Jun 28, 2024

0.2.44

Jun 28, 2024

0.2.29

Jun 20, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fitxf-1.2.5.tar.gz (341.6 kB view details)

Uploaded Jun 2, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

fitxf-1.2.5-py3-none-any.whl (376.3 kB view details)

Uploaded Jun 2, 2025 Python 3

File details

Details for the file fitxf-1.2.5.tar.gz.

File metadata

Download URL: fitxf-1.2.5.tar.gz
Upload date: Jun 2, 2025
Size: 341.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.7

File hashes

Hashes for fitxf-1.2.5.tar.gz
Algorithm	Hash digest
SHA256	`5bcb44716b2ceb2b66db2a6fbb1431240866736a18d6b5f0d12e182493eec614`
MD5	`597cec28e2d838e303fe89aca5123cff`
BLAKE2b-256	`61be37db8204c12e23b707ffe65ec87bf41342d6672acbff88c9266fee39fbc7`

See more details on using hashes here.

File details

Details for the file fitxf-1.2.5-py3-none-any.whl.

File metadata

Download URL: fitxf-1.2.5-py3-none-any.whl
Upload date: Jun 2, 2025
Size: 376.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.7

File hashes

Hashes for fitxf-1.2.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`47db6b16f8bde45af3960b4796f07f11d8019392f46fda6a3933a1a3e43880db`
MD5	`a3f64f6e02f00ad1e67eb9dceb35a9fe`
BLAKE2b-256	`d57b245ba4ad8977710a763f748eea0a615eea33466284eb91fda6e30dbc4219`

See more details on using hashes here.

fitxf 1.2.5

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

fitxf

Basic Math

Clustering

Case 1: All clusters (think towns) are almost-equally spaced

Case 2: Some clusters are spaced much larger apart

Graph Wrappers

Fit Transform

Save Model to String & Load Back

Fine Tuning

Tensor DB Models

Miscellaneous

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes