A simple tool for benchamrking and tracking machine learning models and experiments.

These details have not been verified by PyPI

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Xetrack

xetrack is a lightweight package to track experiments and benchmarks data using duckdb. It looks and feels like pandas and is very easy to use.

Each instance of the tracker has a "track_id" which is a unique identifier for a single run.

Features

Simple
Embedded
Fast
Pandas-like
SQL-like
Multiprocessing reads and writes
Loguru text integration for file monitoring

Installation

pip install xetrack

Quickstart

from xetrack import Tracker

tracker = Tracker('database.db',
                  params={'model': 'resnet18'}
                  )
tracker.log(accuracy=0.9, loss=0.1, epoch=1)
{'accuracy': 0.9, 'loss': 0.1, 'epoch': 1, 'model': 'resnet18', 'timestamp': '18-08-2023 11:02:35.162360',
 'track_id': 'cd8afc54-5992-4828-893d-a4cada28dba5'}

tracker.latest
{'accuracy': 0.9, 'loss': 0.1, 'epoch': 1, 'model': 'resnet18', 'timestamp': '18-08-2023 11:02:35.162360',
 'track_id': 'cd8afc54-5992-4828-893d-a4cada28dba5'}


tracker.to_df(all=True)  # as dataframe
                    timestamp                              track_id     model  loss  epoch  accuracy
0  26-09-2023 12:17:00.342814  398c985a-dc15-42da-88aa-6ac6cbf55794  resnet18   0.1      1       0.9
1  26-09-2023 12:17:29.771021  398c985a-dc15-42da-88aa-6ac6cbf55794  resnet18   0.1      2       0.9

Params are values which are added to every future row:

tracker.set_params({'model': 'resnet18', 'dataset': 'cifar10'})
tracker.log(accuracy=0.9, loss=0.1, epoch=2)
{'accuracy': 0.9, 'loss': 0.1, 'epoch': 2, 'model': 'resnet18', 'dataset': 'cifar10', 
 'timestamp': '26-09-2023 12:18:40.151756', 'track_id': '398c985a-dc15-42da-88aa-6ac6cbf55794'}

You can also set a value to an entire run with set_value ("back in time"):

tracker.set_value('test_accuracy', 0.9)
tracker.to_df()

                    timestamp                              track_id     model  loss  epoch  accuracy  dataset  test_accuracy
0  26-09-2023 12:17:00.342814  398c985a-dc15-42da-88aa-6ac6cbf55794  resnet18   0.1      1       0.9      NaN            0.9
2  26-09-2023 12:18:40.151756  398c985a-dc15-42da-88aa-6ac6cbf55794  resnet18   0.1      2       0.9  cifar10            0.9

Track functions

You can track any function.

The return value is logged before returned

tracker = Tracker('database.db', log_system_params=True, log_network_params=True, measurement_interval=0.1)
image = tracker.track(read_image, *args, **kwargs)
tracker.latest
{'result': 571084, 'name': 'read_image', 'time': 0.30797290802001953, 'error': '', 'disk_percent': 0.6,
 'p_memory_percent': 0.496507, 'cpu': 0.0, 'memory_percent': 32.874608, 'bytes_sent': 0.0078125,
 'bytes_recv': 0.583984375}

Or with a wrapper:

@tracker.wrap(params={'name':'foofoo'})
def foo(a: int, b: str):
    return a + len(b)
result = foo(1, 'hello')
tracker.latest
{'function_name': 'foo', 'args': "[1, 'hello']", 'kwargs': '{}', 'error': '', 'function_time': 4.0531158447265625e-06, 
 'function_result': 6, 'name': 'foofoo', 'disk_percent': 0, 'p_memory_percent': 0, 'cpu': 0, 'memory_percent': 0, 
 'bytes_sent': 0.0, 'bytes_recv': 0.0, 'model': 'resnet18', 'dataset': 'cifar10', 'timestamp': '26-09-2023 12:21:02.200245', 
 'track_id': '398c985a-dc15-42da-88aa-6ac6cbf55794'}

Tips and tricks

Tracker(Tracker.IN_MEMORY) Let you run only in memory

Pandas-like

print(tracker)
                                    _id                              track_id                 date    b    a  accuracy
0  48154ec7-1fe4-4896-ac66-89db54ddd12a  fd0bfe4f-7257-4ec3-8c6f-91fe8ae67d20  16-08-2023 00:21:46  2.0  1.0       NaN
1  8a43000a-03a4-4822-98f8-4df671c2d410  fd0bfe4f-7257-4ec3-8c6f-91fe8ae67d20  16-08-2023 00:24:21  NaN  NaN       1.0

tracker['accuracy'] # get accuracy column
tracker.to_df() # get pandas dataframe of current run

SQL-like

You can filter the data using SQL-like syntax using duckdb:

The sqlite database is attached as db and the table is events

tracker.conn.execute(f"SELECT * FROM db.events WHERE accuracy > 0.8").fetchall()

Logger integration

This is very useful in an environment where you can use normal logs, and don't want to manage a separate logger or file.
On great use-case is model monitoring.

pip install xetrack[loguru]

logs_stdout=true print to stdout every tracked event logs_path='logs' writes logs to a file

$ Tracker(db=Tracker.IN_MEMROY, logs_path='logs',logs_stdout=True).log(accuracy=0.9)
2023-12-14 21:46:55.290 | TRACKING | xetrack.logging:log:69!📁!{"a": 1, "b": 2, "timestamp": "2023-12-14 21:46:55.290098", "track_id": "marvellous-stork-4885"}

$ Reader.read_logs(path='logs')
   accuracy                   timestamp                track_id
0       0.9  2023-12-14 21:47:48.375258  unnatural-polecat-1380

Analysis

To get the data of all runs in the database for analysis:
Use this for further analysis and plotting.

This works even while a another process is writing to the database.

from xetrack import Reader
df = Reader('database.db').to_df()

Model Monitoring

Here is how we can save logs on any server and monitor them with xetrack:
We want to print logs to a file or stdout to be captured normally.
We save memory by not inserting the data to the database (even though it's fine). LAter we can read the logs and do fancy visualisation, online/offline analysis, build dashboards etc.

tracker = Tracker(db=Tracker.SKIP_INSERT, logs_path='logs',logs_stdout=True)
tracker.logger.monitor("<dict or pandas DataFrame>") # -> write to logs in a structured way, consistent by schema, no database file needed


df = Reader.read_logs(path='logs')
"""
Run drift analysis and outlier detection on your logs: 
"""

ML tracking

tracker.logger.experiemnt(<model evaluation and params>) # -> prettily write to logs

df = Reader.read_logs(path='logs')
"""
Run fancy visualisation, online/offline analysis, build dashboards etc.
"""

Merge two databases

If you have two databases, and you want to merge them into one, you can use the copy function:

python -c 'from xetrack import copy; copy(source="db1.db", target="db2.db")'

Project details

These details have not been verified by PyPI

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.3.1

Mar 5, 2024

0.3.0

Mar 5, 2024

0.2.7

Mar 5, 2024

0.2.6

Feb 2, 2024

0.2.5

Feb 2, 2024

0.2.4

Feb 2, 2024

0.2.3

Feb 2, 2024

0.2.2

Jan 10, 2024

0.2.1

Jan 8, 2024

0.2.0

Dec 27, 2023

This version

0.1.0

Dec 14, 2023

0.0.14

Nov 27, 2023

0.0.13

Oct 9, 2023

0.0.12

Sep 27, 2023

0.0.11

Sep 26, 2023

0.0.10

Sep 26, 2023

0.0.9

Sep 15, 2023

0.0.8

Aug 21, 2023

0.0.7

Aug 18, 2023

0.0.6

Aug 18, 2023

0.0.5

Aug 17, 2023

0.0.4

Aug 17, 2023

0.0.3

Aug 16, 2023

0.0.2

Aug 15, 2023

0.0.1

Aug 15, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xetrack-0.1.0.tar.gz (16.5 kB view hashes)

Uploaded Dec 14, 2023 Source

Built Distribution

xetrack-0.1.0-py3-none-any.whl (17.0 kB view hashes)

Uploaded Dec 14, 2023 Python 3

Hashes for xetrack-0.1.0.tar.gz

Hashes for xetrack-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`387be92aeb709ad0a3f618f9af1b50a6e382c4d750cf94e23e37f6d33bbb665f`
MD5	`f51d8626071132c1466749ca26d120ef`
BLAKE2b-256	`f36b6517afd51a734ce15bb01e9ac8326358f631fda1fa6f78474973bc6c37ac`

Hashes for xetrack-0.1.0-py3-none-any.whl

Hashes for xetrack-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d2645858f823db67407d1d39642469eab5b660e6d02202409c209f162db3207e`
MD5	`05a1089eaff6ee79c4780fc93f9bbcc5`
BLAKE2b-256	`c6409e2c8348cddf267cd1c3c85c53ea2bc0fb3f0640ee34ce16cfb41c77653a`