Skip to main content

A simple tool for benchamrking and tracking machine learning models and experiments.

Project description

Xetrack

xetrack is a lightweight package to track experiments and benchmarks data using duckdb. It looks and feels like pandas and is very easy to use.

Each instance of the tracker has a "track_id" which is a unique identifier for a single run.

Features

  • Simple
  • Embedded
  • Fast
  • Pandas-like
  • SQL-like
  • Multiprocessing reads and writes

Installation

pip install xetrack

Quickstart

from xetrack import Tracker

tracker = Tracker('database.db',
                  params={'model': 'resnet18'}
                  )
tracker.log(accuracy=0.9, loss=0.1, epoch=1)
{'accuracy': 0.9, 'loss': 0.1, 'epoch': 1, 'model': 'resnet18', 'timestamp': '18-08-2023 11:02:35.162360',
 'track_id': 'cd8afc54-5992-4828-893d-a4cada28dba5'}

tracker.latest
{'accuracy': 0.9, 'loss': 0.1, 'epoch': 1, 'model': 'resnet18', 'timestamp': '18-08-2023 11:02:35.162360',
 'track_id': 'cd8afc54-5992-4828-893d-a4cada28dba5'}


tracker.to_df(all=True)  # as dataframe
                    timestamp                              track_id     model  loss  epoch  accuracy
0  26-09-2023 12:17:00.342814  398c985a-dc15-42da-88aa-6ac6cbf55794  resnet18   0.1      1       0.9
1  26-09-2023 12:17:29.771021  398c985a-dc15-42da-88aa-6ac6cbf55794  resnet18   0.1      2       0.9

Params are values which are added to every future row:

tracker.set_params({'model': 'resnet18', 'dataset': 'cifar10'})
tracker.log(accuracy=0.9, loss=0.1, epoch=2)
{'accuracy': 0.9, 'loss': 0.1, 'epoch': 2, 'model': 'resnet18', 'dataset': 'cifar10', 
 'timestamp': '26-09-2023 12:18:40.151756', 'track_id': '398c985a-dc15-42da-88aa-6ac6cbf55794'}

You can also set a value to an entire run with set_value ("back in time"):

tracker.set_value('test_accuracy', 0.9)
tracker.to_df()

                    timestamp                              track_id     model  loss  epoch  accuracy  dataset  test_accuracy
0  26-09-2023 12:17:00.342814  398c985a-dc15-42da-88aa-6ac6cbf55794  resnet18   0.1      1       0.9      NaN            0.9
2  26-09-2023 12:18:40.151756  398c985a-dc15-42da-88aa-6ac6cbf55794  resnet18   0.1      2       0.9  cifar10            0.9

Track functions

You can track any function.

  • The return value is logged before returned
tracker = Tracker('database.db', log_system_params=True, log_network_params=True, measurement_interval=0.1)
image = tracker.track(read_image, *args, **kwargs)
tracker.latest
{'result': 571084, 'name': 'read_image', 'time': 0.30797290802001953, 'error': '', 'disk_percent': 0.6,
 'p_memory_percent': 0.496507, 'cpu': 0.0, 'memory_percent': 32.874608, 'bytes_sent': 0.0078125,
 'bytes_recv': 0.583984375}

Or with a wrapper:

@tracker.wrap(params={'name':'foofoo'})
def foo(a: int, b: str):
    return a + len(b)
result = foo(1, 'hello')
tracker.latest
{'function_name': 'foo', 'args': "[1, 'hello']", 'kwargs': '{}', 'error': '', 'function_time': 4.0531158447265625e-06, 
 'function_result': 6, 'name': 'foofoo', 'disk_percent': 0, 'p_memory_percent': 0, 'cpu': 0, 'memory_percent': 0, 
 'bytes_sent': 0.0, 'bytes_recv': 0.0, 'model': 'resnet18', 'dataset': 'cifar10', 'timestamp': '26-09-2023 12:21:02.200245', 
 'track_id': '398c985a-dc15-42da-88aa-6ac6cbf55794'}

Logger integration

import logging
logging.basicConfig(level=logging.INFO)

logger = logging.getLogger() # or loguru.logger
tracker = Tracker(db=database.db, logger=logger)
info = tracker.log(x='x')

INFO:root:x=x	timestamp=26-09-2023 12:26:36.564740	track_id=beb17e36-b646-4049-aff1-fd0e1574eb9e

Pandas-like

print(tracker)
                                    _id                              track_id                 date    b    a  accuracy
0  48154ec7-1fe4-4896-ac66-89db54ddd12a  fd0bfe4f-7257-4ec3-8c6f-91fe8ae67d20  16-08-2023 00:21:46  2.0  1.0       NaN
1  8a43000a-03a4-4822-98f8-4df671c2d410  fd0bfe4f-7257-4ec3-8c6f-91fe8ae67d20  16-08-2023 00:24:21  NaN  NaN       1.0

tracker['accuracy'] # get accuracy column
tracker.to_df() # get pandas dataframe of current run

SQL-like

You can filter the data using SQL-like syntax using duckdb:

  • The sqlite database is attached as db and the table is events
tracker.conn.execute(f"SELECT * FROM db.events WHERE accuracy > 0.8").fetchall()

Analysis

To get the data of all runs in the database for analysis:
Use this for further analysis and plotting.

  • This works even while a another process is writing to the database.
from xetrack import Reader
df = Reader('database.db').to_df() 

Merge two databases

If you have two databases, and you want to merge them into one, you can use the copy function:

python -c 'from xetrack import copy; copy(source="db1.db", target="db2.db")'

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xetrack-0.0.14.tar.gz (12.9 kB view details)

Uploaded Source

Built Distribution

xetrack-0.0.14-py3-none-any.whl (13.4 kB view details)

Uploaded Python 3

File details

Details for the file xetrack-0.0.14.tar.gz.

File metadata

  • Download URL: xetrack-0.0.14.tar.gz
  • Upload date:
  • Size: 12.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.1 CPython/3.11.6 Darwin/23.0.0

File hashes

Hashes for xetrack-0.0.14.tar.gz
Algorithm Hash digest
SHA256 a094cdbe5dad066970f4892df3cb4813ca0d9e4858a6f25224d9d7b4ae2827ff
MD5 2591e23ab1ae90f5e53d3f8d7c4552aa
BLAKE2b-256 b05f1fa25597d46ddbe94b7fb1758e2c6127e109e06cef14429ae934755a5516

See more details on using hashes here.

File details

Details for the file xetrack-0.0.14-py3-none-any.whl.

File metadata

  • Download URL: xetrack-0.0.14-py3-none-any.whl
  • Upload date:
  • Size: 13.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.1 CPython/3.11.6 Darwin/23.0.0

File hashes

Hashes for xetrack-0.0.14-py3-none-any.whl
Algorithm Hash digest
SHA256 b2784c0d7fe7974d5bab88764438e3121084046c2ee18b42a5a249b99c4df6a2
MD5 6083ae86c223bf001ac8b8ebdc63e95d
BLAKE2b-256 b9080afe3e03f6afc520421ab407c63f87c49e1161eb405495ff3eda1ab95a8b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page