Traintrack is a modular MLOps platform designed to manage and monitor the full lifecycle of machine learning workflows — from dataset tracking to deployment — with a robust Go-based backend and an easy-to-use Python SDK.

Project description

🚅 Traintrack

Note, this is still very much in active development so nothing is stable. Use at your own risk.

🚀 Features

Python SDK Simplifies dataset management and future ML pipeline interactions. Ideal for data scientists and ML engineers.
Dataset Tracking Track versions, metadata, and lineage of datasets across your team or projects.
Model Tracking Track versions, metadata, and lineage of models across your team or projects. The code used to train the model and details of the environment are stored to ensure reproducability further down the track.
Designed for Multi-tenant Systems Secure and flexible, with SCIM auto-provisioning on the roadmap.

🌍 Set up environment

Install the CLI tool:

$ brew tap heldtogether/tools
$ brew install traintrack

Configure:

First, tell it where your instance can be found:

$ traintrack set-instance <url>

Then log in:

$ traintrack login

After adding the config, the SDK will take over and ensure that tokens are refreshed.

🐍 Python SDK Usage

Install the library:

$ pip install traintrack

Create a dataset

from traintrack import Dataset
import pandas as pd

# Create a dataset
data = {
    "Bedrooms": [2, 3, 3, 4, 2, None, 4, 3, 5, 2, 4],
    "Bathrooms": [1, 2, 1, 3, 1, 2, 2, 2, 3, 1, 2],
    "Sqft": [900, 1500, 1200, 2000, 950, 1200, 1850, 1400, 2500, 1000, 2100],
    "Age": [30, 15, 20, 5, 40, 15, 10, 12, 4, 35, 8],  # years
    "Price": [200_000, 340_000, 275_000, 500_000, 210_000, None, 480_000, 320_000, 600_000, 205_000, 520_000],
}

df = pd.DataFrame(data)

X = df[["Bedrooms", "Bathrooms", "Sqft", "Age"]]
y = df[["Price"]]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

dataset = Dataset(None, "house_prices", "1.0.0", "Raw data")
dataset.set_artefact("input_features_train", X_train)
dataset.set_artefact("output_train", y_train)
dataset.set_artefact("input_features_test", X_test)
dataset.set_artefact("output_test", y_test)
dataset.save()
# dataset = <Dataset house_prices:1.0.0>

Up-version a dataset

from traintrack import list_datasets
imports pandas as pd

# List all datasets
datasets = list_datasets()
dataset = datasets.latest_version("house_prices")
# dataset = <Dataset house_prices:1.0.0> 

# Manipulate the dataset and create a new version
new_dataset = dataset.transform(name="house_prices", description="drop NaNs", version="1.0.1")

X = new_dataset.get_artefact("input_features_train")
y = new_dataset.get_artefact("output_train")

# Concatenate so we can drop rows with NaNs in either
combined = pd.concat([X, y], axis=1)
combined_clean = combined.dropna()

# Split back into X and y
X_clean = combined_clean[X.columns]
y_clean = combined_clean[y.columns]

# Save cleaned artefacts back to the dataset
new_dataset.set_artefact("input_features_train", X_clean)
new_dataset.set_artefact("output_train", y_clean)
new_dataset.save()
# new_dataset = <Dataset house_prices:1.0.1>

Train a model

from traintrack import Model

# Prepare the functions we need to orchestrate the model training
def setup_model(dataset, config):
    from sklearn.ensemble import RandomForestClassifier
    return RandomForestClassifier(n_estimators=config['n_estimators'])

def train_model(model_obj, dataset):
    X, y = dataset.artefacts["input_features_train"], dataset.artefacts["output_train"]
    model_obj.fit(X, y)
    return model_obj

def eval_model(model_obj, dataset):
    from sklearn.metrics import mean_absolute_error, mean_squared_error, root_mean_squared_error, r2_score
    X = dataset.artefacts["input_features_test"]
    y_true = dataset.artefacts["output_test"]
    y_pred = model_obj.predict(X)
    mae = mean_absolute_error(y_true, y_pred)
    mse = mean_squared_error(y_true, y_pred)
    rmse = root_mean_squared_error(y_true, y_pred)
    r2 = r2_score(y_true, y_pred)
    return {
        "mae": mae,
        "mse": mse,
        "rmse": rmse,
        "r2": r2,
    }

model = Model(None, "house_price_regressor", "1.0.0", "initial model", dataset=dataset, config={'n_estimators': 100})

model.setup(setup_model)
model.train(train_model)
model.eval(eval_model)
print(model.evaluation)
# {
#   'mae': 12500.0,
#   'mse': 212500000.0,
#   'rmse': 14577.379737113251,
#   'r2': 0.953360768175583
# }

model.save()
# model = <Model house_price_regressor:1.0.0>

Fetch a model

from traintrack import list_models

models = list_models()
loaded_model = models.latest_version('house_price_regressor')
eval = eval_model(loaded_model.trained_model, dataset) # same func as above
print(eval)
# {
#   'mae': 12500.0,
#   'mse': 212500000.0,
#   'rmse': 14577.379737113251,
#   'r2': 0.953360768175583
# }

>_ Other Tools

Using the traintrack cli, a number of commands are provided to explore and understand your MLOps.

See a list of available commands:

$ traintrack help

List datasets:

$ traintrack datasets

* 9f9b8055 - house_prices 1.0.0: Raw data
* 7b755226 - house_prices 1.0.1: Drop NaNs
|\
| * cd564c17 - house_prices 2.0.0: Change column: years -> months
| * 6d302698 - house_prices 2.1.0: Add more rows of same data type
* 1bbfbdf4 - house_prices 1.1.0: Add classification column

List models:

$ traintrack models

* 58f5849c - house_classifier 0.0.1: initial model
---                                                                                         
* 9120834a - house_price_regressor 1.0.0: initial model

📦 Run the Backplane (API, data stores, file stores, etc)

$ traintrack serve

Environment variables:

TRAINTRACK_DATABASE_URL – PostgreSQL connection string
TRAINTRACK_AUTH_NAME - The audience claim for the JWT, typically the API name/url you registered in your OIDC provider.
TRAINTRACK_CLIENT_ID - The client ID given by your OIDC provider.
TRAINTRACK_AUTH_URL - The base URL for auth'ing against your OIDC provider.

🧱 Backend Architecture

Architecture diagram

📅 Roadmap

Dataset tracking (Go backend + Python SDK)
Model versioning
Pipeline tracking and DAG visualization
Role-based access control
SCIM integration
Web UI

🤝 Contributing

Traintrack is early-stage and open to collaborators. Start by opening an issue or submitting a small PR.

📄 License

MIT

Project details

Release history Release notifications | RSS feed

This version

0.0.1.post2

Jun 27, 2025

0.0.1.post1

Jun 27, 2025

0.0.1.post0

Jun 27, 2025

0.0.1

Jun 27, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

traintrack_mlops-0.0.1.post2.tar.gz (11.4 kB view details)

Uploaded Jun 27, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

traintrack_mlops-0.0.1.post2-py3-none-any.whl (11.0 kB view details)

Uploaded Jun 27, 2025 Python 3

File details

Details for the file traintrack_mlops-0.0.1.post2.tar.gz.

File metadata

Download URL: traintrack_mlops-0.0.1.post2.tar.gz
Upload date: Jun 27, 2025
Size: 11.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for traintrack_mlops-0.0.1.post2.tar.gz
Algorithm	Hash digest
SHA256	`b8b2605b8d57082cc30f84c63f1e1b51e1be70fcd15ba8b4492e84e49c11b6d6`
MD5	`f6cc4db74193a26c4b67a5ac7e0d9f9e`
BLAKE2b-256	`bcbe5dd6fdec20fd6f62413c580a78c4e3f290cba2047dd14c79fe4e84f0e90e`

See more details on using hashes here.

Provenance

The following attestation bundles were made for traintrack_mlops-0.0.1.post2.tar.gz:

Publisher: publish-python-sdk.yml on heldtogether/traintrack

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: traintrack_mlops-0.0.1.post2.tar.gz
- Subject digest: b8b2605b8d57082cc30f84c63f1e1b51e1be70fcd15ba8b4492e84e49c11b6d6
- Sigstore transparency entry: 253211804
- Sigstore integration time: Jun 27, 2025
Source repository:
- Permalink: heldtogether/traintrack@8b09be6e2ec15fc7488c88687cc423cdfc494617
- Branch / Tag: refs/tags/v0.0.1
- Owner: https://github.com/heldtogether
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-python-sdk.yml@8b09be6e2ec15fc7488c88687cc423cdfc494617
- Trigger Event: push

File details

Details for the file traintrack_mlops-0.0.1.post2-py3-none-any.whl.

File metadata

Download URL: traintrack_mlops-0.0.1.post2-py3-none-any.whl
Upload date: Jun 27, 2025
Size: 11.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for traintrack_mlops-0.0.1.post2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b377abd01974f01161167501093e1fbe28d5392c3ee5db473cc85608cd8d0a92`
MD5	`6b1e6dd142cb794ecc58447028836d79`
BLAKE2b-256	`efb638aded65f7666e563d214b874ff8055d5117c1844963f293d4fd8bfc8cac`

See more details on using hashes here.

Provenance

The following attestation bundles were made for traintrack_mlops-0.0.1.post2-py3-none-any.whl:

Publisher: publish-python-sdk.yml on heldtogether/traintrack

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: traintrack_mlops-0.0.1.post2-py3-none-any.whl
- Subject digest: b377abd01974f01161167501093e1fbe28d5392c3ee5db473cc85608cd8d0a92
- Sigstore transparency entry: 253211807
- Sigstore integration time: Jun 27, 2025
Source repository:
- Permalink: heldtogether/traintrack@8b09be6e2ec15fc7488c88687cc423cdfc494617
- Branch / Tag: refs/tags/v0.0.1
- Owner: https://github.com/heldtogether
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-python-sdk.yml@8b09be6e2ec15fc7488c88687cc423cdfc494617
- Trigger Event: push

traintrack-mlops 0.0.1.post2

Navigation

Verified details

Maintainers

Unverified details

Project description

🚅 Traintrack

🚀 Features

🌍 Set up environment

🐍 Python SDK Usage

Create a dataset

Up-version a dataset

Train a model

Fetch a model

>_ Other Tools

📦 Run the Backplane (API, data stores, file stores, etc)

🧱 Backend Architecture

📅 Roadmap

🤝 Contributing

📄 License

Project details

Verified details

Maintainers

Unverified details

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance