Skip to main content

ML experiment lineage tracker with snapshot-based reproducibility.

Project description

AIline logo

AIline

ML experiment lineage tracker with snapshot-based reproducibility.

Documentation

AIline captures the exact code that produced an experiment — including uncommitted changes — alongside DVC-managed data and MLflow run metadata, so past experiments can be inspected and (eventually) re-run with confidence.

Install (editable / development)

poetry install
# or, with pip
pip install -e .

Quick start (your own project)

pip install ailine-core                             # or: poetry add --group dev ailine-core
cd /path/to/your/repo
ailine init-workspace                               # writes default .ailine.yml + .ailineignore
ailine doctor                                       # green-light all checks
ailine track -- python train.py --epochs 5          # run + record
ailine status --verbose                             # see what was captured

Releases are published to PyPI when a v*.*.* tag is pushed (see .github/workflows/release.yml).

ailine track -- is the primary, no-magic interface. It snapshots dirty state, records DVC linkage + environment fingerprint + the exact argv, runs your command in the repo root, and propagates the exit code. Your training script keeps full ownership of MLflow runs (track.mlflow.mode: inherit). See docs/track-contract.md for the schema.

Demo / tutorial flow

The legacy "clone a sample repo and pretend-train" flow is still available behind explicit *-demo commands:

ailine init-demo <git_repo_url>     # clone into ./repo
ailine run --script train.py        # demo: wraps in MLflow, records snapshot
ailine reset-demo                   # remove ./repo, DB, mlruns/

CLI command summary

Command Purpose
ailine init-workspace [--force] Bootstrap the pip-install workflow: write a default .ailine.yml and ensure state directories. No clone.
ailine doctor [--json] [--strict] [--config PATH] Validate .ailine.yml and the local environment. The single source of truth for "is my setup OK".
ailine track [--config PATH] [--run-name NAME] [--name NAME] -- <argv...> Run a command under AIline tracking. The argv after -- is executed verbatim from the repo root. The lineage row is published with status=in_progress before the child starts (and the MLflow run id, in wrap mode, is printed alongside) so live runs are visible in ailine status and the web UI from second zero. Snapshot location is configured via snapshot.storage_dir in .ailine.yml (or AILINE_STORAGE_DIR).
ailine restore <snapshot_id> [--config PATH] [--dry-run] [--force] Restore the worktree to the exact state captured by <snapshot_id> (strict sync: extra files in scope are removed; .git and .ailine are always preserved). Aborts on a dirty worktree unless --force; --dry-run previews the write/delete plan without touching the filesystem.
ailine status [--verbose] List recorded runs: default output includes full record_id and parent lines (copy/paste for restore); --verbose dumps all fields. Errors clearly when the DB does not exist yet.
ailine serve Start the MLflow UI subprocess and the Flask app together (ports 5001 and 5000).
ailine remove <id> [--with-mlflow true|false] [--dry-run] [--config PATH] Delete one lineage record and its on-disk fan-out (manifest, metadata, diff, plus content-addressed objects only this row owned). --with-mlflow overrides cleanup.remove.with_mlflow from .ailine.yml (default false). --dry-run prints the plan without changes.
ailine purge [--dry-run] [--config PATH] Remove all AIline state and workspace config from the project: .ailine/, .ailine.yml, .ailineignore, plus any non-default snapshot storage_dir configured outside .ailine/. Leaves mlruns/ and repo/ untouched. Asks Confirm? [y/N] before deleting; --dry-run skips the prompt and prints the plan only.
ailine init-demo <repo_url> Clone a sample repo into ./repo and persist the URL in ailine_config.txt (tutorial flow).
ailine run --script <s> [--dataset <d>] [--dvc-add] [--name NAME] Demo wrapper around track that hard-codes ./repo and forces mlflow.mode=wrap.
ailine reset-demo Delete demo artifacts (./repo, DB, mlruns/, default snapshot dir, temp_*).

By default MLflow writes runs to a local file store under ./mlruns (no tracking server required). Override with AILINE_MLFLOW_URI if you use a remote or local REST tracking server.

For the Flask UI plus MLflow UI together (localhost tracking API on port 5001):

export AILINE_MLFLOW_URI=http://localhost:5001
ailine serve    # MLflow UI + Flask on :5001 / :5000 in one process

Then open http://localhost:5000/ for ailine (the unified Lineage dashboard) and http://localhost:5001 for MLflow.

The legacy paths http://localhost:5000/commits and http://localhost:5000/experiments now redirect (302) to / for backward compatibility.

Code browser (commit / snapshot views)

The /commit/<id> and /snapshot/<id> pages render a left-hand file tree with a single-file blob view on the right. Use ?path=<rel/path> to deep-link to a specific file. Blobs and patches are capped at the first 512 KiB; oversized or binary files are flagged in the header. Snapshots add a ?view=diff tab that renders the stored unified patch (diff_path) against the parent commit, split into one card per file (split on diff --git headers) for readability.

Configuration

Env var Purpose
AILINE_MLFLOW_URI MLflow tracking backend (default: file://…/mlruns under the project)
AILINE_MLFLOW_UI_BASE Base URL for Run ID links in the ailine web UI (default: http://127.0.0.1:5001). When unset and tracking is http(s), same scheme/host as AILINE_MLFLOW_URI is used.

Run links only work if an MLflow UI is reachable at that base URL (for example mlflow ui --backend-store-uri "$(pwd)/mlruns" --host 127.0.0.1 --port 5001).

Project-level behaviour lives in .ailine.yml at the repository root (large-file policy, DVC linkage settings, environment fingerprint packages, run-capture toggle, plus the project: and track: blocks for the ailine track -- workflow). Snapshot ignore patterns are configured separately in .ailineignore (gitignore syntax) — see docs/track-contract.md.

AIline's own auto-generated artifacts (lineage DB, log file, demo bookkeeping) live under .ailine/ next to .ailine/snapshots/ so the project root stays clean. User-owned paths (mlruns/, repo/, .ailine.yml, .ailineignore) are never relocated. On first run inside an older checkout AIline transparently moves any legacy root-level artifacts (ailine_tree.db, ailine.log, ailine_config.txt) into .ailine/.

Cleanup commands

ailine remove <id> deletes one lineage record and its on-disk fan-out:

  • the lineage row in .ailine/tree.db;
  • <id>.manifest.json, <id>.metadata.json, <id>.diff.patch in the storage dir;
  • any content-addressed objects under <storage_dir>/objects/ that only this row referenced — shared objects survive.

By default the linked MLflow run is not deleted. Override with the CLI or with a project-level default in .ailine.yml:

cleanup:
  remove:
    with_mlflow: false   # default; set to true to also delete linked MLflow runs

Resolution order: explicit --with-mlflow true|false on the CLI wins, then cleanup.remove.with_mlflow in .ailine.yml, then the built-in default false. Use ailine remove <id> --dry-run to preview without changes.

ailine purge is the project-wide reset: it removes .ailine/, .ailine.yml, .ailineignore, and any non-default snapshot storage_dir configured outside .ailine/. mlruns/ and repo/ are intentionally left alone (those belong to the user). purge always asks All AIline files listed above will be removed. Confirm? [y/N]; pass --dry-run to print the plan and skip the prompt entirely.

Limitations

Real-time MLflow linking (track.mlflow.link_strategy)

AIline links each lineage row to the user's MLflow run without requiring any import ailine in the training script. The default mechanism is a deterministic correlation tag:

  1. ailine track generates a per-invocation AILINE_CORRELATION_ID (UUID) and exports it to the child process.
  2. AIline ships a tiny MLflow plugin (AilineRunContextProvider, auto-discovered via the mlflow.run_context_provider entry point) that tags every run started in that child with ailine.correlation_id=<uuid>.
  3. The session loop polls MLflow (default cadence: track.mlflow.link_poll_seconds=3.0) for that tag. The first match wins and the lineage row's mlflow_run column is updated mid-flight.

Strategies live under track.mlflow.link_strategy in .ailine.yml:

  • tag (default) — the flow above. Zero client code changes, no run id ownership. Requires AIline to be installed in the same Python venv as your training script so MLflow loads the plugin.
  • prelink — legacy: AIline pre-creates the MLflow run and exports MLFLOW_RUN_ID. Brittle when the configured experiment is missing or deleted; kept for users who explicitly want AIline to own the run id.
  • none — skip live linking entirely; AIline still falls back to a best-effort post-hoc lookup at the end of the run.

Troubleshooting

  • Empty MLflow column even after the run finishes: AIline must be installed in the same venv as the training script so its run_context_provider plugin is auto-loaded by MLflow. From that venv, python -c "import ailine.integrations.mlflow_plugin" should succeed.
  • Wrong tracking server: ailine init-workspace now prints the resolved tracking URI / UI base / storage dir with their source labels and a copy-pasteable export AILINE_MLFLOW_URI=... snippet. Pin those in your shell rc so AIline and your script always talk to the same server.
  • Multiple AIline-launched runs against one MLflow server: each carries its own correlation id, so links stay deterministic regardless of concurrency.
  • prelink users seeing INVALID_PARAMETER_VALUE: experiment ... is deleted: the legacy prelink strategy fails when the resolved MLflow experiment is in a deleted state. Switch to link_strategy: tag (the default) or set MLFLOW_EXPERIMENT_NAME to an active experiment.

Layout

ailine/
  cli/             # Click entry point + terminal formatters
  config/          # .ailine.yml loaders + defaults + path constants
  fingerprint/     # environment fingerprint
  integrations/    # MLflow UI subprocess, git URL helpers
  linkage/         # DVC discovery + linkage classification
  persistence/     # SQLite schema, migrations, repository facade
  run/             # CLI run-command capture
  snapshot/        # repo scan, manifest, content-addressed objects
  web/             # Flask app factory + route modules + templates

Releasing

The package version is derived from the latest git tag via poetry-dynamic-versioning. There is no manual version = ... bump in pyproject.toml; the tag is the version.

One-time, on each developer machine:

poetry self add "poetry-dynamic-versioning[plugin]"

Local dry-run before tagging (runs tests, builds sdist + wheel, smoke-tests the wheel in a throwaway venv):

bash scripts/release-check.sh

Cut a release:

git tag v0.2.0
git push origin v0.2.0

Pushing a v*.*.* tag triggers .github/workflows/release.yml, which runs the test suite, calls poetry build, and publishes a GitHub Release with the dist/*.tar.gz and dist/*.whl attached and auto-generated notes.

Pre-releases follow PEP 440 (matched by the configured tag pattern): v0.2.0a1, v0.2.0b2, v0.2.0rc1.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ailine_core-0.5.5.tar.gz (1.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ailine_core-0.5.5-py3-none-any.whl (1.6 MB view details)

Uploaded Python 3

File details

Details for the file ailine_core-0.5.5.tar.gz.

File metadata

  • Download URL: ailine_core-0.5.5.tar.gz
  • Upload date:
  • Size: 1.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.5 CPython/3.12.3 Linux/6.17.0-1013-azure

File hashes

Hashes for ailine_core-0.5.5.tar.gz
Algorithm Hash digest
SHA256 84ac37ab8c4b238d8b4b445bc5b6f4d89cd6841a1892d2be1446e501d9b2b3a5
MD5 673ddc35b85a12289c58e9d0d4e028ac
BLAKE2b-256 9f74e2ee4d90d2fee74da811f79b3d960bdb2e96700b2da166c93ed97b1c8cca

See more details on using hashes here.

File details

Details for the file ailine_core-0.5.5-py3-none-any.whl.

File metadata

  • Download URL: ailine_core-0.5.5-py3-none-any.whl
  • Upload date:
  • Size: 1.6 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.5 CPython/3.12.3 Linux/6.17.0-1013-azure

File hashes

Hashes for ailine_core-0.5.5-py3-none-any.whl
Algorithm Hash digest
SHA256 c201a9e6b54066df6bff26d29854c4e390090e532798cf340fb305921b58a677
MD5 59f4d74c1595eaa7d885521144a1df87
BLAKE2b-256 21097ed4320e005a31efac9c66b525025698548e947d0c18a3e22a54081f10c8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page