Skip to main content

RAKAM SYSTEMS CLI

Project description

Rakam Eval CLI

A CLI for running LLM evaluations and tracking quality over time.


Quick Start

A typical workflow is:

  1. Write eval function
edit eval/my_eval.py 
  1. Run evaluation
rakam eval run
  1. View results
rakam eval show

Installation

pip install rakam-systems-cli

Writing Evaluations

Create an eval/ directory in your project. Each evaluation function must:

  • Be decorated with @eval_run
  • Return an EvalConfig object
# eval/examples.py
from rakam_systems_cli.decorators import eval_run
from rakam_systems_tools.evaluation.schema import (
    EvalConfig,
    TextInputItem,
    ClientSideMetricConfig,
    ToxicityConfig,
)

@eval_run
def test_simple_text_eval():
    """A simple text evaluation showcasing a basic client-side metric."""
    return EvalConfig(
        component="text_component_1",
        label="demo_simple_text",
        data=[
            TextInputItem(
                id="txt_001",
                input="Hello world",
                output="Hello world",
                expected_output="Hello world",
                metrics=[ClientSideMetricConfig(name="relevance", score=1)],
            )
        ],
        metrics=[ToxicityConfig(name="toxicity_demo", include_reason=False)],
    )

User Guide

Listing evaluations

rakam eval list evals

This shows all functions decorated with @eval_run in the eval/ directory.

Listing runs

This shows all runs hosted on the evaluation server.

rakam eval list runs

Comparing runs

Compare two runs to see what changed:

# Compare by IDs
rakam eval compare --id 42 --id 45

# Save comparison to file
rakam eval compare --id 42 --id 45 -o comparison.json

Command Reference

Full command reference (click to expand)

rakam eval list evals

Usage: rakam eval list evals [OPTIONS] [DIRECTORY]

 List evaluations (functions decorated with @eval_run).

╭─ Arguments ──────────────────────────────────────────────────────────────────╮
│   directory      [DIRECTORY]  Directory to scan (default: ./eval)            │
│                               [default: eval]                                │
╰──────────────────────────────────────────────────────────────────────────────╯
╭─ Options ────────────────────────────────────────────────────────────────────╮
│ --recursive  -r        Recursively search for Python files                   │
│ --help                 Show this message and exit.                           │
╰──────────────────────────────────────────────────────────────────────────────╯

rakam eval list runs

Usage: rakam eval list runs [OPTIONS]

 List runs (newest first).

╭─ Options ────────────────────────────────────────────────────────────────────╮
│ --limit   -l      INTEGER  Max number of runs [default: 20]                  │
│ --offset          INTEGER  Pagination offset [default: 0]                    │
│ --help                     Show this message and exit.                       │
╰──────────────────────────────────────────────────────────────────────────────╯

rakam eval run

Usage: rakam eval run [OPTIONS] [DIRECTORY]

 Execute evaluations (functions decorated with @eval_run).

╭─ Arguments ──────────────────────────────────────────────────────────────────╮
│   directory      [DIRECTORY]  Directory to scan (default: ./eval)            │
│                               [default: eval]                                │
╰──────────────────────────────────────────────────────────────────────────────╯
╭─ Options ────────────────────────────────────────────────────────────────────╮
│ --recursive   -r            Recursively search for Python files              │
│ --dry-run                   Only list functions without executing them       │
│ --save-runs                 Save each run result to a JSON file              │
│ --output-dir          PATH  Directory where run results are saved            │
│                             [default: eval_runs]                             │
│ --help                      Show this message and exit.                      │
╰──────────────────────────────────────────────────────────────────────────────╯

rakam eval show

Usage: rakam eval show [OPTIONS]

 Show a run by ID or tag. Without arguments, shows the most recent run.

╭─ Options ────────────────────────────────────────────────────────────────────╮
│ --id    -i      INTEGER  Run ID                                              │
│ --tag   -t      TEXT     Run tag                                             │
│ --raw                    Print raw JSON instead of formatted output          │
│ --help                   Show this message and exit.                         │
╰──────────────────────────────────────────────────────────────────────────────╯

rakam eval compare

Usage: rakam eval compare [OPTIONS]

 Compare two evaluation runs.

 Default: unified git diff

╭─ Options ────────────────────────────────────────────────────────────────────╮
│ --tag           -t      TEXT     Run tag                                     │
│ --id            -i      INTEGER  Run ID                                      │
│ --summary                        Show summary diff only                      │
│ --side-by-side                   Show side-by-side diff (git)                │
│ --help                           Show this message and exit.                 │
╰──────────────────────────────────────────────────────────────────────────────╯

rakam eval tag

Usage: rakam eval tag [OPTIONS]

 Assign a tag to a run or delete a tag.

╭─ Options ────────────────────────────────────────────────────────────────────╮
│ --id      -i      INTEGER  Run ID                                            │
│ --tag     -t      TEXT     Tag to assign to the run                          │
│ --delete          TEXT     Delete a tag                                      │
│ --help                     Show this message and exit.                       │
╰──────────────────────────────────────────────────────────────────────────────╯

rakam eval metrics list

Usage: rakam eval metrics list [OPTIONS] [DIRECTORY]

 List all metric types used by loaded eval configs.

╭─ Arguments ──────────────────────────────────────────────────────────────────╮
│   directory      [DIRECTORY]  Directory to scan (default: ./eval)            │
│                               [default: eval]                                │
╰──────────────────────────────────────────────────────────────────────────────╯
╭─ Options ────────────────────────────────────────────────────────────────────╮
│ --recursive  -r        Recursively search for Python files                   │
│ --help                 Show this message and exit.                           │
╰──────────────────────────────────────────────────────────────────────────────╯

Documentation

For the full user guide, see the official documentation.

License

See main project LICENSE file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rakam_systems_cli-0.2.6.tar.gz (15.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rakam_systems_cli-0.2.6-py3-none-any.whl (20.6 kB view details)

Uploaded Python 3

File details

Details for the file rakam_systems_cli-0.2.6.tar.gz.

File metadata

  • Download URL: rakam_systems_cli-0.2.6.tar.gz
  • Upload date:
  • Size: 15.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.6

File hashes

Hashes for rakam_systems_cli-0.2.6.tar.gz
Algorithm Hash digest
SHA256 f3ff1ba2c7dd3b72eb0d7ed75e6d1f36bc2de77837f2e0ad5d34811f69f61c55
MD5 04e6c30e5f63b3776926dcf4fccc1aa8
BLAKE2b-256 879144d1ef3a61d1c6b0e4fc08fe0b03e7dd70345a856e84b73b6b83bbfa6c27

See more details on using hashes here.

File details

Details for the file rakam_systems_cli-0.2.6-py3-none-any.whl.

File metadata

File hashes

Hashes for rakam_systems_cli-0.2.6-py3-none-any.whl
Algorithm Hash digest
SHA256 3538f0658a24fdd594c2fb6a24a34477b1c2db4eb7b8c62a43eeb30e8d85c6d2
MD5 923f751acd2946508c57d64ecc959f72
BLAKE2b-256 daf8e009a534b91034a03eea0fb56ff99c1b0864da093eb398087bcf281c2603

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page