Skip to main content

RAKAM SYSTEMS CLI

Project description

Rakam Eval CLI

A CLI for running LLM evaluations and tracking quality over time.


Quick Start

A typical workflow is:

  1. Write eval function
edit eval/my_eval.py 
  1. Run evaluation
rakam eval run
  1. View results
rakam eval show

Installation

pip install rakam-systems-cli

Writing Evaluations

Create an eval/ directory in your project. Each evaluation function must:

  • Be decorated with @eval_run
  • Return an EvalConfig object
# eval/examples.py
from rakam_systems_cli.decorators import eval_run
from rakam_systems_tools.evaluation.schema import (
    EvalConfig,
    TextInputItem,
    ClientSideMetricConfig,
    ToxicityConfig,
)

@eval_run
def test_simple_text_eval():
    """A simple text evaluation showcasing a basic client-side metric."""
    return EvalConfig(
        component="text_component_1",
        label="demo_simple_text",
        data=[
            TextInputItem(
                id="txt_001",
                input="Hello world",
                output="Hello world",
                expected_output="Hello world",
                metrics=[ClientSideMetricConfig(name="relevance", score=1)],
            )
        ],
        metrics=[ToxicityConfig(name="toxicity_demo", include_reason=False)],
    )

User Guide

Listing evaluations

rakam eval list evals

This shows all functions decorated with @eval_run in the eval/ directory.

Listing runs

This shows all runs hosted on the evaluation server.

rakam eval list runs

Comparing runs

Compare two runs to see what changed:

# Compare by IDs
rakam eval compare --id 42 --id 45

# Save comparison to file
rakam eval compare --id 42 --id 45 -o comparison.json

Command Reference

Full command reference (click to expand)

rakam eval list evals

Usage: rakam eval list evals [OPTIONS] [DIRECTORY]

 List evaluations (functions decorated with @eval_run).

╭─ Arguments ──────────────────────────────────────────────────────────────────╮
│   directory      [DIRECTORY]  Directory to scan (default: ./eval)            │
│                               [default: eval]                                │
╰──────────────────────────────────────────────────────────────────────────────╯
╭─ Options ────────────────────────────────────────────────────────────────────╮
│ --recursive  -r        Recursively search for Python files                   │
│ --help                 Show this message and exit.                           │
╰──────────────────────────────────────────────────────────────────────────────╯

rakam eval list runs

Usage: rakam eval list runs [OPTIONS]

 List runs (newest first).

╭─ Options ────────────────────────────────────────────────────────────────────╮
│ --limit   -l      INTEGER  Max number of runs [default: 20]                  │
│ --offset          INTEGER  Pagination offset [default: 0]                    │
│ --help                     Show this message and exit.                       │
╰──────────────────────────────────────────────────────────────────────────────╯

rakam eval run

Usage: rakam eval run [OPTIONS] [DIRECTORY]

 Execute evaluations (functions decorated with @eval_run).

╭─ Arguments ──────────────────────────────────────────────────────────────────╮
│   directory      [DIRECTORY]  Directory to scan (default: ./eval)            │
│                               [default: eval]                                │
╰──────────────────────────────────────────────────────────────────────────────╯
╭─ Options ────────────────────────────────────────────────────────────────────╮
│ --recursive   -r            Recursively search for Python files              │
│ --dry-run                   Only list functions without executing them       │
│ --save-runs                 Save each run result to a JSON file              │
│ --output-dir          PATH  Directory where run results are saved            │
│                             [default: eval_runs]                             │
│ --help                      Show this message and exit.                      │
╰──────────────────────────────────────────────────────────────────────────────╯

rakam eval show

Usage: rakam eval show [OPTIONS]

 Show a run by ID or tag. Without arguments, shows the most recent run.

╭─ Options ────────────────────────────────────────────────────────────────────╮
│ --id    -i      INTEGER  Run ID                                              │
│ --tag   -t      TEXT     Run tag                                             │
│ --raw                    Print raw JSON instead of formatted output          │
│ --help                   Show this message and exit.                         │
╰──────────────────────────────────────────────────────────────────────────────╯

rakam eval compare

Usage: rakam eval compare [OPTIONS]

 Compare two evaluation runs.

 Default: unified git diff

╭─ Options ────────────────────────────────────────────────────────────────────╮
│ --tag           -t      TEXT     Run tag                                     │
│ --id            -i      INTEGER  Run ID                                      │
│ --summary                        Show summary diff only                      │
│ --side-by-side                   Show side-by-side diff (git)                │
│ --help                           Show this message and exit.                 │
╰──────────────────────────────────────────────────────────────────────────────╯

rakam eval tag

Usage: rakam eval tag [OPTIONS]

 Assign a tag to a run or delete a tag.

╭─ Options ────────────────────────────────────────────────────────────────────╮
│ --id      -i      INTEGER  Run ID                                            │
│ --tag     -t      TEXT     Tag to assign to the run                          │
│ --delete          TEXT     Delete a tag                                      │
│ --help                     Show this message and exit.                       │
╰──────────────────────────────────────────────────────────────────────────────╯

rakam eval metrics list

Usage: rakam eval metrics list [OPTIONS] [DIRECTORY]

 List all metric types used by loaded eval configs.

╭─ Arguments ──────────────────────────────────────────────────────────────────╮
│   directory      [DIRECTORY]  Directory to scan (default: ./eval)            │
│                               [default: eval]                                │
╰──────────────────────────────────────────────────────────────────────────────╯
╭─ Options ────────────────────────────────────────────────────────────────────╮
│ --recursive  -r        Recursively search for Python files                   │
│ --help                 Show this message and exit.                           │
╰──────────────────────────────────────────────────────────────────────────────╯

Documentation

For the full user guide, see the official documentation.

License

See main project LICENSE file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rakam_systems_cli-0.2.5rc2.tar.gz (15.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rakam_systems_cli-0.2.5rc2-py3-none-any.whl (20.6 kB view details)

Uploaded Python 3

File details

Details for the file rakam_systems_cli-0.2.5rc2.tar.gz.

File metadata

File hashes

Hashes for rakam_systems_cli-0.2.5rc2.tar.gz
Algorithm Hash digest
SHA256 44dd2068863c715e2fbf12baaa18358190f680faeb5d448b8edf668b8348afa2
MD5 f0181219f0ed49369af925e4b9e39d60
BLAKE2b-256 a92ae1f2b158940c625f44a58be4c0889a78159ddd992e19bdfe73735d1b360e

See more details on using hashes here.

File details

Details for the file rakam_systems_cli-0.2.5rc2-py3-none-any.whl.

File metadata

File hashes

Hashes for rakam_systems_cli-0.2.5rc2-py3-none-any.whl
Algorithm Hash digest
SHA256 c3465ec7622b39a5025580bc0c51c048c86239ab94c5a7c0587a5cc6d4136bda
MD5 b4d5ff2e22b8de702c91c6f425fd25a9
BLAKE2b-256 0451593363a833db4cd1a5649444665932e513eaf1c0c17b91986dfda5801ef0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page