Skip to main content

High-performance columnar scan engine for LLM logs stored as JSONL

Project description

LLMLog Engine

A high-performance columnar scan engine for LLM logs stored as JSONL. Built in C++ with SIMD-friendly data structures, exposed via Python bindings.

Overview

LLMLog Engine is a specialized, embedded columnar database designed specifically for analyzing LLM application logs. It provides:

  • Fast JSONL ingestion into columnar format
  • Efficient filtering on numeric and string columns
  • Group-by aggregations (COUNT, SUM, AVG, MIN, MAX)
  • Dictionary encoding for low-cardinality string columns
  • SIMD-friendly memory layout for future performance optimization

The core is implemented in C++17 with columnar storage, while the user-facing API is clean Python with pandas integration.

Installation

From Source (Development)

git clone <repo>
cd llmlog_engine
pip install -e .

Requires:

  • Python 3.8+
  • C++17 compiler
  • cmake 3.15+
  • pybind11 (installed via pip)

Quick Start

from llmlog_engine import LogStore

# Load JSONL logs
store = LogStore.from_jsonl("logs.jsonl")

# Create a query
result = (store.query()
    .filter(model="gpt-4.1", min_latency_ms=1000)
    .aggregate(
        by=["model", "route"],
        metrics={
            "count": "count",
            "avg_latency": "avg(latency_ms)",
            "avg_tokens_out": "avg(tokens_output)"
        }
    ))

print(result)

Supported Fields

The engine expects JSONL records with these fields:

Field Type Notes
ts string Timestamp (ISO 8601 or custom format)
session_id string Session identifier
model string Model name (dictionary-encoded)
latency_ms int Response latency in milliseconds
tokens_input int Input token count
tokens_output int Output token count
route string API route/endpoint (dictionary-encoded)
status string Response status: "ok", "error", etc. (dictionary-encoded)
error_type string Error category (optional)
tags array Metadata tags (future support)

All fields are optional with sensible defaults.

API Reference

LogStore

Main table class for columnar storage.

LogStore.from_jsonl(path: str) -> LogStore

Load a JSONL file into the store.

store = LogStore.from_jsonl("logs.jsonl")

row_count() -> int

Get number of loaded rows.

n = store.row_count()

basic_stats() -> dict

Get basic statistics (min, max, avg latency; cardinalities).

stats = store.basic_stats()
print(stats["latency_ms_min"])

query() -> Query

Create a new query builder.

q = store.query()

Query

Query builder for filtering and aggregation.

filter(**kwargs) -> Query

Add filter predicates. All filters are combined with AND logic.

Supported filter parameters:

  • model (str): Exact match on model name
  • route (str): Exact match on route
  • status (str): Exact match on status
  • min_latency_ms (int): Minimum latency
  • max_latency_ms (int): Maximum latency
  • min_tokens_input (int): Minimum input tokens
  • max_tokens_input (int): Maximum input tokens
  • min_tokens_output (int): Minimum output tokens
  • max_tokens_output (int): Maximum output tokens
q = store.query().filter(
    model="gpt-4.1",
    min_latency_ms=1000,
    route="chat"
)

aggregate(by: list[str], metrics: dict[str, str]) -> pd.DataFrame

Compute aggregations grouped by specified columns.

Metric expressions:

  • "count" — Row count
  • "sum(column)" — Sum of numeric column
  • "avg(column)" — Average of numeric column
  • "min(column)" — Minimum value
  • "max(column)" — Maximum value
result = q.aggregate(
    by=["model", "route"],
    metrics={
        "count": "count",
        "avg_latency": "avg(latency_ms)",
        "max_latency": "max(latency_ms)",
        "total_output": "sum(tokens_output)"
    }
)
# Returns pandas DataFrame

If by is omitted or empty, aggregates over all matched rows:

result = store.query().aggregate(
    metrics={"count": "count", "avg_latency": "avg(latency_ms)"}
)

Example Usage

Filter and Group by Model

from llmlog_engine import LogStore

store = LogStore.from_jsonl("production_logs.jsonl")

# Analyze slow responses by model
slow_by_model = (store.query()
    .filter(min_latency_ms=500)
    .aggregate(
        by=["model"],
        metrics={
            "count": "count",
            "avg_latency": "avg(latency_ms)",
            "min_latency": "min(latency_ms)",
            "max_latency": "max(latency_ms)"
        }
    ))

print(slow_by_model)

Multi-Dimension Analysis

# Analyze error rates by model and route
errors_by_model_route = (store.query()
    .filter(status="error")
    .aggregate(
        by=["model", "route"],
        metrics={"count": "count"}
    ))

print(errors_by_model_route)

Summary Statistics

# Overall stats
stats = store.basic_stats()
print(f"Total rows: {stats['row_count']}")
print(f"Avg latency: {stats['latency_ms_avg']:.1f}ms")
print(f"Max latency: {stats['latency_ms_max']}ms")
print(f"Unique models: {stats['model_cardinality']}")

Performance

Architecture Optimizations

  1. Columnar Storage: Data organized by column, not row. Enables:

    • Efficient filtering on single columns
    • Better CPU cache utilization
    • Vectorization opportunities
  2. Dictionary Encoding: Low-cardinality string columns (model, route, status) mapped to int32 IDs:

    • Faster equality comparisons
    • Smaller memory footprint
    • Consistent performance regardless of string length
  3. Contiguous Numeric Arrays: int32_t columns stored as dense vectors:

    • SIMD-friendly layout
    • Efficient range filtering
    • Minimal memory overhead

Benchmark Results

On a 100,000-row log file:

Pure Python:     0.8234s
C++ Engine:      0.1205s
Speedup:         6.8x faster

Query: Filter by model + latency, group by route, compute 6 metrics.

Architecture

User Code
    │
    ├─ Python API (LogStore, Query)
    │  └─ pandas DataFrame output
    │
    └─ C++ Core (_llmlog_engine module)
       ├─ DictionaryColumn (strings + int32 IDs)
       ├─ NumericColumn<T> (contiguous arrays)
       └─ LogStore (main engine)
          ├─ ingest_from_jsonl()
          ├─ apply_filter() → boolean mask
          └─ aggregate() → grouped metrics

Memory Layout

Columnar format (after ingestion):

Column: model       [0, 1, 0, 2, 0, ...]  (int32 IDs)
Column: route       [0, 1, 0, 1, 0, ...]  (int32 IDs)
Column: latency_ms  [423, 1203, 512, ...]  (int32)
Column: tokens_out  [921, 214, 512, ...]   (int32)

Dictionary: model   {0: "gpt-4.1-mini", 1: "gpt-4.1", 2: "gpt-4-turbo"}
Dictionary: route   {0: "chat", 1: "rag"}

Limitations (v0)

  • In-memory only (no persistence or external storage yet)
  • No SQL-like expression parser (use Python kwargs for filters)
  • No support for complex data types (arrays, nested objects)
  • Single-threaded query execution
  • No distributed processing

Future Enhancements

  1. On-disk columnar format (memory-mapped access)
  2. Query expression parser for string-based filters
  3. Parallel scan/aggregation with thread pool
  4. SIMD micro-optimizations for filter loops
  5. Compression for numeric columns
  6. Support for timestamp parsing and range filters
  7. Approximate aggregations for large datasets

Development

Build from Source

mkdir build && cd build
cmake ..
make

Run Tests

pytest tests/test_basic.py -v

Run Benchmarks

python tests/test_bench.py

Implementation Notes

Dictionary Encoding

String columns like model, route, and status are dictionary-encoded:

  1. First occurrence of "gpt-4.1" gets ID 0, second occurrence also uses ID 0
  2. Comparisons done on int32 IDs (much faster)
  3. String storage is deduplicated

This is transparent to the user:

# You write:
store.query().filter(model="gpt-4.1")

# The engine internally:
# 1. Looks up "gpt-4.1" in dictionary → ID 1
# 2. Compares integer column against 1
# 3. Returns matching rows

Filter Predicates

Filters are applied using a boolean mask:

std::vector<bool> mask(row_count_, true);  // Initially all true
for (const auto& predicate : predicates) {
    // For each row, evaluate predicate
    // Update mask: mask[i] &= matches_predicate(row_i)
}
// Now mask[i] = true if row_i matches ALL predicates (AND logic)

Aggregation

Once the mask is computed, aggregations scan only matching rows:

for (const auto& [group_key, row_indices] : groups) {
    for (size_t idx : row_indices) {
        // Sum, average, min, max operations
    }
}

License

MIT

Contributing

Pull requests welcome! Please include:

  • Tests for new features
  • Updated documentation
  • Benchmark results for performance changes

Contact

Questions or issues? Open a GitHub issue.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llmlog_engine-1.0.0.tar.gz (41.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llmlog_engine-1.0.0-cp314-cp314-macosx_15_0_arm64.whl (131.3 kB view details)

Uploaded CPython 3.14macOS 15.0+ ARM64

File details

Details for the file llmlog_engine-1.0.0.tar.gz.

File metadata

  • Download URL: llmlog_engine-1.0.0.tar.gz
  • Upload date:
  • Size: 41.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for llmlog_engine-1.0.0.tar.gz
Algorithm Hash digest
SHA256 9f29fea7b0a92544248d2d3f65a2b904fc8530b94c82a54d27f609f4afe151d6
MD5 4534c077dd8938a02c7aad12385639b2
BLAKE2b-256 93b1d08ac8a9e89e1fae4c75f1e5b9e7aff2ad39ccfed5495d24ebb470a44bf5

See more details on using hashes here.

File details

Details for the file llmlog_engine-1.0.0-cp314-cp314-macosx_15_0_arm64.whl.

File metadata

File hashes

Hashes for llmlog_engine-1.0.0-cp314-cp314-macosx_15_0_arm64.whl
Algorithm Hash digest
SHA256 0e8109c3726ebf10feff265acc10ba0976d46d1622650cada62fae5b26cd4ceb
MD5 3b879dddcfb062b142c62a5be33b22cc
BLAKE2b-256 35d66c3243ae6b5bd6b9f305611b681c506f2a92d5279d59fa7e1e6aa2b23144

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page