High-performance columnar scan engine for LLM logs stored as JSONL

These details have not been verified by PyPI

Project links

Repository

Project description

LLMLog Engine

A high-performance columnar scan engine for LLM logs stored as JSONL. Built in C++ with SIMD-friendly data structures, exposed via Python bindings.

Overview

LLMLog Engine is a specialized, embedded columnar database designed specifically for analyzing LLM application logs. It provides:

Fast JSONL ingestion into columnar format
Efficient filtering on numeric and string columns
Group-by aggregations (COUNT, SUM, AVG, MIN, MAX)
Dictionary encoding for low-cardinality string columns
SIMD-friendly memory layout for future performance optimization

The core is implemented in C++17 with columnar storage, while the user-facing API is clean Python with pandas integration.

Installation

From Source (Development)

git clone <repo>
cd llmlog_engine
pip install -e .

Requires:

Python 3.8+
C++17 compiler
cmake 3.15+
pybind11 (installed via pip)

Quick Start

from llmlog_engine import LogStore

# Load JSONL logs
store = LogStore.from_jsonl("logs.jsonl")

# Create a query
result = (store.query()
    .filter(model="gpt-4.1", min_latency_ms=1000)
    .aggregate(
        by=["model", "route"],
        metrics={
            "count": "count",
            "avg_latency": "avg(latency_ms)",
            "avg_tokens_out": "avg(tokens_output)"
        }
    ))

print(result)

Supported Fields

The engine expects JSONL records with these fields:

Field	Type	Notes
`ts`	string	Timestamp (ISO 8601 or custom format)
`session_id`	string	Session identifier
`model`	string	Model name (dictionary-encoded)
`latency_ms`	int	Response latency in milliseconds
`tokens_input`	int	Input token count
`tokens_output`	int	Output token count
`route`	string	API route/endpoint (dictionary-encoded)
`status`	string	Response status: "ok", "error", etc. (dictionary-encoded)
`error_type`	string	Error category (optional)
`tags`	array	Metadata tags (future support)

All fields are optional with sensible defaults.

API Reference

LogStore

Main table class for columnar storage.

`LogStore.from_jsonl(path: str) -> LogStore`

Load a JSONL file into the store.

store = LogStore.from_jsonl("logs.jsonl")

`row_count() -> int`

Get number of loaded rows.

n = store.row_count()

`basic_stats() -> dict`

Get basic statistics (min, max, avg latency; cardinalities).

stats = store.basic_stats()
print(stats["latency_ms_min"])

`query() -> Query`

Create a new query builder.

q = store.query()

Query

Query builder for filtering and aggregation.

`filter(**kwargs) -> Query`

Add filter predicates. All filters are combined with AND logic.

Supported filter parameters:

model (str): Exact match on model name
route (str): Exact match on route
status (str): Exact match on status
min_latency_ms (int): Minimum latency
max_latency_ms (int): Maximum latency
min_tokens_input (int): Minimum input tokens
max_tokens_input (int): Maximum input tokens
min_tokens_output (int): Minimum output tokens
max_tokens_output (int): Maximum output tokens

q = store.query().filter(
    model="gpt-4.1",
    min_latency_ms=1000,
    route="chat"
)

`aggregate(by: list[str], metrics: dict[str, str]) -> pd.DataFrame`

Compute aggregations grouped by specified columns.

Metric expressions:

"count" — Row count
"sum(column)" — Sum of numeric column
"avg(column)" — Average of numeric column
"min(column)" — Minimum value
"max(column)" — Maximum value

result = q.aggregate(
    by=["model", "route"],
    metrics={
        "count": "count",
        "avg_latency": "avg(latency_ms)",
        "max_latency": "max(latency_ms)",
        "total_output": "sum(tokens_output)"
    }
)
# Returns pandas DataFrame

If by is omitted or empty, aggregates over all matched rows:

result = store.query().aggregate(
    metrics={"count": "count", "avg_latency": "avg(latency_ms)"}
)

Example Usage

Filter and Group by Model

from llmlog_engine import LogStore

store = LogStore.from_jsonl("production_logs.jsonl")

# Analyze slow responses by model
slow_by_model = (store.query()
    .filter(min_latency_ms=500)
    .aggregate(
        by=["model"],
        metrics={
            "count": "count",
            "avg_latency": "avg(latency_ms)",
            "min_latency": "min(latency_ms)",
            "max_latency": "max(latency_ms)"
        }
    ))

print(slow_by_model)

Multi-Dimension Analysis

# Analyze error rates by model and route
errors_by_model_route = (store.query()
    .filter(status="error")
    .aggregate(
        by=["model", "route"],
        metrics={"count": "count"}
    ))

print(errors_by_model_route)

Summary Statistics

# Overall stats
stats = store.basic_stats()
print(f"Total rows: {stats['row_count']}")
print(f"Avg latency: {stats['latency_ms_avg']:.1f}ms")
print(f"Max latency: {stats['latency_ms_max']}ms")
print(f"Unique models: {stats['model_cardinality']}")

Performance

Architecture Optimizations

Columnar Storage: Data organized by column, not row. Enables:
- Efficient filtering on single columns
- Better CPU cache utilization
- Vectorization opportunities
Dictionary Encoding: Low-cardinality string columns (model, route, status) mapped to int32 IDs:
- Faster equality comparisons
- Smaller memory footprint
- Consistent performance regardless of string length
Contiguous Numeric Arrays: int32_t columns stored as dense vectors:
- SIMD-friendly layout
- Efficient range filtering
- Minimal memory overhead

Benchmark Results

On a 100,000-row log file:

Pure Python:     0.8234s
C++ Engine:      0.1205s
Speedup:         6.8x faster

Query: Filter by model + latency, group by route, compute 6 metrics.

Architecture

User Code
    │
    ├─ Python API (LogStore, Query)
    │  └─ pandas DataFrame output
    │
    └─ C++ Core (_llmlog_engine module)
       ├─ DictionaryColumn (strings + int32 IDs)
       ├─ NumericColumn<T> (contiguous arrays)
       └─ LogStore (main engine)
          ├─ ingest_from_jsonl()
          ├─ apply_filter() → boolean mask
          └─ aggregate() → grouped metrics

Memory Layout

Columnar format (after ingestion):

Column: model       [0, 1, 0, 2, 0, ...]  (int32 IDs)
Column: route       [0, 1, 0, 1, 0, ...]  (int32 IDs)
Column: latency_ms  [423, 1203, 512, ...]  (int32)
Column: tokens_out  [921, 214, 512, ...]   (int32)

Dictionary: model   {0: "gpt-4.1-mini", 1: "gpt-4.1", 2: "gpt-4-turbo"}
Dictionary: route   {0: "chat", 1: "rag"}

Limitations (v0)

In-memory only (no persistence or external storage yet)
No SQL-like expression parser (use Python kwargs for filters)
No support for complex data types (arrays, nested objects)
Single-threaded query execution
No distributed processing

Future Enhancements

On-disk columnar format (memory-mapped access)
Query expression parser for string-based filters
Parallel scan/aggregation with thread pool
SIMD micro-optimizations for filter loops
Compression for numeric columns
Support for timestamp parsing and range filters
Approximate aggregations for large datasets

Development

Build from Source

mkdir build && cd build
cmake ..
make

Run Tests

pytest tests/test_basic.py -v

Run Benchmarks

python tests/test_bench.py

Implementation Notes

Dictionary Encoding

String columns like model, route, and status are dictionary-encoded:

First occurrence of "gpt-4.1" gets ID 0, second occurrence also uses ID 0
Comparisons done on int32 IDs (much faster)
String storage is deduplicated

This is transparent to the user:

# You write:
store.query().filter(model="gpt-4.1")

# The engine internally:
# 1. Looks up "gpt-4.1" in dictionary → ID 1
# 2. Compares integer column against 1
# 3. Returns matching rows

Filter Predicates

Filters are applied using a boolean mask:

std::vector<bool> mask(row_count_, true);  // Initially all true
for (const auto& predicate : predicates) {
    // For each row, evaluate predicate
    // Update mask: mask[i] &= matches_predicate(row_i)
}
// Now mask[i] = true if row_i matches ALL predicates (AND logic)

Aggregation

Once the mask is computed, aggregations scan only matching rows:

for (const auto& [group_key, row_indices] : groups) {
    for (size_t idx : row_indices) {
        // Sum, average, min, max operations
    }
}

License

MIT

Contributing

Pull requests welcome! Please include:

Tests for new features
Updated documentation
Benchmark results for performance changes

Contact

Questions or issues? Open a GitHub issue.

Project details

These details have not been verified by PyPI

Project links

Repository

Release history Release notifications | RSS feed

This version

1.0.0

Dec 3, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llmlog_engine-1.0.0.tar.gz (41.5 kB view details)

Uploaded Dec 3, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llmlog_engine-1.0.0-cp314-cp314-macosx_15_0_arm64.whl (131.3 kB view details)

Uploaded Dec 3, 2025 CPython 3.14macOS 15.0+ ARM64

File details

Details for the file llmlog_engine-1.0.0.tar.gz.

File metadata

Download URL: llmlog_engine-1.0.0.tar.gz
Upload date: Dec 3, 2025
Size: 41.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for llmlog_engine-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`9f29fea7b0a92544248d2d3f65a2b904fc8530b94c82a54d27f609f4afe151d6`
MD5	`4534c077dd8938a02c7aad12385639b2`
BLAKE2b-256	`93b1d08ac8a9e89e1fae4c75f1e5b9e7aff2ad39ccfed5495d24ebb470a44bf5`

See more details on using hashes here.

File details

Details for the file llmlog_engine-1.0.0-cp314-cp314-macosx_15_0_arm64.whl.

File metadata

Download URL: llmlog_engine-1.0.0-cp314-cp314-macosx_15_0_arm64.whl
Upload date: Dec 3, 2025
Size: 131.3 kB
Tags: CPython 3.14, macOS 15.0+ ARM64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for llmlog_engine-1.0.0-cp314-cp314-macosx_15_0_arm64.whl
Algorithm	Hash digest
SHA256	`0e8109c3726ebf10feff265acc10ba0976d46d1622650cada62fae5b26cd4ceb`
MD5	`3b879dddcfb062b142c62a5be33b22cc`
BLAKE2b-256	`35d66c3243ae6b5bd6b9f305611b681c506f2a92d5279d59fa7e1e6aa2b23144`

See more details on using hashes here.

llmlog-engine 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

LLMLog Engine

Overview

Installation

From Source (Development)

Quick Start

Supported Fields

API Reference

LogStore

LogStore.from_jsonl(path: str) -> LogStore

row_count() -> int

basic_stats() -> dict

query() -> Query

Query

filter(**kwargs) -> Query

aggregate(by: list[str], metrics: dict[str, str]) -> pd.DataFrame

Example Usage

Filter and Group by Model

Multi-Dimension Analysis

Summary Statistics

Performance

Architecture Optimizations

Benchmark Results

Architecture

Memory Layout

Limitations (v0)

Future Enhancements

Development

Build from Source

Run Tests

Run Benchmarks

Implementation Notes

Dictionary Encoding

Filter Predicates

Aggregation

License

Contributing

Contact

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`LogStore.from_jsonl(path: str) -> LogStore`

`row_count() -> int`

`basic_stats() -> dict`

`query() -> Query`

`filter(**kwargs) -> Query`

`aggregate(by: list[str], metrics: dict[str, str]) -> pd.DataFrame`