High-performance columnar scan engine for LLM logs stored as JSONL
Project description
LLMLog Engine
A high-performance columnar scan engine for LLM logs stored as JSONL. Built in C++ with SIMD-friendly data structures, exposed via Python bindings.
Overview
LLMLog Engine is a specialized, embedded columnar database designed specifically for analyzing LLM application logs. It provides:
- Fast JSONL ingestion into columnar format
- Efficient filtering on numeric and string columns
- Group-by aggregations (COUNT, SUM, AVG, MIN, MAX)
- Dictionary encoding for low-cardinality string columns
- SIMD-friendly memory layout for future performance optimization
The core is implemented in C++17 with columnar storage, while the user-facing API is clean Python with pandas integration.
Installation
From Source (Development)
git clone <repo>
cd llmlog_engine
pip install -e .
Requires:
- Python 3.8+
- C++17 compiler
- cmake 3.15+
- pybind11 (installed via pip)
Quick Start
from llmlog_engine import LogStore
# Load JSONL logs
store = LogStore.from_jsonl("logs.jsonl")
# Create a query
result = (store.query()
.filter(model="gpt-4.1", min_latency_ms=1000)
.aggregate(
by=["model", "route"],
metrics={
"count": "count",
"avg_latency": "avg(latency_ms)",
"avg_tokens_out": "avg(tokens_output)"
}
))
print(result)
Supported Fields
The engine expects JSONL records with these fields:
| Field | Type | Notes |
|---|---|---|
ts |
string | Timestamp (ISO 8601 or custom format) |
session_id |
string | Session identifier |
model |
string | Model name (dictionary-encoded) |
latency_ms |
int | Response latency in milliseconds |
tokens_input |
int | Input token count |
tokens_output |
int | Output token count |
route |
string | API route/endpoint (dictionary-encoded) |
status |
string | Response status: "ok", "error", etc. (dictionary-encoded) |
error_type |
string | Error category (optional) |
tags |
array | Metadata tags (future support) |
All fields are optional with sensible defaults.
API Reference
LogStore
Main table class for columnar storage.
LogStore.from_jsonl(path: str) -> LogStore
Load a JSONL file into the store.
store = LogStore.from_jsonl("logs.jsonl")
row_count() -> int
Get number of loaded rows.
n = store.row_count()
basic_stats() -> dict
Get basic statistics (min, max, avg latency; cardinalities).
stats = store.basic_stats()
print(stats["latency_ms_min"])
query() -> Query
Create a new query builder.
q = store.query()
Query
Query builder for filtering and aggregation.
filter(**kwargs) -> Query
Add filter predicates. All filters are combined with AND logic.
Supported filter parameters:
model(str): Exact match on model nameroute(str): Exact match on routestatus(str): Exact match on statusmin_latency_ms(int): Minimum latencymax_latency_ms(int): Maximum latencymin_tokens_input(int): Minimum input tokensmax_tokens_input(int): Maximum input tokensmin_tokens_output(int): Minimum output tokensmax_tokens_output(int): Maximum output tokens
q = store.query().filter(
model="gpt-4.1",
min_latency_ms=1000,
route="chat"
)
aggregate(by: list[str], metrics: dict[str, str]) -> pd.DataFrame
Compute aggregations grouped by specified columns.
Metric expressions:
"count"— Row count"sum(column)"— Sum of numeric column"avg(column)"— Average of numeric column"min(column)"— Minimum value"max(column)"— Maximum value
result = q.aggregate(
by=["model", "route"],
metrics={
"count": "count",
"avg_latency": "avg(latency_ms)",
"max_latency": "max(latency_ms)",
"total_output": "sum(tokens_output)"
}
)
# Returns pandas DataFrame
If by is omitted or empty, aggregates over all matched rows:
result = store.query().aggregate(
metrics={"count": "count", "avg_latency": "avg(latency_ms)"}
)
Example Usage
Filter and Group by Model
from llmlog_engine import LogStore
store = LogStore.from_jsonl("production_logs.jsonl")
# Analyze slow responses by model
slow_by_model = (store.query()
.filter(min_latency_ms=500)
.aggregate(
by=["model"],
metrics={
"count": "count",
"avg_latency": "avg(latency_ms)",
"min_latency": "min(latency_ms)",
"max_latency": "max(latency_ms)"
}
))
print(slow_by_model)
Multi-Dimension Analysis
# Analyze error rates by model and route
errors_by_model_route = (store.query()
.filter(status="error")
.aggregate(
by=["model", "route"],
metrics={"count": "count"}
))
print(errors_by_model_route)
Summary Statistics
# Overall stats
stats = store.basic_stats()
print(f"Total rows: {stats['row_count']}")
print(f"Avg latency: {stats['latency_ms_avg']:.1f}ms")
print(f"Max latency: {stats['latency_ms_max']}ms")
print(f"Unique models: {stats['model_cardinality']}")
Performance
Architecture Optimizations
-
Columnar Storage: Data organized by column, not row. Enables:
- Efficient filtering on single columns
- Better CPU cache utilization
- Vectorization opportunities
-
Dictionary Encoding: Low-cardinality string columns (model, route, status) mapped to int32 IDs:
- Faster equality comparisons
- Smaller memory footprint
- Consistent performance regardless of string length
-
Contiguous Numeric Arrays:
int32_tcolumns stored as dense vectors:- SIMD-friendly layout
- Efficient range filtering
- Minimal memory overhead
Benchmark Results
On a 100,000-row log file:
Pure Python: 0.8234s
C++ Engine: 0.1205s
Speedup: 6.8x faster
Query: Filter by model + latency, group by route, compute 6 metrics.
Architecture
User Code
│
├─ Python API (LogStore, Query)
│ └─ pandas DataFrame output
│
└─ C++ Core (_llmlog_engine module)
├─ DictionaryColumn (strings + int32 IDs)
├─ NumericColumn<T> (contiguous arrays)
└─ LogStore (main engine)
├─ ingest_from_jsonl()
├─ apply_filter() → boolean mask
└─ aggregate() → grouped metrics
Memory Layout
Columnar format (after ingestion):
Column: model [0, 1, 0, 2, 0, ...] (int32 IDs)
Column: route [0, 1, 0, 1, 0, ...] (int32 IDs)
Column: latency_ms [423, 1203, 512, ...] (int32)
Column: tokens_out [921, 214, 512, ...] (int32)
Dictionary: model {0: "gpt-4.1-mini", 1: "gpt-4.1", 2: "gpt-4-turbo"}
Dictionary: route {0: "chat", 1: "rag"}
Limitations (v0)
- In-memory only (no persistence or external storage yet)
- No SQL-like expression parser (use Python kwargs for filters)
- No support for complex data types (arrays, nested objects)
- Single-threaded query execution
- No distributed processing
Future Enhancements
- On-disk columnar format (memory-mapped access)
- Query expression parser for string-based filters
- Parallel scan/aggregation with thread pool
- SIMD micro-optimizations for filter loops
- Compression for numeric columns
- Support for timestamp parsing and range filters
- Approximate aggregations for large datasets
Development
Build from Source
mkdir build && cd build
cmake ..
make
Run Tests
pytest tests/test_basic.py -v
Run Benchmarks
python tests/test_bench.py
Implementation Notes
Dictionary Encoding
String columns like model, route, and status are dictionary-encoded:
- First occurrence of "gpt-4.1" gets ID 0, second occurrence also uses ID 0
- Comparisons done on int32 IDs (much faster)
- String storage is deduplicated
This is transparent to the user:
# You write:
store.query().filter(model="gpt-4.1")
# The engine internally:
# 1. Looks up "gpt-4.1" in dictionary → ID 1
# 2. Compares integer column against 1
# 3. Returns matching rows
Filter Predicates
Filters are applied using a boolean mask:
std::vector<bool> mask(row_count_, true); // Initially all true
for (const auto& predicate : predicates) {
// For each row, evaluate predicate
// Update mask: mask[i] &= matches_predicate(row_i)
}
// Now mask[i] = true if row_i matches ALL predicates (AND logic)
Aggregation
Once the mask is computed, aggregations scan only matching rows:
for (const auto& [group_key, row_indices] : groups) {
for (size_t idx : row_indices) {
// Sum, average, min, max operations
}
}
License
MIT
Contributing
Pull requests welcome! Please include:
- Tests for new features
- Updated documentation
- Benchmark results for performance changes
Contact
Questions or issues? Open a GitHub issue.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llmlog_engine-1.0.0.tar.gz.
File metadata
- Download URL: llmlog_engine-1.0.0.tar.gz
- Upload date:
- Size: 41.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9f29fea7b0a92544248d2d3f65a2b904fc8530b94c82a54d27f609f4afe151d6
|
|
| MD5 |
4534c077dd8938a02c7aad12385639b2
|
|
| BLAKE2b-256 |
93b1d08ac8a9e89e1fae4c75f1e5b9e7aff2ad39ccfed5495d24ebb470a44bf5
|
File details
Details for the file llmlog_engine-1.0.0-cp314-cp314-macosx_15_0_arm64.whl.
File metadata
- Download URL: llmlog_engine-1.0.0-cp314-cp314-macosx_15_0_arm64.whl
- Upload date:
- Size: 131.3 kB
- Tags: CPython 3.14, macOS 15.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0e8109c3726ebf10feff265acc10ba0976d46d1622650cada62fae5b26cd4ceb
|
|
| MD5 |
3b879dddcfb062b142c62a5be33b22cc
|
|
| BLAKE2b-256 |
35d66c3243ae6b5bd6b9f305611b681c506f2a92d5279d59fa7e1e6aa2b23144
|