GrandJury SDK — submit LLM traces for human evaluation + analytics client
Project description
grandjury
Get human feedback on your AI in 3 lines of Python.
from grandjury import GrandJury
gj = GrandJury() # reads GRANDJURY_API_KEY from env
gj.trace(name="chat", input=prompt, output=response, model="gpt-4o")
Then open your Jupyter notebook:
df = gj.results() # traces with human votes — as a DataFrame
print(f"Pass rate: {df['pass_rate'].mean():.1%}")
Patent Pending.
What is GrandJury?
HumanJudge connects your AI to a community of human reviewers who evaluate your model's outputs. GrandJury is the Python SDK — it sends traces and retrieves human evaluation results.
Write path: Log AI calls from your app → traces appear in your developer dashboard. Read path: Fetch evaluation results (votes, pass rates, reviewer feedback) into DataFrames for analysis.
Installation
pip install grandjury
Optional performance dependencies:
pip install grandjury[performance] # msgspec, pyarrow, polars
Quick Start
1. Register your model
Go to humanjudge.com/projects/new, register your AI, and copy the secret key.
export GRANDJURY_API_KEY=gj_sk_live_...
2. Log traces from your app
from grandjury import GrandJury
gj = GrandJury() # zero-config — reads from env
# Option A: Direct call
gj.trace(name="chat", input="What is ML?", output="Machine learning is...", model="gpt-4o")
# Option B: Decorator — auto-captures input/output/latency
@gj.observe(name="chat", model="gpt-4o")
def call_llm(prompt: str) -> str:
return openai.chat(prompt)
# Option C: Context manager
with gj.span("chat", input=prompt) as s:
response = call_llm(prompt)
s.set_output(response)
3. Get human evaluation results
Once reviewers vote on your traces:
# Trace-level summary
df = gj.results()
# trace_id | input | output | model | pass_count | flag_count | total_votes | pass_rate
# Individual votes with reviewer identity
df_votes = gj.results(detail='votes')
# trace_id | voter_id | voter_name | verdict | flag_category | feedback | created_at
# Filter by benchmark
df_benchmark = gj.results(evaluation='marketing-benchmark')
# Export
df.to_parquet('evaluation_results.parquet')
4. Run analytics
Works on both live platform data and offline datasets:
# Auto-fetch from platform
gj.analytics.vote_histogram()
gj.analytics.population_confidence(voter_list=[...])
# Or pass your own data
import pandas as pd
df = pd.read_csv("my_votes.csv")
gj.analytics.vote_histogram(df)
gj.analytics.votes_distribution(df)
Enroll in Benchmarks
List and enroll your model in open benchmarks programmatically:
# Browse available benchmarks
benchmarks = gj.benchmarks.list()
# Enroll with endpoint config
gj.benchmarks.enroll(
benchmark_id="...",
model_id="...",
endpoint_config={
"endpoint": "https://api.myapp.com/v1/chat/completions",
"apiKey": "sk-...",
"request_template": '{"model":"gpt-4o","messages":[{"role":"user","content":"{{prompt}}"}]}',
"response_path": "choices[0].message.content"
}
)
Analytics Methods
All analytics methods work on both platform data (gj.results(detail='votes')) and offline data (pandas/polars/CSV/parquet):
| Method | Description |
|---|---|
gj.analytics.evaluate_model() |
Decay-adjusted scoring |
gj.analytics.vote_histogram() |
Vote time distribution |
gj.analytics.vote_completeness() |
Completeness per voter |
gj.analytics.population_confidence() |
Confidence metrics |
gj.analytics.majority_good_votes() |
Threshold analysis |
gj.analytics.votes_distribution() |
Votes per inference |
Privacy
gj.results()only returns traces with at least 1 human vote (privacy gate)- Zero-vote traces are invisible to the SDK — only visible on the web dashboard
- Reviewer identity is public (consistent with platform's public profile/leaderboard model)
API Reference
gj = GrandJury(
api_key=None, # reads GRANDJURY_API_KEY from env if not provided
base_url="https://grandjury-server.onrender.com",
timeout=5.0,
)
# Write
gj.trace(name, input, output, model, latency_ms, metadata, gj_inference_id)
await gj.atrace(...) # async version (requires httpx)
gj.observe(name, model, metadata) # decorator
gj.span(name, input, model, metadata) # context manager
# Read
gj.results(detail=None, evaluation=None) # returns DataFrame or list[dict]
# Browse
gj.models.list()
gj.models.get(model_id)
gj.benchmarks.list()
gj.benchmarks.enroll(benchmark_id, model_id, endpoint_config)
# Analytics
gj.analytics.evaluate_model(...)
gj.analytics.vote_histogram(data=None, ...)
gj.analytics.vote_completeness(data=None, voter_list=None, ...)
gj.analytics.population_confidence(data=None, voter_list=None, ...)
gj.analytics.majority_good_votes(data=None, ...)
gj.analytics.votes_distribution(data=None, ...)
Contributing
See CONTRIBUTING.md for development setup, testing, and PR guidelines.
License
See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file grandjury-2.1.3.tar.gz.
File metadata
- Download URL: grandjury-2.1.3.tar.gz
- Upload date:
- Size: 219.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
196d124574ba6d6e11133b53a88718eef39d2eceaee0168303ea2afd52c7b176
|
|
| MD5 |
0c6716d0ff965ae5517899afa9ac2612
|
|
| BLAKE2b-256 |
3ea0ded4880b0165832542aa62c3e34d9d6389dc278fa93c714be6b4fd105776
|
File details
Details for the file grandjury-2.1.3-py3-none-any.whl.
File metadata
- Download URL: grandjury-2.1.3-py3-none-any.whl
- Upload date:
- Size: 13.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cc7e08b538740267a4503359df4c785fb3baf1c8ffd94f7feef4f197bb2f1d68
|
|
| MD5 |
d209970c96309ec006ccc6cfdfaf7e64
|
|
| BLAKE2b-256 |
82e5a3f2abe73483d99e3f14289513aef1cb35f92272113eff11bbb9fd5b608e
|