Multi-agent codebase evaluation and reliability optimization.

These details have not been verified by PyPI

Project links

Project description

OneCode - Agentic Codebase Evaluation

Quantify agent and GenAI reliability • Analyze, search, and refactor codebases • Run and debug code using natural language • Intelligent code retrieval via semantic knowledge graphs • Track agent improvements over time.

Evaluation Metrics

OneCode evaluates agents using industry-standard metrics:

Core GenAI Reliability

Faithfulness - How faithful is the output to the context
Hallucination - How much the output diverges from the context; lower is better
Answer Accuracy - How accurate the answer is compared to ground truth

Agent-Specific

Agent Goal Accuracy - Did the agent achieve its intended objective?
Tool Call F1 - Precision and recall of tool invocations

Quality & Coherence

Answer Relevancy - How relevant the output is to the input question
Response Groundedness - How grounded the response is in retrieved context

Retrieval Quality

Context Precision - Ratio of relevant to total retrieved context chunks
Context Recall - Ratio of retrieved to total relevant context chunks
Context Relevance - How relevant the retrieved context is to the question

Context-Aware Datasets

OneCode automatically generates test datasets tailored to each module by analyzing its purpose and code. These datasets are:

Automatically refreshed when module code changes
Reused consistently across evaluation runs for reliable trend tracking

Example: Evaluation Output

You: evaluate the summarizer agent

Here is the complete evaluation report for the Summarizer Agent
(agents/summarizer.py):

Metric Scores (5 samples)
- Hallucination: 0.90 ✓ Good (lower is better)
- Answer Accuracy: 0.45 ⚠ Needs Improvement
- Context Precision: 0.27 ✗ Critical
- Answer Relevancy: 0.39 ✗ Critical
- Faithfulness: 0.10 ✗ Critical

Root Cause Analysis:
Faithfulness (0.10) & Response Groundedness (0.10) — The agent is 
largely fabricating content rather than grounding summaries in the 
provided input. This is a fundamental failure for a summarizer.

Comparison with Prior Run (3 days ago):
- Faithfulness: 0.10 (no change)
- Answer Accuracy: 0.45 (↑ +0.05 improvement)
- Context Precision: 0.27 (↓ -0.12 regression)

Recommendations:
1. Add input validation to reject malformed text
2. Implement a grounding constraint that requires citations
3. Test with diverse document types

Accountability & Comparative Analysis

Track agent improvements over time and compare across versions:

You: how does this agent compare to last week's version?
→ Shows metrics side-by-side with delta (+/- changes)

You: which agents regressed in the last evaluation?
→ Flags agents with metric drops and explains why

You: show me the evaluation history for the coder agent
→ Displays trend chart showing faithfulness, accuracy over time

Installation

From PyPI:

pip install onecode-cli

Setup

1. Configure environment

Provide API keys using one of two methods:

Method A: Create .env file Add API keys to .env in your project or home directory. OPENAI_API_KEY is always required (used for embeddings):

OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...   # only needed for Claude models

Method B: Export environment variables

export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...   # only needed for Claude models

How to run

After installation, the onecode command is available globally:

# Default model (claude-sonnet-4-6) with explicit path
onecode ~/path/to/project

# Use current directory (default if no path specified)
onecode

# Specify a different model
onecode --model gpt-4o

# From within the codebase directory (same as above)
cd ~/myproject
onecode

First installation check:

onecode --help

First run — output looks like this:

$ onecode ~/myproject
OneCode - Codebase Analyzer
----------------------------------------
Model:    claude-sonnet-4-6
Indexing: /Users/you/myproject
Ready:    42 nodes (class:12, file:18, function:12) | 42 embeddings

Type a question or task (or 'exit' to quit).
----------------------------------------

You:

Subsequent runs — output looks like this:

$ onecode ~/myproject
OneCode - Codebase Analyzer
----------------------------------------
Model:    claude-sonnet-4-6
Indexing: /Users/you/myproject
Ready:    42 nodes (class:12, file:18, function:12) | 42 embeddings

Type a question or task (or 'exit' to quit).
----------------------------------------

You:

Example queries

Evaluate code quality with RAGAS metrics

You: evaluate the codebase
You: what is the accuracy of the coder agent?
You: compare this run with the previous evaluation

Understand the codebase

You: what does this codebase do?
You: explain the authentication flow
You: what agents/modules are in this project?

Find specific code

You: search for all calls to connect_db
You: where is the retry logic implemented?
You: find all async functions

Write and modify code

You: add input validation to the login function
You: write a utility function that validates emails
You: refactor the parse_config function to handle missing keys gracefully

Write, run, and debug

You: create a function that reverses a string, write a test for it, and run the test
You: add a health check endpoint and run the server to verify it starts
You: debug why the executor agent is failing on error handling

File management

You: rename src/helpers.py to src/utils.py
You: delete the tmp/ directory
You: move all test files into a tests/ directory

Git operations

You: show git status
You: show the diff of uncommitted changes
You: commit all staged files with message "add retry logic"

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.6

Apr 26, 2026

0.1.5

Apr 18, 2026

0.1.4

Apr 18, 2026

0.1.3

Apr 18, 2026

0.1.2

Apr 12, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

onecode_cli-0.1.6.tar.gz (61.7 kB view details)

Uploaded Apr 26, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

onecode_cli-0.1.6-py3-none-any.whl (62.8 kB view details)

Uploaded Apr 26, 2026 Python 3

File details

Details for the file onecode_cli-0.1.6.tar.gz.

File metadata

Download URL: onecode_cli-0.1.6.tar.gz
Upload date: Apr 26, 2026
Size: 61.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for onecode_cli-0.1.6.tar.gz
Algorithm	Hash digest
SHA256	`13dbfc6f53516de458981a247a5ff4298dfd1bb0ead969e5baa4947e727001be`
MD5	`f2bd43ded772bb6c0733a6ae61bda81c`
BLAKE2b-256	`aa5d410378f922a92931c65450214807d4c66093df694003ccee8706dceb7e73`

See more details on using hashes here.

File details

Details for the file onecode_cli-0.1.6-py3-none-any.whl.

File metadata

Download URL: onecode_cli-0.1.6-py3-none-any.whl
Upload date: Apr 26, 2026
Size: 62.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for onecode_cli-0.1.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d71897549061ffd986a01c90f473f0e8c8035c5571e0b60b048679eb6c316c3c`
MD5	`204693acb7d4a922142452d1d5d98a32`
BLAKE2b-256	`da5dfa67b2f618e7dba738f1343a1351191325858415bbaf3c13f8cca04bbacd`

See more details on using hashes here.

onecode-cli 0.1.6

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

OneCode - Agentic Codebase Evaluation

Quick Navigation

Evaluation Metrics

Context-Aware Datasets

Example: Evaluation Output

Accountability & Comparative Analysis

Installation

Setup

How to run

Example queries

Evaluate code quality with RAGAS metrics

Understand the codebase

Find specific code

Write and modify code

Write, run, and debug

File management

Git operations

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes