Multi-agent codebase evaluation, analysis and refactoring in natural language.

These details have not been verified by PyPI

Project links

Project description

OneCode

Evaluate, analyze, and refactor multi-agent codebases in natural language.

OneCode is an AI system designed specifically for evaluating agentic workflows. Measure agent reliability, test coverage, and output quality with 10 RAGAS metrics. Then analyze, refactor, and improve your code—all through natural conversation.

Point OneCode at any project folder, and ask it anything:

📊 "Evaluate the faithfulness of my coding agent"
🤔 "Which agent has the least tool calling accuracy?"
🔍 "Evaluate the entire workflow"
✏️ "Add input validation to the login function"
🧪 "Write a test for this function and run it"
📁 "Move all test files to a tests/ directory"

No complex commands. No context switching. Just natural conversation.

Why OneCode?

Problem	Solution
Measuring agent reliability	Automatic RAGAS evaluation (faithfulness, accuracy, hallucination)
Understanding unfamiliar codebases	Semantic search + AI analysis
Time-consuming refactoring	AI-powered code modification
Manual testing & debugging	Automated test generation & self-correction
Context switching between tools	Single natural language interface
Code maintenance at scale	Intelligent file operations & git integration

Evaluation Metrics

OneCode evaluates agents using industry-standard metrics:

Core Reliability (Critical)

Faithfulness — How faithful is the output to the context (foundational)
Hallucination — How much the output diverges from the context; lower is better
Answer Accuracy — How accurate the answer is compared to ground truth

Agent-Specific (Critical for agentic systems)

Agent Goal Accuracy — Did the agent achieve its intended objective?
Tool Call F1 — Precision and recall of tool invocations (critical for multi-agent workflows)

Quality & Coherence

Answer Relevancy — How relevant the output is to the input question
Response Groundedness — How grounded the response is in retrieved context

Retrieval Quality (Diagnostic)

Context Precision — Ratio of relevant to total retrieved context chunks
Context Recall — Ratio of retrieved to total relevant context chunks
Context Relevance — How relevant the retrieved context is to the question

Evaluation Speed:

Use quick for fast evaluation (2 samples)
Use comprehensive for detailed analysis (10 samples)
Default: 5 samples

Note: Evaluation uses gpt-4o-mini for reliable metrics computation, independent of your chosen model.

Installation

From PyPI:

pip install onecode-cli

Setup

1. Configure environment

Provide API keys using one of two methods:

Method A: Create .env file Add API keys to .env in your project or home directory. OPENAI_API_KEY is always required (used for embeddings):

OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...   # only needed for Claude models

Method B: Export environment variables

export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...   # only needed for Claude models

How to run

After installation, the onecode command is available globally:

# Default model (claude-sonnet-4-6) with explicit path
onecode ~/path/to/project

# Use current directory (default if no path specified)
onecode

# Specify a different model
onecode --model gpt-4o

# From within the codebase directory (same as above)
cd ~/myproject
onecode

First installation check:

onecode --help

First run — output looks like this:

$ onecode ~/myproject
OneCode - Codebase Analyzer
----------------------------------------
Model:    claude-sonnet-4-6
Indexing: /Users/you/myproject
Ready:    42 nodes (class:12, file:18, function:12) | 42 embeddings

Type a question or task (or 'exit' to quit).
----------------------------------------

You:

Subsequent runs — output looks like this:

$ onecode ~/myproject
OneCode - Codebase Analyzer
----------------------------------------
Model:    claude-sonnet-4-6
Indexing: /Users/you/myproject
Ready:    42 nodes (class:12, file:18, function:12) | 42 embeddings

Type a question or task (or 'exit' to quit).
----------------------------------------

You:

Development

Setup for development

pip install -e ".[dev]"

This installs OneCode in development mode with test dependencies.

Run tests

pytest tests/              # Run all tests
pytest tests/ -v           # Verbose output with details
pytest tests/ -v --tb=short  # With short error tracebacks

Test coverage

The test suite validates the evaluation system:

Query parsing (6 tests) — Natural language query parsing for target module, metrics, and sample count
- Extracts agent names and metric aliases correctly
- Handles sample count keywords (quick=2, comprehensive=10)
Target selection (4 tests) — Module matching and filtering logic
- Exact filename matching (prevents false positives)
- Substring fallback for flexibility
- "codebase" target returns all modules
Metrics filtering (3 tests) — Metrics display selection
- Shows only requested metrics when specified
- Shows all metrics when none specified
- Gracefully handles missing metric values

Example queries

Evaluate code quality with RAGAS metrics

You: evaluate the codebase
You: quick evaluation of reader agent focusing on faithfulness
You: what is the accuracy of the coder agent?
You: comprehensive evaluation of all modules
You: evaluate the code writing agent
You: which agent has the best answer relevancy?

Understand the codebase

You: what does this codebase do?
You: explain the authentication flow
You: what classes exist and what are their responsibilities?
You: how does the database connection work?
You: analyze the modules in this codebase
You: what agents/modules are in this project?

Find specific code

You: search for all calls to connect_db
You: search for TODO comments
You: where is the retry logic implemented?
You: find all async functions

Write and modify code

You: add input validation to the login function
You: write a utility function that paginates a list and add it to utils.py
You: refactor the parse_config function to handle missing keys gracefully

Write, run, and self-correct

You: create a function that reverses a string, write a test for it, and run the test
You: add a health check endpoint and run the server to verify it starts
You: write a script that counts lines of code per file and run it

File management

You: rename src/helpers.py to src/utils.py
You: delete the tmp/ directory
You: move all test files into a tests/ directory

Git operations

You: show git status
You: show the diff of uncommitted changes
You: commit all staged files with message "add retry logic"
You: show the last 5 commits

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.6

Apr 26, 2026

This version

0.1.5

Apr 18, 2026

0.1.4

Apr 18, 2026

0.1.3

Apr 18, 2026

0.1.2

Apr 12, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

onecode_cli-0.1.5-py3-none-any.whl (49.4 kB view details)

Uploaded Apr 18, 2026 Python 3

File details

Details for the file onecode_cli-0.1.5-py3-none-any.whl.

File metadata

Download URL: onecode_cli-0.1.5-py3-none-any.whl
Upload date: Apr 18, 2026
Size: 49.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for onecode_cli-0.1.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8cae69d6ad43715967d3288ee85ca46e5a00238a9556fd244c4381e39e4a496f`
MD5	`47da0d1dc86897ae48b4943e271ec508`
BLAKE2b-256	`33f30fc075f931dd1dc9f7c2889899aa249e9090dbe0ed254fdf99a90e3374d9`

See more details on using hashes here.

onecode-cli 0.1.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

OneCode

Why OneCode?

Evaluation Metrics

Installation

Setup

How to run

Development

Setup for development

Run tests

Test coverage

Example queries

Evaluate code quality with RAGAS metrics

Understand the codebase

Find specific code

Write and modify code

Write, run, and self-correct

File management

Git operations

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes