Skip to main content

AI-powered test generation and documentation for Python codebases

Project description

CodeCoverage CLI

AI-powered test generation and documentation for Python codebases.

CodeCoverage analyses your project, learns your existing test style, and generates tests and documentation using an LLM agent — function by function or across a whole module.


Features

  • generate — Generate a pytest/unittest test for any Python function. The agent reads your existing test style first and replicates it exactly. Decision-point extraction (every if/elif/else/try/except branch) ensures exhaustive branch coverage.
  • diff-test — Generate or update tests for functions that changed in a git diff (uncommitted, last commit, since a branch). Use --trace-impact to walk the resolved call graph backwards and also test everything transitively affected by your changes.
  • document — Write FLOWS.md (entry points + call chains) and SUMMARY.md (per-function docs) from your codebase.
  • doc-from-tests — Derive TEST_DOCS.md from existing test files using pure static analysis (no LLM). Maps each production function to the tests that cover it, computes a coverage score, and lists uncovered functions.
  • serve — Browse your API as a live Swagger UI.

Requirements

  • Python 3.10+
  • One of:
    • Anthropic API key (ANTHROPIC_API_KEY)
    • OpenAI API key (OPENAI_API_KEY) + pip install langchain-openai
    • Cursor IDE with an API key (CURSOR_API_KEY)

Installation

From PyPI

pip install codecoverage-cli

For OpenAI provider support:

pip install codecoverage-cli langchain-openai

Local development install

git clone https://github.com/AbhigyaShridhar/codecoverage-cli
cd codecoverage-cli
pip install -e ".[dev]"

Quick Start

1. Initialise your project

cd /path/to/your/project
codecoverage init

This creates .codecoverage.toml with sensible defaults. Do not commit API keys — use environment variables instead (see Configuration).

2. Generate tests

# Generate tests for every function in the project (bulk mode)
codecoverage generate

# Limit bulk generation to a subdirectory
codecoverage generate --dir payments/interface_layer/

# Generate tests for every function in one file
codecoverage generate --file payments/gateway.py

# Generate a test for a single function
codecoverage generate -f my_function --file src/module.py

# Steer the agent with extra instructions
codecoverage generate --dir payments/ -x "focus on celery tasks and APIs"

# Preview what would be generated without making LLM calls
codecoverage generate --dir payments/ --dry-run

The agent parses the codebase once, then makes one LLM call per function, keeping context focused. For each function it:

  1. Scans existing tests in that module to learn the style (framework, fixtures, mock library)
  2. Reads the source function and its dependencies
  3. Writes a test file, auto-placing it in your existing test root

3. Generate tests from a git diff

# Test uncommitted changes (default)
codecoverage diff-test

# Test what the last commit changed
codecoverage diff-test --last-commit

# Test everything changed since branching off main
codecoverage diff-test --since main

# Preview without calling the LLM
codecoverage diff-test --dry-run

4. Document from existing tests (no LLM)

# Build TEST_DOCS.md from every test file in the project
codecoverage doc-from-tests

# Limit to a subdirectory
codecoverage doc-from-tests --dir payments/

# Write to a custom location
codecoverage doc-from-tests --output docs/

This command requires no API key. It statically parses test files to produce a markdown file where test descriptions are the documentation.


6. Generate documentation

# Render FLOWS.md and SUMMARY.md from already-cached docs
codecoverage document

# Enrich a module with LLM-generated summaries, then render
codecoverage document --enrich payments/gateway/

# Write to a custom directory
codecoverage document --output docs/

7. Browse your API

codecoverage serve
# Opens Swagger UI at http://localhost:8080

codecoverage serve --port 9000 --no-browser

Configuration

codecoverage init creates .codecoverage.toml in the project root:

[project]
name = "my-project"

[parsing]
ignore_patterns = [
    "venv/", "env/", ".venv/",
    "node_modules/", ".git/",
    "__pycache__/", "*.pyc",
    "dist/", "build/", "migrations/",
    "static/",
]

[llm]
provider    = "anthropic"        # anthropic | openai | cursor
model       = "claude-sonnet-4-6"
temperature = 0.0

# API keys — prefer environment variables instead of hardcoding here
# anthropic_api_key = "sk-ant-..."
# openai_api_key    = "sk-proj-..."
# cursor_api_key    = "crsr_..."

[generation]
max_retries = 3

Environment variables

Variable Used by
ANTHROPIC_API_KEY --provider anthropic
OPENAI_API_KEY --provider openai
CURSOR_API_KEY --provider cursor

Environment variables always take priority over values in the config file.


LLM Providers

Provider --provider flag Default model Requires
Anthropic anthropic claude-sonnet-4-6 ANTHROPIC_API_KEY
OpenAI openai gpt-4o OPENAI_API_KEY + pip install langchain-openai
Cursor cursor sonnet-4.6 Cursor IDE installed, CURSOR_API_KEY

Override per-run with --provider and --model:

codecoverage generate -f my_fn --file src/foo.py --provider openai --model gpt-4o-mini
codecoverage generate -f my_fn --file src/foo.py --provider cursor --model opus-4.6

Command Reference

codecoverage init

Creates .codecoverage.toml in the current directory with interactive prompts.

codecoverage init [--path DIR]

codecoverage generate

Generate tests for functions. Works in three modes:

Invocation What happens
codecoverage generate Generate tests for every function in the project
codecoverage generate --file src/mod.py Generate tests for every function in that file
codecoverage generate -f my_fn --file src/mod.py Generate a test for one specific function

The codebase is parsed once at startup. The agent is then called once per function so context stays focused and token use stays manageable.

Options:
  -f, --function TEXT      Function name. If omitted, all functions are processed.
  --file TEXT              Source file (relative to project root). If omitted, all files are processed.
  --dir DIR                Limit bulk generation to this subdirectory (relative to project root).
                           Ignored when --file is specified.
  --path PATH              Project root  [default: .]
  --provider CHOICE        anthropic | openai | cursor
  --model TEXT             Model name (overrides config default)
  -o, --output PATH        Output file path (single-function mode only; default: auto-detected)
  --dry-run                Show which functions would be processed without making LLM calls
  -x, --extra-context TEXT Extra instructions passed verbatim to the agent for every function.
  --overwrite              Regenerate tests even for files that already have a test file.
  --config PATH            Config file path

Mode summary:

Flags Scope
(none) Every function in the entire project
--dir payments/ Every function under payments/
--file payments/gateway.py Every function in that file
-f process --file payments/gateway.py One specific function

Existing test detection (bulk and per-file modes)

In bulk mode (codecoverage generate) and per-file mode (codecoverage generate --file ...), the tool checks whether a test file already exists for each source file before making any LLM call. Files with existing tests are skipped and reported as a count at the start of the run:

⊘  42 function(s) in files with existing tests skipped — pass --overwrite to regenerate.

Pass --overwrite to regenerate tests for all files regardless. Single-function mode (-f --file) always proceeds and is unaffected by this check.

Agent skip decisions

Even for files without existing tests, the agent is allowed to decline generating a test. It will do this when a function is not worth testing — Django/framework entry points (manage.py, wsgi.py), migration files, trivial one-liner passthroughs, __str__/__repr__ with only attribute assignment, etc. Skipped functions are shown in yellow with a reason:

⊘  main: Django entry point with no testable logic

Skips are reported separately from failures in the final summary:

Done.  12 succeeded  8 skipped  0 failed

The agent also respects --extra-context when deciding whether to skip. If you say -x "focus on celery tasks, APIs and models", functions clearly outside those areas will be skipped rather than generating empty or unhelpful tests.

--extra-context / -x

Passes additional instructions verbatim to the agent for every function. Use this to steer how tests are written, not to scope which files are processed (use --file for that):

# Focus on edge cases, skip boilerplate assertions
codecoverage generate --file payments/gateway.py -x "skip boilerplate, focus on edge cases"

# Let the agent decide whether a legacy module is worth testing
codecoverage generate --file legacy/utils.py -x "decide whether this module is worth testing"

# Enforce a specific style
codecoverage generate -f process -x "use hypothesis for property-based tests"

Applies equally in single-function, per-file, and bulk modes.

Output path auto-detection (when -o is not specified):

  1. Scans the project for all test_*.py and *_test.py files (excludes venv/, env/, etc.)
  2. Walks each test file's ancestor directories to find the nearest directory named tests, unit_tests, specs, etc.
  3. Picks the most frequently occurring such directory as the test root
  4. Mirrors the source file's path under that root:
    • payments/gateway/views.pyunit_tests/tests/payments/gateway/test_views.py
  5. Falls back to <source_dir>/tests/test_<name>.py if no existing test layout is found

codecoverage diff-test

Generate or update tests for functions that changed in git.

Options:
  --working              [default] Diff uncommitted changes vs HEAD
  --last-commit          Diff the most recent commit against its parent
  --last-merge           Diff the most recent merge commit
  --since REF            Diff HEAD against a branch, tag, or commit SHA
  --trace-impact         Walk the resolved call graph backwards from each changed
                         function and add all transitively impacted callers to the
                         work plan. Shows an "Impacted Functions" table. Works with
                         --dry-run.
  --dry-run              Show plan without making LLM calls
  --output-dir DIR       Write generated tests here (mirrors source structure)
  --provider / --model   (same as generate)
  --path PATH            Project root
  --config PATH          Config file path

Actions per changed function:

Change type Action
Added Generate a fresh test; if a test file already exists for that module, update it instead
Modified Always overwrites the existing test with minimal changes
Deleted Report orphaned tests for manual review — no auto-delete

diff-test always overwrites existing tests for modified functions — that is its purpose. A .py.bak backup of the original test file is created beside it before overwriting. This is intentionally different from generate, which skips files with existing tests by default and requires --overwrite to proceed.


codecoverage document

Write codebase documentation to markdown files.

Options:
  --path PATH          Project root  [default: .]
  --output PATH        Output directory  [default: .codecoverage/docs/]
  --enrich DIR         Run LLM doc generation for all functions under DIR
                       (skips already-cached). DIR is relative to --path.
  --working            Update docs for uncommitted changes vs HEAD
  --last-commit        Update docs for functions changed in the last commit
  --last-merge         Update docs for functions changed in the last merge commit
  --since REF          Update docs for everything changed since a branch/tag/SHA
  --dry-run            Show which functions would be re-documented (no LLM calls)
  --provider / --model (same as generate)
  --config PATH        Config file path

Three enrichment modes:

Mode When to use
--enrich DIR First-time documentation of a module
--working / --last-commit / --since REF Incremental update after code changes
(none) Just re-render FLOWS.md and SUMMARY.md from the existing cache

The diff modes mirror diff-test exactly: added/modified functions are re-documented, deleted functions are removed from the cache.

Output files:

File Contents
FLOWS.md All HTTP endpoints, Celery tasks, signal handlers and management commands, each with its full call chain and decoupled flows. LLM summaries shown as block quotes.
SUMMARY.md All LLM-documented functions grouped by source file, with test coverage links.

Function summaries are cached in .codecoverage/doc_cache.json so re-running document is fast — only new or changed functions are re-enriched.


codecoverage doc-from-tests

Build TEST_DOCS.md from existing test files using pure static analysis — no LLM, no API key required.

Options:
  --path PATH          Project root  [default: .]
  --dir DIR            Limit analysis to this subdirectory
  --output PATH        Output directory  [default: .codecoverage/docs/]
  --config PATH        Config file path

What it produces:

For every production function the tool finds at least one test reference for, it emits a section containing:

  • A list of test names and their docstrings
  • Mock targets (signals intent about dependencies)
  • Assertion count per test
  • A coverage score: none / minimal / partial / good / thorough

Functions with no test coverage are collected in an Uncovered Functions appendix.

Coverage score thresholds:

Score Criteria
none No tests found
minimal 1 test, few assertions
partial Tests exist but fewer than the number of decision points
good Tests cover most decision points
thorough Tests ≥ decision points

codecoverage serve

Start a local Swagger UI server to browse your project's API.

Options:
  --path PATH         Project root  [default: .]
  --port INT          Port number  [default: 8080]
  --no-browser        Don't auto-open the browser

How the Codebase is Analysed

CodeCoverage parses Python source files using the standard ast module (no runtime import required). It extracts:

  • Module-level and class-level functions with full decorator details
  • Class inheritance chains (for DRF/Django CBV HTTP method inference)
  • Decorator arguments (@app.route("/path", methods=["POST"]), @shared_task(bind=True), etc.)
  • Resolved call graph — every call edge is resolved to a fully-qualified file:func ID using the module's import map, with same-file/same-class priority. Stdlib and builtins are skipped. This makes get_transitive_callers() and get_impact_chain() reliable across files.
  • Decision points: every if/elif/else/try/except branch per function (used for exhaustive branch coverage and doc-from-tests scoring)
  • Decoupled flows: functions invoked by the framework via decorators (signals, pre/post transitions, Celery tasks)

The parsed representation is kept in memory — no database required.


Development

# Install with dev dependencies
pip install -e ".[dev]"

# Run the test suite
pytest

# Run without coverage (faster)
pytest --no-cov

# Lint
ruff check src/

# Type check
mypy src/codecoverage/

Limitations

  • Cursor provider requires Cursor IDE to be installed locally. It uses cursor agent --print as a subprocess (no API endpoint). Prompts with very large source files may hit OS argument length limits.
  • Generated tests are not auto-run. The tool writes test files to disk; you should run pytest to validate them. Import paths in generated tests may need adjustment for unusual project layouts.
  • Full codebase parse on every invocation. There is no incremental parse cache — the entire project is re-parsed on each generate / document / diff-test call.
  • Python 3.10+ required to run the CLI itself. Generated tests can target older Python versions if your LLM config or project conventions guide the agent accordingly.
  • OpenAI provider requires pip install langchain-openai separately (not bundled to keep the base install lightweight).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

codecoverage_cli-0.2.0.tar.gz (148.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

codecoverage_cli-0.2.0-py3-none-any.whl (152.6 kB view details)

Uploaded Python 3

File details

Details for the file codecoverage_cli-0.2.0.tar.gz.

File metadata

  • Download URL: codecoverage_cli-0.2.0.tar.gz
  • Upload date:
  • Size: 148.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for codecoverage_cli-0.2.0.tar.gz
Algorithm Hash digest
SHA256 90420a2bd2b2f114f378715904d66f822caeb8c4490ec422cac07f36cb0c2112
MD5 ab802e0382fa32e885432ed34b189fd7
BLAKE2b-256 fca3575a55fbbdbf70d4fbac01eb3faab91395cba0c2bcb6c2f10d5ded93e94b

See more details on using hashes here.

File details

Details for the file codecoverage_cli-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for codecoverage_cli-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 59ab0d128d015fd8dc9fc0c25622c0a383f2d7a961af2c472e352e6d782e5598
MD5 6266103dfbef9164dc30cc0b36f986d4
BLAKE2b-256 f6dd13fd934dd01c1ca0f91c19763b3502a85cf609ee01c5bcb61e118d174cf2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page