Automated evaluation framework for AI-generated code quality

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

kamrynohly

These details have not been verified by PyPI

Project description

MicroEvals

Automated evaluation framework for AI-generated code quality and best practices.

MicroEvals is a collection of focused, automated tests that evaluate whether AI-generated code (or any codebase) follows framework-specific best practices and avoids common anti-patterns. Each evaluation uses Claude to analyze your codebase against specific criteria.

What Are MicroEvals?

MicroEvals are micro-evaluations - small, focused tests that check for specific patterns or anti-patterns in your code. Unlike traditional linters that check syntax, MicroEvals use LLM as a judge to understand context and evaluate architectural decisions.

Example Use Cases:

Verify Next.js App Router best practices (server components, data fetching)
Catch React anti-patterns (missing dependencies, incorrect hooks usage)
Validate Supabase security (RLS policies, proper auth setup)
Check TypeScript type safety (unsafe assertions, missing null checks)
Ensure proper shadcn/ui integration
Audit deployment configurations

Quick Start

Installation

Option 1: Install from PyPI (Recommended)

pip install microevals

Option 2: Install from Source (For Development)

# Clone the repository
git clone https://github.com/Design-Arena/MicroEvals
cd MicroEvals

# Install in development mode
pip install -e .

Prerequisites

Python 3.8+ installed

Claude CLI installed and authenticated:

# Install Claude CLI (if not already installed)
# See: https://docs.anthropic.com/en/docs/build-with-claude/cli

# Verify installation
claude --version

# If command not found, add Claude to your PATH:
export PATH="$PATH:/path/to/claude"
# Add the export line to your ~/.bashrc or ~/.zshrc to make it permanent

Git installed (for remote repositories)

Run Your First Eval

# Navigate to your project
cd your-nextjs-app

# Run evaluations on current directory
microeval --category nextjs

# Check the results
cat results/*.json

🔒 Safety Note: When running on local directories, your code is copied to a temporary directory before evaluation. Your original files are never modified or deleted. The framework has 6 independent safety checks to prevent accidental file deletion.

Alternative: Run Against Remote Repository

# Run against a GitHub repository
microeval --repo https://github.com/user/app --category nextjs

Available Eval Categories

Category	Count	Description
nextjs	20+	Next.js App Router patterns, server/client components, routing
react	7+	React hooks, state management, component patterns
supabase	17+	Supabase auth, database, storage, RLS security
tailwind	4+	Tailwind CSS configuration and usage
typescript	2+	TypeScript type safety and best practices
vercel	3+	Vercel deployment and configuration
shadcn	7+	shadcn/ui component library integration

See all available evals:

# List all evals (recommended)
microeval --list

# List evals in a specific category
microeval --list --category nextjs

# Or using Python module
python -m microevals.eval_registry --list

Running Evals

Local Directory (Recommended)

Run evaluations on your current project:

# Using the microeval command (recommended)
microeval --category nextjs

# Or using Python module directly
python -m microevals.eval_runner --category nextjs

More examples:

# Run a specific eval
microeval --eval evals/nextjs/001-server-component.yaml

# Run all evals
microeval --all

# Run with batch mode for speed
microeval --category nextjs --batch-size 10

Remote Repository

Run evaluations against a GitHub repository:

# Using the microeval command
microeval --repo https://github.com/user/app --category nextjs

# Or using Python module directly
python -m microevals.eval_runner --repo https://github.com/user/app --category nextjs

More examples:

# Run specific eval
microeval --repo https://github.com/user/app --eval evals/nextjs/001-server-component.yaml

# Run all evals
microeval --repo https://github.com/user/app --all

# Run with batch mode
microeval --repo https://github.com/user/app --all --batch-size 15

Specific Eval IDs

Run evaluations by their IDs:

# Using microeval command
microeval --ids nextjs_server_component_001 react_missing_useeffect_dependencies_001

# Or using Python module
python -m microevals.eval_runner \
  --repo https://github.com/user/app \
  --ids nextjs_server_component_001 react_missing_useeffect_dependencies_001

Multiple Specific Evals

Run multiple specific eval files:

# Using microeval command
microeval --evals evals/nextjs/001-server-component.yaml evals/react/001_missing_useeffect_dependencies.yaml

# Or using Python module
python -m microevals.eval_runner \
  --repo https://github.com/user/app \
  --evals evals/nextjs/001-server-component.yaml evals/react/001_missing_useeffect_dependencies.yaml

Advanced Usage

Runtime Input Overrides

Override default values from eval YAML files:

# Using microeval command
microeval --eval evals/supabase/001_client_setup.yaml \
  --input supabase_url "https://xyz.supabase.co" \
  --input supabase_anon_key "your_key_here"

# Or using Python module
python -m microevals.eval_runner \
  --repo https://github.com/user/app \
  --eval evals/supabase/001_client_setup.yaml \
  --input supabase_url "https://xyz.supabase.co" \
  --input supabase_anon_key "your_key_here"

Parallel Execution

Run multiple evals in parallel (faster but uses more resources):

# Using microeval command
microeval --category nextjs --parallel 3

# Or using Python module
python -m microevals.eval_runner \
  --repo https://github.com/user/app \
  --category nextjs \
  --parallel 3

Batch Mode

Run multiple evals in a single Claude session (most efficient):

# Using microeval command - Run 5 evals per Claude session
microeval --category tailwind --batch-size 5

# Run all evals in large batches
microeval --all --batch-size 15

# Or using Python module
python -m microevals.eval_runner \
  --repo https://github.com/user/app \
  --category tailwind \
  --batch-size 5

Batch mode benefits:

Faster execution (single context for multiple evals)
More efficient Claude usage
Better for related evaluations

Preview batch prompt before running:

microeval --category tailwind --batch-size 3 --print-prompt

# Or using Python module
python -m microevals.eval_runner \
  --repo https://github.com/user/app \
  --category tailwind \
  --batch-size 3 \
  --print-prompt

Custom Timeout

Increase timeout for slower evaluations:

# Using microeval command
microeval --eval evals/nextjs/030_app_router_migration.yaml --timeout 600

# Or using Python module
python -m microevals.eval_runner \
  --repo https://github.com/user/app \
  --eval evals/nextjs/030_app_router_migration.yaml \
  --timeout 600  # 10 minutes

Custom Output Directory

Save results to a specific directory:

# Using microeval command
microeval --category nextjs --output-dir my_results

# Or using Python module
python -m microevals.eval_runner \
  --repo https://github.com/user/app \
  --category nextjs \
  --output-dir my_results

Understanding Results

Score System

Each eval returns a score:

Score	Status	Meaning
1.0	PASS	Code follows best practices, no issues found
0.0	FAIL	Anti-pattern detected or criteria not met
-1.0	N/A	Pattern/feature not present in codebase

Result Output

Results are saved to results/ as JSON files:

{
  "passed": true,
  "score": 1.0,
  "summary": "Server components properly use async/await for data fetching",
  "evidence": [
    "app/page.tsx:15 - Correct async server component implementation",
    "app/posts/page.tsx:20 - Proper await on fetch and response.json()"
  ],
  "issues": [],
  "metadata": {
    "eval_id": "nextjs_server_component_001",
    "eval_name": "Server Component Data Fetching",
    "repo_url": "https://github.com/user/app",
    "timestamp": "2025-11-10T10:30:45",
    "evaluator": "claude"
  }
}

Terminal Output

Live results show in terminal with color coding:

Running evaluations for: https://github.com/user/my-app
================================================================================

[1/5] Running 001-server-component.yaml...
PASS     nextjs/001-server-component.yaml                    12.3s
    Server components properly use async/await for data fetching

[2/5] Running 002-client-component.yaml...
FAIL     nextjs/002-client-component.yaml                     8.7s
    Found 'use client' components with hooks that should be server components

[3/5] Running 003-cookies.yaml...
N/A      nextjs/003-cookies.yaml                              5.2s
    No cookie usage found in codebase

================================================================================
SUMMARY
================================================================================
Total evaluations:  5
Passed:            3
Failed:            1
Not Applicable:    1
Timeouts:          0
Errors:            0
Total duration:    45.2s
Pass rate:         75.0% (excluding N/A)

Project Structure

MicroEvals/
├── microevals/                     # Main package
│   ├── __init__.py                 # Package initialization
│   ├── eval_runner.py              # Main CLI for running evals
│   ├── eval_registry.py            # Registry and discovery of evals
│   └── utils.py                    # Utility functions
│
├── evals/                          # Evaluation definitions
│   ├── nextjs/                     # Next.js-specific evals
│   │   ├── 001-server-component.yaml
│   │   ├── 002-client-component.yaml
│   │   └── ...
│   ├── react/                      # React-specific evals
│   ├── supabase/                   # Supabase-specific evals
│   ├── tailwind/                   # Tailwind-specific evals
│   ├── typescript/                 # TypeScript-specific evals
│   ├── vercel/                     # Vercel-specific evals
│   └── shadcn/                     # shadcn/ui-specific evals
│
├── config/                         # Configuration files
│   ├── judge_system_prompt.yaml    # Claude judge prompt templates
│   └── example_repos.json          # Example repositories
│
├── results/                        # Evaluation results (auto-generated)
│   └── *.json                      # Result files
│
├── requirements.txt                # Python dependencies
├── CONTRIBUTING.md                 # Contribution guidelines
├── LICENSE                         # License file
└── README.md                       # This file

Creating Custom Evals

Want to add your own evaluations? See CONTRIBUTING.md for:

Eval template and format
Naming conventions
Testing guidelines
Submission process

Quick template:

eval_id: category_descriptive_name_001
name: "Human-Readable Name"
description: "What this eval checks"
category: nextjs  # or react, supabase, etc.

# Optional runtime inputs
inputs:
  custom_variable: "default_value"

criteria: |
  You have access to the entire codebase. Evaluate [what to check].
  
  WHAT TO LOOK FOR:
  - [Specific patterns to search for]
  
  ANTI-PATTERN (mark as failed):
  - [Bad pattern 1]
  - [Bad pattern 2]
  
  CORRECT PATTERN (mark as passed):
  - [Good pattern 1]
  - [Good pattern 2]
  
  MARK AS N/A if:
  - [Condition for not applicable]
  
  Return JSON with: passed, score, summary, evidence, issues

Use Cases

1. CI/CD Integration

Add to your CI pipeline to catch anti-patterns:

# .github/workflows/evals.yml
name: Code Quality Evals
on: [push, pull_request]

jobs:
  evals:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Run MicroEvals
        run: |
          pip install -r requirements.txt
          python -m microevals.eval_runner \
            --repo . \
            --category nextjs \
            --batch-size 10

2. Audit Existing Projects

Evaluate multiple repositories:

#!/bin/bash
repos=(
  "https://github.com/org/app1"
  "https://github.com/org/app2"
  "https://github.com/org/app3"
)

for repo in "${repos[@]}"; do
  echo "Evaluating $repo..."
  python -m microevals.eval_runner --repo "$repo" --all --batch-size 20
done

3. Pre-deployment Checks

Validate before deploying to production:

# Check production-critical patterns
python -m microevals.eval_runner \
  --repo https://github.com/org/production-app \
  --category vercel \
  --category supabase \
  --input deployment_url "https://app.vercel.app"

Troubleshooting

Claude CLI Not Found

# Ensure Claude CLI is installed and in PATH
which claude

# If not installed, see: https://docs.anthropic.com/en/docs/build-with-claude/cli

Rate Limiting

If you hit Claude rate limits:

# Use batch mode to reduce API calls
python -m microevals.eval_runner --repo URL --all --batch-size 15

# Or add delays with single eval mode (automatic 2s delay)
python -m microevals.eval_runner --repo URL --all --parallel 1

Timeout Issues

For large codebases, increase timeout:

python -m microevals.eval_runner \
  --repo URL \
  --all \
  --timeout 600 \
  --batch-size 10

Contributing

We welcome contributions! See CONTRIBUTING.md for:

How to submit new evals
Testing requirements
PR guidelines

Quick contribution:

Fork the repo
Create new eval in evals/[category]/
Test locally: python -m microevals.eval_runner --eval your-eval.yaml --repo test-repo
Submit PR

License

MicroEvals operates under MIT license. Please see LICENSE for more details.

Support

Issues
Email: contact@designarena.ai

Built for better agent code quality. See more and try the evals live at designarena.ai/evals.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

kamrynohly

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

Nov 10, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

microevals-0.1.0.tar.gz (48.7 kB view details)

Uploaded Nov 10, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

microevals-0.1.0-py3-none-any.whl (74.3 kB view details)

Uploaded Nov 10, 2025 Python 3

File details

Details for the file microevals-0.1.0.tar.gz.

File metadata

Download URL: microevals-0.1.0.tar.gz
Upload date: Nov 10, 2025
Size: 48.7 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for microevals-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`d1d7ebc1566c0242a23d5d00940138ec926742cec9aee26a85f06534d9265b31`
MD5	`1a46a55150ebc55a170efd08a7687f7f`
BLAKE2b-256	`37fab410a700d45f50d56550e6ca1834ebac16a544ae3f075167c5afd510a8b3`

See more details on using hashes here.

Provenance

The following attestation bundles were made for microevals-0.1.0.tar.gz:

Publisher: python-publish.yml on Design-Arena/MicroEvals

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: microevals-0.1.0.tar.gz
- Subject digest: d1d7ebc1566c0242a23d5d00940138ec926742cec9aee26a85f06534d9265b31
- Sigstore transparency entry: 689672843
- Sigstore integration time: Nov 10, 2025
Source repository:
- Permalink: Design-Arena/MicroEvals@87982b3cb63181239d69134092ffa4984141d876
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/Design-Arena
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@87982b3cb63181239d69134092ffa4984141d876
- Trigger Event: release

File details

Details for the file microevals-0.1.0-py3-none-any.whl.

File metadata

Download URL: microevals-0.1.0-py3-none-any.whl
Upload date: Nov 10, 2025
Size: 74.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for microevals-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`116d24ac66a906a74de402bc1e113ef478041ffd688b7c7d14e8592d261b39d8`
MD5	`ab08ec555b9b34f16499bfb2e5cdc03b`
BLAKE2b-256	`fa4d283c53d6945ab970c60b7176644de88793c889e47d0b32ff6f3a781befc8`

See more details on using hashes here.

Provenance

The following attestation bundles were made for microevals-0.1.0-py3-none-any.whl:

Publisher: python-publish.yml on Design-Arena/MicroEvals

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: microevals-0.1.0-py3-none-any.whl
- Subject digest: 116d24ac66a906a74de402bc1e113ef478041ffd688b7c7d14e8592d261b39d8
- Sigstore transparency entry: 689672900
- Sigstore integration time: Nov 10, 2025
Source repository:
- Permalink: Design-Arena/MicroEvals@87982b3cb63181239d69134092ffa4984141d876
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/Design-Arena
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@87982b3cb63181239d69134092ffa4984141d876
- Trigger Event: release

microevals 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

MicroEvals

What Are MicroEvals?

Quick Start

Installation

Option 1: Install from PyPI (Recommended)

Option 2: Install from Source (For Development)

Prerequisites

Run Your First Eval

Alternative: Run Against Remote Repository

Available Eval Categories

Running Evals

Local Directory (Recommended)

Remote Repository

Specific Eval IDs

Multiple Specific Evals

Advanced Usage

Runtime Input Overrides

Parallel Execution

Batch Mode

Custom Timeout

Custom Output Directory

Understanding Results

Score System

Result Output

Terminal Output

Project Structure

Creating Custom Evals

Use Cases

1. CI/CD Integration

2. Audit Existing Projects

3. Pre-deployment Checks

Troubleshooting

Claude CLI Not Found

Rate Limiting

Timeout Issues

Contributing

License

Support

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance