Skip to main content

LLM Chinese Couplet Generation Evaluation Framework

Project description

DuiZhang (对仗)

PyPI version Python License: GPL-3.0

LLM Chinese Couplet Generation Evaluation Framework — Evaluate and compare large language models' Chinese language capabilities through traditional couplet (对联) generation tasks.

Features

  • Rule-based Evaluation: 6 core metrics (POS matching, structure, rhythm, tone, content relevance, imagery correspondence) + 2 base metrics (length match, no duplicate)
  • LLM Self-Evaluation: Model self-assessment across all 6 dimensions for meta-cognitive analysis
  • PDF RAG Pipeline: Build knowledge bases from academic PDFs and generate literature reviews
  • Multi-Model Comparison: Evaluate multiple Ollama models side by side with radar chart visualization
  • CLI & Python API: Full command-line interface and programmatic access
  • Offline Design: All processing runs locally via Ollama — no external API calls required

Requirements

  • Python 3.10+
  • Ollama running locally
  • Chat model (e.g., qwen2.5:7b)
  • Embedding model (e.g., nomic-embed-text)

Installation

# From PyPI
pip install duizhang

# From source
git clone https://github.com/cycleuser/DuiZhang.git
cd DuiZhang
pip install -e .

Quick Start

# Start Ollama
ollama serve
ollama pull qwen2.5:7b
ollama pull nomic-embed-text

# Evaluate a model
duizhang eval --model qwen2.5:7b --samples 10

# Quick generation
duizhang run --model qwen2.5:7b --input "春风送暖入屠苏"

# Process PDFs
duizhang pdf --force-rebuild

# List available models
duizhang models

# Show configuration
duizhang config show

Python API

from duizhang import evaluate_couplet, index_documents, ToolResult

# Evaluate a single couplet
result = evaluate_couplet(
    input_line="春风送暖",
    expected="冬雪飘香",
    generated="秋月寒江",
)
print(result.metrics)

# Index PDF documents
result = index_documents(["paper1.pdf", "paper2.pdf"])

Evaluation Metrics

Rule-based Metrics

Metric Range Description
POS Match 0-1 Part-of-speech correspondence between lines
Structure Match 0-1 Punctuation position alignment
Rhythm Match 0-1 Word segmentation length pattern matching
Tone Match 0-1 Ping/Ze (平仄) tone opposition
Content Relevance 0-1 TF-IDF semantic similarity
Imagery Correspondence 0-1 Noun quantity and reflection matching

Base Metrics

Metric Range Description
Length Match 0-1 Generated vs expected length alignment
No Duplicate 0-1 Character non-repetition between lines

LLM Self-Evaluation

The model evaluates its own output across all 6 rule-based dimensions, enabling meta-cognitive analysis by comparing algorithm scores vs self-assessment scores.

Project Structure

DuiZhang/
├── duizhang/
│   ├── __init__.py          # Version & public API
│   ├── __main__.py          # python -m duizhang entry
│   ├── api.py               # Unified Python API
│   ├── cli.py               # Entry point routing
│   ├── cli_app.py           # CLI application
│   ├── tools.py             # OpenAI function-calling tools
│   ├── core/
│   │   ├── __init__.py      # Core module exports
│   │   ├── config.py        # Configuration dataclass
│   │   ├── constants.py     # Application constants
│   │   ├── errors.py        # Custom exception hierarchy
│   │   ├── ollama_client.py # Ollama API client
│   │   ├── pdf_processor.py # PDF text extraction
│   │   ├── kb_builder.py    # FAISS knowledge base builder
│   │   └── summarizer.py    # Document summarization
│   ├── evaluator/
│   │   ├── __init__.py
│   │   ├── metrics.py       # Rule-based metrics
│   │   ├── llm_metrics.py   # LLM self-evaluation
│   │   ├── data_loader.py   # Couplet data loading
│   │   ├── evaluator.py     # Main evaluation engine
│   │   └── visualizer.py    # Radar chart visualization
│   ├── pdf_rag/
│   │   ├── __init__.py
│   │   ├── pipeline.py      # PDF processing pipeline
│   │   └── report_generator.py
│   ├── templates/           # Web UI templates
│   └── static/              # CSS/JS assets
├── tests/                   # Test suite
├── data/                    # Runtime data (auto-created)
└── images/                  # Screenshots

Testing

pip install -e ".[dev]"
pytest -v
pytest -v --cov=duizhang

License

GPL-3.0-or-later. See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

duizhang-0.1.0.tar.gz (46.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

duizhang-0.1.0-py3-none-any.whl (46.9 kB view details)

Uploaded Python 3

File details

Details for the file duizhang-0.1.0.tar.gz.

File metadata

  • Download URL: duizhang-0.1.0.tar.gz
  • Upload date:
  • Size: 46.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for duizhang-0.1.0.tar.gz
Algorithm Hash digest
SHA256 d851319583acdb70bf603379c70c9cf56a218248453e8d468daa2e061eca6b2f
MD5 fa95c37d08f34ec0bc528cf9d0389ee7
BLAKE2b-256 c719adeaddf038b525462c79c727a98953c99f916510169e80fabf7c39952086

See more details on using hashes here.

File details

Details for the file duizhang-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: duizhang-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 46.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for duizhang-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 01a09e6efdd652cd08d3cb7bfc8d80f991ce778710c6f66ab34046bcb87a2a75
MD5 fa8b4fe1f1222e4091698882337d5b1e
BLAKE2b-256 7813ea65cf7448aa34ec22558959c7adb6f920b44a55ffd98e0704329fc36ed7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page