LLM Chinese Couplet Generation Evaluation Framework
Project description
DuiZhang (对仗)
LLM Chinese Couplet Generation Evaluation Framework — Evaluate and compare large language models' Chinese language capabilities through traditional couplet (对联) generation tasks.
Features
- Rule-based Evaluation: 6 core metrics (POS matching, structure, rhythm, tone, content relevance, imagery correspondence) + 2 base metrics (length match, no duplicate)
- LLM Self-Evaluation: Model self-assessment across all 6 dimensions for meta-cognitive analysis
- PDF RAG Pipeline: Build knowledge bases from academic PDFs and generate literature reviews
- Multi-Model Comparison: Evaluate multiple Ollama models side by side with radar chart visualization
- CLI & Python API: Full command-line interface and programmatic access
- Offline Design: All processing runs locally via Ollama — no external API calls required
Requirements
- Python 3.10+
- Ollama running locally
- Chat model (e.g.,
qwen2.5:7b) - Embedding model (e.g.,
nomic-embed-text)
Installation
# From PyPI
pip install duizhang
# From source
git clone https://github.com/cycleuser/DuiZhang.git
cd DuiZhang
pip install -e .
Quick Start
# Start Ollama
ollama serve
ollama pull qwen2.5:7b
ollama pull nomic-embed-text
# Evaluate a model
duizhang eval --model qwen2.5:7b --samples 10
# Quick generation
duizhang run --model qwen2.5:7b --input "春风送暖入屠苏"
# Process PDFs
duizhang pdf --force-rebuild
# List available models
duizhang models
# Show configuration
duizhang config show
Python API
from duizhang import evaluate_couplet, index_documents, ToolResult
# Evaluate a single couplet
result = evaluate_couplet(
input_line="春风送暖",
expected="冬雪飘香",
generated="秋月寒江",
)
print(result.metrics)
# Index PDF documents
result = index_documents(["paper1.pdf", "paper2.pdf"])
Evaluation Metrics
Rule-based Metrics
| Metric | Range | Description |
|---|---|---|
| POS Match | 0-1 | Part-of-speech correspondence between lines |
| Structure Match | 0-1 | Punctuation position alignment |
| Rhythm Match | 0-1 | Word segmentation length pattern matching |
| Tone Match | 0-1 | Ping/Ze (平仄) tone opposition |
| Content Relevance | 0-1 | TF-IDF semantic similarity |
| Imagery Correspondence | 0-1 | Noun quantity and reflection matching |
Base Metrics
| Metric | Range | Description |
|---|---|---|
| Length Match | 0-1 | Generated vs expected length alignment |
| No Duplicate | 0-1 | Character non-repetition between lines |
LLM Self-Evaluation
The model evaluates its own output across all 6 rule-based dimensions, enabling meta-cognitive analysis by comparing algorithm scores vs self-assessment scores.
Project Structure
DuiZhang/
├── duizhang/
│ ├── __init__.py # Version & public API
│ ├── __main__.py # python -m duizhang entry
│ ├── api.py # Unified Python API
│ ├── cli.py # Entry point routing
│ ├── cli_app.py # CLI application
│ ├── tools.py # OpenAI function-calling tools
│ ├── core/
│ │ ├── __init__.py # Core module exports
│ │ ├── config.py # Configuration dataclass
│ │ ├── constants.py # Application constants
│ │ ├── errors.py # Custom exception hierarchy
│ │ ├── ollama_client.py # Ollama API client
│ │ ├── pdf_processor.py # PDF text extraction
│ │ ├── kb_builder.py # FAISS knowledge base builder
│ │ └── summarizer.py # Document summarization
│ ├── evaluator/
│ │ ├── __init__.py
│ │ ├── metrics.py # Rule-based metrics
│ │ ├── llm_metrics.py # LLM self-evaluation
│ │ ├── data_loader.py # Couplet data loading
│ │ ├── evaluator.py # Main evaluation engine
│ │ └── visualizer.py # Radar chart visualization
│ ├── pdf_rag/
│ │ ├── __init__.py
│ │ ├── pipeline.py # PDF processing pipeline
│ │ └── report_generator.py
│ ├── templates/ # Web UI templates
│ └── static/ # CSS/JS assets
├── tests/ # Test suite
├── data/ # Runtime data (auto-created)
└── images/ # Screenshots
Testing
pip install -e ".[dev]"
pytest -v
pytest -v --cov=duizhang
License
GPL-3.0-or-later. See LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file duizhang-0.1.0.tar.gz.
File metadata
- Download URL: duizhang-0.1.0.tar.gz
- Upload date:
- Size: 46.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d851319583acdb70bf603379c70c9cf56a218248453e8d468daa2e061eca6b2f
|
|
| MD5 |
fa95c37d08f34ec0bc528cf9d0389ee7
|
|
| BLAKE2b-256 |
c719adeaddf038b525462c79c727a98953c99f916510169e80fabf7c39952086
|
File details
Details for the file duizhang-0.1.0-py3-none-any.whl.
File metadata
- Download URL: duizhang-0.1.0-py3-none-any.whl
- Upload date:
- Size: 46.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
01a09e6efdd652cd08d3cb7bfc8d80f991ce778710c6f66ab34046bcb87a2a75
|
|
| MD5 |
fa8b4fe1f1222e4091698882337d5b1e
|
|
| BLAKE2b-256 |
7813ea65cf7448aa34ec22558959c7adb6f920b44a55ffd98e0704329fc36ed7
|