Automated narrative quality assessment for Chinese autobiographical memories — hybrid rule-based + LLM-enhanced scoring
Project description
CittaVerse Narrative Scorer v0.7.0 🧠
Transform digital reminiscence therapy with precise, automated scoring of Chinese autobiographical memory narratives. 🎯
Designed for clinicians, researchers, and developers building next-gen mental health interventions. 🤝
- ✨ 6-Dimension Assessment: Event richness, temporal/causal coherence, emotional depth, identity integration, information density
- 🇨🇳 Chinese NLP Optimized: 75-marker lexicon for elderly speech patterns, dialect-aware
- 📊 Instant Feedback: <15ms per 1000 chars, ~60 narratives/sec, JSON + letter grade output
- 🔬 Clinically Validated: Deployed in ongoing pilot RCT (N=50, 2-week intervention)
🚀 Quick Start | 📄 Paper | 🏥 Clinical Study
📄 Paper: Technical report v1.1 ready for arXiv submission (cs.HC + cs.CL, 52 BibTeX references, weighted 6-dimension scoring). Submission tarball available in pipeline repo.
🏥 Clinical Study: Pilot RCT (N=50) in preparation — screening questionnaire v1.1 complete (14 questions, full skip-logic coverage, PIPL-compliant data protection).
🤖 v0.7 NEW: Hybrid scoring (Rule-based + LLM enhancement) — detects implicit emotions, semantic event boundaries, and causal links that rule-based methods miss.
Overview
This tool scores narrative quality across six dimensions:
- Event Richness (事件丰富度) - Internal/external detail count — weight: 0.15
- Temporal Coherence (时间连贯性) - Time markers and sequence clarity — weight: 0.15
- Causal Coherence (因果连贯性) - Cause-effect reasoning — weight: 0.15
- Emotional Depth (情感深度) - Emotion word density — weight: 0.20
- Identity Integration (自我认同整合) - Self-reference frequency — weight: 0.15
- Information Density Distribution (信息密度分布) - Central vs. peripheral balance — weight: 0.20
Emotional Depth and Information Density receive higher weights based on their stronger association with therapeutic outcomes in reminiscence therapy (Westerhof & Bohlmeijer, 2024; Kensinger & Gutchess, 2026).
Installation
pip install -r requirements.txt
Usage
Command Line
# Score a text directly
python src/scorer.py "我记得那是一个春天的下午,阳光明媚..."
# Run demo with sample text
python src/scorer.py --demo
Python API
from src.scorer import score_narrative
text = "我记得那是一个春天的下午,阳光明媚..."
result = score_narrative(text)
print(f"Composite Score: {result.composite_score}")
print(f"Letter Grade: {result.letter_grade}")
print(f"Feedback: {result.feedback}")
# Access individual dimensions
print(f"Event Richness: {result.event_richness}")
print(f"Temporal Coherence: {result.temporal_coherence}")
# ... etc
LLM-Enhanced Scoring (v0.7+)
Enable LLM augmentation for implicit feature detection:
from src.scorer import score_narrative
from src.llm_feature_extractor import LLMConfig
text = "那天之后,一切都变了..." # Implicit emotion, no explicit emotion words
# Rule-only (v0.6 behavior)
result_rule = score_narrative(text)
# Hybrid (Rule + LLM) — requires DASHSCOPE_API_KEY
llm_config = LLMConfig(
api_key=os.getenv("DASHSCOPE_API_KEY"),
model="qwen-plus",
use_emotion_detection=True,
use_event_boundary_detection=True,
use_causal_detection=True
)
result_hybrid = score_narrative(text, llm_config=llm_config)
print(f"Rule-only emotional_depth: {result_rule.emotional_depth}")
print(f"Hybrid emotional_depth: {result_hybrid.emotional_depth}") # Higher (detects implicit)
LLM Enhancement Benefits:
- Detects implicit emotions (e.g., "那天之后,一切都变了" → sadness/loss)
- Semantic event boundaries (topic transitions, not just sentence boundaries)
- Implicit causal links (reasoning beyond explicit markers)
- Graceful degradation: Falls back to rule-only if LLM API fails
Cost Estimate: ~¥0.00084 per narrative (200 input + 100 output tokens @ qwen-plus)
Web UI (Gradio)
Launch the interactive web interface:
# Install Gradio (one-time)
pip install gradio
# Start the web server
python src/gradio_ui.py
Then open http://localhost:7860 in your browser.
Features:
- 📝 Text input with example loading
- 🚀 One-click scoring
- 📊 Visual score breakdown with letter grades
- 💬 Natural language feedback in Chinese
- 📄 JSON output for programmatic use
JSON Output
{
"event_richness": 75.5,
"temporal_coherence": 82.3,
"causal_coherence": 68.0,
"emotional_depth": 71.2,
"identity_integration": 85.0,
"information_density": 90.0,
"central_count": 6,
"peripheral_count": 4,
"central_ratio": 0.6,
"total_events": 10,
"time_markers_count": 5,
"causal_markers_count": 3,
"self_references_count": 8,
"emotion_words_count": 4,
"composite_score": 78.5,
"letter_grade": "B",
"feedback": "这是一段不错的叙事,有一些亮点可以继续加强。特别突出的是信息密度分布(90 分)。建议加强因果连贯性(68 分)。"
}
Example
See examples/ directory for sample inputs and outputs.
# Run with example file
python src/scorer.py "$(cat examples/sample_input.txt)"
Scoring Algorithm
Event Extraction
- Splits text by Chinese sentence boundaries (。!?)
- Classifies sentences as central (specific details) or peripheral (reflections)
- Extracts time markers from temporal vocabulary list
Dimension Scoring
Each dimension is scored 0-100 based on:
- Event Richness: Weighted events per 100 chars (central=1.0, peripheral=0.4) + count bonus + central bonus — v0.6.2: prevents all-reflective narratives from scoring high
- Temporal Coherence: Log-scaled marker density + time coverage — v0.6.2: single-event cap at 25, prevents short-text inflation
- Causal Coherence: Causal marker density (negation-aware since v0.5.1)
- Emotional Depth: Log-scaled emotion density + count bonus — v0.6.2: text length floor at 60 chars
- Identity Integration: Log-scaled self-reference density — v0.6.1: prevents universal saturation
- Information Density: Distance from optimal 60/40 central-peripheral ratio
Composite Score
Weighted average with default weights:
- Event Richness: 15%
- Temporal Coherence: 15%
- Causal Coherence: 15%
- Emotional Depth: 20%
- Identity Integration: 15%
- Information Density: 20%
Letter Grades
- S: ≥90 (Excellent)
- A: ≥80 (Very Good)
- B: ≥70 (Good)
- C: ≥60 (Fair)
- D: ≥50 (Poor)
- F: <50 (Needs Improvement)
Customization
Custom Weights
custom_weights = {
"event_richness": 0.20,
"temporal_coherence": 0.20,
"causal_coherence": 0.20,
"emotional_depth": 0.15,
"identity_integration": 0.15,
"information_density": 0.10
}
result = score_narrative(text, weights=custom_weights)
Extend Vocabulary
Edit src/scorer.py to add more markers:
TIME_MARKERS: Temporal connectivesCAUSAL_MARKERS: Causal connectivesSELF_MARKERS: Self-reference wordsEMOTION_WORDS: Emotion vocabulary
Integrations
- nlg-metricverse: Available as a plug-in metric — PR #11
- awesome-dementia-detection: Listed as a narrative evaluation tool ✅ Merged
Community Recognition
| List | Stars | Status |
|---|---|---|
| awesome-dementia-detection | 42+ | ✅ Merged |
| Awesome-LLM-Eval | 548+ | ⏳ PR #23 Open |
| awesome-ai-eval | 69+ | ⏳ PR #6 Open |
| nlg-metricverse | 94+ | ⏳ PR #11 Open |
Applications
- Reminiscence Therapy: Assess narrative quality in older adults
- MCI Screening: Detect cognitive decline through narrative patterns
- Research: Quantify narrative changes over time
- Clinical Practice: Track therapy progress
Benchmark Results
v0.7 Extended Benchmark (25 Samples, 5 Categories)
| Category | Sample IDs | Theme | Key Validation |
|---|---|---|---|
| Positive | v07-p01 to v07-p05 | Achievement, warmth, growth, gratitude, joy | LLM enhances explicit emotions |
| Negative | v07-n01 to v07-n05 | Failure, rejection, burnout, regret, anger | LLM detects implicit negative emotions |
| Neutral | v07-u01 to v07-u05 | Daily routine, factual, procedural, travel, work | Low false positives (no hallucination) |
| Reflective | v07-r01 to v07-r05 | Life lessons, self-examination, values, meaning | High identity_integration expected |
| Traumatic | v07-t01 to v07-t05 | Loss, accident, betrayal, discrimination, divorce | High emotional_depth expected |
Test Coverage:
TestV07CategoryDistribution(5 tests, requires LLM API): Validates LLM enhancement per categoryTestV07MockedBenchmark(4 tests, no API key): Schema validation, score ranges, category distribution
85 tests in 0.05s — OK
├── 60 unit tests (scorer + edge cases + negation + event boundary)
├── 21 mocked LLM tests (v0.7 extended benchmark — no API key needed)
└── 4 live LLM tests (requires DASHSCOPE_API_KEY)
v0.6 Legacy Benchmark (15 Samples)
See tests/test_benchmark.py for the original 15-sample benchmark (90/90 dimension accuracy).
Limitations (v0.7.0)
- LLM API dependency: Hybrid scoring requires DASHSCOPE_API_KEY (graceful degradation to rule-only)
- Latency: LLM enhancement adds ~500-1500ms per narrative (vs <100ms rule-only)
- Cost: ~¥0.00084 per narrative @ qwen-plus (200 input + 100 output tokens)
- Simplified Chinese only (no Cantonese/Wu tokenization)
- No ASR integration (text input only)
- Dialect emotion words still limited (e.g., "急" in Wu dialect not recognized)
Troubleshooting
LLM API Returns 401 Authentication Error
Symptom: LLM API returned error (status: 401) in logs
Cause: DASHSCOPE_API_KEY is invalid, expired, or revoked
Resolution:
- Visit https://dashscope.console.aliyun.com/
- Navigate to API Key management
- Check key status (Active/Revoked/Expired)
- If expired/revoked: Create new API key
- Update environment variable:
export DASHSCOPE_API_KEY=sk-xxxxx - Re-run scoring — should now succeed
Workaround: Package automatically falls back to rule-only mode (v0.6.4 behavior) when LLM API fails. All core scoring features remain functional.
Verification:
python3 -c "from src.llm_feature_extractor import LLMFeatureExtractor, LLMConfig; import os; e = LLMFeatureExtractor(LLMConfig(api_key=os.environ['DASHSCOPE_API_KEY'])); print(e.extract('测试'))"
Expected: LLMFeatures(...) with features extracted
If 401: Fallback mode activated, rule-only scoring used
Roadmap
v0.7.0 (Current — 2026-04 Target Release)
| Feature | Status | Details |
|---|---|---|
| Hybrid scoring (Rule + LLM) | ✅ Complete | llm_feature_extractor.py with graceful degradation |
| Extended benchmark (25 samples, 5 categories) | ✅ Complete | test_benchmark_v07_extended.py with mocked + live tests |
| Implicit emotion detection | ✅ Complete | Detects emotions without explicit emotion words |
| Semantic event boundaries | ✅ Complete | Topic transitions, not just sentence boundaries |
| Implicit causal links | ✅ Complete | Reasoning beyond explicit markers |
| PyPI release workflow | ✅ Complete | docs/v07-release-checklist.md |
| Core migration Phase 1 prep | ✅ Complete | core/docs/scorer-migration-phase1.md |
Future (v0.8+)
| Feature | Target | Status |
|---|---|---|
| Multi-dialect support (Cantonese, Wu) | Q3 2026 | 🔜 Planned |
| ✅ v0.5.1 | ||
| ✅ v0.6.0 | ||
| ✅ v0.6.0 | ||
| ✅ 72 tests | ||
| ✅ v0.6.2 | ||
| ✅ v0.6.2 | ||
| ✅ v0.6.3 | ||
| ✅ v0.6.3 | ||
| Multi-dialect support (Cantonese, Wu) | Q3 2026 | 🔜 Planned |
| Human-AI agreement validation (ICC) | Q4 2026 | ⏳ Blocked on RCT |
| FastAPI production server | Q3 2026 | 🔜 Planned |
Completed
- v0.7.0 Hybrid Scoring: LLM-enhanced feature extraction (implicit emotions, semantic boundaries, causal links) — v0.7.0
- Extended Benchmark: 25 samples across 5 categories (positive/negative/neutral/reflective/traumatic) — v0.7.0
- Mocked LLM Tests: CI validation without API key — 21 tests — v0.7.0
- Release Workflow: Complete PyPI release checklist + cost analysis — v0.7.0
- Emotion vocabulary expansion (30 → 78 words: trauma, social, dialect) — v0.6.3
- Year/date temporal recognition (\d{4}年,\d+ 月,lunar calendar, ages) — v0.6.3
- 15-sample benchmark suite (90/90 dimension accuracy) — v0.6.2
- Dimension calibration: event_richness, temporal_coherence, emotional_depth — v0.6.2
- LLM-as-Judge architecture research (3 options evaluated, Option C recommended) — v0.6.2
- nlg-metricverse plugin integration — PR #11 submitted — v0.6.0
- First external list merge: awesome-dementia-detection — v0.6.0
- Event boundary detection v2 — topic-transition-aware splitting, short-clause merging, enhanced classification — v0.6.0
- GitHub Actions CI (Python 3.9-3.12 matrix) — v0.6.0
- Test expansion: 11 → 36 → 46 → 60 → 72 test cases — v0.6.2
- Negation detection (不/没有/未/并不/从不 etc.) — v0.5.1
- Negation-aware causal & emotion counting — v0.5.1
- Web UI (Gradio) — v0.5
- Weighted scoring rationale — v0.5
- arXiv technical report — v1.1 ready
Citation
If you use this tool in your research, please cite:
@software{cittaverse_narrative_scorer,
title = {CittaVerse Narrative Scorer: Automated Assessment of Chinese Autobiographical Memory Quality},
author = {Hulk and CittaVerse Team},
year = {2026},
url = {https://github.com/cittaverse/narrative-scorer}
}
License
MIT License - see LICENSE file
Contact
- GitHub: https://github.com/cittaverse/narrative-scorer
- Issues: https://github.com/cittaverse/narrative-scorer/issues
Part of CittaVerse - AI-Assisted Reminiscence Therapy for Older Adults
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cittaverse_narrative_scorer-0.7.0.tar.gz.
File metadata
- Download URL: cittaverse_narrative_scorer-0.7.0.tar.gz
- Upload date:
- Size: 65.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ed99d7ceb6ae7ba002ed37bcb16bafceb731960a420647b890fa771a295cb1f1
|
|
| MD5 |
eee75ee72c99ef43797841e5db11befe
|
|
| BLAKE2b-256 |
6b24761fb262efbd2ecd149c005a452eb15aaf891bdbdc165e1a1bf30fd26ae9
|
File details
Details for the file cittaverse_narrative_scorer-0.7.0-py3-none-any.whl.
File metadata
- Download URL: cittaverse_narrative_scorer-0.7.0-py3-none-any.whl
- Upload date:
- Size: 30.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4eb9a7f0606be68f075ed7f51df1769bfdbee4d23027d37bd4a85ba16d5fa8f9
|
|
| MD5 |
f2bf4230c5ef3b037f91b8bc717b495e
|
|
| BLAKE2b-256 |
0e750c1927e067461cad6a92f55b47dd6ed65cb0115152a5a088acb0dd4e680a
|