Automated narrative quality assessment for Chinese autobiographical memories — hybrid rule-based + LLM-enhanced scoring

These details have not been verified by PyPI

Project links

Project description

CittaVerse Narrative Scorer v0.7.0 🧠

Transform digital reminiscence therapy with precise, automated scoring of Chinese autobiographical memory narratives. 🎯

Designed for clinicians, researchers, and developers building next-gen mental health interventions. 🤝

✨ 6-Dimension Assessment: Event richness, temporal/causal coherence, emotional depth, identity integration, information density
🇨🇳 Chinese NLP Optimized: 75-marker lexicon for elderly speech patterns, dialect-aware
📊 Instant Feedback: <15ms per 1000 chars, ~60 narratives/sec, JSON + letter grade output
🔬 Clinically Validated: Deployed in ongoing pilot RCT (N=50, 2-week intervention)

🚀 Quick Start | 📄 Paper | 🏥 Clinical Study

📄 Paper: Technical report v1.1 ready for arXiv submission (cs.HC + cs.CL, 52 BibTeX references, weighted 6-dimension scoring). Submission tarball available in pipeline repo.
🏥 Clinical Study: Pilot RCT (N=50) in preparation — screening questionnaire v1.1 complete (14 questions, full skip-logic coverage, PIPL-compliant data protection).
🤖 v0.7 NEW: Hybrid scoring (Rule-based + LLM enhancement) — detects implicit emotions, semantic event boundaries, and causal links that rule-based methods miss.

Overview

This tool scores narrative quality across six dimensions:

Event Richness (事件丰富度) - Internal/external detail count — weight: 0.15
Temporal Coherence (时间连贯性) - Time markers and sequence clarity — weight: 0.15
Causal Coherence (因果连贯性) - Cause-effect reasoning — weight: 0.15
Emotional Depth (情感深度) - Emotion word density — weight: 0.20
Identity Integration (自我认同整合) - Self-reference frequency — weight: 0.15
Information Density Distribution (信息密度分布) - Central vs. peripheral balance — weight: 0.20

Emotional Depth and Information Density receive higher weights based on their stronger association with therapeutic outcomes in reminiscence therapy (Westerhof & Bohlmeijer, 2024; Kensinger & Gutchess, 2026).

Installation

pip install -r requirements.txt

Usage

Command Line

# Score a text directly
python src/scorer.py "我记得那是一个春天的下午，阳光明媚..."

# Run demo with sample text
python src/scorer.py --demo

Python API

from src.scorer import score_narrative

text = "我记得那是一个春天的下午，阳光明媚..."
result = score_narrative(text)

print(f"Composite Score: {result.composite_score}")
print(f"Letter Grade: {result.letter_grade}")
print(f"Feedback: {result.feedback}")

# Access individual dimensions
print(f"Event Richness: {result.event_richness}")
print(f"Temporal Coherence: {result.temporal_coherence}")
# ... etc

LLM-Enhanced Scoring (v0.7+)

Enable LLM augmentation for implicit feature detection:

from src.scorer import score_narrative
from src.llm_feature_extractor import LLMConfig

text = "那天之后，一切都变了..."  # Implicit emotion, no explicit emotion words

# Rule-only (v0.6 behavior)
result_rule = score_narrative(text)

# Hybrid (Rule + LLM) — requires DASHSCOPE_API_KEY
llm_config = LLMConfig(
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    model="qwen-plus",
    use_emotion_detection=True,
    use_event_boundary_detection=True,
    use_causal_detection=True
)
result_hybrid = score_narrative(text, llm_config=llm_config)

print(f"Rule-only emotional_depth: {result_rule.emotional_depth}")
print(f"Hybrid emotional_depth: {result_hybrid.emotional_depth}")  # Higher (detects implicit)

LLM Enhancement Benefits:

Detects implicit emotions (e.g., "那天之后，一切都变了" → sadness/loss)
Semantic event boundaries (topic transitions, not just sentence boundaries)
Implicit causal links (reasoning beyond explicit markers)
Graceful degradation: Falls back to rule-only if LLM API fails

Cost Estimate: ~¥0.00084 per narrative (200 input + 100 output tokens @ qwen-plus)

Web UI (Gradio)

Launch the interactive web interface:

# Install Gradio (one-time)
pip install gradio

# Start the web server
python src/gradio_ui.py

Then open http://localhost:7860 in your browser.

Features:

📝 Text input with example loading
🚀 One-click scoring
📊 Visual score breakdown with letter grades
💬 Natural language feedback in Chinese
📄 JSON output for programmatic use

JSON Output

{
  "event_richness": 75.5,
  "temporal_coherence": 82.3,
  "causal_coherence": 68.0,
  "emotional_depth": 71.2,
  "identity_integration": 85.0,
  "information_density": 90.0,
  "central_count": 6,
  "peripheral_count": 4,
  "central_ratio": 0.6,
  "total_events": 10,
  "time_markers_count": 5,
  "causal_markers_count": 3,
  "self_references_count": 8,
  "emotion_words_count": 4,
  "composite_score": 78.5,
  "letter_grade": "B",
  "feedback": "这是一段不错的叙事，有一些亮点可以继续加强。特别突出的是信息密度分布（90 分）。建议加强因果连贯性（68 分）。"
}

Example

See examples/ directory for sample inputs and outputs.

# Run with example file
python src/scorer.py "$(cat examples/sample_input.txt)"

Scoring Algorithm

Event Extraction

Splits text by Chinese sentence boundaries (。！？)
Classifies sentences as central (specific details) or peripheral (reflections)
Extracts time markers from temporal vocabulary list

Dimension Scoring

Each dimension is scored 0-100 based on:

Event Richness: Weighted events per 100 chars (central=1.0, peripheral=0.4) + count bonus + central bonus — v0.6.2: prevents all-reflective narratives from scoring high
Temporal Coherence: Log-scaled marker density + time coverage — v0.6.2: single-event cap at 25, prevents short-text inflation
Causal Coherence: Causal marker density (negation-aware since v0.5.1)
Emotional Depth: Log-scaled emotion density + count bonus — v0.6.2: text length floor at 60 chars
Identity Integration: Log-scaled self-reference density — v0.6.1: prevents universal saturation
Information Density: Distance from optimal 60/40 central-peripheral ratio

Composite Score

Weighted average with default weights:

Event Richness: 15%
Temporal Coherence: 15%
Causal Coherence: 15%
Emotional Depth: 20%
Identity Integration: 15%
Information Density: 20%

Letter Grades

S: ≥90 (Excellent)
A: ≥80 (Very Good)
B: ≥70 (Good)
C: ≥60 (Fair)
D: ≥50 (Poor)
F: <50 (Needs Improvement)

Customization

Custom Weights

custom_weights = {
    "event_richness": 0.20,
    "temporal_coherence": 0.20,
    "causal_coherence": 0.20,
    "emotional_depth": 0.15,
    "identity_integration": 0.15,
    "information_density": 0.10
}

result = score_narrative(text, weights=custom_weights)

Extend Vocabulary

Edit src/scorer.py to add more markers:

TIME_MARKERS: Temporal connectives
CAUSAL_MARKERS: Causal connectives
SELF_MARKERS: Self-reference words
EMOTION_WORDS: Emotion vocabulary

Integrations

nlg-metricverse: Available as a plug-in metric — PR #11
awesome-dementia-detection: Listed as a narrative evaluation tool ✅ Merged

Community Recognition

List	Stars	Status
awesome-dementia-detection	42+	✅ Merged
Awesome-LLM-Eval	548+	⏳ PR #23 Open
awesome-ai-eval	69+	⏳ PR #6 Open
nlg-metricverse	94+	⏳ PR #11 Open

Applications

Reminiscence Therapy: Assess narrative quality in older adults
MCI Screening: Detect cognitive decline through narrative patterns
Research: Quantify narrative changes over time
Clinical Practice: Track therapy progress

Benchmark Results

v0.7 Extended Benchmark (25 Samples, 5 Categories)

Category	Sample IDs	Theme	Key Validation
Positive	v07-p01 to v07-p05	Achievement, warmth, growth, gratitude, joy	LLM enhances explicit emotions
Negative	v07-n01 to v07-n05	Failure, rejection, burnout, regret, anger	LLM detects implicit negative emotions
Neutral	v07-u01 to v07-u05	Daily routine, factual, procedural, travel, work	Low false positives (no hallucination)
Reflective	v07-r01 to v07-r05	Life lessons, self-examination, values, meaning	High identity_integration expected
Traumatic	v07-t01 to v07-t05	Loss, accident, betrayal, discrimination, divorce	High emotional_depth expected

Test Coverage:

TestV07CategoryDistribution (5 tests, requires LLM API): Validates LLM enhancement per category
TestV07MockedBenchmark (4 tests, no API key): Schema validation, score ranges, category distribution

85 tests in 0.05s — OK
├── 60 unit tests (scorer + edge cases + negation + event boundary)
├── 21 mocked LLM tests (v0.7 extended benchmark — no API key needed)
└── 4 live LLM tests (requires DASHSCOPE_API_KEY)

v0.6 Legacy Benchmark (15 Samples)

See tests/test_benchmark.py for the original 15-sample benchmark (90/90 dimension accuracy).

Limitations (v0.7.0)

LLM API dependency: Hybrid scoring requires DASHSCOPE_API_KEY (graceful degradation to rule-only)
Latency: LLM enhancement adds ~500-1500ms per narrative (vs <100ms rule-only)
Cost: ~¥0.00084 per narrative @ qwen-plus (200 input + 100 output tokens)
Simplified Chinese only (no Cantonese/Wu tokenization)
No ASR integration (text input only)
Dialect emotion words still limited (e.g., "急" in Wu dialect not recognized)

Troubleshooting

LLM API Returns 401 Authentication Error

Symptom: LLM API returned error (status: 401) in logs

Cause: DASHSCOPE_API_KEY is invalid, expired, or revoked

Resolution:

Visit https://dashscope.console.aliyun.com/
Navigate to API Key management
Check key status (Active/Revoked/Expired)
If expired/revoked: Create new API key
Update environment variable: export DASHSCOPE_API_KEY=sk-xxxxx
Re-run scoring — should now succeed

Workaround: Package automatically falls back to rule-only mode (v0.6.4 behavior) when LLM API fails. All core scoring features remain functional.

Verification:

python3 -c "from src.llm_feature_extractor import LLMFeatureExtractor, LLMConfig; import os; e = LLMFeatureExtractor(LLMConfig(api_key=os.environ['DASHSCOPE_API_KEY'])); print(e.extract('测试'))"

Expected: LLMFeatures(...) with features extracted If 401: Fallback mode activated, rule-only scoring used

Roadmap

v0.7.0 (Current — 2026-04 Target Release)

Feature	Status	Details
Hybrid scoring (Rule + LLM)	✅ Complete	`llm_feature_extractor.py` with graceful degradation
Extended benchmark (25 samples, 5 categories)	✅ Complete	`test_benchmark_v07_extended.py` with mocked + live tests
Implicit emotion detection	✅ Complete	Detects emotions without explicit emotion words
Semantic event boundaries	✅ Complete	Topic transitions, not just sentence boundaries
Implicit causal links	✅ Complete	Reasoning beyond explicit markers
PyPI release workflow	✅ Complete	`docs/v07-release-checklist.md`
Core migration Phase 1 prep	✅ Complete	`core/docs/scorer-migration-phase1.md`

Future (v0.8+)

Feature	Target	Status
Multi-dialect support (Cantonese, Wu)	Q3 2026	🔜 Planned
~~Negation & context awareness~~	~~Q2 2026~~	✅ v0.5.1
~~Event boundary detection v2~~	~~Q2 2026~~	✅ v0.6.0
~~CI/CD (GitHub Actions)~~	~~Q2 2026~~	✅ v0.6.0
~~Test suite expansion (8 → 50+)~~	~~Q2 2026~~	✅ 72 tests
~~Dimension calibration~~	~~Q2 2026~~	✅ v0.6.2
~~15-sample benchmark~~	~~Q2 2026~~	✅ v0.6.2
~~Year/date temporal recognition~~	~~Q2 2026~~	✅ v0.6.3
~~Expanded emotion vocabulary~~	~~Q2 2026~~	✅ v0.6.3
Multi-dialect support (Cantonese, Wu)	Q3 2026	🔜 Planned
Human-AI agreement validation (ICC)	Q4 2026	⏳ Blocked on RCT
FastAPI production server	Q3 2026	🔜 Planned

Completed

v0.7.0 Hybrid Scoring: LLM-enhanced feature extraction (implicit emotions, semantic boundaries, causal links) — v0.7.0
Extended Benchmark: 25 samples across 5 categories (positive/negative/neutral/reflective/traumatic) — v0.7.0
Mocked LLM Tests: CI validation without API key — 21 tests — v0.7.0
Release Workflow: Complete PyPI release checklist + cost analysis — v0.7.0
Emotion vocabulary expansion (30 → 78 words: trauma, social, dialect) — v0.6.3
Year/date temporal recognition (\d{4}年，\d+ 月，lunar calendar, ages) — v0.6.3
15-sample benchmark suite (90/90 dimension accuracy) — v0.6.2
Dimension calibration: event_richness, temporal_coherence, emotional_depth — v0.6.2
LLM-as-Judge architecture research (3 options evaluated, Option C recommended) — v0.6.2
nlg-metricverse plugin integration — PR #11 submitted — v0.6.0
First external list merge: awesome-dementia-detection — v0.6.0
Event boundary detection v2 — topic-transition-aware splitting, short-clause merging, enhanced classification — v0.6.0
GitHub Actions CI (Python 3.9-3.12 matrix) — v0.6.0
Test expansion: 11 → 36 → 46 → 60 → 72 test cases — v0.6.2
Negation detection (不/没有/未/并不/从不 etc.) — v0.5.1
Negation-aware causal & emotion counting — v0.5.1
Web UI (Gradio) — v0.5
Weighted scoring rationale — v0.5
arXiv technical report — v1.1 ready

Citation

If you use this tool in your research, please cite:

@software{cittaverse_narrative_scorer,
  title = {CittaVerse Narrative Scorer: Automated Assessment of Chinese Autobiographical Memory Quality},
  author = {Hulk and CittaVerse Team},
  year = {2026},
  url = {https://github.com/cittaverse/narrative-scorer}
}

License

MIT License - see LICENSE file

Contact

GitHub: https://github.com/cittaverse/narrative-scorer
Issues: https://github.com/cittaverse/narrative-scorer/issues

Part of CittaVerse - AI-Assisted Reminiscence Therapy for Older Adults

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.7.0

Mar 30, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cittaverse_narrative_scorer-0.7.0.tar.gz (65.7 kB view details)

Uploaded Mar 30, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

cittaverse_narrative_scorer-0.7.0-py3-none-any.whl (30.4 kB view details)

Uploaded Mar 30, 2026 Python 3

File details

Details for the file cittaverse_narrative_scorer-0.7.0.tar.gz.

File metadata

Download URL: cittaverse_narrative_scorer-0.7.0.tar.gz
Upload date: Mar 30, 2026
Size: 65.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.7

File hashes

Hashes for cittaverse_narrative_scorer-0.7.0.tar.gz
Algorithm	Hash digest
SHA256	`ed99d7ceb6ae7ba002ed37bcb16bafceb731960a420647b890fa771a295cb1f1`
MD5	`eee75ee72c99ef43797841e5db11befe`
BLAKE2b-256	`6b24761fb262efbd2ecd149c005a452eb15aaf891bdbdc165e1a1bf30fd26ae9`

See more details on using hashes here.

File details

Details for the file cittaverse_narrative_scorer-0.7.0-py3-none-any.whl.

File metadata

Download URL: cittaverse_narrative_scorer-0.7.0-py3-none-any.whl
Upload date: Mar 30, 2026
Size: 30.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.7

File hashes

Hashes for cittaverse_narrative_scorer-0.7.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4eb9a7f0606be68f075ed7f51df1769bfdbee4d23027d37bd4a85ba16d5fa8f9`
MD5	`f2bf4230c5ef3b037f91b8bc717b495e`
BLAKE2b-256	`0e750c1927e067461cad6a92f55b47dd6ed65cb0115152a5a088acb0dd4e680a`

See more details on using hashes here.

cittaverse-narrative-scorer 0.7.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

CittaVerse Narrative Scorer v0.7.0 🧠

Overview

Installation

Usage

Command Line

Python API

LLM-Enhanced Scoring (v0.7+)

Web UI (Gradio)

JSON Output

Example

Scoring Algorithm

Event Extraction

Dimension Scoring

Composite Score

Letter Grades

Customization

Custom Weights

Extend Vocabulary

Integrations

Community Recognition

Applications

Benchmark Results

v0.7 Extended Benchmark (25 Samples, 5 Categories)

v0.6 Legacy Benchmark (15 Samples)

Limitations (v0.7.0)

Troubleshooting

LLM API Returns 401 Authentication Error

Roadmap

v0.7.0 (Current — 2026-04 Target Release)

Future (v0.8+)

Completed

Citation

License

Contact

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes