A multi-dimensional Natural Language Processing (NLP) framework for analyzing and assessing the quality of therapeutic conversations
Project description
Therapeutic Conversation Quality Assessment Framework
A multi-dimensional Natural Language Processing (NLP) framework for analyzing and assessing the quality of therapeutic conversations.
🌟 Overview
This project presents a novel approach to evaluating therapeutic conversation quality using advanced NLP techniques and machine learning. Our framework analyzes key conversational dynamics to distinguish between high and low-quality therapeutic interactions, providing valuable insights for mental health professionals.
🔑 Key Features
-
Multi-dimensional Analysis: Evaluates conversations across four key dimensions:
- Conversation Analytics (turn-taking, word usage patterns)
- Semantic Analysis (topic coherence and flow)
- Sentiment Analysis (emotional context)
- Question Detection (engagement patterns)
-
Advanced ML Classification: Implements multiple classifiers including Random Forest, CatBoost, and SVM, achieving up to 97% accuracy with optimized parameters
-
Robust Data Processing:
- Handles imbalanced datasets using SMOTE-Tomek
- Comprehensive outlier detection
- Feature normalization and preprocessing
System Arhcitecture
📊 Performance Highlights
| Classifier | Accuracy | Precision | Recall | F1 Score | AUC Score |
|---|---|---|---|---|---|
| SVM | 0.9717 | 0.9775 | 0.9667 | 0.9715 | 0.9874 |
| CatBoost | 0.9600 | 0.9606 | 0.9600 | 0.9600 | 0.9912 |
| Random Forest | 0.9533 | 0.9487 | 0.9600 | 0.9539 | 0.9893 |
🛠 Technical Implementation
Feature Extraction Pipeline
-
Conversation Analytics:
- Words per turn analysis
- Turn-taking patterns
- Statistical measures (std dev, skewness, kurtosis)
-
Semantic Analysis:
- Utilizes multiple embedding models:
- PromCSE
- Sentence-BERT
- SAKIL sentence similarity
- Both overall and turn-order-aware analysis
- Utilizes multiple embedding models:
-
Sentiment Analysis:
- Twitter-roBERTa-base model
- Sentiment transition tracking
- Weighted certainty scores
-
Question Detection:
- Syntactic pattern recognition
- Bi-gram analysis
- Speaker-specific question tracking
📦 Requirements
- Python 3.8+
- scikit-learn
- transformers
- torch
- pandas
- numpy
- catboost
- SVM
- Random Forest
📝 License
This project is licensed under the MIT License - see the LICENSE file for details.
🙏 Acknowledgments
This research was supported by:
- Natural Sciences and Engineering Research Council of Canada (NSERC)
- New Frontiers in Research Fund
- LeaCros
📬 Contact
For questions and feedback, please contact Niloy Roy
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file therapeuticnlp-1.0.0.tar.gz.
File metadata
- Download URL: therapeuticnlp-1.0.0.tar.gz
- Upload date:
- Size: 20.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2c0cd351519df47d01afe7c4de6afa84bd6341b21f59062cc81e1cd22c6f7417
|
|
| MD5 |
bc60fa0839818614ec942c029315bc2d
|
|
| BLAKE2b-256 |
4f256a26d2b4bb6e4844ebeeff0797dd5ca31f4ebed7390fcde2aa786982cf4e
|
File details
Details for the file therapeuticnlp-1.0.0-py3-none-any.whl.
File metadata
- Download URL: therapeuticnlp-1.0.0-py3-none-any.whl
- Upload date:
- Size: 20.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1534002ebdc23f3820a2dac28e7eb6cae2b02e6a4697e697c280b316c1ec1e69
|
|
| MD5 |
4517959869bee6a76d40e180ee26debe
|
|
| BLAKE2b-256 |
e90d740561f681792a4ce3d8ae119a1aa62913a7dabe1686a51ea5b9786af214
|