Digital Interface for Aggregate Analysis of Dialog

These details have not been verified by PyPI

Project description

DIAAD — Digital Interface for Aggregate Analysis of Dialog

DIAAD is a small toolkit for batched dialog analysis that includes workflows for analyzing digital conversation turns and POWERS coding. It complements (and imports) the monologic speech analysis system RASCAL.

Overview (more details below)

Digital Conversation Turns Analysis
- tracking turn-taking in dialogs can reveal meaningful linguistic and psychosocial patterns Tuomenoksa, et al., 2020
- recording turns with a sequence of digits enables analysis of tallies and transition probabilities (see below)
POWERS Coding
- Profile of Word Errors and Retrieval in Speech (POWERS) is an aphasiological coding system for analyzing dialogic speech (Herbet, et al., 2013)
- DIAAD functionalities:
  - generates coder workbooks, automating most fields
  - summarizes coding and reports ICC2 values between coders
  - evaluates and optionally reselects reliability coding

Web App

You can use DIAAD in your browser — no installation required:

👉 Launch the DIAAD Web App

Installation

We recommend installing DIAAD into a dedicated virtual environment using Anaconda:

1. Create and activate your environment:

conda create --name diaad python=3.12
conda activate diaad

Install from GitHub:

pip install git+https://github.com/nmccloskey/diaad.git@main

Setup

To prepare for running DIAAD, complete the following steps:

1. Create your working directory:

We recommend creating a fresh project directory where you'll run your analysis.

Example structure:

your_project/
├── config.yaml           # Configuration file (see below)
└── diaad_data/
    └── input/            # Place your .cha or .xlsx files here
                          # (DIAAD will make an output directory)

2. Provide a `config.yaml` file

This file specifies the directories, coders, reliability settings, and tier structure.

You can download the example config file from the repo or create your own like this:

input_dir: diaad_data/input
output_dir: diaad_data/output
reliability_fraction: 0.2
automate_POWERS: true
exclude_participants:
coders:
- '1'
- '2'
- '3'
tiers:
  time:
    values:
    - PreTx
    - PostTx
    blind: true
  client_id:
    values: \d+
  setting:
    values:
    - LargeGroup
    - SmallGroup

See RASCAL for more information about the tier system for organizing data based on .cha file names.

Quickstart — Command Line

DIAAD exposes a concise CLI with subcommands:

# Analyze digital conversation turns
diaad turns

# POWERS workflow
diaad powers make       # prepare POWERS coding files
diaad powers analyze    # analyze completed POWERS coding
diaad powers evaluate   # evaluate completed POWERS reliability coding
diaad powers reselect   # randomly reselect reliability subset

Digital Conversation Turns (DCT) Protocol

DIAAD includes a lightweight system for analyzing digital conversational turns in group treatment sessions with people with aphasia.
Instead of simple tallies, the DCT protocol records the sequence of turns compactly, enabling analysis of turn-taking dynamics and engagement, with optional markers for capturing turn qualities (e.g., length/substantiveness).

Coding Procedure

1. Speaker Assignment

0 = Clinician(s) (all individuals not receiving treatment collapsed under this code)
1 = Participant 1
2 = Participant 2
Continue incrementing (3, 4, …) as needed.

2. Turn Entry with Markers

For each conversational turn, enter the assigned digit for the speaker (e.g., 0, 1, 2).

Marking system:

Digits are followed by one dot . (mark1), two dots .. (mark2) or no dots
Recommended usage:
- Add . if the turn is substantial (contains an independent clause). is
- Add .. if the turn is monologic (contains at least two independent clauses)
- Add no dots otherwise, or the turn is minimal (brief/no full idea)

3. Input Coding Table Format

Turns are entered sequentially as a continuous string of digits and dots.
Bins are recommended for some temporal granularity (e.g., six 10-minute bins for a 1-hour conversation treatment session).

Example: Digital Conversation Turns Coding Input

site	session	group	coder	bin	turns
TU	12	Dyad1	NM	1	`212012.02121210.10101.210.12.021212121210.210.2.1.010121.010.110.2102.12.`
TU	12	Dyad1	NM	2	`0202.121212101.011101.2.12.120201.212101020202.10.21212.02.12010212.`
TU	12	Dyad1	NM	3	`12..121.212.1212.0202.12120.201.210101..2012121.2121.2..1212.12.020.2.0`
TU	12	Dyad1	NM	4	`010202.02121021020212101.01012101210010102.1210101010101010101010121020.1.`
TU	12	Dyad1	NM	5	`0.121210.1010102120.102.02120212.0.2.020212121202121212.120.21010101212121`
TU	12	Dyad1	NM	6	`2120210101212121212.10121202.12.02.1212010202.02.02.0202.020201202020.22.02012102002.012102`
TU	4	LgGroup	NM	1	`4.24.242424.0640.4.206.434343430606.060436.3706.0406.76760.602.502.326207.07.67.06767.3737.17.0701270606.06.54321007`
TU	4	LgGroup	NM	2	`763670.50505620507102..02404676.70101...010.707057574767.6..76717.01.7010141.4..1014.3401.671..61016161.721.77414.0`
TU	4	LgGroup	NM	3	`2.0.2.0.3.13.23.01313535737037.0.7.137314.`
TU	4	LgGroup	NM	4	`4.0.5.35.05.0.5..7575404.53436..40575754..24242..575.4375.45705.20.6.`
TU	4	LgGroup	NM	5	`06.007070767676050.21627.17.106063434607571270101.61.01016.161.2.0.1.01`
TU	4	LgGroup	NM	6	`0.607.2707.07.06..06.06.4603403212607201202..2702760276..020.1212606016..70.701702.1.70731313510.`

Analytic Opportunities

This richer symbolic format enables:

Turn counts & proportions per participant
Substantial vs. monologic turn ratios
Transitions (e.g., clinician → participant, participant → participant)
Speaker dominance indices
Engagement rates between participants
Distribution metrics (e.g., Gini index, entropy)
Transition matrices & dyadic graphs
Temporal trends (with optional bins)
Reliability: inter-coder sequence comparisons (e.g., Levenshtein distance)
Correlation with treatment outcome measures (e.g., ACOM, WAB) for longitudinal studies

Limitations

Turn Overlap: current system assumes sequentialization - not uncommonly violated in group settings.
Subjectivity: coder judgment needed for speaker boundaries and substantiality. Calibration recommended.
Binary turn length: mark1 vs. mark2 is coarse; future versions may refine scale.
Scalability: beyond 9+ participants, codes like P1, P2, C may be adopted.

Profile of Word Errors and Retrieval in Speech (POWERS) coding

Measures

The POWERS coding system addresses the need to assess language abilities in conversation for people with aphasia. DIAAD facilitates quantification of the following subset of POWERS variables for both the clinician and client (see the POWERS manual for full details):

filled pauses - disfluencies like "um", "uh", "er", etc.
speech units - these more or less map onto tokens excluding filled pauses
content words - nouns (including proper nouns), non-auxiliary verbs, adjectives, -ly-terminal adverbs, and numerals
nouns - a subset of content words
number of turns - a verbal contribution to the conversation with three types:
- substantial turn - contains at least one content word
- minimal turn - hands the turn back to the other conversation partner
- subminimal turn (a nonce, non-canonical term) - not classifiable as either type above
collaborative repair - sequences of turns devoted to overcoming communicative error/difficulty

Automation (reliability details pending)

DIAAD automates as much as possible. Below are descriptions of automatability and ICC2 utterance-level reliability metrics on a stratified (by study site, mild/severe aphasia profile, and pre-/post-tx test) random selection of XX samples (XX utterances).

fully automated with regex and spaCy (en_core_web_trf):
- filled pauses:
- speech units:
- content words:
- noun count:
semi-automated with a computational first pass followed by manual checks:
- turn type:
fully manual given the rich contextual dependencies:
- collaborative repair

Typical Workflow

Tabularize utterances (if needed)
If *Utterances*.xlsx files aren’t present, DIAAD will call RASCAL to read .cha files and tabularize utterances, Assigning samples unique identifiers at the utterance and transcript levels.
Prepare POWERS coding files
diaad powers make creates full dataset plus reliability coding workbooks, with most coding automated.
Human coding
Coders complete POWERS annotations in the generated spreadsheets.
Analyze
diaad powers analyze aggregates and reports POWERS metrics at the turn, speaker, and dialog levels.
Reliability evaluation
diaad powers evaulate matches reliability files and runs ICC2 evaluation.
Reliability subset (optional)
diaad powers reselect Reselects reliability coding subset if ICC2 measures fail to meet threshold (0.7 a typical minimum).

🧪 Testing

This project uses pytest for its testing suite.
All tests are located under the tests/ directory, organized by module/function.

Running Tests

To run the full suite:

pytest

Run "quietly":

pytest -q

Run a specific test file:

pytest tests/test_samples/test_digital_convo_turns_analyzer.py

Status and Contact

I warmly welcome feedback, feature suggestions, or bug reports. Feel free to reach out by:

Submitting an issue through the GitHub Issues tab
Emailing me directly at: nsm [at] temple.edu

Thanks for your interest and collaboration!

Citation & Acknowledgments

Full details of the POWERS coding system can be found in the manual:

Herbert, R., Best, W., Hickin, J., Howard, D., & Osborne, F. (2013). Powers: Profile of word errors and retrieval in speech: An assessment tool for use with people with communication impairment. CQUniversity.

If DIAAD supports your work, please cite the repo:

McCloskey N. (2025). DIAAD: Digital Interface for Aggregate Analysis of Dialog. GitHub repository. https://github.com/nmccloskey/diaad

Project details

These details have not been verified by PyPI

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

0.2.2

Jan 31, 2026

0.2.1

Jan 9, 2026

0.2.0

Oct 2, 2025

This version

0.1.0

Oct 1, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

diaad-0.1.0.tar.gz (28.8 kB view details)

Uploaded Oct 1, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

diaad-0.1.0-py3-none-any.whl (26.2 kB view details)

Uploaded Oct 1, 2025 Python 3

File details

Details for the file diaad-0.1.0.tar.gz.

File metadata

Download URL: diaad-0.1.0.tar.gz
Upload date: Oct 1, 2025
Size: 28.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for diaad-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`2e5a180e70fabcb5e7d612ce8cc64ae930e8e4aab112cabc2c34ee00c25e746a`
MD5	`c9e9bbbde94bcd47ee748c14609c08da`
BLAKE2b-256	`28fd627bb860cae932c3260f6d2a752d0cbb96923ff470a45144b4a9577484d4`

See more details on using hashes here.

File details

Details for the file diaad-0.1.0-py3-none-any.whl.

File metadata

Download URL: diaad-0.1.0-py3-none-any.whl
Upload date: Oct 1, 2025
Size: 26.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for diaad-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4f5c4a3046102d843b2a5f03c5fc273785c45514f2fe8c95c9cfbd18c4836d4c`
MD5	`f63bcbaf2b04e35654b746ef98f960b2`
BLAKE2b-256	`5524c99757851f4c54d16b3c6499c29b5dfd7d02c204d6566146ecd383d3d286`

See more details on using hashes here.

diaad 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

DIAAD — Digital Interface for Aggregate Analysis of Dialog

Overview (more details below)

Web App

Installation

1. Create and activate your environment:

Install from GitHub:

Setup

1. Create your working directory:

2. Provide a config.yaml file

Quickstart — Command Line

Digital Conversation Turns (DCT) Protocol

Coding Procedure

1. Speaker Assignment

2. Turn Entry with Markers

3. Input Coding Table Format

Example: Digital Conversation Turns Coding Input

Analytic Opportunities

Limitations

Profile of Word Errors and Retrieval in Speech (POWERS) coding

Measures

Automation (reliability details pending)

Typical Workflow

🧪 Testing

Running Tests

Status and Contact

Citation & Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

2. Provide a `config.yaml` file