Aggregate Linguistic Analysis of Speech Transcripts for Research
Project description
ALASTR – Aggregate Linguistic Analysis of Speech Transcripts for Research
Status: Active development (early-stage, version 0.0.1a1).
Stability: APIs, module layout, and CLI interfaces are subject to change.
Audience: Researchers and clinicians working with clinical aphasiology and SLP discourse data.
ALASTR is a Python toolkit for scalable, scriptable analysis of clinical speech and language transcripts, with an emphasis on aphasia-focused workflows. It is designed to complement existing CHAT/CLAN-based pipelines by adding reproducible batch processing, richer linguistic feature extraction, and integration with downstream statistical analyses.
While ALASTR draws on concepts and components piloted in earlier prototypes (e.g., CLATR), it is being developed as the lab-facing, aphasiology-specialized system, with a clearer focus on clinical narratives, paraphasias, disfluencies, and other discourse-level phenomena relevant to treatment and outcomes research.
Core Aims
- Scalability: Process many transcripts in batch (across participants, timepoints, or conditions) with consistent configuration and logging.
- Clinical relevance: Target metrics and summaries that are meaningful for aphasiology and speech–language pathology.
- Interoperability with CHAT/CLAN: Leverage automation to populate tiers (e.g.,morphology) in CHAT-formatted (.cha) transcripts, enabling semi-automated workflows.
- Integration with other tools: Provide hooks for metrics and outputs from systems such as RASCAL (monologic discourse analysis) and DIAAD (dialogue analysis).
High-Level Functionality (Planned / Emerging)
-
Transcript ingestion and organization
- Read, validate, and organize transcripts (e.g., by group, site, timepoint).
- Support CHAT-formatted transcripts, with planned adapters for other formats.
-
Linguistic feature extraction
- Token-level and utterance-level features using spaCy and related NLP libraries.
- Tier-aware processing (e.g., mapping CHAT tiers into structured tables).
- Preliminary support for paraphasia and disfluency-related annotations.
-
Batch summarization and export
- Participant-level and group-level summary tables (e.g., lexical, syntactic, discourse measures).
- Integration points for CoreLex counts (via RASCAL) and other domain metrics.
- Consistent output schemas suitable for downstream statistics in R, Python, or other tools.
Installation (Early Preview)
From Github:
git clone https://github.com/nmccloskey/ALASTR.git
cd ALASTR
pip install -e .
From PyPI:
pip install alastr
You may wish to create and activate a dedicated virtual environment or conda environment before installing.
Usage (Very Early Sketch)
CLI and API interfaces are still evolving. A minimal example of the intended usage pattern might eventually look like:
alastr run \
--config path/to/config.yaml \
--input-transcripts path/to/cha/files \
--output-dir path/to/output
or, in Python:
from alastr.pipeline import run_pipeline
run_pipeline(
config_path="path/to/config.yaml",
input_root="path/to/cha/files",
output_root="path/to/output",
)
Exact function names and options are likely to change as the design stabilizes.
Project Status and Roadmap
ALASTR is under active development and not yet recommended for routine clinical or research deployment. Near-term goals include:
- Stabilizing the package layout and configuration system.
- Implementing an end-to-end demo pipeline on a small aphasia dataset.
- Adding basic tests and continuous integration.
- Documenting example workflows and key metrics for clinical researchers.
Citation and Contributions
A formal citation will be provided once an ALASTR methods paper is available. Until then, if you use concepts or code from this repository in academic work, please:
- Cite the GitHub repository URL, and
- Acknowledge ALASTR as an early-stage tool under development.
Issues, suggestions, and (well-scoped) pull requests are welcome, with the understanding that the codebase is still evolving.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file alastr-0.0.1a1.tar.gz.
File metadata
- Download URL: alastr-0.0.1a1.tar.gz
- Upload date:
- Size: 64.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ff88a22c035c9f3910a3bc47cf0c7dc400406c5db0d9c13c699669b9c7a4dc29
|
|
| MD5 |
e2f488f6439e564183fb829e17f90304
|
|
| BLAKE2b-256 |
c807baa68f0ae8d1761a1529192d8366e60d83bdd213b8a34d4a2fb472d116dc
|
File details
Details for the file alastr-0.0.1a1-py3-none-any.whl.
File metadata
- Download URL: alastr-0.0.1a1-py3-none-any.whl
- Upload date:
- Size: 74.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cca723ed2cd0b977fcac4a9d2ae5dc41bc4dd7242e010bcef1880601bb52f978
|
|
| MD5 |
769b10a8b20abc713e82916082f8d337
|
|
| BLAKE2b-256 |
b564acc4c5d67cc0068797d1252e0622c1022fd303d73a234abff92abc35a33d
|