Aggregate Linguistic Analysis of Speech Transcripts for Research

These details have not been verified by PyPI

Project description

ALASTR – Aggregate Linguistic Analysis of Speech Transcripts for Research

Status: Active development (early-stage, version 0.0.1a1).
Stability: APIs, module layout, and CLI interfaces are subject to change.
Audience: Researchers and clinicians working with clinical aphasiology and SLP discourse data.

ALASTR is a Python toolkit for scalable, scriptable analysis of clinical speech and language transcripts, with an emphasis on aphasia-focused workflows. It is designed to complement existing CHAT/CLAN-based pipelines by adding reproducible batch processing, richer linguistic feature extraction, and integration with downstream statistical analyses.

While ALASTR draws on concepts and components piloted in earlier prototypes (e.g., CLATR), it is being developed as the lab-facing, aphasiology-specialized system, with a clearer focus on clinical narratives, paraphasias, disfluencies, and other discourse-level phenomena relevant to treatment and outcomes research.

Core Aims

Scalability: Process many transcripts in batch (across participants, timepoints, or conditions) with consistent configuration and logging.
Clinical relevance: Target metrics and summaries that are meaningful for aphasiology and speech–language pathology.
Interoperability with CHAT/CLAN: Leverage automation to populate tiers (e.g.,morphology) in CHAT-formatted (.cha) transcripts, enabling semi-automated workflows.
Integration with other tools: Provide hooks for metrics and outputs from systems such as RASCAL (monologic discourse analysis) and DIAAD (dialogue analysis).

High-Level Functionality (Planned / Emerging)

Transcript ingestion and organization
- Read, validate, and organize transcripts (e.g., by group, site, timepoint).
- Support CHAT-formatted transcripts, with planned adapters for other formats.
Linguistic feature extraction
- Token-level and utterance-level features using spaCy and related NLP libraries.
- Tier-aware processing (e.g., mapping CHAT tiers into structured tables).
- Preliminary support for paraphasia and disfluency-related annotations.
Batch summarization and export
- Participant-level and group-level summary tables (e.g., lexical, syntactic, discourse measures).
- Integration points for CoreLex counts (via RASCAL) and other domain metrics.
- Consistent output schemas suitable for downstream statistics in R, Python, or other tools.

Installation (Early Preview)

From Github:

git clone https://github.com/nmccloskey/ALASTR.git
cd ALASTR
pip install -e .

From PyPI:

pip install alastr

You may wish to create and activate a dedicated virtual environment or conda environment before installing.

Usage (Very Early Sketch)

CLI and API interfaces are still evolving. A minimal example of the intended usage pattern might eventually look like:

alastr run \
  --config path/to/config.yaml \
  --input-transcripts path/to/cha/files \
  --output-dir path/to/output

or, in Python:

from alastr.pipeline import run_pipeline

run_pipeline(
    config_path="path/to/config.yaml",
    input_root="path/to/cha/files",
    output_root="path/to/output",
)

Exact function names and options are likely to change as the design stabilizes.

Project Status and Roadmap

ALASTR is under active development and not yet recommended for routine clinical or research deployment. Near-term goals include:

Stabilizing the package layout and configuration system.
Implementing an end-to-end demo pipeline on a small aphasia dataset.
Adding basic tests and continuous integration.
Documenting example workflows and key metrics for clinical researchers.

Citation and Contributions

A formal citation will be provided once an ALASTR methods paper is available. Until then, if you use concepts or code from this repository in academic work, please:

Cite the GitHub repository URL, and
Acknowledge ALASTR as an early-stage tool under development.

Issues, suggestions, and (well-scoped) pull requests are welcome, with the understanding that the codebase is still evolving.

Project details

These details have not been verified by PyPI

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

0.0.1a1 pre-release

Jan 31, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

alastr-0.0.1a1.tar.gz (64.1 kB view details)

Uploaded Jan 31, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

alastr-0.0.1a1-py3-none-any.whl (74.7 kB view details)

Uploaded Jan 31, 2026 Python 3

File details

Details for the file alastr-0.0.1a1.tar.gz.

File metadata

Download URL: alastr-0.0.1a1.tar.gz
Upload date: Jan 31, 2026
Size: 64.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for alastr-0.0.1a1.tar.gz
Algorithm	Hash digest
SHA256	`ff88a22c035c9f3910a3bc47cf0c7dc400406c5db0d9c13c699669b9c7a4dc29`
MD5	`e2f488f6439e564183fb829e17f90304`
BLAKE2b-256	`c807baa68f0ae8d1761a1529192d8366e60d83bdd213b8a34d4a2fb472d116dc`

See more details on using hashes here.

File details

Details for the file alastr-0.0.1a1-py3-none-any.whl.

File metadata

Download URL: alastr-0.0.1a1-py3-none-any.whl
Upload date: Jan 31, 2026
Size: 74.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for alastr-0.0.1a1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`cca723ed2cd0b977fcac4a9d2ae5dc41bc4dd7242e010bcef1880601bb52f978`
MD5	`769b10a8b20abc713e82916082f8d337`
BLAKE2b-256	`b564acc4c5d67cc0068797d1252e0622c1022fd303d73a234abff92abc35a33d`

See more details on using hashes here.

alastr 0.0.1a1

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

ALASTR – Aggregate Linguistic Analysis of Speech Transcripts for Research

Core Aims

High-Level Functionality (Planned / Emerging)

Installation (Early Preview)

Usage (Very Early Sketch)

Project Status and Roadmap

Citation and Contributions

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes