pelican-nlp

Preprocessing and Extraction of Linguistic Information for Computational Analysis

These details have not been verified by PyPI

Project links

Project description

pelican_nlp Logo

pelican_nlp stands for “Preprocessing and Extraction of Linguistic Information for Computational Analysis - Natural Language Processing”. This package enables the creation of standardized and reproducible language processing pipelines, extracting linguistic features from various tasks like discourse, fluency, and image descriptions.

Installation

Create conda environment

conda create --name pelican-nlp --channel defaults python=3.10

Activate environment

conda activate pelican-nlp

Install the package using pip:

pip install pelican-nlp

Usage

To run pelican_nlp, you need a configuration.yml file in your main project directory. This file defines the settings and parameters used for your project.

Sample configuration files are available here: https://github.com/ypauli/pelican_nlp/tree/main/examples

Adapt a sample configuration to your needs.
Save your personalized configuration.yml in the root of your project directory.

Running pelican_nlp

You can run pelican_nlp via the command line or a Python script.

From the command line:

Navigate to your project directory (must contain your participants/ folder and configuration.yml), then run:

conda activate pelican-nlp
pelican-run

To optimize performance, close other programs and limit GPU usage during language processing.

Data Format Requirements: LPDS

For reliable operation, your data must follow the Language Processing Data Structure (LPDS), inspired by brain imaging data structures like BIDS.

Main Concepts (Quick Guide)

Project Root: Contains a participants/ folder plus optional files like participants.tsv, dataset_description.json, and README.
Participants: Each participant has a folder named part-<ID> (e.g., part-01).
Sessions (Optional): For longitudinal studies, use ses-<ID> subfolders inside each participant folder.
Tasks/Contexts: Each session (or directly in the participant folder for non-longitudinal studies) includes subfolders for specific tasks (e.g., interview, fluency, image-description).
Data Files: Named with structured metadata, e.g.: part-01_ses-01_task-fluency_cat-semantic_acq-baseline_transcript.txt

Filename Structure

Filenames follow this format:

part-<id>[_ses-<id>]_task-<label>[_<key>-<value>...][_suffix].<extension>

Required Entities: part, task
Optional Entities Examples: ses, cat, acq, proc, metric, model, run, group, param
Suffix Examples: transcript, audio, embeddings, logits, annotations

Example Project Structure

my_project/
├── participants/
│   ├── part-01/
│   │   └── ses-01/
│   │       └── interview/
│   │           └── part-01_ses-01_task-interview_transcript.txt
│   └── part-02/
│       └── fluency/
│           └── part-02_task-fluency_audio.wav
├── configuration.yml
├── dataset_description.json
├── participants.tsv
└── README.md

Features

Feature 1: Cleaning text files
- Handles whitespaces, timestamps, punctuation, special characters, and case-sensitivity.
Feature 2: Linguistic Feature Extraction
- Extracts semantic embeddings, logits, distance from optimality, perplexity and semantic similarity.
Feature 3: Acoustic Feature Extraction
- Extracts prosogram and openSMILE feature.

Examples

You can find example setups on the github repository in the examples folder:

Contributing

Contributions are welcome! Please check out the contributing guide.

License

This project is licensed under Attribution-NonCommercial 4.0 International. See the LICENSE file for details.

Citation

If you use this project, please cite:

Pauli Y, Marsman J-B, Rabe F, et al. Standardising the NLP Workflow: A Framework for Reproducible Linguistic Analysis. arXiv preprint arXiv:2511.15512 [cs.CL] 2025. https://doi.org/10.48550/arXiv.2511.15512

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.3.31

Apr 10, 2026

0.3.30

Apr 2, 2026

0.3.29

Apr 1, 2026

0.3.28

Apr 1, 2026

0.3.27

Mar 11, 2026

0.3.26

Mar 10, 2026

0.3.25

Mar 10, 2026

0.3.24

Dec 16, 2025

0.3.23

Dec 4, 2025

0.3.22

Nov 14, 2025

0.3.21

Oct 17, 2025

0.3.20

Oct 17, 2025

0.3.19

Oct 17, 2025

0.3.18

Oct 1, 2025

0.3.17

Oct 1, 2025

0.3.16

Oct 1, 2025

0.3.15

Oct 1, 2025

0.3.14

Oct 1, 2025

0.3.13

Oct 1, 2025

0.3.12

Oct 1, 2025

0.3.11

Sep 11, 2025

0.3.10

Sep 11, 2025

0.3.9

Sep 2, 2025

0.3.8

Sep 2, 2025

0.3.7

Sep 2, 2025

0.3.6

Sep 2, 2025

0.3.5

Sep 2, 2025

0.3.4

Jun 20, 2025

0.3.3

Jun 20, 2025

0.3.2

May 2, 2025

0.3.1

Apr 23, 2025

0.3.0

Apr 16, 2025

0.2.7

Apr 14, 2025

0.2.6

Apr 10, 2025

0.2.5

Apr 9, 2025

0.2.4

Apr 9, 2025

0.2.3

Apr 9, 2025

0.2.2

Apr 9, 2025

0.2.1

Apr 9, 2025

0.2.0

Apr 9, 2025

0.1.3

Apr 8, 2025

0.1.2

Apr 8, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pelican_nlp-0.3.31-py3-none-any.whl (34.3 MB view details)

Uploaded Apr 10, 2026 Python 3

File details

Details for the file pelican_nlp-0.3.31-py3-none-any.whl.

File metadata

Download URL: pelican_nlp-0.3.31-py3-none-any.whl
Upload date: Apr 10, 2026
Size: 34.3 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for pelican_nlp-0.3.31-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9b92697b1e3fe95181195b22c54dae71b32ad792a443ad0a1edc3b5d97f3f45e`
MD5	`29967e7e0846307a2fd11115d099a3e7`
BLAKE2b-256	`8a66b874e777bb8ad53137c4c419eafe79c8a662a74e643055a71813cd28ff92`

See more details on using hashes here.

pelican-nlp 0.3.31

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Installation

Usage

Running pelican_nlp

Data Format Requirements: LPDS

Main Concepts (Quick Guide)

Filename Structure

Example Project Structure

Features

Examples

Contributing

License

Citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes