Skip to main content

Preprocessing and Extraction of Linguistic Information for Computational Analysis

Project description

pelican_nlp stands for “Preprocessing and Extraction of Linguistic Information for Computational Analysis - Natural Language Processing”. This package enables the creation of standardized and reproducible language processing pipelines, extracting linguistic features from various tasks like discourse, fluency, and image descriptions.

PyPI version License Supported Python Versions

Installation

Install the package using pip:

pip install pelican_nlp

For the latest development version:

pip install https://github.com/ypauli/pelican_nlp/releases/tag/v0.1.2-alpha

Usage

To use the pelican_nlp package:

Adapt your configuration file to your needs. ALWAYS change the specified project folder location.

Save configuration file to main project directory.

Run from command line:

Navigate to main project directory in command line and enter the following command (Note: Folder must contain your subjects folder and your configuration.yml file):

pelican-run

Run with python script:

Create python file with IDE of your choice (e.g. Visual Studio Code, Pycharm, etc.) and copy the following code into the file:

from pelican_nlp.main import Pelican

configuration_file = "/path/to/your/config/file.yml"
pelican = Pelican(configuration_file)
pelican.run()

Replace “/path/to/your/config/file” with the path to your configuration file located in your main project folder.

For reliable operation, data must be stored in the Language Processing Data Structure (LPDS) format, inspired by brain imaging data structure conventions.

Text and audio files should follow this naming convention:

[subjectID]_[sessionID]_[task]_[task-supplement]_[corpus].[extension]

  • subjectID: ID of subject (e.g., sub-01), mandatory

  • sessionID: ID of session (e.g., ses-01), if available

  • task: task used for file creation, mandatory

  • task-supplement: additional information regarding the task, if available

  • corpus: (e.g., healthy-control / patient) specify files belonging to the same group, mandatory

  • extension: file extension (e.g., txt / pdf / docx / rtf), mandatory

Example filenames:

  • sub-01_interview_schizophrenia.rtf

  • sub-03_ses-02_fluency_semantic_animals.docx

To optimize performance, close other programs and limit GPU usage during language processing.

Features

  • Feature 1: Cleaning text files
    • Handles whitespaces, timestamps, punctuation, special characters, and case-sensitivity.

  • Feature 2: Linguistic Feature Extraction
    • Extracts semantic embeddings, logits, distance from optimality, and semantic similarity.

Examples

You can find example setups in the [examples/](https://github.com/ypauli/pelican_nlp/tree/main/examples) folder. ALWAYS change the path to the project folder specified in the configuration file to your specific project location.

Contributing

Contributions are welcome! Please check out the contributing guide.

License

This project is licensed under Attribution-NonCommercial 4.0 International. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pelican_nlp-0.2.4.tar.gz (319.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pelican_nlp-0.2.4-py3-none-any.whl (307.9 kB view details)

Uploaded Python 3

File details

Details for the file pelican_nlp-0.2.4.tar.gz.

File metadata

  • Download URL: pelican_nlp-0.2.4.tar.gz
  • Upload date:
  • Size: 319.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for pelican_nlp-0.2.4.tar.gz
Algorithm Hash digest
SHA256 e6715246ca19fb734136be45f2db4e1abe204e4e9e4519224ce0a4a1413b05d0
MD5 f1bb95dd81041e7fcd910fe14fb2bc5a
BLAKE2b-256 c00e87f8b5d0c63970f6f6871facf89229073d5ca627a43af067a5a23e3dc100

See more details on using hashes here.

File details

Details for the file pelican_nlp-0.2.4-py3-none-any.whl.

File metadata

  • Download URL: pelican_nlp-0.2.4-py3-none-any.whl
  • Upload date:
  • Size: 307.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for pelican_nlp-0.2.4-py3-none-any.whl
Algorithm Hash digest
SHA256 1e7cdc141bec639afd3740e2f6426d6e9dd79688228b312496953482195b5978
MD5 62e97c16c5bb412845e7602f0e899e8e
BLAKE2b-256 437f2d80e7514d29e2ebedbbb3bc8a8f3a1a6c8078a4478d661f1e47d426d826

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page