Skip to main content

Preprocessing and Extraction of Linguistic Information for Computational Analysis

Project description

pelican_nlp Logo

pelican_nlp stands for “Preprocessing and Extraction of Linguistic Information for Computational Analysis - Natural Language Processing”. This package enables the creation of standardized and reproducible language processing pipelines, extracting linguistic features from various tasks like discourse, fluency, and image descriptions.

PyPI version License CC BY-NC 4.0 Supported Python Versions Contributions Welcome

Installation

Create conda environment

conda create -n pelican-nlp -c defaults python=3.10

Activate environment

conda activate pelican-nlp

Install the package using pip:

pip install pelican_nlp

For the latest development version:

pip install https://github.com/ypauli/pelican_nlp/releases/tag/v0.1.2-alpha

Usage

To run pelican_nlp you need a configuration.yml file in your project directory, which specifies the configurations used for your project. Sample configuration files can be found on the pelican_nlp github repository: https://github.com/ypauli/pelican_nlp/tree/main/sample_configuration_files

Adapt your configuration file to your needs and save your personal configuration.yml file to your main project directory.

Running pelican_nlp with your configurations can be done directly from the command line interface or via Python script.

Run from command line:

Navigate to main project directory in command line and enter the following command (Note: Folder must contain your subjects folder and your configuration.yml file):

conda activate pelican-nlp
pelican-run

Run with python script:

Create python file with IDE of your choice (e.g. Visual Studio Code, Pycharm, etc.) and copy the following code into the file: Make sure to use the previously created conda environment ‘pelican-nlp’ for your project.

Run the following Python code: .. code-block:: python

from pelican_nlp.main import Pelican

configuration_file = “/path/to/your/config/file.yml” pelican = Pelican(configuration_file) pelican.run()

Replace “/path/to/your/config/file” with the path to your configuration file located in your main project folder.

For reliable operation, data must be stored in the Language Processing Data Structure (LPDS) format, inspired by brain imaging data structure conventions.

Text and audio files should follow this naming convention:

[subjectID]_[sessionID]_[task]_[task-supplement]_[corpus].[extension]

  • subjectID: ID of subject (e.g., sub-01), mandatory

  • sessionID: ID of session (e.g., ses-01), if available

  • task: task used for file creation, mandatory

  • task-supplement: additional information regarding the task, if available

  • corpus: (e.g., healthy-control / patient) specify files belonging to the same group, mandatory

  • extension: file extension (e.g., txt / pdf / docx / rtf), mandatory

Example filenames:

  • sub-01_interview_schizophrenia.rtf

  • sub-03_ses-02_fluency_semantic_animals.docx

To optimize performance, close other programs and limit GPU usage during language processing.

Features

  • Feature 1: Cleaning text files
    • Handles whitespaces, timestamps, punctuation, special characters, and case-sensitivity.

  • Feature 2: Linguistic Feature Extraction
    • Extracts semantic embeddings, logits, distance from optimality, and semantic similarity.

Examples

You can find example setups on the github repository in the examples folder:

Contributing

Contributions are welcome! Please check out the contributing guide.

License

This project is licensed under Attribution-NonCommercial 4.0 International. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pelican_nlp-0.3.3.tar.gz (340.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pelican_nlp-0.3.3-py3-none-any.whl (320.4 kB view details)

Uploaded Python 3

File details

Details for the file pelican_nlp-0.3.3.tar.gz.

File metadata

  • Download URL: pelican_nlp-0.3.3.tar.gz
  • Upload date:
  • Size: 340.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for pelican_nlp-0.3.3.tar.gz
Algorithm Hash digest
SHA256 55dfc559a8ecf5019ca86b1112a20d73a120191a00a9f8b10f903b6dbd87a425
MD5 d5d0435d67977735c68b550a475e5817
BLAKE2b-256 b6a81e08b01806be2b31544612d3b790bcbc6d80f9a82a23aef4fa1e9cb921fa

See more details on using hashes here.

File details

Details for the file pelican_nlp-0.3.3-py3-none-any.whl.

File metadata

  • Download URL: pelican_nlp-0.3.3-py3-none-any.whl
  • Upload date:
  • Size: 320.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for pelican_nlp-0.3.3-py3-none-any.whl
Algorithm Hash digest
SHA256 ff642fec8a42972185ebb1ffdd4c64ccf7316ff6954a88698fec174a639a5e1f
MD5 4609b52c53cffbeaf5a74f297c3d31f7
BLAKE2b-256 77fc5846c742219f53d1a3efc96afb127749b400f1db0ae89b21503480269543

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page