Preprocessing and Extraction of Linguistic Information for Computational Analysis
Project description
pelican_nlp stands for “Preprocessing and Extraction of Linguistic Information for Computational Analysis - Natural Language Processing”. This package enables the creation of standardized and reproducible language processing pipelines, extracting linguistic features from various tasks like discourse, fluency, and image descriptions.
Installation
Create conda environment
conda create -n pelican-nlp -c defaults python=3.10
Activate environment
conda activate pelican-nlp
Install the package using pip:
pip install pelican_nlp
For the latest development version:
pip install https://github.com/ypauli/pelican_nlp/releases/tag/v0.1.2-alpha
Usage
To run pelican_nlp you need a configuration.yml file in your project directory, which specifies the configurations used for your project. Sample configuration files can be found on the pelican_nlp github repository: https://github.com/ypauli/pelican_nlp/tree/main/sample_configuration_files
Adapt your configuration file to your needs and save your personal configuration.yml file to your main project directory.
Running pelican_nlp with your configurations can be done directly from the command line interface or via Python script.
Run from command line:
Navigate to main project directory in command line and enter the following command (Note: Folder must contain your subjects folder and your configuration.yml file):
conda activate pelican-nlp
pelican-run
Run with python script:
Create python file with IDE of your choice (e.g. Visual Studio Code, Pycharm, etc.) and copy the following code into the file: Make sure to use the previously created conda environment ‘pelican-nlp’ for your project.
Run the following Python code: .. code-block:: python
from pelican_nlp.main import Pelican
configuration_file = “/path/to/your/config/file.yml” pelican = Pelican(configuration_file) pelican.run()
Replace “/path/to/your/config/file” with the path to your configuration file located in your main project folder.
For reliable operation, data must be stored in the Language Processing Data Structure (LPDS) format, inspired by brain imaging data structure conventions.
Text and audio files should follow this naming convention:
[subjectID]_[sessionID]_[task]_[task-supplement]_[corpus].[extension]
subjectID: ID of subject (e.g., sub-01), mandatory
sessionID: ID of session (e.g., ses-01), if available
task: task used for file creation, mandatory
task-supplement: additional information regarding the task, if available
corpus: (e.g., healthy-control / patient) specify files belonging to the same group, mandatory
extension: file extension (e.g., txt / pdf / docx / rtf), mandatory
Example filenames:
sub-01_interview_schizophrenia.rtf
sub-03_ses-02_fluency_semantic_animals.docx
To optimize performance, close other programs and limit GPU usage during language processing.
Features
- Feature 1: Cleaning text files
Handles whitespaces, timestamps, punctuation, special characters, and case-sensitivity.
- Feature 2: Linguistic Feature Extraction
Extracts semantic embeddings, logits, distance from optimality, and semantic similarity.
Examples
You can find example setups on the github repository in the examples folder:
Contributing
Contributions are welcome! Please check out the contributing guide.
License
This project is licensed under Attribution-NonCommercial 4.0 International. See the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pelican_nlp-0.3.0.tar.gz.
File metadata
- Download URL: pelican_nlp-0.3.0.tar.gz
- Upload date:
- Size: 321.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4ba9ee02eb242b27a1af64ed304c246afb1b7ef3e6c46741f77dae3a8fe41da3
|
|
| MD5 |
59fa7556c268f1047e9e97f2ad5b34f9
|
|
| BLAKE2b-256 |
6dcbe5e0fbb468f1e6ae7ccbc6affce9a09746fee219c381a83f60835706cdd5
|
File details
Details for the file pelican_nlp-0.3.0-py3-none-any.whl.
File metadata
- Download URL: pelican_nlp-0.3.0-py3-none-any.whl
- Upload date:
- Size: 310.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cbf2ba107b6b5bc02ee22e12aae6b49a9c838c475d7bc6c6f9fc22759b5b87cb
|
|
| MD5 |
72badc93769474b4e34fdb701399cfac
|
|
| BLAKE2b-256 |
95b2c4d6b7378334be23f0a9dc1341aa71eb45dccc5259e13be3a8132d3a7492
|