A brief description of your package

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 2 - Pre-Alpha
Intended Audience
- Science/Research
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3
Topic
- Software Development :: Libraries :: Python Modules

Project description

DanGam

DanGam is a Python package designed for advanced emotion analysis in text, mainly focused on the Korean language.
DanGam provides insights into the emotional tone of texts, aiming for more accurate and context-aware sentiment analysis.
The name DanGam came from the abbreviation of "Word-Emotion" in Korean (단어-감정).

[!IMPORTANT] Latest Version of the model is 0.0.133
GPU Enabled from Version 0.0.130

Installation

DanGam can be easily installed via pip.
Simply run the following command in your terminal:

pip install DanGam

[!TIP]
Just in case if you encounter issues with other packages, try run the following command in your terminal before installing DanGam.
If you still encounter any problems installing DanGam, please report it to me.

pip install torch numpy pandas tqdm transformers scipy regex

Features

Sentence Emotion Segmentation: DanGam segments sentences and identifies their overarching emotional tone (positive, negative, or neutral).
Word-Level Emotion Analysis: It goes deeper into the emotional analysis by evaluating the sentiment of individual words within the context of their sentences.
Customizability: Flexible configuration options allow users to tailor the analysis to specific requirements.
Support for Korean Language: Specifically for Korean language texts, offering more reliable results than general-purpose sentiment analysis tools.

Quick Start

from dangam import DanGam

# Initialize the DanGam
dangam = DanGam()
# add configuration dictionary if needed.
# details explained after this code cell.

# Example text
text = "나는 방금 먹은 마라탕이 너무 좋다. 적당한 양념에 알싸한 마라향이 미쳤다. 그런데 고수는 진짜 싫다!"
original_emotion = "positive"
default_emotion = "good food"
normalized_specific_emotion = "satisfied"

# Analyze the emotion of the sentence
emotion, specified_emotion = dangam.get_emotion(text, original_emotion, default_emotion, normalized_specific_emotion)

print("Sentence Emotion:", emotion)
print("Specified Emotion:", specified_emotion)
#### Sentence Emotion: positive
#### Specified Emotion: satisfied

# Analyze the emotion of each word

words_emotion = dangam.word_emotions(text, emotion, specified_emotion)
print(words_emotion)
# {'나는': 1.0,
# '방금': 0.8419228076866834,
# '먹은': 1.0,
# '마라탕이': 0.8522973110543406,
# '너무': 1.0,
# '좋다': 1.0,
# '적당한': 0.965806179144829,
# '양념에': 0.7151325862316465,
# '알싸한': 0.4678710873322536,
# '마라향이': 0.328179239525493,
# '미쳤다': 0.34263925379014165,
# '그런데': -0.07491504014905744,
# '고수는': -0.7992964009024587,
# '진짜': -0.9295882226863167,
# '싫다': -0.9120299268217638}

Configuration

DanGam allows a wide range of degrees of customization. _{(at least trying)}
You can modify various settings like model names, column names, etc., to fit your specific needs.

Initialization:

When initially calling DanGam, you can add configuration setting in a form of Dictionary.
```
dangam = DanGam(cfg:dict)
```

The dictionary should be in the format of

{"model_name":"hf/some_model", "sub_model_name":"hf/some_model", ...}

You can modify a part of the configuration; it will use the default configuration for not mentioned ones.

config_info():
Prints the current configuration information of the DanGam.
Includes details about the models used, text and emotion column names, and other settings.

check_default():
Outputs the default configuration values for reference.

check_config():
Returns the current configuration of DanGam as a dictionary.

update_config(config):
Update the configuration of DanGam and reinitialize components as necessary.

List of modifiable configurations

- model_name
  - The model that will run through the first loop of the sentence segmentation.

- sub_model_name
  - The model that will run through the second loop of the sentence segmentation.

- word_senti_model_name
  - The model that will through the loop of the word segmentation.

- text_col
  - The name of the column that you want to segment the emotion.

- default_emotion_column
  - Pre-labeled emotion by user.

- original_emotion_column
  - Pre-segmented emotions by user.
  - Performs the best if this section is segmented into 'positive', 'negative', 'neutral'.
  - Used for accuracy evaluation.

- normalized_emotion_column
  - Normalized pre-labeled emotion.
  - Performs the best if this section is in English.
  - Directly used from the second loop, since it will only segment positive, negative, neutral.
  - Not into 60 different emotions.

- sentence_emotion_column
  - The column name of sentence emotion (pos/neg/neut) you want this module to set.

- sentence_specific_emotion_column
  - The column name of sentence emotion (pos/neg/neut) you want this module to set.

- truncation
  - Turning on and off Truncation throughout the module.

- max_length
  - Max length for chunk_text

- emotion_threshold
  - The threshold for emotion and specific emotion embeddings are adjusted accordingly to refine the combined embedding, ensuring a more nuanced sentiment analysis.

- alignment_threshold
  - The threshold for the cosine similarity between the combined sentence-emotion embedding and each individual word embedding.

- emotion_weight_reach_threshold
  - The weight to be multiplied on emotion embedding when similarity exceeds the threshold.

- emotion_weight_not_reach_threshold
  - The weight to be multiplied on emotion embedding when similarity doesn't exceed the threshold.

- specific_weight_reach_threshold
  - The weight to be multiplied on specific emotion embedding when similarity exceeds the threshold.

- specific_weight_not_reach_threshold
  - The weight to be multiplied on specific emotion embedding when similarity doesn't exceed the threshold.

- noun_threshold
  - The threshold for deciding the emotion segment of a word.

Core Functionality

The primary objective of word_segmentator is to assign sentiment scores to each word in a given sentence.
These scores are not just arbitrary numbers; they represent how closely each word aligns with the overall emotional tone of the sentence.
This process involves several steps, starting from embedding extraction to sentiment score normalization.

get_emotion(sentence, origianl_emotion, default_specific_emotion, normalized_emotion):
Determines the overall emotion of a given sentence by analyzing it in chunks.
Considers both the general and specific emotions to enhance accuracy.

Arguments: - sentence : str - The sentence to extract the emotions from. - original_emotion (str) -> optional : The pre-segmented emotion (positive, negative, neutral) - default_specific_emotion (str) -> optional : The pre-segmented specific emotion (love, thrilled, happy, sad, etc..) - normalized_emotion (str) -> optional : Normalized User input emotion (good food, bad person, lovely day, etc..) Returns: - emotion (str) : A string of overall emotion of the sentence. (positive, neutral, negative) - specific_emotion (str) : A string of specific emotion of the sentence. (one out of 60 emotions)

[!WARNING]
Depreciated from Ver 0.0.133
~~match_rate_calc(df):~~
~~Calculates the accuracy of emotion predictions in a dataframe by comparing predicted emotions with their original annotations.~~

word_emotions(sentence, emotion, specific_emotion):
Segments a sentence and assigns emotions to each word based on the overall sentence emotion and specific emotion.
Args:
- sentence (str): The sentence for segmentation. - emotion (str) -> Optional: The general emotion of the sentence. - specific_emotion (str) -> Optional: The specific emotion of the sentence.

Returns: dict: A dictionary mapping each word in the sentence to its assigned emotion.

noun_emotions(sentence, noun_list, count):
Analyzes emotions associated with specific nouns within a sentence. Args: sentence (str): The sentence containing the nouns for emotion analysis. emotion (str) -> Optional: The general emotion of the sentence. specific_emotion (str) -> Optional: The specific emotion of the sentence. noun_list (list): A list of nouns to analyze within the sentence. count (bool) : True or False for switching on off counting the number of nouns in each segment. Returns: dict: A dictionary categorizing nouns into positive, neutral, and negative based on their associated emotions.

Embedding Extraction and Analysis

The function begins by extracting embeddings for each word in the sentence, as well as for the sentence as a whole.
Embeddings are essentially numerical representations that capture the semantic essence of words and sentences.
For a more nuanced analysis, it also considers specific emotion embeddings, which are representations of predefined emotional states or tones.
By comparing word embeddings with the sentence and emotion embeddings, the function can gauge the degree of emotional congruence or divergence each word has with the overall sentence sentiment.

Sentiment Score Calculation

The core of word_segmentator lies in calculating these sentiment scores.
It does so by measuring the cosine similarity between the combined sentence and emotion embeddings and individual word embeddings.
This similarity metric is then adjusted to account for dissimilarities.
The function implements a threshold-based mechanism to enhance the accuracy of these calculations, ensuring that the scores genuinely reflect whether each word shares or contrasts with the sentence's emotional tone.

Normalization and Interpretation

Post-calculation, the sentiment scores undergo normalization, a crucial step to ensure that the scores are within a consistent range (typically -1 to 1).
This normalization helps in interpreting the scores uniformly across different sentences and contexts.
A score closer to 1 indicates a strong alignment with the sentence's emotion, whereas a score near -1 suggests a contrast.
Scores around 0 imply neutrality or a lack of strong emotional alignment.

Diagram

Contributing

Contributions to DanGam are welcome!
Whether it's feature requests, bug reports, or code contributions, please feel free to contribute.
_{If you are interested in hiring me, please feel free to contact jason.heesang.lee96@gmail.com}

License

Dangam is released under MIT License, making it suitable for both personal and commercial use.

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 2 - Pre-Alpha
Intended Audience
- Science/Research
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3
Topic
- Software Development :: Libraries :: Python Modules

Release history Release notifications | RSS feed

0.0.137

Dec 19, 2023

0.0.136

Dec 19, 2023

0.0.135

Dec 18, 2023

This version

0.0.134

Dec 13, 2023

0.0.133

Dec 13, 2023

0.0.132

Dec 13, 2023

0.0.131

Dec 13, 2023

0.0.130

Dec 12, 2023

0.0.129

Dec 11, 2023

0.0.127

Dec 11, 2023

0.0.126

Dec 11, 2023

0.0.125

Dec 11, 2023

0.0.124

Dec 11, 2023

0.0.123

Dec 11, 2023

0.0.122

Dec 11, 2023

0.0.121

Dec 11, 2023

0.0.12

Dec 11, 2023

0.0.11

Dec 10, 2023

0.0.10

Dec 10, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

DanGam-0.0.134.tar.gz (15.1 kB view hashes)

Uploaded Dec 13, 2023 Source

Built Distribution

DanGam-0.0.134-py3-none-any.whl (16.5 kB view hashes)

Uploaded Dec 13, 2023 Python 3

Hashes for DanGam-0.0.134.tar.gz

Hashes for DanGam-0.0.134.tar.gz
Algorithm	Hash digest
SHA256	`b1b38b5ca38be52e4e27ccec08e31391a2e4a5e1d0ab2cc6f96635a3423ac534`
MD5	`6c9e65ed592cebbf6ae2f160495a0465`
BLAKE2b-256	`377ae7e16012401eeceb15915af031b1d08500723f2e912b51c457ce3fda177c`

Hashes for DanGam-0.0.134-py3-none-any.whl

Hashes for DanGam-0.0.134-py3-none-any.whl
Algorithm	Hash digest
SHA256	`03712f2285438b391e8de7343f715faf54f85f86239b2004e5979d8e4a209635`
MD5	`5c04a4c320ee4ba9f40a8c13d0362b24`
BLAKE2b-256	`cfdf1793a18dec826cc760342d10d2db492b9c5797f517f591ef98f6eb49f4c6`