Generates word statistics
Project description
Monogatari
Installation
To install:
pip install monogatari
Quick start
General usage
Any dictionary file defined with the format:
%
Category_key_1 Category_value_1
Category_key_2 Category_value_2
Category_key_3 Category_value_3
Category_key_4 Category_value_4
%
Word_key_1 Category_key_1 Category_key_2
Word_key_2 Category_key_1
Word_key_3 Category_key_2
Word_key_4 Category_key_1 Category_key_3
Word_key_5 Category_key_1 Category_key_2 Category_key_4
Word_key_6 Category_key_4
Where:
- Each category key must be a number.
- Each word_key must be separated by a
tab
in between category keys. - The match is not case sensitive, that is, uppercase or lowercase won't make a difference in word_keys.
- Word_keys can contain whitespaces (ex: "kind of"), that's why it's important to separate the word_key and category_keys with a
tab
.
from monogatari import DictCounter
counter = DictCounter("dictionary_file.dic") # Load the dictionary
list_of_words = ["My", "list", "of", "words"]
counter.count(list_of_words) # This will count the words in each category
counter.top(100) # This will return a list of the top 100 categories and their count
# {("category_A", 12), ("category_B", 9), ...}
counter.top_normalized(100) # This will return a list of the top 100 categories and their normalized value
# {("category_A", 0.014), ("category_B", 0.008), ...}
counter.words_found() # This will return all categories and for each, a list of words belonging to that category
# {"category_A": ["I", "am", "here"], "category_B": ["something"], ...}
counter.reset() # This will reset the count, so that you can count again using another list of words
Pre-loaded MFD dictionary
The following example uses a pre-loaded MFD dictionary (from https://www.moralfoundations.org/).
from monogatari import MFDCounter
counter = MFDCounter()
list_of_words = ['私', 'コーヒー', '好き', '友達']
counter.count(list_of_words)
counter.top(100) # List top N categories, ordered by number of words
counter.top_normalized(100) # List top N categories, ordered by number of words normalized by the total number of words
Pre-loaded JMFD dictionary
The following example uses a pre-loaded JMFD dictionary (from https://github.com/soramame0518/j-mfd).
from monogatari import JMFDCounter
counter = JMFDCounter()
list_of_words = ['私', 'コーヒー', '好き', '友達']
counter.count(list_of_words)
counter.top(100) # List top N categories, ordered by number of words
counter.top_normalized(100) # List top N categories, ordered by number of words normalized by the total number of words
LIWC example
The library is also compatible with the LIWC dictionary.
from monogatari import DictCounter
counter = DictCounter("liwc_dictionary_file.dic") # Load the dictionary
list_of_words = ["My", "list", "of", "words"]
counter.count(list_of_words) # This will count the words in each category
counter.top(100) # This will return a list of the top 100 categories and their count
# {("category_A", 12), ("category_B", 9), ...}
counter.top_normalized(100) # This will return a list of the top 100 categories and their normalized value
# {("category_A", 0.014), ("category_B", 0.008), ...}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
monogatari-2.1.1.tar.gz
(11.5 kB
view hashes)
Built Distribution
monogatari-2.1.1-py3-none-any.whl
(12.1 kB
view hashes)
Close
Hashes for monogatari-2.1.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ab1c9940d1575a0681935fa51a3f100a8466d2acc05cb1369faadbf2853b17ca |
|
MD5 | aab6487534cd8c35f0f19d76c93400db |
|
BLAKE2b-256 | edc9dd7884c436767eba71040bb2dede51cc136c4ed11b97d8b29981f1757835 |