Analysis of N-gram in a set of messages
Project description
Take NGram
TakeNGram is a tool to provide analysis of n-grams in a dataset of messages.
The recommendation usage is with the InsightExtractor Cloud CSV output.
The analysis consists in creation of a dictionary with the n-grams of all messages and their respective frequency. Besides the creation of word cloud of the n-grams.
All analysis can be made in a group of sentences of a subject (most useful with the Insight Extractor output).
Overview
Installation
The take_ngram
package cab be installed from PyPI.
pip install take_ngram
Usage
For usage the file must have to be a CSV
file.
All the examples are based on the Insight Extractor output.
- Creating a BiGram of the sentences and get the WordCloud.
from take_ngram import NGram
bigram = NGram('file.csv',
'Structured Message')
bigram.get_word_cloud()
- Creating a BiGram of the sentences and saving the WordCloud.
from take_ngram import NGram
bigram = NGram('teste.csv',
'Structured Message')
bigram.get_word_cloud(file_path='image.png')
- Adding stop words
from take_ngram import NGram
bigram = NGram('file.csv',
'Structured Message',
stop_words = ['segunda'])
bigram.get_word_cloud(file_path='image.png')
- Removing prepositions from stop words
- By default prepositions are added to the stop words
from take_ngram import NGram
bigram = NGram('file.csv',
'Structured Message',
remove_prepositions=False)
bigram.get_word_cloud(file_path='image.png')
- Making n-grams for some specific subjects.
from take_ngram import NGram
bigram = NGram('file.csv',
'Structured Message',
subject_column = 'Groups',
subject_list = ['fatura','plano'])
bigram.get_word_cloud(file_path='image.png')
Author
Take Blip Data&Analytics Research
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for take_ngram-0.1.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 52e41f3d60c71fb00bca0121b0b20b47c8c1dff193b4f9dd201b82928dd36251 |
|
MD5 | dc7c41c5dfa0e9ae7c9ca2205d3d9b7b |
|
BLAKE2b-256 | bbdcad3073a75ab06ea11be53c36321a12683a1b40d9999011bb49ca92de7c6d |