Visualization Module for Natural Language Processing
Project description
nlplot
Visualization Module for Natural Language Processing
Description
Facilitates the visualization of natural language processing and provides quicker analysis
You can draw the following graph
- N-gram bar chart
- N-gram tree Map
- Histogram of the word count
- wordcloud
- co-occurrence networks
- sunburst chart
- pyLDAvis
(Tested in English and Japanese)
Requirement
Install
pip install nlplot
I've posted on this blog about the specific use. (Japanese)
And, The sample code is also available in the kernel of kaggle. (English)
Usage
sample df
df.head()
text | |
---|---|
0 | Think rich look poor |
1 | When you come to a roadblock, take a detour |
2 | When it is dark enough, you can see the stars |
3 | Never let your memories be greater than your dreams |
4 | Victory is sweetest when you’ve known defeat |
import nlplot
# target_col as a list type or a string separated by a space.
npt = nlplot.NLPlot(df, target_col='text')
# Stopword calculations can be performed.
stopwords = npt.get_stopword(top_n=30, min_freq=0)
# 1. N-gram bar chart
npt.bar_ngram(title='uni-gram', ngram=1, top_n=50, stopwords=stopwords)
npt.bar_ngram(title='bi-gram', ngram=2, top_n=50, stopwords=stopwords)
# 2. N-gram tree Map
npt.treemap(title='Tree of Most Common Words', ngram=1, top_n=30, stopwords=stopwords)
# 3. Histogram of the word count
npt.word_distribution(title='words distribution')
# 4. wordcloud
npt.wordcloud(stopwords=stopwords, colormap='tab20_r')
# 5. co-occurrence networks
npt.build_graph(stopwords=stopwords, min_edge_frequency=10)
# The number of nodes and edges to which this output is plotted.
# If this number is too large, plotting will take a long time, so adjust the [min_edge_frequency] well.
>> node_size:70, edge_size:166
npt.co_network(title='Co-occurrence network')
# 6. sunburst chart
npt.sunburst(title='sunburst chart', colorscale=True)
# 7. pyLDAvis
# If you want to run it in a notebook environment, you need to use the import and magic commands
import pyLDAvis
pyLDAvis.enable_notebook()
npt.ldavis(num_topics=5, passes=5, save=False)
Document
TBD
Test
TBD
Other
-
Plotly is used to plot the figure
-
co-occurrence networks is used to calculate the co-occurrence network
-
The following is used to plot pyLDAvis
-
wordcloud uses the following fonts
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
nlplot-1.2.0.tar.gz
(968.6 kB
view hashes)
Built Distribution
nlplot-1.2.0-py3-none-any.whl
(968.4 kB
view hashes)