Skip to main content

Visualization Module for Natural Language Processing

Project description

📝 nlplot

nlplot: Analysis and visualization module for Natural Language Processing 📈

Description

Facilitates the visualization of natural language processing and provides quicker analysis

You can draw the following graph

  1. N-gram bar chart
  2. N-gram tree Map
  3. Histogram of the word count
  4. wordcloud
  5. co-occurrence networks
  6. sunburst chart

(Tested in English and Japanese)

Requirement

Installation

pip install nlplot

I've posted on this blog about the specific use. (Japanese)

And, The sample code is also available in the kernel of kaggle. (English)

Quick start - Data Preparation

The column to be analyzed must be a space-delimited string

# sample data
target_col = "text"
texts = [
    "Think rich look poor",
    "When you come to a roadblock, take a detour",
    "When it is dark enough, you can see the stars",
    "Never let your memories be greater than your dreams",
    "Victory is sweetest when you’ve known defeat"
    ]
df = pd.DataFrame({target_col: texts})
df.head()
text
0 Think rich look poor
1 When you come to a roadblock, take a detour
2 When it is dark enough, you can see the stars
3 Never let your memories be greater than your dreams
4 Victory is sweetest when you’ve known defeat

Quick start - Python API

import nlplot
import pandas as pd
import plotly
from plotly.subplots import make_subplots
from plotly.offline import iplot
import matplotlib.pyplot as plt

%matplotlib inline

# target_col as a list type or a string separated by a space.
npt = nlplot.NLPlot(df, target_col='text')

# Stopword calculations can be performed.
stopwords = npt.get_stopword(top_n=30, min_freq=0)

# 1. N-gram bar chart
fig_unigram = npt.bar_ngram(
    title='uni-gram',
    xaxis_label='word_count',
    yaxis_label='word',
    ngram=1,
    top_n=50,
    width=800,
    height=1100,
    color=None,
    horizon=True,
    stopwords=stopwords,
    verbose=False,
    save=False,
)
fig_unigram.show()

fig_bigram = npt.bar_ngram(
    title='bi-gram',
    xaxis_label='word_count',
    yaxis_label='word',
    ngram=2,
    top_n=50,
    width=800,
    height=1100,
    color=None,
    horizon=True,
    stopwords=stopwords,
    verbose=False,
    save=False,
)
fig_bigram.show()


# 2. N-gram tree Map
fig_treemap = npt.treemap(
    title='Tree map',
    ngram=1,
    top_n=50,
    width=1300,
    height=600,
    stopwords=stopwords,
    verbose=False,
    save=False
)
fig_treemap.show()


# 3. Histogram of the word count
fig_histgram = npt.word_distribution(
    title='word distribution',
    xaxis_label='count',
    yaxis_label='',
    width=1000,
    height=500,
    color=None,
    template='plotly',
    bins=None,
    save=False,
)
fig_histgram.show()


# 4. wordcloud
fig_wc = npt.wordcloud(
    width=1000,
    height=600,
    max_words=100,
    max_font_size=100,
    colormap='tab20_r',
    stopwords=stopwords,
    mask_file=None,
    save=False
)
plt.figure(figsize=(15, 25))
plt.imshow(fig_wc, interpolation="bilinear")
plt.axis("off")
plt.show()


# 5. co-occurrence networks
npt.build_graph(stopwords=stopwords, min_edge_frequency=10)
# The number of nodes and edges to which this output is plotted.
# If this number is too large, plotting will take a long time, so adjust the [min_edge_frequency] well.
# >> node_size:70, edge_size:166
fig_co_network = npt.co_network(
    title='Co-occurrence network',
    sizing=100,
    node_size='adjacency_frequency',
    color_palette='hls',
    width=1100,
    height=700,
    save=False
)
iplot(fig_co_network)


# 6. sunburst chart
fig_sunburst = npt.sunburst(
    title='sunburst chart',
    colorscale=True,
    color_continuous_scale='Oryel',
    width=1000,
    height=800,
    save=False
)
fig_sunburst.show()


# other
# The original data frame of the co-occurrence network can also be accessed
display(
    npt.node_df.head(), npt.node_df.shape,
    npt.edge_df.head(), npt.edge_df.shape
)

Document

TBD

Test

cd tests
pytest

Other

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nlplot-1.6.0.tar.gz (968.8 kB view details)

Uploaded Source

Built Distribution

nlplot-1.6.0-py3-none-any.whl (967.9 kB view details)

Uploaded Python 3

File details

Details for the file nlplot-1.6.0.tar.gz.

File metadata

  • Download URL: nlplot-1.6.0.tar.gz
  • Upload date:
  • Size: 968.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.6

File hashes

Hashes for nlplot-1.6.0.tar.gz
Algorithm Hash digest
SHA256 c69ca2149f7b46011cb20674e152906939d00ce0bdc5a8cf5604c88d08cdd079
MD5 db79fdce82cd6bf0c5e834b2aa49a4f6
BLAKE2b-256 ea92482a573afbf45b07ace7dd35584e7b6d19c618811e11fc893353cdbe9dd2

See more details on using hashes here.

File details

Details for the file nlplot-1.6.0-py3-none-any.whl.

File metadata

  • Download URL: nlplot-1.6.0-py3-none-any.whl
  • Upload date:
  • Size: 967.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.6

File hashes

Hashes for nlplot-1.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 bb4d060802bef2459b02cfda6307b3ae038a36122dc0d513d0b2d733db153f5d
MD5 2f4bf279e0bfee29951cfedddf20b880
BLAKE2b-256 8c99775fdfc2911cf1411ac183dd92259f1efe14304312bb93f60a0d6f7a5c7e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page