Skip to main content
Join the official 2020 Python Developers SurveyStart the survey!

A simple NLP library allows profiling datasets with one or more text columns.

Project description

NLP Profiler

License GitHub actions Code coverage Sourcery Codeac PyPI version Python versions PyPi stats

A simple NLP library allows profiling datasets with one or more text columns.

NLP Profiler returns either high-level insights or low-level/granular statistical information about the text when given a dataset and a column name containing text data, in that column.

In short: Think of it as using the pandas.describe() function or running Pandas Profiling on your data frame, but for datasets containing text columns rather than the usual columnar datasets.

Table of contents

What do you get from the library?

  • Input a Pandas dataframe series as input paramater.
  • You get back a new dataframe with various features about the parsed text per row.
    • high-level: sentiment analysis, objectivity/subjectivity analysis, spelling quality check, grammar quality check, etc...
    • low-level/granular: number of characters in the sentence, number of words, number of emojis, number of words, etc...
  • From the above numerical data in the resulting dataframe descriptive statistics can be drawn using the pandas.describe() on the dataframe.

See screenshots under the Jupyter section and also under Screenshots for further illustrations.

Under the hood it does make use of a number of libraries that are popular in the AI and ML communities, but we can extend it's functionality by replacing or adding other libraries as well.

A simple notebook have been provided to illustrate the usage of the library.

Note: _this is a new endeavour and it's may have rough edges i.e. probably NOT capable of doing many things atm. Many of these gaps are opportunities we can work on and plug, as we go along using it. Please provide constructive feedback to help with the improvement of this library. We just recently achieved this with scaling with larger datasets.


  • Python 3.6.x or higher.
  • Dependencies described in the requirements.txt.
  • High-level including Grammar checks:
    • faster processor
    • higher RAM capacity
    • working disk-space of 1 to 3 GBytes (depending on the dataset size)
  • (Optional)
    • Jupyter Lab (on your local machine).
    • Google Colab account.
    • Kaggle account.
    • Grammar check functionality:
      • Internet access
      • Java 8 or higher

Getting started


Look at a short demo of the NLP Profiler library at one of these:

Demo of the NLP Profiler library (Abhishek talks #6) or you find the rest of the talk here Demo of the NLP Profiler library (NLP Zurich talk) or you find the rest of the talk here


From PyPi:

pip install nlp_profiler

From the GitHub repo:

pip install git+

From the source (only for development purposes), see Developer guide


import nlp_profiler.core as nlpprof

new_text_column_dataset = nlpprof.apply_text_profiling(dataset, 'text_column')


from nlp_profiler.core import apply_text_profiling

new_text_column_dataset = apply_text_profiling(dataset, 'text_column')

See Notebooks section for further illustrations.

Developer guide

See Developer guide to know how to build, test, and contribute to the library.


After successful installation of the library, RESTART Jupyter kernels or Google Colab runtimes for the changes to take effect.

See Notebooks for usage and further details.


See Screenshots

Credits and supporters





Refer licensing (and warranty) policy.


Contributions are Welcome!

Please have a look at the CONTRIBUTING guidelines.

Please share it with the wider community (and get credited for it)!

Go to the NLP page

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for nlp-profiler, version 0.0.2
Filename, size File type Python version Upload date Hashes
Filename, size nlp_profiler-0.0.2-py2.py3-none-any.whl (39.9 kB) File type Wheel Python version py2.py3 Upload date Hashes View
Filename, size nlp_profiler-0.0.2.tar.gz (21.4 kB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page