Skip to main content

A simple NLP library allows profiling datasets with one or more text columns.

Project description

NLP Profiler

||| Gitter ||| License GitHub actions Code coverage Sourcery Codeac PyPI version Python versions PyPi stats Downloads

A simple NLP library that allows profiling datasets with one or more text columns.

NLP Profiler returns either high-level insights or low-level/granular statistical information about the text when given a dataset and a column name containing text data, in that column.

In short: Think of it as using the pandas.describe() function or running Pandas Profiling on your data frame, but for datasets containing text columns rather than the usual columnar datasets.

Table of contents


What do you get from the library?

  • Input a Pandas dataframe series as an input parameter.
  • You get back a new dataframe with various features about the parsed text per row.
    • High-level: sentiment analysis, objectivity/subjectivity analysis, spelling quality check, grammar quality check, ease of readability check, etc...
    • Low-level/granular: number of characters in the sentence, number of words, number of emojis, number of words, etc...
  • From the above numerical data in the resulting dataframe descriptive statistics can be drawn using the pandas.describe() on the dataframe.

See screenshots under the Jupyter section and also under Screenshots for further illustrations.

Under the hood it does make use of a number of libraries that are popular in the AI and ML communities, but we can extend it's functionality by replacing or adding other libraries as well.

A simple notebook have been provided to illustrate the usage of the library.

Please join the Gitter.im community and say "hello" to us, share your feedback, have a fun time with us.

Note: this is a new endeavour and it may have rough edges i.e. NLP_Profiler in its current version is probably NOT capable of doing many things. Many of these gaps are opportunities we can work on and plug, as we go along using it. Please provide constructive feedback to help with the improvement of this library. We just recently achieved this with scaling with larger datasets.

Requirements

  • Python 3.6.x or higher.
  • Dependencies described in the requirements.txt.
  • High-level including Grammar checks:
    • faster processor
    • higher RAM capacity
    • working disk-space of 1 to 3 GBytes (depending on the dataset size)
  • (Optional)
    • Jupyter Lab (on your local machine).
    • Google Colab account.
    • Kaggle account.
    • Grammar check functionality:
      • Internet access
      • Java 8 or higher

Getting started

Demo and presentations

Look at a short demo of the NLP Profiler library at one of these:

Demo of the NLP Profiler library (Abhishek talks #6) or you find the rest of the talk here or here for slides Demo of the NLP Profiler library (NLP Zurich talk) or you find the rest of the talk here or here for slides

Installation

For Conda/Miniconda environments:

conda config --set pip_interop_enabled True
pip install "spacy >= 2.3.0,<3.0.0"         # in case spacy is not present
python -m spacy download en_core_web_sm

### now perform any of the below pathways/options

From PyPi:

pip install -U nlp_profiler

From the GitHub repo:

pip install -U git+https://github.com/neomatrix369/nlp_profiler.git@master

From the source (only for development purposes), see Developer guide

Usage

import nlp_profiler.core as nlpprof

new_text_column_dataset = nlpprof.apply_text_profiling(dataset, 'text_column')

or

from nlp_profiler.core import apply_text_profiling

new_text_column_dataset = apply_text_profiling(dataset, 'text_column')

See Notebooks section for further illustrations.

Developer guide

See Developer guide to know how to build, test, and contribute to the library.

Notebooks

After successful installation of the library, RESTART Jupyter kernels or Google Colab runtimes for the changes to take effect.

See Notebooks for usage and further details.

Screenshots

See Screenshots

Credits and supporters

See CREDITS_AND_SUPPORTERS.md

Changes

See CHANGELOG.md

License

Refer licensing (and warranty) policy.

Contributing

Contributions are Welcome!

Please have a look at the CONTRIBUTING guidelines.

Please share it with the wider community (and get credited for it)!


Go to the NLP page

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nlp_profiler-0.0.3.tar.gz (1.9 MB view details)

Uploaded Source

Built Distribution

nlp_profiler-0.0.3-py2.py3-none-any.whl (49.3 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file nlp_profiler-0.0.3.tar.gz.

File metadata

  • Download URL: nlp_profiler-0.0.3.tar.gz
  • Upload date:
  • Size: 1.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.7.2

File hashes

Hashes for nlp_profiler-0.0.3.tar.gz
Algorithm Hash digest
SHA256 c5b032dbf984c930ba5e5ae627c20ee170f02bf6f6c5831938cfc53622ae3550
MD5 2d00dcac2af58549269a5861a1f18aad
BLAKE2b-256 0e58b4dfbb5ce0e390c80063809f372fa1be61e34e2ceb7812b65144ae767a42

See more details on using hashes here.

File details

Details for the file nlp_profiler-0.0.3-py2.py3-none-any.whl.

File metadata

  • Download URL: nlp_profiler-0.0.3-py2.py3-none-any.whl
  • Upload date:
  • Size: 49.3 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.7.2

File hashes

Hashes for nlp_profiler-0.0.3-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 cdbc022993ae78570c8c3d7404093d034c93b47fe9a0ce6a19192ea05a1f06c4
MD5 0f8be2bf29b567b6001c9f04de32373d
BLAKE2b-256 f0f70410e2a430dd83b844b6b120331aead5908467b34464fc23ccbfffd1e28f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page