Skip to main content

Natural Language Processing Tools

Project description

nlp_toolbox

nlp_toolbox is an open-source GitHub repository that provides a collection of tools for natural language processing tasks. The repository provides functions for loading text from multiple sources such as the web and ebooks. Additionally, it includes functions for summarizing text, OCR, interacting with the OpenAI GPT API, and generating word clouds.

Version Download Badge Commit Badge License Badge

II. REFERENCES

2.1. How to use this package?

  • Install the stable version: pip install nlp_toolbox
  • You can install the latest nlp_toolbox version from source with the following command: pip install git+https://github.com/thinh-vu/nlp_toolbox.git@main

(*) You might need to insert a ! before your command when running terminal commands on Google Colab.

  • To start using functions, you need to import them: from nlp_toolbox import *

III. DEPENDENCIES

ChatGPT API

ChatGPT API simplifies NLP tasks by allowing you to send a request in a prompt format and receive a response without the need for dependent packages.

To send a request to ChatGPT API endpoint, you will need an OpenAI API key that can be obtained from OpenAI.

spacy

Download pre-trained files using terminal.

python -m spacy download en_core_web_lg
python -m spacy download en_core_web_sm

pytesseract

Quick installation guide

On Linux

sudo apt-get update
sudo apt-get install libleptonica-dev tesseract-ocr tesseract-ocr-dev libtesseract-dev python3-pil tesseract-ocr-eng tesseract-ocr-script-latn

On Mac brew install tesseract

On Windows

pip install tesseract
pip install tesseract-ocr

Reference: Installation guide

  • For Windows: Specific the location of the pytesseract by adding this code to your python project pytesseract.pytesseract.tesseract_cmd = r'C:\Users\mrthi\AppData\Local\Tesseract-OCR\tesseract.exe'

  • Add pre-trained languages data:

    • Visit github and download the data file: tessdata, eg: vie.traineddata for Vietnamese
    • Copy the downloaded file to the tessdata folder, Eg C:\Users\YOUR-USER-NAME\AppData\Local\Tesseract-OCR\tessdata

IV. 🙋‍♂️ CONTACT INFORMATION

You can contact me at one of my social network profiles:


If you find value in my open-source projects and would like to support their development, you can donate via Paypal or Momo e-wallet (VN). Your contribution will help me maintain my blog hosting fee and continue to create high-quality content. Thank you for your support!

momo-qr

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nlp_toolbox-0.0.1.tar.gz (7.0 kB view hashes)

Uploaded Source

Built Distribution

nlp_toolbox-0.0.1-py3-none-any.whl (8.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page