Skip to main content

Natural Language Processing Tools

Project description

nlp_toolbox

nlp_toolbox is an open-source GitHub repository that provides a collection of tools for natural language processing tasks. The repository provides functions for loading text from multiple sources such as the web and ebooks. Additionally, it includes functions for summarizing text, OCR, interacting with the OpenAI GPT API, and generating word clouds.

Version Download Badge Commit Badge License Badge

II. REFERENCES

2.1. How to use this package?

  • Install the stable version: pip install nlp_toolbox
  • You can install the latest nlp_toolbox version from source with the following command: pip install git+https://github.com/thinh-vu/nlp_toolbox.git@main

(*) You might need to insert a ! before your command when running terminal commands on Google Colab.

  • To start using functions, you need to import them: from nlp_toolbox import *

III. DEPENDENCIES

ChatGPT API

ChatGPT API simplifies NLP tasks by allowing you to send a request in a prompt format and receive a response without the need for dependent packages.

To send a request to ChatGPT API endpoint, you will need an OpenAI API key that can be obtained from OpenAI.

spacy

Download pre-trained files using terminal.

python -m spacy download en_core_web_lg
python -m spacy download en_core_web_sm

pytesseract

Quick installation guide

On Linux

sudo apt-get update
sudo apt-get install libleptonica-dev tesseract-ocr tesseract-ocr-dev libtesseract-dev python3-pil tesseract-ocr-eng tesseract-ocr-script-latn

On Mac brew install tesseract

On Windows

pip install tesseract
pip install tesseract-ocr

Reference: Installation guide

  • For Windows: Specific the location of the pytesseract by adding this code to your python project pytesseract.pytesseract.tesseract_cmd = r'C:\Users\mrthi\AppData\Local\Tesseract-OCR\tesseract.exe'

  • Add pre-trained languages data:

    • Visit github and download the data file: tessdata, eg: vie.traineddata for Vietnamese
    • Copy the downloaded file to the tessdata folder, Eg C:\Users\YOUR-USER-NAME\AppData\Local\Tesseract-OCR\tessdata

IV. 🙋‍♂️ CONTACT INFORMATION

You can contact me at one of my social network profiles:


If you find value in my open-source projects and would like to support their development, you can donate via Paypal or Momo e-wallet (VN). Your contribution will help me maintain my blog hosting fee and continue to create high-quality content. Thank you for your support!

momo-qr

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nlp_toolbox-0.0.3.tar.gz (9.6 kB view details)

Uploaded Source

Built Distribution

nlp_toolbox-0.0.3-py3-none-any.whl (9.4 kB view details)

Uploaded Python 3

File details

Details for the file nlp_toolbox-0.0.3.tar.gz.

File metadata

  • Download URL: nlp_toolbox-0.0.3.tar.gz
  • Upload date:
  • Size: 9.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.8

File hashes

Hashes for nlp_toolbox-0.0.3.tar.gz
Algorithm Hash digest
SHA256 485f8f54d140379eab6ad67264c079612c6c8802471ede7feea48b270a67d459
MD5 32b224e11fd398e5e2186afea44cac48
BLAKE2b-256 503469e0060a5c9f853416e48b4ff77cc0f648f27ca531c2dcc36634f474e616

See more details on using hashes here.

File details

Details for the file nlp_toolbox-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: nlp_toolbox-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 9.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.8

File hashes

Hashes for nlp_toolbox-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 53bf6ef2af98ccf039ce78637cc10bb566c9fa45ffe1b5a645c7db83d314e29c
MD5 1ba9c0fdd0e4f8b23920b289f3b8931e
BLAKE2b-256 a781ccb1340d595a745a447a59058dbbcf8ae75b602aaea47e1b70443e517cf1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page