Skip to main content

PyNLP Lib is an open source Python NLP library that provides functionality for both web and local development

Project description

PyNLP Lib (or PyNLPL, pronounced "pineapple")

PyNLP Lib is an open source Python Natural Language Processing library that provides functionality for both web and local development. It offers a wide range of functionality from text analysis to audio transcription to (planned) language generation. Also called "PyNLPL" (actually this was taken too), "PyNLP-L", "PyNLP-Lib", "PyNLPLib" (the official package name), or "PyNLP Library". This package would have been named PyNLP if that name wasn't taken by a third party wrapper library for Stanford NLP.

IF YOU ARE LOOKING FOR THE STANFORD NLP PACKAGE GO TO THE OFFICIAL STANFORD NLP PYTHON PACKAGE - Stanza.

PyNLP Lib README Navigation

PyNLP Library Installation

PyNLP-Lib can be installed from pip with the line

pip install pynlp-lib

PyNLP-L Usage

EVEN THOUGH WE INSTALL PYNLP-LIB, WE MUST import pynlpl! DASHES ARE FORBIDDEN IN IMPORTS

The following code snippets assume you are using a .env file with your API keys for these online backends stored there under the shown keynames. (Deepgram key stored under deepgram_key, The Text API key stored under textapi_key)

Transcribing an Audio File on the Web with Deepgram:

from pynlpl.web import audio
from dotenv import load_dotenv
import os, asyncio, json

load_dotenv()
deepgram = audio.Deepgram(os.getenv("deepgram_key"))

async def main():
    # Initializes the Deepgram SDK
    # Open the audio file of https://www.youtube.com/watch?v=sQuFl0PSoXo
    # download with youtube_dl script, found on
    # github here: https://gist.github.com/ytang07/9b8317f268ffcf97cd47950aa7f94282 
    with open("./tests/Watch a professional software engineer live code a web scraper.mp3", 'rb') as audio:
        # ...or replace mimetype as appropriate
        source = {'buffer': audio, 'mimetype': 'audio/mp3'}
        response = await deepgram.transcription.prerecorded(source, {'punctuate': True})
        print(json.dumps(response, indent=4))

asyncio.run(main())

Online Text Analysis with The Text API:

from pynlpl.web import text
from dotenv import load_dotenv
import os

load_dotenv()
text_API = text.TheTextAPI(os.getenv("textapi_key"))

test_text = """The Text API is a comprehensive text processing and sentiment analysis API created by Yujian Tang. PyNLP-Lib or PyNLPL is an open source NLP library for Python. PyNLP-L aims to coalesce many different NLP backend tools and offer a high level API to use them. This test example shows how we can use the online text processing capabilities of PyNLP-L."""

def summarize_test():
    res = text_API.summarize(text=test_text)
    assert "most positive sentences" in res

def most_common_phrases_test():
    res = text_API.most_common_phrases(text=test_text)
    assert "most common phrases" in res

def least_common_phrases_test():
    res = text_API.least_common_phrases(text=test_text)
    assert "least common phrases" in res

def ner_test():
    res = text_API.ner(text=test_text)
    assert "ner" in res

def most_positive_sentences_test():
    res = text_API.most_positive_sentences(text=test_text)
    assert "most positive sentences" in res

def most_negative_sentences_test():
    res = text_API.most_negative_sentences(text=test_text)
    assert "most negative sentences" in res

def summarize_test():
    res = text_API.summarize(text=test_text)
    assert "summary" in res

def kw_test():
    res = text_API.sentences_with_keywords(kws =["PyNLP"], text=test_text)
    assert "\"PyNLP\":" in res

def similarity_by_sentences_test():
    res = text_API.similarity_by_sentences(texts=[test_text, test_text])
    assert any(x in res for x in ["doc1 cleaned", "doc2 cleaned", "repeat sentences"])

def test():
    summarize_test()
    kw_test()
    most_positive_sentences_test()
    most_negative_sentences_test()
    similarity_by_sentences_test()
    most_common_phrases_test()
    least_common_phrases_test()
    ner_test()

test()

External Documentation for PyNLP Lib

This section includes external documentation for the tools used in PyNLP Lib.

The Text API

Included in Beta

Resources:

Example Projects:

Deepgram

Included in Beta

Resources for the SDK:

Example Projects:

TorchAudio

Coming in 2023

Resources:

Example Projects:

spaCy

Coming late 2022

NLTK

Coming late 2022

Stanford NLP/Stanza

Coming in 2023 (flex add)

DeepSpeech

Coming late 2022

Resources:

Example Projects:

Microsoft Text

Coming in 2023

Microsoft Audio

Coming in 2023

Google Text

Coming in 2023

Google Audio

Coming in 2023

Amazon Text

Coming in 2023

Amazon Audio

Coming in 2023

PyNLP-Lib functionality

PyNLPL is the comprehensive module for NLP in Python. It is an open source NLP module with multiple backends. Currently, PyNLP Lib is maintained by the team at The Text API.

As of the August 2022 release, PyNLP Lib includes functionality for online text and audio processing. See Roadmap for planned future functionality. Ideally, we will add Natural Language Generation, Natural Language Understanding, Optical Character Recognition, and Conversational AI backends as well as additional backends for the existing text/audio features through 2023.

PyNLP-L module breakdown

PyNLP Lib has two high level modules - web and local. The web module provides access to the web APIs that are used as the backend of PyNLPL. The local module provides access to tools that allow you to do NLP on your device.

Inside of the modules are individual backends. As of the beta release (0.1.0), the web backend contains text and audio submodules. Each of these submodules contain classes for different backends. web.text currently has The Text API with future plans to extend to include Google, Amazon, and Microsoft Cloud products. web.audio currently has Deepgram with future plans to extend to include Google, Amazon, and Microsoft Cloud products.

PyNLP Lib Online/Web API Backends

Current online backends are Deepgram (audio) and The Text API (text)

Planned online backends include: Google Cloud, Azure, Microsoft

Local Backends for PyNLPL

Local backends planned include: spaCy, NLTK, Stanford NLP, and Deepspeech

Roadmap for PyNLP Lib Development

This roadmap assumes no one helps add to this open source library! However, we'd LOVE help, so please feel free to contribute!

  • August 2022 - Initial Public Beta Release (0.1.0)
  • September 2022 - Add Deepspeech for local audio transcription (0.2.0)
  • October 2022 - Add spaCy backend for local text analysis (0.3.0)
  • November 2022 - Add NLTK backend for local text analysis (0.4.0)
  • December 2022 - Add TorchAudio for local audio transcription (0.5.0)
  • January 2023 - Add Google Cloud Natural Language AI for online text analysis (0.6.0)
  • February 2023 - Add Azure Text Analysis for online Text Analysis (0.7.0)
  • March 2023 - Add an online Translation API (0.8.0)
  • April 2023 - Add Google Online Speech Transcription for audio transcription (0.9.0)
  • May 2023 - Add an online Text Generation API (0.10.0)
  • June 2023 - Add Amazon Transcribe for online audio transcription (0.11.0)
  • July 2023 - Add an online Conversational AI API (0.12.0)
  • August 2023 - Add an online OCR API, upgrade version for official release (1.0.0)

Timeline for PyNLP Lib Development so far

  • August 2022 - Initial Beta Release

Contribution Guidelines

  • Remember to update requirements in pyproject.toml
  • Remember to update version in pyproject.toml

How to Test Locally

  • Run python -m build in the folder that contains pyproject.toml
    • this will produce a dist folder with a .whl file, copy the relative path
  • Run pip install <path to whl>
    • You have to run pip uninstall pynlp-lib between tests of each update to remove it from the cache

How to Create a Pull Request

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pynlp-lib-0.1.4.tar.gz (9.5 kB view hashes)

Uploaded Source

Built Distribution

pynlp_lib-0.1.4-py3-none-any.whl (10.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page