PyNLP Lib is an open source Python NLP library that provides functionality for both web and local development
Project description
PyNLP Lib (or PyNLPL, pronounced "pineapple")
PyNLP Lib is an open source Python Natural Language Processing library that provides functionality for both web and local development. It offers a wide range of functionality from text analysis to audio transcription to (planned) language generation. Also called "PyNLPL" (actually this was taken too), "PyNLP-L", "PyNLP-Lib", "PyNLPLib" (the official package name), or "PyNLP Library". This package would have been named PyNLP if that name wasn't taken by a third party wrapper library for Stanford NLP.
IF YOU ARE LOOKING FOR THE STANFORD NLP PACKAGE GO TO THE OFFICIAL STANFORD NLP PYTHON PACKAGE - Stanza.
PyNLP Lib README Navigation
- PyNLP Library Installation
- Usage for PyNLPL
- External Docs for PyNLP-Lib Tooling
- PyNLP-Lib Functionality
- PyNLP-L module breakdown
- PyNLP Lib Online/Web API Backends
- Local Backends for PyNLPL
- Roadmap for PyNLP Lib Development
- Timeline for PyNLP Lib so far
- Contribution Guidelines
- How to Test Locally
PyNLP Library Installation
PyNLP-Lib can be installed from pip with the line
pip install pynlp-lib
PyNLP-L Usage
EVEN THOUGH WE INSTALL PYNLP-LIB, WE MUST import pynlpl
! DASHES ARE FORBIDDEN IN IMPORTS
The following code snippets assume you are using a .env
file with your API keys for these online backends stored there under the shown keynames. (Deepgram key stored under deepgram_key
, The Text API key stored under textapi_key
)
Transcribing an Audio File on the Web with Deepgram:
from pynlpl.web import audio
from dotenv import load_dotenv
import os, asyncio, json
load_dotenv()
deepgram = audio.Deepgram(os.getenv("deepgram_key"))
async def main():
# Initializes the Deepgram SDK
# Open the audio file of https://www.youtube.com/watch?v=sQuFl0PSoXo
# download with youtube_dl script, found on
# github here: https://gist.github.com/ytang07/9b8317f268ffcf97cd47950aa7f94282
with open("./tests/Watch a professional software engineer live code a web scraper.mp3", 'rb') as audio:
# ...or replace mimetype as appropriate
source = {'buffer': audio, 'mimetype': 'audio/mp3'}
response = await deepgram.transcription.prerecorded(source, {'punctuate': True})
print(json.dumps(response, indent=4))
asyncio.run(main())
Online Text Analysis with The Text API:
from pynlpl.web import text
from dotenv import load_dotenv
import os
load_dotenv()
text_API = text.TheTextAPI(os.getenv("textapi_key"))
test_text = """The Text API is a comprehensive text processing and sentiment analysis API created by Yujian Tang. PyNLP-Lib or PyNLPL is an open source NLP library for Python. PyNLP-L aims to coalesce many different NLP backend tools and offer a high level API to use them. This test example shows how we can use the online text processing capabilities of PyNLP-L."""
def summarize_test():
res = text_API.summarize(text=test_text)
assert "most positive sentences" in res
def most_common_phrases_test():
res = text_API.most_common_phrases(text=test_text)
assert "most common phrases" in res
def least_common_phrases_test():
res = text_API.least_common_phrases(text=test_text)
assert "least common phrases" in res
def ner_test():
res = text_API.ner(text=test_text)
assert "ner" in res
def most_positive_sentences_test():
res = text_API.most_positive_sentences(text=test_text)
assert "most positive sentences" in res
def most_negative_sentences_test():
res = text_API.most_negative_sentences(text=test_text)
assert "most negative sentences" in res
def summarize_test():
res = text_API.summarize(text=test_text)
assert "summary" in res
def kw_test():
res = text_API.sentences_with_keywords(kws =["PyNLP"], text=test_text)
assert "\"PyNLP\":" in res
def similarity_by_sentences_test():
res = text_API.similarity_by_sentences(texts=[test_text, test_text])
assert any(x in res for x in ["doc1 cleaned", "doc2 cleaned", "repeat sentences"])
def test():
summarize_test()
kw_test()
most_positive_sentences_test()
most_negative_sentences_test()
similarity_by_sentences_test()
most_common_phrases_test()
least_common_phrases_test()
ner_test()
test()
External Documentation for PyNLP Lib
This section includes external documentation for the tools used in PyNLP Lib.
The Text API
Included in Beta
Resources:
- Build an AI Text Summarizer
- Build an AI Content Moderation System
- Text Sentiment Analysis
- Best Way to do Named Entity Recognition (NER) Python
- NLP: What is Text Polarity?
- NLP Stop Words and How to Use Them
Example Projects:
- What are the Most Common Phrases on YouTube?
- Black Friday: How Does Twitter Feel?
- Using NLP to Analyze Obama Headlines
- Use NLP to get Insights from Twitter
Deepgram
Included in Beta
Resources for the SDK:
Example Projects:
TorchAudio
Coming in 2023
Resources:
Example Projects:
spaCy
Coming late 2022
NLTK
Coming late 2022
Stanford NLP/Stanza
Coming in 2023 (flex add)
DeepSpeech
Coming late 2022
Resources:
Example Projects:
Microsoft Text
Coming in 2023
Microsoft Audio
Coming in 2023
Google Text
Coming in 2023
Google Audio
Coming in 2023
Amazon Text
Coming in 2023
Amazon Audio
Coming in 2023
PyNLP-Lib functionality
PyNLPL is the comprehensive module for NLP in Python. It is an open source NLP module with multiple backends. Currently, PyNLP Lib is maintained by the team at The Text API.
As of the August 2022 release, PyNLP Lib includes functionality for online text and audio processing. See Roadmap for planned future functionality. Ideally, we will add Natural Language Generation, Natural Language Understanding, Optical Character Recognition, and Conversational AI backends as well as additional backends for the existing text/audio features through 2023.
PyNLP-L module breakdown
PyNLP Lib has two high level modules - web
and local
. The web
module provides access to the web APIs that are used as the backend of PyNLPL. The local
module provides access to tools that allow you to do NLP on your device.
Inside of the modules are individual backends. As of the beta release (0.1.0), the web
backend contains text
and audio
submodules. Each of these submodules contain classes for different backends. web.text
currently has The Text API with future plans to extend to include Google, Amazon, and Microsoft Cloud products. web.audio
currently has Deepgram with future plans to extend to include Google, Amazon, and Microsoft Cloud products.
PyNLP Lib Online/Web API Backends
Current online backends are Deepgram (audio) and The Text API (text)
Planned online backends include: Google Cloud, Azure, Microsoft
Local Backends for PyNLPL
Local backends planned include: spaCy, NLTK, Stanford NLP, and Deepspeech
Roadmap for PyNLP Lib Development
This roadmap assumes no one helps add to this open source library! However, we'd LOVE help, so please feel free to contribute!
- August 2022 - Initial Public Beta Release (0.1.0)
- September 2022 - Add Deepspeech for local audio transcription (0.2.0)
- October 2022 - Add spaCy backend for local text analysis (0.3.0)
- November 2022 - Add NLTK backend for local text analysis (0.4.0)
- December 2022 - Add TorchAudio for local audio transcription (0.5.0)
- January 2023 - Add Google Cloud Natural Language AI for online text analysis (0.6.0)
- February 2023 - Add Azure Text Analysis for online Text Analysis (0.7.0)
- March 2023 - Add an online Translation API (0.8.0)
- April 2023 - Add Google Online Speech Transcription for audio transcription (0.9.0)
- May 2023 - Add an online Text Generation API (0.10.0)
- June 2023 - Add Amazon Transcribe for online audio transcription (0.11.0)
- July 2023 - Add an online Conversational AI API (0.12.0)
- August 2023 - Add an online OCR API, upgrade version for official release (1.0.0)
Timeline for PyNLP Lib Development so far
- August 2022 - Initial Beta Release
Contribution Guidelines
- Remember to update requirements in
pyproject.toml
- Remember to update version in
pyproject.toml
How to Test Locally
- Run
python -m build
in the folder that containspyproject.toml
- this will produce a
dist
folder with a.whl
file, copy the relative path
- this will produce a
- Run
pip install <path to whl>
- You have to run
pip uninstall pynlp-lib
between tests of each update to remove it from the cache
- You have to run
How to Create a Pull Request
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pynlp-lib-0.1.4.tar.gz
.
File metadata
- Download URL: pynlp-lib-0.1.4.tar.gz
- Upload date:
- Size: 9.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0747427f892bf27dfa6d62fe8c338b65b9adb57e1b61f43ac8c081c16f9fd4e3 |
|
MD5 | ae6949722050fd188b2fb3dfa49e76da |
|
BLAKE2b-256 | cd612d7aa7cb3be33734aac9ab36302a90efc389af2d9751ba1f36dbaba3eed7 |
File details
Details for the file pynlp_lib-0.1.4-py3-none-any.whl
.
File metadata
- Download URL: pynlp_lib-0.1.4-py3-none-any.whl
- Upload date:
- Size: 10.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1d9be42afc7b1d6c7b9731b4927a1b16454ddcf77aa47f6bca24e342e531d536 |
|
MD5 | 54aca931f2ae31ec2b7af2302e1512b8 |
|
BLAKE2b-256 | f7b84242ba07e719900388e344dd7de5a026e1f85caaac66e6f1e062523e676c |