Skip to main content

Python extension for the NLP++ text analysis engine

Project description

NLPPlus

NLP++ lets you build fully customized text analyzers using the NLP++ VSCode language extension, giving you 100% visibility into — and complete control over — every rule and decision your analyzer makes. Unlike other NLP packages that are statistical black boxes you cannot inspect or change, every NLP++ analyzer is glass-box code you own and can tailor to your exact needs.

NLP++ Textbook

First Textbook on the NLP++ Programming Langauge

The first textbook on NLP++ is now available world-wide by BPB Online. NLP++ can replace LLMs when used in agentic flows. The code must be written by a human like any other programming language and this book will facilitate this process. NLP++ is no a statistical system that needs training. It relies on the ingenuity of the programmer to create a program that can parse text and extract information in a deterministic way.

The NLPPlus Python Package

PyPI Downloads

The NLPPlus Python Package is the package that allows for python scripts to call text and NLP analyzers created using NLP++. The package uses the C++ libraries for the NLP Engine making the calling more efficient than using the NLP++ python class that calls command line version of the NLP Engine "nlp.exe".

The major advantage of NLPPlus over other NLP packages is that is 100% rule-based and modifiable and allows for any non-linguistic programmer to create text analyzers 100% taylored to their needs.

Analyzers can be run in two modes: interpreted (the default, runs straight from the .nlp source) or compiled (analyzer code is compiled to a native shared library once and loaded at runtime). See Compiled Mode below for the cloud_compile() one-call build path.

Long-Term, Open-Source, Glass-Box Project

NLP++ allows any programmer to write text and NLP programs that can be shared by everyone. It represents the first universal programming language for text and NLP. As the community grows, the number of open-source solutions including dictionaries, knowledge bases, and analyzers will grow - all of which can be modified by any programmer using the NLP++ Language Extension for VSCode.

READ FIRST

It is important to understand that the NLPPlus package for Python is very different from ALL other NLP packages in a very important and practical way.

Current NLP python packages have the "intention" of being plug-and-play systems that perform natural language tasks without modification. The problem is that when these systems ultimately fail in critical situations, coders are left with no real way to fix these systems and they are quickly abandoned.

The problem is that most all of these packages rely on statistical methods such as machine learning or neural networks, or in the simpler cases, they rely on Regex. Statistical systems cannot logically be corrected and Regex is extremely limited and unreadable and impossible to maintain or extend. Plus, these systems offer little if any means to modify them even though every NLP task is slightly different in important ways.

The NLPPlus Python Package is different from all other NLP Python packages. All its analyzers are 100% human readable and modifiable code that allows any non-NLP coder to become a NLP programmer using the NLP++ VSCode Language Extension appropriately called "VisualText". The VisualText extension allows for the visualization of any NLP process. Coders can "see" the syntactic parse tree along each step of the process, see rule matches directly in the text, and print out the knowledge base at any point in the process. Plus, dictionaries and knowledge bases are human readable unlike json files or databases.

NLPPlus comes with five starter analyzers: telephone numbers, links, emails, addresses, and a full English parser. And because NLP++ is a glassbox, all analyzers can easily be modified by any coder.

If for example, the telephone number analyzer is not working properly for your application, you can use the NLP++ VSCode extension to edit and test the NLP++ code, and then use updated code instantly. Universities around the world are starting to use NLP++ to write human digital readers for many different applications.

Learn More About NLP++

Requirements

  • Python 3.10 or newer

Installation

Installation

The NLPPlus python package is registered in pypi.org. NLPPlus can be installed using pip:

pip install nlpplus

Installing By Downloading the Package Manually

You can find the installable "wheel" files under each release in the Releases page. Choose the correct version for your platform and Python version based on the filename, for instance, wheels for Python 3.12 and MacOS will have cp312 and macos in the filename, for Windows you will find cp312 and win, and for Linux linux. These files can be installed with pip on the command line, for example:

pip install nlpplus-0.1.2-cp310-cp310-win_amd64.whl

For the most recent version you can also download them from the GitHub actions page. Click on the link at the top of the list of "workflow run results" under "Build and upload to PyPI". After scrolling to the bottom of the page, you should see a section marked "Artifacts". Click on the appropriate link for your platform:

  • For Linux: cibw-wheels-linux
  • For MacOS 11 and later: cibw-wheels-macos
  • For Windows 10 and later: cibw-wheels-windows

This will download a ZIP file containing installation files for each supported version of Python on your platform. The version number is shown in the filename, for instance, for Python 3.10 on Windows you will see a file with a name like nlpplus-0.1.dev1+g55d691d-cp310-cp310-win_amd64.whl - the cp310 means Python 3.10. For Python 3.12 it would be cp312, and so forth.

For specific instructions on setting up Python on your platform please consult the Python documentation.

If your platform is not supported you can also compile it from source, which will require a working C++ compiler. See the platform specific instructions below for the requirements to build.

Why Use NLP++?

There are many reasons to consider using NLP++. Whether it be to be able to write Regex-like rule patterns, to having the ability to modify 100% of the NLP code, or to visualize the NLP analyzer in an intunitive way, NLP++ should be in every coder and programmer's toolkit.

To put it simply, NLP++ turns any coder or programmer into an NLP engineer.

1000 Times Better than Regex

For matching patterns in text, NLP++ is a Regex killer. The rule matching system in NLP++ is human readable and is performed by calling rules in a sequence, making creating and debugging rule-based patterns a breeze. Along with

100% Modifiable

The main reason to use NLP++ it is to engineer an NLP system to a specific task. Most all extraction or understanding tasks in NLP require specific processing that is never included in "generic" systems. NLP++ allows for the creation or modification of any NLP++ system.

It must be emphasized that what separates NLPPlus from all the other NLP packages in Python is that fact that all parsers are 100% modifiable using the VSCode NLP++ Language Extension. Other NLP packages use regex patterns which are impossible to modify or use trained machine learning or neural network systems which cannot be fixed when

VisualText Editor

Writing an NLP system from scratch is thought to be for only those in computational linguistics. But VisualText, NLP++, and the conceptual Grammar changes all that.

Taking full advantage of the familiar VSCode environment, the NLP++ language extension makes NLP a visual process and logical process that is easy to understand.

Usng the NLPPlus Python Package

Very basic usage, which runs the default parser for US English and returns parsing results as xML:

import NLPPlus
xml = NLPPlus.analyze("Hello world.")

This may be less useful than using a domain-specific analyzer. Several of these are included with the module:

  • address-parser: Extract addresses from text
  • emailaddress: Extract email addresses from text
  • links: Extract hyperlinks from text
  • telephone: Extract telephone numbers from text

In contrast to the default analyzer these do not return any text by default. You will have to use the extended API to get the parse tree or JSON output from them:

import NLPPlus
results = NLPPlus.engine.analyze("Reach me at hello@example.com","emailaddress")
parsed_address = results.output["email_address"][0]
parse_tree = results.final_tree

NLPPlus Engine Functions

These are the current functions that come with the NLPPlus package.

set_analyzer_folder(analyzer_folder_path: str)

This is used to set the folder where your analyzers are located.

analyze(text: str, parser: str = "parse-en-us", develop: bool = False, compiled: bool = False): str

This calls one of the analyzers in the analyzer folder on the text. If the analyzer folder was not set, it will use the library analyzers that come with NLPPlus. If you are planning to modify the library analyzers, it is recommended that you use the function copy_library_analyzers to copy the analyzers to avoid having them overwritten when a new version of NLPPlus is installed.

If compiled=True, the engine loads the analyzer's compiled shared libraries (bin/run.<ext> and bin/kb.<ext>) instead of running interpreted from the .nlp source. See compile() and cloud_compile() below for producing those libraries.

The analyze function returns a results object that make the analyzer output files easily accessible to python. (see reults below)

compile(analyzer: str = "parse-en-us", develop: bool = False, kb_only: bool = False)

Generates C++ source files for the analyzer by running the engine in -COMPILE mode. The output lands under <analyzer>/run/*.cpp and <analyzer>/kb/*.cpp (or just <analyzer>/kb/*.cpp if kb_only=True). The generated files still need to be built into shared libraries before analyze(..., compiled=True) can load them — see cloud_compile() for the one-call end-to-end path.

cloud_compile(analyzer: str = "parse-en-us", dispatcher_url: Optional[str] = None, kb_only: bool = False, develop: bool = False, poll_interval: float = 2.0, timeout: float = 1800, skip_local_compile: bool = False)

End-to-end compile via the public nlp-compile-service cloud build: runs compile() to produce the C++ trees, tars them up, submits to a Cloudflare-Worker dispatcher, polls the GitHub-Actions runner build, downloads the resulting shared library and stages it into <analyzer>/bin/ as run.<ext> + runu.<ext> + kb.<ext> + kbu.<ext> (or just kb.<ext> + kbu.<ext> for kb_only=True). After it returns, analyze(..., compiled=True) will pick up the staged libraries.

dispatcher_url defaults to the same public Cloudflare-Worker the VSCode NLP++ extension uses; override per-call to point at a self-hosted deployment. timeout caps the wait for the runner build (default 30 minutes — GitHub-Actions Windows free-tier queues can stall 5-10 minutes before the build even starts).

copy_library_analyzers(self, to_dir: str, overwrite: bool=True)

This function copies the NLPPlus library analyzers into a safe folder away from where they can be overwritten by newer versions of the NLPPlus package. This allows coders to edit and modify the analyzers to their liking. Remember to use the set_analyzers_folder if you want to call your versions of these library analyzers using the NLPPlus package.

input_text(analyzer_name: str, file_name: str)

When developing or editing NLP++ analyzers and calling them from Python, it is convenient to test your python code on text you have used to develop your analyzer in in the NLP++ VisualText extension for VSCode. This function retrieves the text from a file in the analyzer's input directory for easy access while developing your python code in conjunction with and NLP++ analyzer.

NLPPlus Engine Results

output

This returns a json object based on the parsed output.json file producted by the analyzer. The analyzer has to purposely construct the output.json file for this to work.

output.json

The output file produced by the analyzer that is a string, not a json object. This file must explicity be created by the analyzer.

final.tree

All analyzers output a final tree of the text that is being processed. This file is in the NLP++ tree format.

Compiled Mode

Analyzers normally run interpreted from their .nlp source — fine for development, but slower on large inputs and unaffected by source edits (i.e., you can't ship a "frozen" version without bundling the sources). NLPPlus now supports compiled mode: generate native shared libraries from the analyzer's .nlp files once, then load them at analyze time. Source edits after the build don't change the output until you re-compile.

The simplest path is one call to cloud_compile, which uses the public nlp-compile-service to build the right shared library for your platform:

import NLPPlus

# Generate run/*.cpp + kb/*.cpp, ship to the cloud builder, download
# the .so/.dylib/.dll, stage into <analyzer>/bin/.
NLPPlus.cloud_compile("parse-en-us")

# Now run with the compiled artifacts instead of the interpreter.
xml = NLPPlus.analyze("Hello world.", compiled=True)

The cloud build takes anywhere from ~1 minute (small analyzer, cache hit) up to ~10 minutes (parse-en-us, cold Windows runner queue). The first build for a given source hash is the slow one — subsequent builds against the same code hit the dispatcher's cache.

If you'd rather generate the C++ trees and build them yourself (e.g. air-gapped, custom toolchain), use compile() for the codegen step and run cmake against the engine's published compile-libs to produce the shared library, then stage the result as <analyzer>/bin/run.<ext> and <analyzer>/bin/kb.<ext>. See the nlp-compile-service emit-cmake.sh for the exact CMake invocation the cloud uses.

NLP++ Development

By default the NLPPlus module will create a temporary working directory with the default parser and the small set of analyzers mentioned above. If you are developing NLP++ code, you can also point it at an existing working folder using set_working_folder:

import NLPPlus
NLPPlus.set_working_folder("somewhere/else")

This working folder is expected to contain the directories analyzers and data. If you wish to initialize a new working folder with the default analyzers and data, you can pass initialize=True:

import NLPPlus
NLPPlus.set_working_folder("somewhere/else", initialize=True)

Module Development

This module is built using scikit-build-core and nanobind. To set up for development, make sure you have a C++ compiler that works, and clone the source with:

git clone --recursive-submodules https://github.com/VisualText/py-package-nlpengine.git

For development it is convenient to disable build isolation, so install the necessary build dependencies. We suggest doing this in a virtual environment:

cd py-package-nlpengine
python -m venv venv
. venv/bin/activate
pip install -r requirements-dev.txt

Linux Setup

On Linux, generally, you can simply install the ICU development libraries system-wide:

# On Ubuntu / Debian /etc
sudo apt install libicu-dev
# On CentOS / RHEL / etc
sudo yum install libicu-devel

Now you can build the module as a "writable" install, which will allow you to test changes as you make them:

pip install --no-build-isolation -ve .

MacOS and other Unix Setup

If you were not able to install ICU above (such as on MacOS), you have to use vcpkg:

git clone --depth 1 https://github.com/Microsoft/vcpkg.git
./vcpkg/bootstrap-vcpkg.sh

Additionally, on MacOS, you'll probably need a whole lot of other things to use vcpkg:

brew install autoconf-archive autoconf automake pkg-config

Now you can install with this somewhat more complicated command:

pip install --no-build-isolation \
    -C cmake.args=-DCMAKE_TOOLCHAIN_FILE=./nlp-engine/vcpkg/scripts/buildsystems/vcpkg.cmake \
    -ve .

Windows Setup

On Windows, everything is vastly more complicated for a number of reasons:

  • The ICU library on which NLP++ depends is built as DLLs, and these have to be included with the package
  • Python won't load arbitrary DLLs from the current directory, unlike the rest of Windows (this is a good thing)
  • Builds take 10x longer on Windows than on reasonable operating systems, so you will wait a long time to find out that the module you built actually doesn't work

For this reason "editable" installs (the -e option to pip install) do not work on Windows and can't be expected to work. Instead it is necessary to build a wheel file and "repair" it with delvewheel to package the DLLs correctly, then install that wheel.

If that sounds like too much trouble then just install from PyPI or the wheel files as described above

Testing

Verify that it works:

python -m unittest discover -s tests

Note that you might get undefined C++ symbols if you are using Python from miniconda on Linux. In this case, please use the system Python instead.

Making a release

For developer reference: the release process is managed using GitHub actions. To make a release from the main branch, make an annotated tag (with -m and -a, this is important) of the form vX.Y or vX.Y.Z (e.g. v0.1.3) and push the tag and the branch:

git tag -m 'Release 0.1.3' -a v0.1.3
git push --follow-tags

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nlpplus-2.0.9.tar.gz (21.8 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

nlpplus-2.0.9-cp314-cp314-macosx_15_0_arm64.whl (16.5 MB view details)

Uploaded CPython 3.14macOS 15.0+ ARM64

nlpplus-2.0.9-cp312-cp312-win_amd64.whl (16.3 MB view details)

Uploaded CPython 3.12Windows x86-64

nlpplus-2.0.9-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (13.8 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

nlpplus-2.0.9-cp311-cp311-win_amd64.whl (16.3 MB view details)

Uploaded CPython 3.11Windows x86-64

nlpplus-2.0.9-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (13.8 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

nlpplus-2.0.9-cp310-cp310-win_amd64.whl (16.3 MB view details)

Uploaded CPython 3.10Windows x86-64

nlpplus-2.0.9-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (13.8 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

File details

Details for the file nlpplus-2.0.9.tar.gz.

File metadata

  • Download URL: nlpplus-2.0.9.tar.gz
  • Upload date:
  • Size: 21.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for nlpplus-2.0.9.tar.gz
Algorithm Hash digest
SHA256 b4566b3fbec4cc6712cc514060fc24cbd6b8c4d0385d80ab4ec066f65c31069e
MD5 d27c745699afe32f9f033e97fc363bef
BLAKE2b-256 780f2fafee6750e9c81613bfe13c1c0fabb27aa79c80f8852b74810442c58546

See more details on using hashes here.

Provenance

The following attestation bundles were made for nlpplus-2.0.9.tar.gz:

Publisher: publish.yml on VisualText/py-package-nlpengine

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file nlpplus-2.0.9-cp314-cp314-macosx_15_0_arm64.whl.

File metadata

File hashes

Hashes for nlpplus-2.0.9-cp314-cp314-macosx_15_0_arm64.whl
Algorithm Hash digest
SHA256 d8c619ee1af2f8eeb91209cb8459cb0179ea8c1c0f7c45740ce9dd177ffa4947
MD5 d65fe0001d876caec1414822e8996627
BLAKE2b-256 fdb44da53bb2769a555e9bb9f2df5107442fcb3cd4cd47694fa0e735c495837a

See more details on using hashes here.

Provenance

The following attestation bundles were made for nlpplus-2.0.9-cp314-cp314-macosx_15_0_arm64.whl:

Publisher: publish.yml on VisualText/py-package-nlpengine

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file nlpplus-2.0.9-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: nlpplus-2.0.9-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 16.3 MB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for nlpplus-2.0.9-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 cb28bc40bc5978783d4707f8cd7445b76281eb3056bc85a1fb0b96004451056c
MD5 2e623053cc7d468e892dedcc1e8d929d
BLAKE2b-256 e42d996e2006416c8d34a3a6abcb4b02af7a5cb9ad7cbc51fbce71d7ee426d7a

See more details on using hashes here.

Provenance

The following attestation bundles were made for nlpplus-2.0.9-cp312-cp312-win_amd64.whl:

Publisher: publish.yml on VisualText/py-package-nlpengine

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file nlpplus-2.0.9-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for nlpplus-2.0.9-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 bf8ccb6cf92cdcc728769a3624b6311d0a5dd6e12de1205cd4a91350565f5287
MD5 ae31c27ffa86fca54d3c39fe29c09454
BLAKE2b-256 0afee0d86f3b67c4dff69ba83cce1933b6caee1dbad64b15972be9491abd671c

See more details on using hashes here.

Provenance

The following attestation bundles were made for nlpplus-2.0.9-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on VisualText/py-package-nlpengine

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file nlpplus-2.0.9-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: nlpplus-2.0.9-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 16.3 MB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for nlpplus-2.0.9-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 4d61928b818c5d4f4a071c56d1b8255ebb850d22fcd87c2abb8afa71b882a74e
MD5 39873ca22196fc59e97137384b3326b5
BLAKE2b-256 1dc82331ae8de9ef172ac5d43832365779ff71984d5c0312c1c8594a9be227f8

See more details on using hashes here.

Provenance

The following attestation bundles were made for nlpplus-2.0.9-cp311-cp311-win_amd64.whl:

Publisher: publish.yml on VisualText/py-package-nlpengine

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file nlpplus-2.0.9-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for nlpplus-2.0.9-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 a808b4a8b55f9b295b0ba6ee63cb5abbfe88777f92400351f415a9d2f4f886c4
MD5 223aeceb3cb068008f0bef7088d5c0a4
BLAKE2b-256 5795089eb485c3a05bc095b242d1567c6eab67376167c52329258a3fcedf6307

See more details on using hashes here.

Provenance

The following attestation bundles were made for nlpplus-2.0.9-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on VisualText/py-package-nlpengine

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file nlpplus-2.0.9-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: nlpplus-2.0.9-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 16.3 MB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for nlpplus-2.0.9-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 201b0a8cb073081f04798611fb16cd92803d83a60e7ae6fa061b7016b2a9085a
MD5 01e96cee810b3fbfe516cb0e317e5344
BLAKE2b-256 f4e7d89393a8fac84347b71eb45df5888606a66686509bd3c37a7324df3ab9b2

See more details on using hashes here.

Provenance

The following attestation bundles were made for nlpplus-2.0.9-cp310-cp310-win_amd64.whl:

Publisher: publish.yml on VisualText/py-package-nlpengine

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file nlpplus-2.0.9-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for nlpplus-2.0.9-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 1e9c80d8d65ea5b0fbc5fe0ae3e97e2d843430536888ad3efed4a6d6f14e17fe
MD5 72c68ec1114f0b112179f1de5ea6f6ae
BLAKE2b-256 4f9979e4c46ff350310d1834ba836816c7667e296b4b5d21b290c39c91b068f3

See more details on using hashes here.

Provenance

The following attestation bundles were made for nlpplus-2.0.9-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on VisualText/py-package-nlpengine

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page