Skip to main content

Wowool NLP Toolkit

Project description

The Wowool NLP Toolkit

install

Install the main sdk.

pip install wowool-sdk

Installing languages.

pip install wowool-[language]

Quick Start

Just create a document and pipeline, pass your document trough the Pipeline, and your done.

from wowool.sdk import Pipeline
from wowool.document import Document

document = Document("Mark Van Den Berg works at Omega Pharma.")
# Create an analyzer for a given language and options
process = Pipeline("english,entity")
# Process the data
document = process(document)
print(document)

API

Examples

You will need to install the english language module to run the sample. pip install wowool-english

Create a pipeline.

This script demonstrates how to use the UUID component to create a pipeline.

from wowool.sdk import Pipeline
from wowool.common.pipeline import UUID

process = Pipeline(
    [
        UUID("english", options={"anaphora": False}),
        UUID("entity"),
        UUID("topics.app", {"count": 3}),
    ]
)
document = process("Mark Janssens works at Omega Pharma.")
print(document)

Custom domain

The script identifies the word "car" as a Vehicle entity in the sentence "I have a car." using custom domain rules and language processing.

For more info on how to write rules see: https://www.wowool.com/docs/nlp/matching-&-capturing

from wowool.sdk import Language, Domain
from wowool.document import Document

english = Language("english")
vehicle = Domain(source="rule:{ 'car'} = Vehicle;")
doc = vehicle(english(Document("I have a car.")))
for entity in doc.entities:
    print(entity)

Using the language identifier

This script demonstrates how to use the LanguageIdentifier to detect the language of a document.

from wowool.sdk import LanguageIdentifier

document = """
Un été de tous les records de chaleur en France.
Record de chaleur battu dans une cinquantaine de villes en France

"""
# Initialize a language identification engine
lid = LanguageIdentifier()
# Process the data
doc = lid(document)
print(doc.language)

Extract dutch entities

This script demonstrates how to perform basic entity analysis on a Dutch sentence using the Wowool SDK.

Install first the dutch language model pip install wowool-dutch

from wowool.sdk import Pipeline
from wowool.document import Document

entities = Pipeline("dutch,entity")
document = entities(Document("Mark Van Den Berg werkte als hoofdarts bij Omega Pharma."))
for sentence in document.sentences:
    for entity in sentence.entities:
        print(entity)

Using the language identifier

This script demonstrates how to use the LanguageIdentifier to detect the different language sections in a text multi-language document.

from wowool.sdk import LanguageIdentifier

document = """
La juventud no es más que un estado de ánimo.

Record de chaleur battu dans une cinquantaine de villes en France

"""
# Initialize a language identification engine
lid = LanguageIdentifier(sections=True, section_data=True)
# Process the data
doc = lid(document)
if lid_results := doc.lid:
    for section in doc.lid.sections:
        assert section.text
        print(f"({section.begin_offset},{section.end_offset}): language= {section.language} text={section.text[:20].strip('\n')}...")

License

In both cases you will need to acquirer a license file at https://www.wowool.com

Non-Commercial

This library is licensed under the GNU AGPLv3 for non-commercial use.  
For commercial use, a separate license must be purchased.  

Commercial license Terms

1. Grants the right to use this library in proprietary software.  
2. Requires a valid license key  
3. Redistribution in SaaS requires a commercial license.  

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

wowool_sdk-3.5.4.dev1-cp313-cp313-macosx_11_0_arm64.whl (69.4 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

wowool_sdk-3.5.4.dev1-cp312-cp312-macosx_11_0_arm64.whl (69.4 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

File details

Details for the file wowool_sdk-3.5.4.dev1-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for wowool_sdk-3.5.4.dev1-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 bae3b32c426a9ceffccc604eb9e461252c4f44c1d8ea04b5f2e5a560476911ca
MD5 7aaa124d3dd46d8d670a163ae647c424
BLAKE2b-256 e18e5e1eb839b378bc47ab6b041cf8216c4dae79b8d324109b6a41edcb78b69e

See more details on using hashes here.

File details

Details for the file wowool_sdk-3.5.4.dev1-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for wowool_sdk-3.5.4.dev1-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 508400caa6596509e512568e2d9df510e7b85a7ff88e0563202ab694ef83da5d
MD5 69927f3339f341b0c36458567487411d
BLAKE2b-256 3805817eda350304fce0adff6a0704e209b4a2adaf025282f0ebdef701e23575

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page