Wowool NLP Toolkit
Project description
The Wowool NLP Toolkit
install
Install the main sdk.
pip install wowool-sdk
Installing languages.
pip install wowool-[language]
Quick Start
Just create a document and pipeline, pass your document trough the Pipeline, and your done.
from wowool.sdk import Pipeline
from wowool.document import Document
document = Document("Mark Van Den Berg works at Omega Pharma.")
# Create an analyzer for a given language and options
process = Pipeline("english,entity")
# Process the data
document = process(document)
print(document)
API
Examples
You will need to install the english language module to run the sample. pip install wowool-english
Create a pipeline.
This script demonstrates how to use the UUID component to create a pipeline.
from wowool.sdk import Pipeline
from wowool.common.pipeline import UUID
process = Pipeline(
[
UUID("english", options={"anaphora": False}),
UUID("entity"),
UUID("topics.app", {"count": 3}),
]
)
document = process("Mark Janssens works at Omega Pharma.")
print(document)
Custom domain
The script identifies the word "car" as a Vehicle entity in the sentence "I have a car." using custom domain rules and language processing.
For more info on how to write rules see: https://www.wowool.com/docs/nlp/matching-&-capturing
from wowool.sdk import Language, Domain
from wowool.document import Document
english = Language("english")
vehicle = Domain(source="rule:{ 'car'} = Vehicle;")
doc = vehicle(english(Document("I have a car.")))
for entity in doc.entities:
print(entity)
Using the language identifier
This script demonstrates how to use the LanguageIdentifier to detect the language of a document.
from wowool.sdk import LanguageIdentifier
document = """
Un été de tous les records de chaleur en France.
Record de chaleur battu dans une cinquantaine de villes en France
"""
# Initialize a language identification engine
lid = LanguageIdentifier()
# Process the data
doc = lid(document)
print(doc.language)
Extract dutch entities
This script demonstrates how to perform basic entity analysis on a Dutch sentence using the Wowool SDK.
Install first the dutch language model pip install wowool-dutch
from wowool.sdk import Pipeline
from wowool.document import Document
entities = Pipeline("dutch,entity")
document = entities(Document("Mark Van Den Berg werkte als hoofdarts bij Omega Pharma."))
for sentence in document.sentences:
for entity in sentence.entities:
print(entity)
Using the language identifier
This script demonstrates how to use the LanguageIdentifier to detect the different language sections in a text multi-language document.
from wowool.sdk import LanguageIdentifier
document = """
La juventud no es más que un estado de ánimo.
Record de chaleur battu dans une cinquantaine de villes en France
"""
# Initialize a language identification engine
lid = LanguageIdentifier(sections=True, section_data=True)
# Process the data
doc = lid(document)
if lid_results := doc.lid:
for section in doc.lid.sections:
assert section.text
print(f"({section.begin_offset},{section.end_offset}): language= {section.language} text={section.text[:20].strip('\n')}...")
License
In both cases you will need to acquirer a license file at https://www.wowool.com
Non-Commercial
This library is licensed under the GNU AGPLv3 for non-commercial use.
For commercial use, a separate license must be purchased.
Commercial license Terms
1. Grants the right to use this library in proprietary software.
2. Requires a valid license key
3. Redistribution in SaaS requires a commercial license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file wowool_sdk-3.5.3.dev4.tar.gz.
File metadata
- Download URL: wowool_sdk-3.5.3.dev4.tar.gz
- Upload date:
- Size: 72.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0fd7c7ca57a9b8a2416f01737f9cdb0bec1280c5553e2c9a7f0d5e100dca4ea8
|
|
| MD5 |
3c5096c3f9847aeb3004266b0acf27bd
|
|
| BLAKE2b-256 |
50400d336ff49ed003363a6e40a7976fc42f958dc5f07dd4f9734b312b779711
|
File details
Details for the file wowool_sdk-3.5.3.dev4-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: wowool_sdk-3.5.3.dev4-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 42.4 MB
- Tags: CPython 3.13, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eaa530182672d0718a448e735011ed4dd6517d334bc0961b6d4ce2a862492e41
|
|
| MD5 |
23214e2f2ab74fce095168831cdfc2c8
|
|
| BLAKE2b-256 |
f790c294fd0de3e90b6fcbc29a645a52d9eabcb117038904adbea1a354692041
|
File details
Details for the file wowool_sdk-3.5.3.dev4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: wowool_sdk-3.5.3.dev4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 42.4 MB
- Tags: CPython 3.12, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
74099a1ab0a9000aadebcfe7a14f1bc1e6f018f224a8b6db11b66d74b504d955
|
|
| MD5 |
f093b5506f11dc11c054ca51446d7da5
|
|
| BLAKE2b-256 |
4e7c2a56e8f764a3fa722682f8b8a43625df67e81c7a8d308a35a56468c62808
|
File details
Details for the file wowool_sdk-3.5.3.dev4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: wowool_sdk-3.5.3.dev4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 42.3 MB
- Tags: CPython 3.11, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
905b471747a8e8801d00890dd548b89c0899513a65cd72c7fc1c603cfe9a0e90
|
|
| MD5 |
02a175c06bf011c2146c85ec723abd5c
|
|
| BLAKE2b-256 |
b50cf6cb35fcdc89724edb5f12289696c13a3fc2d03f70ffeb1fbbc3039a75d2
|