Lightweight NLP components for semantic processing of domain-specific content.

These details have not been verified by PyPI

Project links

Project description

Description

Structured data in technical domains (e.g. engineering, meteorology) often contain specialized terminology, measurement units, parameter specifications, and symbolic values. These elements pose a challenge for similarity methods based solely on embeddings due to their limited semantic resolution.

This package follows a hybrid approach, in which rule-based processing, NLP-based filtering, and embeddings can be combined so that domain-specific entities are identified and organized across multiple levels of abstraction, enabling interpretable and reproducible retrieval workflows.

The package integrates lightweight components into existing NLP pipelines. These components are designed to work without relying on large language models (LLMs) and to structure relevant data using deterministic and auditable mechanisms.

Additional modules are planned to support structured query generation, including:

Semantic Logic Composer: Parses natural-language input and produces a logical structure enriched with extracted entities. This structure can be used as a basis for formats such as SQL, JSON or YAML.

Structured NLP Workflow

The following figures illustrate the core motivation and design focus of this package. They outline the typical stages of a structured NLP pipeline and highlight the specific components where this package provides support.

Retrieval Process

This conceptual overview serves as a foundation for understanding the individual components, which are detailed in the next section.

Licence Agreement

Seanox Software Solutions is an open-source project, hereinafter referred to as Seanox.

This software is licensed under the Apache License, Version 2.0.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

https://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

System Requirement

Python 3.10 or higher

Installation & Setup

pip install seanox-ai-nlp

Packages & Modules

units

The units module applies rule-based, deterministic pattern recognition to identify numerical expressions and measurement units in text. It is designed for integration into lightweight NLP pipelines and does not rely on large language models (LLMs). Its language-agnostic architecture and flexible formatting support a broad range of use cases, including general, semi-technical and semi-academic content.

The module can be integrated with tools such as spaCy’s EntityRuler, enabling annotation, filtering, and token alignment workflows. It produces structured output suitable for downstream analysis, without performing semantic interpretation.

Features

Pattern-based extraction
Identifies constructs like 5 km, -20 ºC, or 1000 hPa using regular expressions and token patterns -- no training required.
Language-independent architecture
Operates at token and character level; applicable across multilingual content.
Support for compound expressions
Recognizes unit combinations (km/h, kWh/m², g/cm³) and numerical constructs involving signs and operators: ±, ×, ·, :, /, ^, – and more.
Integration-ready output
Returns structured entities compatible with tools like spaCy’s EntityRuler.

Quickstart

from seanox_ai_nlp.units import units
text = "The cruising speed of the Boeing 747 is approximately 900 km/h (559 mph)."
for entity in units(text):
    print(entity)

synthetics

The synthetics module generates annotated natural language from structured input data -- such as records from databases or knowledge graphs. It uses template-based, rule-driven methods to produce controlled and annotated sentences. Designed for deterministic NLP pipelines, it avoids large language models (LLMs) and supports reproducible generation.

Features

Template-Based Text Generation
Produces natural-language output from structured input using YAML-defined Jinja2 templates. Template selection is context-sensitive.
Stochastic Variation
Filters such as random_set, random_range, and random_range_join_phrase introduce lexical and syntactic diversity from identical data structures.
Domain-Specific Annotation
Annotates entities with structured markers for precise extraction and control.
Rule-Based Span Detection
Identifies semantic spans using regular expressions, independent of tokenization or parsing.
Interpretation-Free Generation
Output is deterministic and reproducible; no semantic analysis is performed.
NLP Pipeline compatibility
The Synthetic object includes raw and annotated text, entity spans and regex-based semantic spans. Compatible with spaCy-style frameworks for fine-tuning, evaluation, and augmentation.

Quickstart

from seanox_ai_nlp.synthetics import synthetics
import json

with open("synthetics-planets_en.json", encoding="utf-8") as file:
    datas = json.load(file)
    
for data in datas:
    synthetic = synthetics(".", "synthetics_en_annotate.yaml", data)
    print(synthetic)

Changes

1.3.0 20251001

BF: Python: Corrections/optimizations of dependencies
BF: synthetics: Correction for empty templates / missing segments
BF: synthetics: Consistent use of the parameter pattern for RegEx in spans
CR: Python: Increased the requirement to Python 3.10 or higher
CR: synthetics: Added schema and validation for template YAML
CR: synthetics: Added custom filters for template rendering
CR: synthetics: Template section span - regex added support for labels

Contact

Issues
Requests
Mail

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.3.0.1

Oct 9, 2025

This version

1.3.0

Sep 30, 2025

1.2.0

Sep 6, 2025

1.1.0

Aug 23, 2025

1.0.0

Aug 8, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

seanox_ai_nlp-1.3.0.tar.gz (203.8 kB view details)

Uploaded Sep 30, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

seanox_ai_nlp-1.3.0-py3-none-any.whl (72.5 kB view details)

Uploaded Sep 30, 2025 Python 3

File details

Details for the file seanox_ai_nlp-1.3.0.tar.gz.

File metadata

Download URL: seanox_ai_nlp-1.3.0.tar.gz
Upload date: Sep 30, 2025
Size: 203.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for seanox_ai_nlp-1.3.0.tar.gz
Algorithm	Hash digest
SHA256	`525427a99390e4ba66cafc8513200873634d471dacbc746d7f52bdfc6f56b3d4`
MD5	`0870dee82ce79bce68b190b2aed553b2`
BLAKE2b-256	`0fc2bb6c223b3e83e1cc5f53f719a77e160c37cebc61e8ad033a14ab91cd1ee8`

See more details on using hashes here.

File details

Details for the file seanox_ai_nlp-1.3.0-py3-none-any.whl.

File metadata

Download URL: seanox_ai_nlp-1.3.0-py3-none-any.whl
Upload date: Sep 30, 2025
Size: 72.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for seanox_ai_nlp-1.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0e8109f38405bceda34b8e0aaa3f48c9688f9aed06bb6e80807ae6e95ba34a65`
MD5	`7898c5dcd8fa89218de8cc9899a8817c`
BLAKE2b-256	`86152a22a81053ba831d49a486165708de76516683534783e8adb52c3cbc4281`

See more details on using hashes here.

seanox-ai-nlp 1.3.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Description

Structured NLP Workflow

Licence Agreement

System Requirement

Installation & Setup

Packages & Modules

units

Features

Quickstart

synthetics

Features

Quickstart

Changes

1.3.0 20251001

Contact

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes