Semantic NLP intelligence toolkit — encoding, embeddings, GPU/CPU device handling, and reusable inference interfaces.

These details have not been verified by PyPI

Project links

Project description

defenx-nlp

Lightweight semantic NLP building blocks for Python.

defenx-nlp gives you one interface for text embeddings, semantic retrieval, prototype-based inference, and simple end-to-end NLP pipelines. It is designed for developers who want production-friendly primitives without wiring the same boilerplate in every project.

What It Does

The package currently covers four layers:

SemanticEncoder: a backend-driven embedding facade for local transformer models.
SemanticSearchEngine: semantic indexing and retrieval over embedded documents.
PrototypeInferenceEngine: lightweight embedding-based classification and scoring.
NLPipeline: preprocessing -> encode -> infer orchestration with structured output.

This makes the project useful for:

support ticket routing
internal knowledge search
FAQ and help center retrieval
anomaly or incident scoring
semantic deduplication and clustering
retrieval-augmented backends

Who Uses It

This is primarily a developer library, not a direct end-user application.

Typical users are:

Python backend developers
ML engineers building semantic features
support tooling teams
security/SOC teams experimenting with event similarity
teams building internal search or classification workflows

End users would normally interact with it indirectly inside:

a FastAPI or Flask service
a chatbot or RAG system
a support desk platform
an admin dashboard
a data processing or analytics job

Architecture

defenx-nlp architecture

Installation

Standard install

pip install defenx-nlp

This installs the package and its core dependencies for a normal CPU workflow.

CUDA install

If you want a CUDA-enabled PyTorch build, reinstall torch with the matching wheel after installing the package:

pip install defenx-nlp
pip install --upgrade torch --index-url https://download.pytorch.org/whl/cu128

Development install

git clone https://github.com/defenx-sec/defenx-nlp.git
cd defenx-nlp
pip install -e ".[dev]"

Quick Start

1. Encode text

from defenx_nlp import SemanticEncoder

enc = SemanticEncoder()

embedding = enc.encode("Neural networks are useful for semantic search.")
print(embedding.shape)  # (384,)

embeddings = enc.encode_batch(["hello", "goodbye", "help me"])
print(embeddings.shape)  # (3, 384)

2. Semantic retrieval

from defenx_nlp import SemanticEncoder, SemanticSearchEngine

enc = SemanticEncoder()
search = SemanticSearchEngine(enc)

search.index(
    [
        "Reset your password",
        "Check your latest invoice",
        "Troubleshoot login issues",
    ]
)

results = search.search("I cannot sign in to my account", top_k=2)
for match in results:
    print(match.rank, round(match.score, 3), match.document.text)

3. Prototype-based classification

from defenx_nlp import SemanticEncoder, PrototypeInferenceEngine

enc = SemanticEncoder()
engine = PrototypeInferenceEngine.from_texts(
    enc,
    {
        "support": ["reset password", "cannot log in", "account help"],
        "billing": ["charged twice", "refund request", "invoice issue"],
    },
)

prediction = engine.infer(enc.encode("please help me reset my login"))
print(prediction.label)
print(prediction.score)

4. Run a simple pipeline

from defenx_nlp import (
    NLPipeline,
    PreprocessingConfig,
    PrototypeInferenceEngine,
    SemanticEncoder,
)

enc = SemanticEncoder()
inference = PrototypeInferenceEngine.from_texts(
    enc,
    {
        "support": ["reset password", "login problem"],
        "billing": ["refund request", "invoice problem"],
    },
)

pipeline = NLPipeline(
    enc,
    inference_engine=inference,
    preprocessing_config=PreprocessingConfig(lowercase=True),
)

result = pipeline.run("HELP! I cannot access my account.")
print(result.processed_text)
print(result.prediction.label)

Why Use This Instead Of Raw sentence-transformers?

You can absolutely use sentence-transformers directly. This project becomes helpful when you want a cleaner application-facing layer around embeddings.

Problem	Raw sentence-transformers	defenx-nlp
Device selection	You handle CPU/CUDA/MPS decisions yourself	`get_device()` is built in
Service-friendly facade	Model code leaks into app logic	`SemanticEncoder` keeps a stable interface
Retrieval layer	You wire indexing and ranking yourself	`SemanticSearchEngine` is ready to use
Simple classifier	You build your own prototype scoring	`PrototypeInferenceEngine` is included
End-to-end flow	You orchestrate each step manually	`NLPipeline` returns structured results
Output consistency	Mix of tensors/arrays depending on flags	Returns `float32` NumPy arrays

API Summary

Symbol	Description
`SemanticEncoder`	Main embedding facade
`SemanticSearchEngine`	Document indexing and semantic retrieval
`NumpyVectorIndex`	NumPy-based cosine similarity index
`FaissVectorIndex`	Optional FAISS-backed vector index
`PrototypeInferenceEngine`	Prototype-based classifier/scoring engine
`NLPipeline`	Preprocess -> encode -> infer pipeline
`EncoderConfig`	Backend configuration object
`PreprocessingConfig`	Cleaning/truncation config for the pipeline
`DocumentRecord`	Structured retrieval document
`SearchResult`	Ranked retrieval result
`Prediction`	Structured inference output
`PipelineResult`	Structured pipeline output
`clean_text`, `batch_clean`, `truncate`	Preprocessing helpers
`cosine_similarity`, `batch_cosine_similarity`	Similarity helpers
`normalize_embedding`, `normalize_batch`	L2 normalization helpers

Full API docs: docs/api_reference.md

Backends

The default backend is sentence-transformers.

The package also exports backend contracts for future extension:

SentenceTransformerBackend: implemented and production-usable
OnnxEncoderBackend: interface stub, not implemented yet
APIEncoderBackend: interface stub, not implemented yet

If you expose ONNX or remote API backends publicly, label them as experimental until they perform real inference.

Examples

python examples/basic_usage.py
python examples/batch_encoding.py
python examples/v2_pipeline.py

Testing

pytest tests -v

The test suite contains both:

pure local unit tests for retrieval, inference, and pipeline logic
integration-style encoder tests that require the default model to be locally available or downloadable

If the environment cannot reach Hugging Face and the model is not cached, the integration tests skip instead of failing the entire local test run.

Project Structure

defenx-nlp/
|-- defenx_nlp/
|   |-- __init__.py
|   |-- backends.py
|   |-- device.py
|   |-- encoder.py
|   |-- inference.py
|   |-- interfaces.py
|   |-- pipeline.py
|   |-- preprocessing.py
|   |-- retrieval.py
|   |-- schemas.py
|   `-- utils.py
|-- docs/
|   |-- api_reference.md
|   `-- architecture.png
|-- examples/
|   |-- basic_usage.py
|   |-- batch_encoding.py
|   `-- v2_pipeline.py
|-- tests/
|   |-- test_encoder.py
|   `-- test_v2.py
|-- pyproject.toml
`-- README.md

Roadmap

Good next milestones for the project:

implement the ONNX backend
implement a real API embedding backend
add persistence helpers for vector indexes
add FastAPI service examples
expand benchmark coverage for CPU vs CUDA vs FAISS
publish hosted documentation

License

MIT. See LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.0.1

Apr 25, 2026

0.2.1

Feb 23, 2026

0.2.0

Feb 21, 2026

0.1.2

Feb 21, 2026

0.1.1

Feb 21, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

defenx_nlp-1.0.1.tar.gz (26.3 kB view details)

Uploaded Apr 25, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

defenx_nlp-1.0.1-py3-none-any.whl (23.4 kB view details)

Uploaded Apr 25, 2026 Python 3

File details

Details for the file defenx_nlp-1.0.1.tar.gz.

File metadata

Download URL: defenx_nlp-1.0.1.tar.gz
Upload date: Apr 25, 2026
Size: 26.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for defenx_nlp-1.0.1.tar.gz
Algorithm	Hash digest
SHA256	`112a5996e5de895184a00cd5c6c681c32fe39161fcebd612a574872edc39af16`
MD5	`6f9b5f6ab4e5189da4d8009c0a9318d8`
BLAKE2b-256	`68090f97c5c15e4f8d394e088c151093cc57424a259215b37fa2b01bd88ab7ec`

See more details on using hashes here.

Provenance

The following attestation bundles were made for defenx_nlp-1.0.1.tar.gz:

Publisher: publish.yml on defenx-tech/defenx-nlp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: defenx_nlp-1.0.1.tar.gz
- Subject digest: 112a5996e5de895184a00cd5c6c681c32fe39161fcebd612a574872edc39af16
- Sigstore transparency entry: 1382712473
- Sigstore integration time: Apr 25, 2026
Source repository:
- Permalink: defenx-tech/defenx-nlp@6c24d11256b7f3a0326235c072db93ceb0f13434
- Branch / Tag: refs/tags/v1.0.1
- Owner: https://github.com/defenx-tech
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@6c24d11256b7f3a0326235c072db93ceb0f13434
- Trigger Event: push

File details

Details for the file defenx_nlp-1.0.1-py3-none-any.whl.

File metadata

Download URL: defenx_nlp-1.0.1-py3-none-any.whl
Upload date: Apr 25, 2026
Size: 23.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for defenx_nlp-1.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c367dba6a2257055e1a79a232e1c16695b9b4a77afbfc566d0d54de09252e8f9`
MD5	`dc46e7732e3dd8395c782b4b924219e3`
BLAKE2b-256	`44074a5d53ad82290a45cc7002e2fc5a10f7cf3932a9dbd1b028b48a9092bdc8`

See more details on using hashes here.

Provenance

The following attestation bundles were made for defenx_nlp-1.0.1-py3-none-any.whl:

Publisher: publish.yml on defenx-tech/defenx-nlp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: defenx_nlp-1.0.1-py3-none-any.whl
- Subject digest: c367dba6a2257055e1a79a232e1c16695b9b4a77afbfc566d0d54de09252e8f9
- Sigstore transparency entry: 1382712479
- Sigstore integration time: Apr 25, 2026
Source repository:
- Permalink: defenx-tech/defenx-nlp@6c24d11256b7f3a0326235c072db93ceb0f13434
- Branch / Tag: refs/tags/v1.0.1
- Owner: https://github.com/defenx-tech
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@6c24d11256b7f3a0326235c072db93ceb0f13434
- Trigger Event: push

defenx-nlp 1.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

defenx-nlp

What It Does

Who Uses It

Architecture

Installation

Standard install

CUDA install

Development install

Quick Start

1. Encode text

2. Semantic retrieval

3. Prototype-based classification

4. Run a simple pipeline

Why Use This Instead Of Raw sentence-transformers?

API Summary

Backends

Examples

Testing

Project Structure

Roadmap

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance