Natural Language Question Answering Toolkit (Hybrid NLP + GenAI)

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

anirbansarkarq

These details have not been verified by PyPI

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

NLQcat

Natural Language Question Answering Toolkit

Bridging the gap between Linguistic Analysis, Semantic Search, and Generative AI.

📖 Overview

NLQcat is a production-ready, hybrid NLP + GenAI library designed to unify classic linguistic analysis (spaCy) with modern semantic search (Vector Databases) and Large Language Models (LLMs).

Unlike purely generative frameworks, NLQcat offers a grounded approach where linguistic structure (POS tagging, NER) informs and refines semantic retrieval, leading to more accurate and context-aware RAG (Retrieval-Augmented Generation) pipelines.

Whether you are building a local document Q&A bot, a complex semantic search engine, or an intelligent agent, NLQcat provides the modular building blocks to get you there fast.

✨ Features

🧠 Hybrid Intelligence: Seamlessly blends symbolic NLP (spaCy) with neural embeddings (SentenceTransformers).
🚀 Unified Pipeline: A single Pipeline class to manage ingestion, analysis, retrieval, and generation.
🔌 Plug-and-Play Vector Stores: Integrated support for ChromaDB, with extensible interfaces for FAISS and Pinecone.
🤖 LLM Agnostic: Built-in support for OpenAI GPT models, with a flexible LLMBase for easy integration of HuggingFace or local LLMs.
🔍 Deep Linguistic Analysis: extract entities, linguistic tokens, and POS tags to filter or re-rank semantic search results.
🛠️ Production Ready: Type-hinted, modular architecture designed for scalability and maintainability.

📦 Installation

Install NLQcat via pip:

pip install nlqcat

Download the required spaCy model (default):

python -m spacy download en_core_web_sm

🚀 Quick Start

Get a RAG pipeline running in 3 lines of code:

from nlqcat.core.pipeline import Pipeline

# 1. Initialize Pipeline with Vector Store (ChromaDB default)
pipe = Pipeline(vector_store_type="chroma")

# 2. Add some knowledge
pipe.add_documents([
    "NLQcat combines linguistic NLP with semantic RAG.",
    "It supports ChromaDB, FAISS, and OpenAI integration."
])

# 3. Ask a question! (Retrieval Only)
result = pipe.query("What does NLQcat support?")
print(result['retrieved_docs'])

Want Generative Answers? Configure an LLM:

from nlqcat.models.openai_llm import OpenAILLM

# Initialize LLM
llm = OpenAILLM(api_key="your-openai-key")

# Attach to Pipeline
pipe = Pipeline(vector_store_type="chroma", llm=llm)

# Query
answer = pipe.query("Explain NLQcat's architecture.")['answer']
print(answer)

📚 Full Usage Guide

1. The Core Pipeline

The Pipeline class is the heart of NLQcat. It orchestrates the flow of data between the NLP analyzer, Vector Store, and LLM.

from nlqcat.core.pipeline import Pipeline

pipe = Pipeline(
    enable_spacy=True,          # Enable linguistic analysis
    vector_store_type="chroma", # 'chroma', 'faiss', 'pinecone' or None
    vector_store_path="./db",   # Persistence path
    llm=my_llm_instance         # Optional LLM instance
)

2. Working with Vector Stores

NLQcat supports modular vector stores. If you need a specific configuration, instantiate the store directly or let the pipeline handle it.

Supported Stores:

ChromaStore (Default, excellent for local dev & prod)
FaissStore (Fast, in-memory)
PineconeStore (Managed cloud vector DB)

# Automatic (Recommended)
pipe = Pipeline(vector_store_type="chroma", vector_store_path="./my_chroma_db")

# Manual
from nlqcat.vector_store.chroma_store import ChromaStore
store = ChromaStore(path="./custom_db")

3. Linguistic Analysis (NLP)

Access standard spaCy features conveniently through the unified NLP class.

# Initialize
pipe = Pipeline(enable_spacy=True)
doc = pipe.nlp.analyze("Apple is looking at buying U.K. startup for $1 billion")

# 1. Tokens & POS Tags
print(doc.tokens)   # ['Apple', 'is', 'looking', ...]
print(doc.pos_tags) # [('Apple', 'PROPN'), ('is', 'AUX'), ...]

# 2. Named Entities (NER)
print(doc.entities) 
# [('Apple', 'ORG'), ('U.K.', 'GPE'), ('$1 billion', 'MONEY')]

# 3. Dependency Parsing
for dep in doc.dependencies:
    print(f"{dep['text']} --[{dep['dep']}]--> {dep['head']}")

4. Semantic Features & GenAI

NLQcat makes complex semantic operations simple.

Embeddings & Similarity

Generate vector embeddings and calculate cosine similarity.

pipe = Pipeline()

# Generate Embeddings
text = "Artificial Intelligence is transforming the world."
emb = pipe.nlp.embed(text)
print(f"Dimensions: {emb.shape}")

# Calculate Similarity
score = pipe.nlp.similarity("I love coding", "Programming is my passion")
print(f"Similarity Score: {score:.4f}") # High score (e.g., 0.85)

Text Summarization

Built-in abstractive/extractive summarization (defaulting to simple heuristics or configurable models).

long_text = "Deep learning is part of a broader family of machine learning methods..."
summary = pipe.nlp.summarize(long_text)
print(summary)

Clustering

Cluster sentences based on semantic meaning using K-Means.

sentences = [
    "The cat sits on the mat.", "Dogs are great pets.", # Animals
    "Python is a language.", "Java is verbose."         # Coding
]

clusters = pipe.nlp.cluster(sentences)
# Returns: {0: ['The cat...', 'Dogs...'], 1: ['Python...', 'Java...']}

🧩 Architecture

The NLQcat architecture follows a clean Layered Pattern:

Core Layer (nlqcat.core): Contains the Pipeline orchestrator and RAG logic.
Semantic Layer (nlqcat.semantic): Handles Embeddings (SentenceTransformers) and Similarity calculations.
Vector Store Layer (nlqcat.vector_store): Adapters for different vector databases.
Model Layer (nlqcat.models): Wrappers for LLMs (OpenAI, etc.).

flowchart TD
    U[User Query] --> P[Pipeline]

    P --> S[spaCy NLP<br/>Tokens / POS / NER]
    P --> E[Embedder<br/>Sentence Transformers]

    E --> V[(Vector Store)]
    V --> R[Retrieved Context]

    P --> L[LLM]

    R --> L
    S --> F[Entity / Metadata Filters]
    F --> V

    L --> A[Final Answer]

⚙️ Configuration

NLQcat respects standard environment variables.

Variable	Description
`OPENAI_API_KEY`	Required if using `OpenAILLM` without passing key explicitly.
`HUGGINGFACE_TOKEN`	Required for some gated HuggingFace models (future support).

🧪 Advanced Concepts

Custom LLMs

You can plug in any LLM by inheriting from LLMBase.

from nlqcat.models.llm_base import LLMBase

class MyCustomLLM(LLMBase):
    def generate(self, prompt: str, **kwargs) -> str:
        return "This is a dummy response based on " + prompt

pipe = Pipeline(llm=MyCustomLLM())

Hybrid Filtering (Roadmap)

Future versions will allow using spaCy entities to automatically filter vector search results (e.g., "Show me documents about Elon Musk" -> Filter metadata person="Elon Musk").

❓ FAQ

Q: Can I use a different embedding model? A: Yes! Modify the Embedder class or look out for the upcoming config update allowing custom model names in Pipeline.

Q: Is this thread-safe? A: Pipeline is generally thread-safe, but be cautious with ChromaDB's SQLite backend in highly concurrent write scenarios.

🗺️ Roadmap

v0.2.0: Integration with LangChain tools.
v0.3.0: Advanced RAG (HyDE, MMR Re-ranking).
v0.4.0: Cloud Deployment Blueprints (Docker, AWS Lambda).
Documentation: Sphinx/MkDocs site generation.

🤝 Contributing

We welcome contributions! Please follow these steps:

Fork the repository.
Create a feature branch (git checkout -b feature/amazing-feature).
Commit your changes (git commit -m 'Add amazing feature').
Push to the branch (git push origin feature/amazing-feature).
Open a Pull Request.

Please ensure you run tests before submitting:

python -m pytest tests/

📄 License

Distributed under the MIT License. See LICENSE for more information.

👥 Credits

Author: Anirban Sarkar
Maintainer: AnirbansarkarS

_{Built with ❤️ by Anirban-QuantumCAT.}

Project details

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

anirbansarkarq

These details have not been verified by PyPI

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

0.1.3

Dec 8, 2025

0.1.2

Dec 8, 2025

0.1.0

Dec 7, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nlqcat-0.1.3.tar.gz (29.3 kB view details)

Uploaded Dec 8, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

nlqcat-0.1.3-py3-none-any.whl (31.0 kB view details)

Uploaded Dec 8, 2025 Python 3

File details

Details for the file nlqcat-0.1.3.tar.gz.

File metadata

Download URL: nlqcat-0.1.3.tar.gz
Upload date: Dec 8, 2025
Size: 29.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for nlqcat-0.1.3.tar.gz
Algorithm	Hash digest
SHA256	`d6d9b22bd64b8acc9c05ce1e43eb59658a6c869de96169894abfb617cd9c37af`
MD5	`571ac51aed336acb66173c20d63ee846`
BLAKE2b-256	`90f95998e224680eadb83f418082bf0d473a7fee0b41872fae2578489acde7f8`

See more details on using hashes here.

Provenance

The following attestation bundles were made for nlqcat-0.1.3.tar.gz:

Publisher: workflow.yml on AnirbansarkarS/NLqcat

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: nlqcat-0.1.3.tar.gz
- Subject digest: d6d9b22bd64b8acc9c05ce1e43eb59658a6c869de96169894abfb617cd9c37af
- Sigstore transparency entry: 748679359
- Sigstore integration time: Dec 8, 2025
Source repository:
- Permalink: AnirbansarkarS/NLqcat@da8e766a825a2df2a4a8b1a6e9dd9927b118ba89
- Branch / Tag: refs/tags/v0.1.3
- Owner: https://github.com/AnirbansarkarS
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: workflow.yml@da8e766a825a2df2a4a8b1a6e9dd9927b118ba89
- Trigger Event: release

File details

Details for the file nlqcat-0.1.3-py3-none-any.whl.

File metadata

Download URL: nlqcat-0.1.3-py3-none-any.whl
Upload date: Dec 8, 2025
Size: 31.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for nlqcat-0.1.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9d2b05be49ad728e409c0236cdfc275cd419c1dc07b8a36ffdb97e1a305e3f79`
MD5	`83dc9fae018c3a02ac388904c5da449d`
BLAKE2b-256	`eb4b801ccbc3ea3000de59e8e259a373d3dbd96538e975dcb6e6ed7c98206a25`

See more details on using hashes here.

Provenance

The following attestation bundles were made for nlqcat-0.1.3-py3-none-any.whl:

Publisher: workflow.yml on AnirbansarkarS/NLqcat

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: nlqcat-0.1.3-py3-none-any.whl
- Subject digest: 9d2b05be49ad728e409c0236cdfc275cd419c1dc07b8a36ffdb97e1a305e3f79
- Sigstore transparency entry: 748679362
- Sigstore integration time: Dec 8, 2025
Source repository:
- Permalink: AnirbansarkarS/NLqcat@da8e766a825a2df2a4a8b1a6e9dd9927b118ba89
- Branch / Tag: refs/tags/v0.1.3
- Owner: https://github.com/AnirbansarkarS
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: workflow.yml@da8e766a825a2df2a4a8b1a6e9dd9927b118ba89
- Trigger Event: release

nlqcat 0.1.3

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

NLQcat

📖 Overview

✨ Features

📦 Installation

🚀 Quick Start

📚 Full Usage Guide

1. The Core Pipeline

2. Working with Vector Stores

3. Linguistic Analysis (NLP)

4. Semantic Features & GenAI

Embeddings & Similarity

Text Summarization

Clustering

🧩 Architecture

⚙️ Configuration

🧪 Advanced Concepts

Custom LLMs

Hybrid Filtering (Roadmap)

❓ FAQ

🗺️ Roadmap

🤝 Contributing

📄 License

👥 Credits

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance