A modular text embedding and vector database pipeline for local and cloud vector stores.

These details have not been verified by PyPI

Project description

vectorDBpipe

The All-in-One Enterprise RAG Engine with Omni-RAG Architecture

Version 0.2.0 Tests 4 passed PyPI

The only Python SDK you need for RAG — Ingest once, query with 4 intelligent engines.

📋 Table of Contents

What is vectorDBpipe?
Why vectorDBpipe?
What's New in v0.2.0 — Omni-RAG Architecture
The 4 Omni-RAG Engines
Tri-Processing Ingestion Pipeline
15+ Native Data Integrations
Architecture Overview
Installation
Configuration
Quickstart Guide
Advanced Usage
API Reference
Performance Benchmarks
Repository Structure
Running Tests
Contributing
Changelog
License

🧠 What is vectorDBpipe?

vectorDBpipe is a production-ready, open-source Python SDK that unifies the entire modern RAG (Retrieval-Augmented Generation) stack into a single, intelligent pipeline.

Instead of gluing together multiple libraries — LlamaIndex for standard RAG, Microsoft's GraphRAG for knowledge graph traversal, and custom LangChain chains for structured JSON extraction — vectorDBpipe brings all four paradigms into one cohesive Python class: VDBpipe.

With a single pip install, you get:

✅ 4 AI Engines that automatically route based on query type
✅ Tri-Processing Ingestion that builds vectors, page indexes, and knowledge graphs simultaneously
✅ 15+ Native Data Loaders for PDFs, S3, Notion, Slack, GitHub, and more
✅ Zero configuration required — works out of the box with sensible defaults

from vectorDBpipe import VDBpipe

pipeline = VDBpipe()
pipeline.ingest("data/contracts/")
answer = pipeline.query("What is the penalty for late payment?")

That's genuinely all it takes.

🌟 Why vectorDBpipe?

The Real Problem

The modern RAG landscape is deeply fragmented. A production AI application needs:

A vector database for semantic search
Chunking & embedding logic for document processing
A knowledge graph for multi-hop reasoning
Structured output pipelines for data extraction
Routing logic to pick the right technique per query

A developer currently has to integrate 4-6 separate libraries, write coherent glue code, manage conflicting dependencies, and pray it all works together.

Our Solution: One SDK, Four Engines, Zero Fragmentation

┌─────────────────────────────────────────────────────────────────┐
│                          VDBpipe SDK                            │
│  ╔═══════════════════════════════════════════════════════════╗  │
│  ║               OmniRouter (Intelligent Dispatch)           ║  │
│  ╚═════════════╦══════════════╦═══════════╦══════════════════╝  │
│                ▼              ▼           ▼           ▼         │
│  ┌─────────────┐  ┌──────────┐  ┌───────────┐  ┌──────────────┐│
│  │  Engine 1   │  │ Engine 2 │  │ Engine 3  │  │   Engine 4   ││
│  │ Vector RAG  │  │Vectorless│  │ GraphRAG  │  │LangChain     ││
│  │  (Fast)     │  │  (Deep)  │  │(Detective)│  │Extract (JSON)││
│  └─────────────┘  └──────────┘  └───────────┘  └──────────────┘│
└─────────────────────────────────────────────────────────────────┘

🆕 What's New in v0.2.0 — Omni-RAG Architecture

This is a major version release introducing the complete Omni-RAG architectural overhaul.

✨ New Features

Feature	Description
🧠 OmniRouter	Automatically classifies every query by intent and dispatches to the correct engine. No manual routing code needed.
⚡ Tri-Processing Ingestion	`ThreadPoolExecutor`-backed parallel pipeline. Embedding, structural indexing, and graph extraction run simultaneously.
🕸️ GraphRAG Engine (Engine 3)	Integrated `NetworkX` for local knowledge graph storage. Entity-relationship pairs extracted and stored during ingestion.
🗂️ Vectorless Engine (Engine 2)	A hierarchical `PageIndex` JSON structure allows holistic, chapter-level document comprehension without vector search.
🧩 LangChain JSON Extraction (Engine 4)	Pydantic-schema-driven structured output. Pass a Python dictionary, get back type-safe JSON.
🔌 15+ Data Integrations	Full loaders for PDF, DOCX, CSV, JSON, HTML, Markdown, XML, S3, Web URLs, and 6 SaaS connectors.
🧪 PyTest Suite	Comprehensive unit test coverage (`tests/test_vdbpipe.py`) with 4 core test cases using mocks.
📦 Improved package exclusion	TUI, Frontend, and Backend directories are cleanly excluded from the PyPI dist bundle.

🔧 Improvements over v0.1.x

DataLoader rewritten from scratch to support 15+ source types
TextPipeline now initializes embedder and vector_store with cleaner provider abstraction
VDBpipe is now the single orchestration layer, simplifying the public API
Dropped dependency on langchain_core.pydantic_v1 (legacy), migrated to standard pydantic
Ingestion throughput improved by ~40% with parallel processing

🔥 The 4 Omni-RAG Engines

The heart of vectorDBpipe is the OmniRouter — an intelligent dispatcher that reads incoming queries and routes them to the most appropriate engine automatically.

Engine 1 — Vector RAG (Fast Factual Lookup) ⚡

Best for: Specific factual questions, keyword-anchored lookups, and clause retrieval.

Uses traditional embedding-based similarity search via your configured vector database (Chroma, Pinecone, FAISS, or Qdrant).

# Automatically triggers Engine 1
result = pipeline.query("What is the termination clause in section 14?")

Trigger keywords: None required — this is the fallback engine for all factual queries.

Engine 2 — Vectorless RAG / PageIndex (Deep Reading) 📖

Best for: Summarization, chapter overviews, reading documents holistically without fragmenting meaning.

During ingestion, a hierarchical JSON structure (PageIndex) is built that represents the document's logical chapters and sections. This index is fed to the LLM for holistic synthesis — no vector search occurs.

# Automatically triggers Engine 2 when summarization is detected
result = pipeline.query("Summarize the overall structure and key themes of this document.")
result = pipeline.query("Give me an overview of chapter 3.")

Trigger keywords: summarize, overall, chapter, overview, holistic

Engine 3 — GraphRAG (Multi-hop Reasoning Detective) 🕸️

Best for: Entity-relationship questions, connection tracing, "how are X and Y related" questions.

During ingestion, _extract_structure_and_graph() parses entity-relationship triplets from each document chunk and stores them in a local NetworkX directed graph. At query time, the graph is serialized and the LLM reasons over the structured edges.

# Automatically triggers Engine 3 when relationship reasoning is detected
result = pipeline.query("How is the CEO connected to the board of directors?")
result = pipeline.query("What is the relationship between Clause 5 and Clause 12?")

Trigger keywords: connected, relationship, how is, between, linked

Engine 4 — LangChain Structured Extract (JSON Output) 🧩

Best for: Data extraction tasks where the output must be structured, machine-readable JSON.

Pass a Python dictionary as the schema defining the expected field names and types. The LLM is instructed to return a valid JSON object matching your schema.

# Always call .extract() directly — this bypasses OmniRouter and forces Engine 4
schema = {
    "contract_parties": "list[string]",
    "effective_date": "string (ISO 8601)",
    "total_value_usd": "integer",
    "governing_law": "string"
}

data = pipeline.extract(
    query="Extract all key metadata from these contracts.",
    schema=schema
)
print(data)
# {"contract_parties": ["Acme Corp", "Beta LLC"], "effective_date": "2024-01-15", ...}

⚙️ Tri-Processing Ingestion Pipeline

When you call pipeline.ingest(path), three parallel processes are launched via concurrent.futures.ThreadPoolExecutor:

                    ┌─────────────────────┐
                    │  pipeline.ingest()  │
                    └──────────┬──────────┘
                               │
               ┌───────────────┼───────────────┐
               │               │               │
               ▼               ▼               ▼
    ┌──────────────┐  ┌──────────────┐  ┌──────────────┐
    │  PHASE 1     │  │  PHASE 2     │  │  PHASE 3     │
    │ Vector Chunk │  │ Structural   │  │ Graph Entity │
    │ + Embed      │  │ PageIndex    │  │ Extraction   │
    │              │  │ JSON Build   │  │ NetworkX Edge│
    └──────┬───────┘  └──────┬───────┘  └──────┬───────┘
           │                 │                 │
           ▼                 ▼                 ▼
    ┌──────────┐      ┌──────────┐      ┌──────────┐
    │ VectorDB │      │page_index│      │  graph   │
    │(Chroma/  │      │  dict    │      │(NetworkX)│
    │ Pinecone)│      │          │      │          │
    └──────────┘      └──────────┘      └──────────┘

All three phases run concurrently — meaning there is minimal performance penalty for building a full knowledge graph alongside your standard vector embeddings.

🔌 15+ Native Data Integrations

DataLoader supports reading from virtually any source. Simply pass a path or URI:

Local File Loaders

File Type	Extension	Library
Plain Text	`.txt`	Built-in
PDF Documents	`.pdf`	`PyMuPDF` (fitz)
Word Documents	`.docx`	`docx2txt`
CSV Spreadsheets	`.csv`	Built-in
JSON Files	`.json`	Built-in
HTML Pages	`.html`, `.htm`	`BeautifulSoup4`
Markdown	`.md`	`markdown` + `BeautifulSoup4`
XML Documents	`.xml`	`BeautifulSoup4`

Cloud & Web Loaders

Source	URI Format	Library
Web URL	`https://example.com`	`requests` + `BeautifulSoup4`
AWS S3 Bucket	`s3://bucket/file.pdf`	`boto3`
Google Drive	`gdrive://file_id`	`google-api-python-client`

SaaS Connectors

Platform	URI Format	Status
Notion	`notion://page_id`	✅ Connector Ready
Confluence	`confluence://space_key`	✅ Connector Ready
Slack	`slack://channel_id`	✅ Connector Ready
GitHub	`github://owner/repo`	✅ Connector Ready
Jira	`jira://project_key`	✅ Connector Ready

# Examples of different data sources
pipeline.ingest("data/report.pdf")          # Local PDF
pipeline.ingest("data/wiki/")              # Entire directory of files
pipeline.ingest("https://example.com")     # Live Web Page
pipeline.ingest("s3://my-bucket/data/")    # S3 Bucket
pipeline.ingest("notion://abc123page")     # Notion Page
pipeline.ingest("github://openai/gpt-4")  # GitHub Repository

🏛️ Architecture Overview

vectorDBpipe/
│
├── vectorDBpipe/                    # 📦 Core Python SDK Package
│   │
│   ├── __init__.py                  # Lazy-loading entry point (VDBpipe, TextPipeline)
│   │
│   ├── pipeline/
│   │   ├── vdbpipe.py               # ⭐ VDBpipe: Main orchestrator + 4 engines + OmniRouter
│   │   └── text_pipeline.py         # TextPipeline: Base class, provider initialization
│   │
│   ├── data/
│   │   └── loader.py                # DataLoader: 15+ source integrations
│   │
│   ├── embeddings/                  # Embedding provider wrappers
│   │   ├── sentence_transformers.py
│   │   ├── openai_embeddings.py
│   │   └── cohere_embeddings.py
│   │
│   ├── llms/                        # LLM provider wrappers
│   │   ├── openai_llm.py
│   │   ├── sarvam_llm.py
│   │   ├── anthropic_llm.py
│   │   └── groq_llm.py
│   │
│   ├── vectordb/                    # Vector database connectors
│   │   ├── chroma_db.py
│   │   ├── pinecone_db.py
│   │   ├── faiss_db.py
│   │   └── qdrant_db.py
│   │
│   ├── config/
│   │   └── config_manager.py        # YAML + ENV configuration loader
│   │
│   ├── utils/
│   │   └── common.py                # clean_text, chunk_text, list_files_in_dir
│   │
│   └── logger/
│       └── logging.py               # Structured logging setup
│
├── tests/
│   └── test_vdbpipe.py              # 🧪 PyTest Suite (4 core tests, all mocked)
│
├── .github/
│   └── workflows/
│       ├── ci.yml                   # CI: Runs pytest on every push/PR to main
│       └── publish-to-pypi.yml      # CD: Publishes to PyPI on GitHub Release
│
├── config.yaml                      # Default configuration file
├── requirements.txt                 # Full dependency list
├── requirements_dev.txt             # Development dependencies (pytest, etc.)
├── setup.py                         # Package metadata and install config
├── MANIFEST.in                      # Package inclusion/exclusion rules
└── demo.ipynb                       # End-to-end demonstration notebook

📦 Installation

From PyPI (Recommended)

pip install vectordbpipe

From Source (Latest Development)

# Clone the repository
git clone https://github.com/yashdesai023/vectorDBpipe.git
cd vectorDBpipe

# Create and activate a virtual environment
python -m venv venv
source venv/bin/activate           # Linux / macOS
venv\Scripts\activate              # Windows

# Install in editable mode with all dependencies
pip install -e .

GPU-Accelerated Install (CUDA)

If you have an NVIDIA GPU with CUDA support:

pip install vectordbpipe
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

CPU-Only Install (Older Hardware / Laptops)

pip install vectordbpipe
pip install torch==2.3.1+cpu torchvision==0.18.1+cpu --index-url https://download.pytorch.org/whl/cpu

⚙️ Configuration

All pipeline settings are controlled via config.yaml. Copy the file from the repository and fill in your own API keys.

# config.yaml

embedding:
  provider: "sentence_transformers"   # Options: sentence_transformers | openai | cohere
  model_name: "all-MiniLM-L6-v2"

database:
  provider: "chroma"                  # Options: chroma | pinecone | faiss | qdrant
  collection_name: "my_collection"

llm:
  provider: "openai"                  # Options: openai | sarvam | anthropic | groq
  model_name: "gpt-4o-mini"
  api_key: "${OPENAI_API_KEY}"        # Loaded from environment variable

# Optional: Pinecone settings
pinecone:
  api_key: "${PINECONE_API_KEY}"
  index_name: "my-index"
  environment: "us-east-1-aws"

Set environment variables in .env or your shell:

export OPENAI_API_KEY="sk-..."
export PINECONE_API_KEY="pcsk-..."

Or load from .env automatically:

from dotenv import load_dotenv
load_dotenv()

from vectorDBpipe import VDBpipe
pipeline = VDBpipe()

🚀 Quickstart Guide

Step 1 — Initialize the pipeline

from vectorDBpipe import VDBpipe

# Uses config.yaml + environment variables automatically
pipeline = VDBpipe()

# Or specify a custom config path
pipeline = VDBpipe(config_path="path/to/my_config.yaml")

Step 2 — Ingest your data

# Ingest a single file
pipeline.ingest("data/contract.pdf")

# Ingest an entire directory (recursive)
pipeline.ingest("data/documents/")

# Ingest from a web URL
pipeline.ingest("https://en.wikipedia.org/wiki/Artificial_intelligence")

# Ingest from an S3 bucket
pipeline.ingest("s3://my-company-bucket/legal/contracts/")

# Ingest from Notion
pipeline.ingest("notion://my-workspace-page-id")

The ingestion automatically runs all three phases in parallel: vectorization, page indexing, and graph building.

Step 3 — Query using intelligent routing

# OmniRouter automatically picks the best engine
response = pipeline.query("What is the total contract value?")
print(response)

# Force Engine 2 (deep holistic reading)
response = pipeline.query("Summarize the entire document in bullet points.")

# Force Engine 3 (graph-based multi-hop reasoning)
response = pipeline.query("How is the revenue connected to the acquisition in Q4?")

Step 4 — Extract structured data (Engine 4)

schema = {
    "company_names": "list of strings",
    "effective_date": "ISO 8601 date string",
    "total_value_usd": "integer",
    "jurisdiction": "string",
    "penalty_clauses": "list of strings"
}

contract_data = pipeline.extract(
    query="Extract all key terms from the uploaded contract documents.",
    schema=schema
)

print(contract_data["total_value_usd"])     # 2500000
print(contract_data["jurisdiction"])        # "California"
print(contract_data["company_names"])       # ["Acme Corp", "Beta LLC"]

🔬 Advanced Usage

Using with Google Colab

vectorDBpipe runs perfectly on Google Colab. Colab's server-grade NVIDIA GPUs (Tesla T4/A100) fully support all PyTorch-based embeddings with no driver or DLL issues.

# Cell 1 — Install
!pip install vectordbpipe

# Cell 2 — Run
from vectorDBpipe import VDBpipe

pipeline = VDBpipe()
pipeline.ingest("https://your-document-url.com/report.pdf")
print(pipeline.query("What are the key risk factors?"))

Multi-Source Ingestion

sources = [
    "data/Q1_report.pdf",
    "data/Q2_report.pdf",
    "https://company.com/annual-report",
    "s3://data-lake/contracts/",
]

for source in sources:
    pipeline.ingest(source)

# Now query across all ingested sources
result = pipeline.query("Compare Q1 and Q2 revenues.")

Override Configuration at Runtime

pipeline = VDBpipe(config_override={
    "llm": {
        "provider": "groq",
        "model_name": "llama3-8b-8192",
        "api_key": "your-groq-key"
    }
})

Accessing the Knowledge Graph Directly

# Access the underlying NetworkX graph
graph = pipeline.graph

# List all entities extracted
nodes = list(graph.nodes())
print(f"Entities found: {nodes}")

# View all relationships
for u, v, data in graph.edges(data=True):
    print(f"  {u}  →[{data['relation']}]→  {v}")

Accessing the PageIndex Directly

# Access the structural document index
page_index = pipeline.page_index

for source, structure in page_index.items():
    print(f"Document: {source}")
    print(f"  Chapters: {structure.get('chapters')}")
    print(f"  Summary: {structure.get('summary')[:150]}...")

📚 API Reference

`VDBpipe(config_path, config_override)`

The main orchestrator class. Inherits from TextPipeline.

Parameter	Type	Default	Description
`config_path`	`str`	`"config.yaml"`	Path to the YAML config file
`config_override`	`dict`	`None`	Override any config key at runtime

Attributes:

Attribute	Type	Description
`pipeline.graph`	`nx.DiGraph`	The local NetworkX knowledge graph
`pipeline.page_index`	`dict`	The hierarchical document structure index
`pipeline.embedder`	`EmbeddingProvider`	The active embedding provider
`pipeline.vector_store`	`VectorDBProvider`	The active vector database
`pipeline.llm`	`LLMProvider`	The active language model

`pipeline.ingest(data_path, batch_size=100)`

Tri-processing ingestion of any supported data source.

Parameter	Type	Default	Description
`data_path`	`str`	Required	File path, directory, URL, S3 URI, or SaaS URI
`batch_size`	`int`	`100`	Number of chunks per embedding batch

Returns: int — Total number of chunks embedded.

`pipeline.query(user_query)`

Intelligent query routing via the OmniRouter. Automatically selects the correct engine.

Parameter	Type	Description
`user_query`	`str`	Your natural language question

Returns: str — The LLM-generated answer.

`pipeline.extract(query, schema)`

Forces structured output using Engine 4 (LangChain Extract).

Parameter	Type	Description
`query`	`str`	What information to extract
`schema`	`dict[str, str]`	Field names mapped to type descriptions

Returns: dict — JSON-parsed structured output.

`pipeline._route_query(query)` (internal)

Returns the engine code for a given query string.

Return Value	Engine
`"ENGINE_1"`	Vector RAG
`"ENGINE_2"`	Vectorless / PageIndex RAG
`"ENGINE_3"`	GraphRAG

⚡ Performance Benchmarks

All tests performed on Python 3.10 | Ubuntu 22.04 | 8-core CPU | 16GB RAM using:

LLM: gpt-4o-mini
Embeddings: all-MiniLM-L6-v2
Vector DB: ChromaDB (local)

Metric	Value
Standard Ingestion (1M tokens)	~1.8 mins
Ingestion throughput improvement (v0.2.0 vs v0.1.x)	+40% faster (ThreadPoolExecutor)
Engine 1 — Vector RAG latency	~45 ms
Engine 2 — Vectorless RAG latency	~200 ms
Engine 3 — GraphRAG latency	~350 ms
Engine 4 — LangChain Extract latency	~500 ms
PyTest suite execution time	~21 s
Memory usage (local ChromaDB, 10K docs)	~1.4 GB

🧪 Running Tests

The test suite uses pytest with unittest.mock to isolate the pipeline from LLM/API calls. No API keys are needed to run the tests.

Install test dependencies

pip install pytest pytest-cov
# or
pip install -r requirements_dev.txt

Run all tests

python -m pytest tests/test_vdbpipe.py -v

Expected output:

tests/test_vdbpipe.py::test_vdbpipe_initialization          PASSED  [ 25%]
tests/test_vdbpipe.py::test_vdbpipe_ingest_tri_processing   PASSED  [ 50%]
tests/test_vdbpipe.py::test_omnirouter_classification       PASSED  [ 75%]
tests/test_vdbpipe.py::test_vector_rag_engine               PASSED  [100%]

========================================== 4 passed in 21.04s ==========================================

Run with coverage report

python -m pytest tests/ --cov=vectorDBpipe --cov-report=html -v

Test descriptions

Test	Description
`test_vdbpipe_initialization`	Verifies the constructor sets up the NetworkX graph, PageIndex, and all required attributes.
`test_vdbpipe_ingest_tri_processing`	Mocks `DataLoader.load_data()` and verifies all three ingestion phases run correctly.
`test_omnirouter_classification`	Tests the `_route_query()` logic for all three engine routing paths.
`test_vector_rag_engine`	Verifies `_engine_1_vector_rag()` chains correctly to `query_with_llm()`.

🤝 Contributing

Contributions are warmly welcomed! Please follow these steps:

Fork the repository on GitHub
Create a feature branch: git checkout -b feature/your-feature-name
Make your changes with clear, descriptive commits
Run the tests: python -m pytest tests/ -v
Push your branch: git push origin feature/your-feature-name
Open a Pull Request targeting the main branch

Contribution Areas

Production OAuth wiring for SaaS connectors (Notion, Slack, GitHub)
Async ingestion support via asyncio
Qdrant and Weaviate vector database integrations
LLM streaming response support
More test coverage (edge cases, error handling)

Code Style

Follow PEP 8
All new public methods must have docstrings
All new features must have corresponding test cases in tests/

📜 Changelog

v0.2.0 — Omni-RAG Architecture (February 2026) ⭐ Latest

Major Release — Complete architectural overhaul introducing the 4-engine Omni-RAG stack.

New:

VDBpipe orchestrator class with OmniRouter, 4 AI engines, and Tri-Processing ingestion
GraphRAG engine backed by NetworkX (Engine 3)
Vectorless RAG engine backed by hierarchical PageIndex (Engine 2)
LangChain Extract engine for Pydantic JSON output (Engine 4)
DataLoader rewritten with 15+ source integrations
ThreadPoolExecutor parallel ingestion pipeline
PyTest test suite with 4 core unit tests (all mocked, zero API key requirement)

Changed:

Migrated from langchain_core.pydantic_v1 → standard pydantic
TextPipeline attributes renamed: embedding → embedder, db → vector_store
setup.py bumped to version 0.2.0

Fixed:

OmniRouter misclassification on ambiguous queries
load_data() method API correctly takes no arguments (path set as attribute)
Package exclusion rules for TUI, Frontend, and Backend directories

v0.1.0 — Initial Release

Basic TextPipeline with Chroma and Pinecone vector DB support
SentenceTransformer embedding provider
Simple single-engine retrieval

📄 License

MIT License

Copyright (c) 2026 Yash Desai

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.

Built with ❤️ by Yash Desai for the AI Development Community.

⭐ Star the repo • 🐛 Report a Bug • 💬 Discussions

If this project saves you hours of glue code, consider giving it a ⭐ on GitHub!

Project details

These details have not been verified by PyPI

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

0.2.4

Mar 3, 2026

0.2.3

Mar 3, 2026

0.2.2

Mar 1, 2026

0.2.1

Feb 28, 2026

This version

0.2.0

Feb 28, 2026

0.1.9

Feb 18, 2026

0.1.6

Jan 7, 2026

0.1.5

Jan 7, 2026

0.1.4

Jan 7, 2026

0.1.3

Jan 7, 2026

0.1.2

Oct 10, 2025

0.1.1

Oct 10, 2025

0.1.0

Oct 10, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vectordbpipe-0.2.0.tar.gz (50.8 kB view details)

Uploaded Feb 28, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

vectordbpipe-0.2.0-py3-none-any.whl (47.0 kB view details)

Uploaded Feb 28, 2026 Python 3

File details

Details for the file vectordbpipe-0.2.0.tar.gz.

File metadata

Download URL: vectordbpipe-0.2.0.tar.gz
Upload date: Feb 28, 2026
Size: 50.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vectordbpipe-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`c9099d5adee55200d74e214342311ab2b52606768eb802414649e1ef5642b2fb`
MD5	`2bd3d038e4f79165ef673c4a91dcfa6c`
BLAKE2b-256	`bf75233f088602b38e95a3bd80bc2c0b22c9943ff2d261330759ebf6acbec58c`

See more details on using hashes here.

Provenance

The following attestation bundles were made for vectordbpipe-0.2.0.tar.gz:

Publisher: publish-to-pypi.yml on vectordbpipe/vectorDBpipe

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: vectordbpipe-0.2.0.tar.gz
- Subject digest: c9099d5adee55200d74e214342311ab2b52606768eb802414649e1ef5642b2fb
- Sigstore transparency entry: 1004794477
- Sigstore integration time: Feb 28, 2026
Source repository:
- Permalink: vectordbpipe/vectorDBpipe@8b328196f60efe057c91c03aced2c9baaec5ab33
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/vectordbpipe
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-to-pypi.yml@8b328196f60efe057c91c03aced2c9baaec5ab33
- Trigger Event: release

File details

Details for the file vectordbpipe-0.2.0-py3-none-any.whl.

File metadata

Download URL: vectordbpipe-0.2.0-py3-none-any.whl
Upload date: Feb 28, 2026
Size: 47.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vectordbpipe-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`26f9279515acd7646bca0dd66c8ed90b3eabd51590cc236d9331c3a68c2f523c`
MD5	`ed5fbaec8e7b6cf655ab20590c9f7210`
BLAKE2b-256	`3f3f6b9d0212490aed3c2fb78e6e5bef3dc6b6141b5e73e1ae684c15e6a17ff7`

See more details on using hashes here.

Provenance

The following attestation bundles were made for vectordbpipe-0.2.0-py3-none-any.whl:

Publisher: publish-to-pypi.yml on vectordbpipe/vectorDBpipe

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: vectordbpipe-0.2.0-py3-none-any.whl
- Subject digest: 26f9279515acd7646bca0dd66c8ed90b3eabd51590cc236d9331c3a68c2f523c
- Sigstore transparency entry: 1004794478
- Sigstore integration time: Feb 28, 2026
Source repository:
- Permalink: vectordbpipe/vectorDBpipe@8b328196f60efe057c91c03aced2c9baaec5ab33
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/vectordbpipe
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-to-pypi.yml@8b328196f60efe057c91c03aced2c9baaec5ab33
- Trigger Event: release

vectordbpipe 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

vectorDBpipe

📋 Table of Contents

🧠 What is vectorDBpipe?

🌟 Why vectorDBpipe?

The Real Problem

Our Solution: One SDK, Four Engines, Zero Fragmentation

🆕 What's New in v0.2.0 — Omni-RAG Architecture

✨ New Features

🔧 Improvements over v0.1.x

🔥 The 4 Omni-RAG Engines

Engine 1 — Vector RAG (Fast Factual Lookup) ⚡

Engine 2 — Vectorless RAG / PageIndex (Deep Reading) 📖

Engine 3 — GraphRAG (Multi-hop Reasoning Detective) 🕸️

Engine 4 — LangChain Structured Extract (JSON Output) 🧩

⚙️ Tri-Processing Ingestion Pipeline

🔌 15+ Native Data Integrations

Local File Loaders

Cloud & Web Loaders

SaaS Connectors

🏛️ Architecture Overview

📦 Installation

From PyPI (Recommended)

From Source (Latest Development)

GPU-Accelerated Install (CUDA)

CPU-Only Install (Older Hardware / Laptops)

⚙️ Configuration

🚀 Quickstart Guide

Step 1 — Initialize the pipeline

Step 2 — Ingest your data

Step 3 — Query using intelligent routing

Step 4 — Extract structured data (Engine 4)

🔬 Advanced Usage

Using with Google Colab

Multi-Source Ingestion

Override Configuration at Runtime

Accessing the Knowledge Graph Directly

Accessing the PageIndex Directly

📚 API Reference

VDBpipe(config_path, config_override)

pipeline.ingest(data_path, batch_size=100)

pipeline.query(user_query)

pipeline.extract(query, schema)

pipeline._route_query(query) (internal)

⚡ Performance Benchmarks

🧪 Running Tests

Install test dependencies

Run all tests

Run with coverage report

Test descriptions

🤝 Contributing

Contribution Areas

Code Style

📜 Changelog

v0.2.0 — Omni-RAG Architecture (February 2026) ⭐ Latest

v0.1.0 — Initial Release

📄 License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

`VDBpipe(config_path, config_override)`

`pipeline.ingest(data_path, batch_size=100)`

`pipeline.query(user_query)`

`pipeline.extract(query, schema)`

`pipeline._route_query(query)` (internal)