Lightweight, no-database document search engine using quantized numpy vectors with BM25 and definition-aware ranking.

These details have not been verified by PyPI

Project links

Project description

DBless

Lightweight, no-database PDF search engine — pure Python, no servers, no setup.

💡 What is DBless?

DBless is a lightweight document search engine that works entirely in-memory — no database, no server, no external dependencies beyond numpy. Point it at a PDF and start searching in seconds.

Why DBless?

🚀 Zero setup — no database to install or configure
🎯 Definition-aware — understands "What is X?" style queries
🪶 Lightweight — only requires numpy and pymupdf
📄 Multi-domain — works on legal, medical, corporate, and technical PDFs
⚡ Fast — sub-100ms search on CPU

📚 Documentation Contents

Quick Start — Up and running in 2 minutes
Installation — Install via pip or from source
API Reference — Full API for DBlessEngine
CLI Usage — Command-line interface
How It Works — Architecture overview
Contributing — How to contribute

🚀 Quick Start

from dbless.engine import DBlessEngine

# 1. Load and index a PDF
engine = DBlessEngine.from_pdf(
    "document.pdf",
    chunk_size=100,   # words per chunk
    overlap=20        # overlap between chunks
)

# 2. Search
results = engine.search("What is machine learning?", k=5)

# 3. Print results
for result in results:
    print(f"Score : {result['score']:.2f}")
    print(f"Snippet: {result['snippet']}")
    print("---")

📦 Installation

pip install dbless

Or install from source:

git clone https://github.com/rahulreddy9725/Dbless.git
cd Dbless
pip install -e .

🔧 API Reference

`DBlessEngine.from_pdf(path, chunk_size, overlap, vector_dim, factor_rank)`

Loads a PDF, chunks it, embeds it, and builds the search index.

Parameter	Type	Default	Description
`path`	`str / Path`	required	Path to the PDF file
`chunk_size`	`int`	`600`	Number of words per chunk
`overlap`	`int`	`150`	Word overlap between consecutive chunks
`vector_dim`	`int`	`512`	Hash vector dimensionality
`factor_rank`	`int`	`128`	SVD factorization rank

Returns: DBlessEngine instance

`engine.search(query, k)`

Search the indexed PDF for the most relevant chunks.

Parameter	Type	Default	Description
`query`	`str`	required	Natural language query
`k`	`int`	`5`	Number of results to return

Returns: list[dict] — each result contains:

Key	Description
`text`	Full chunk text
`snippet`	Best-matching sentence(s) from the chunk
`score`	Relevance score (0–100)
`page`	Page number in the PDF
`chunk_id`	Chunk index

Example:

results = engine.search("What are exceptions?", k=3)
for r in results:
    print(r["snippet"])   # best answer sentence
    print(r["score"])     # relevance score 0-100
    print(r["page"])      # page number

🖥️ CLI Usage

DBless ships with a command-line tool:

# Index a PDF and show chunk count
dbless index document.pdf

# Query a PDF and get top results
dbless query document.pdf "What is machine learning?" -k 5

⚙️ How It Works

Step	Description
1. Chunk	PDF is split into overlapping word-based chunks
2. Embed	Each chunk is hashed into a sparse numpy vector
3. IDF Weight	Term frequency weighting applied across all chunks
4. SVD Compress	Dimensionality reduced via matrix factorization
5. Quantize	Vectors quantized to Int8 for memory efficiency
6. Search	Query vector matched via dot product + BM25 boosting
7. Re-rank	Definition-style queries get special re-ranking

🧪 Testing

pytest tests/

Test coverage includes:

✅ PDF ingestion and chunking
✅ Vector quantization accuracy
✅ Phrase and keyword search
✅ End-to-end engine from PDF to results

📊 Performance

Metric	Value
Memory per 100-page PDF	~10–50 MB
Search speed	< 100ms on CPU
Top-3 accuracy (definitions)	85–95%
Python support	3.8 – 3.12

🤝 Contributing

Contributions are welcome!

Fork the repository
Create a feature branch: git checkout -b feature/my-feature
Commit your changes: git commit -m "Add my feature"
Push and open a Pull Request

Please open an issue first for major changes.

📄 License

MIT License — see LICENSE for details.

🔗 Quick Links

Built with ❤️ by Rahul Reddy

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

May 6, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dbless-0.1.0.tar.gz (4.0 MB view details)

Uploaded May 6, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

dbless-0.1.0-py3-none-any.whl (21.5 kB view details)

Uploaded May 6, 2026 Python 3

File details

Details for the file dbless-0.1.0.tar.gz.

File metadata

Download URL: dbless-0.1.0.tar.gz
Upload date: May 6, 2026
Size: 4.0 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.0

File hashes

Hashes for dbless-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`ad4c42955549324d05795c46a2a195ffd7c9b781eb4a11be8fa68b62ce49298e`
MD5	`608e162b1b49c08bf067b48c46df9a4f`
BLAKE2b-256	`98801154a11ed001ea3532ecf6b4dcc58b82f39cb136631151f02ed56566e59a`

See more details on using hashes here.

File details

Details for the file dbless-0.1.0-py3-none-any.whl.

File metadata

Download URL: dbless-0.1.0-py3-none-any.whl
Upload date: May 6, 2026
Size: 21.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.0

File hashes

Hashes for dbless-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`039607cc5aa0a90e81da8038087b7611e1f17de5e815d095beefcb9fc6a1ec1e`
MD5	`87111002c43bfff1b34371b48de536e4`
BLAKE2b-256	`c8c3c438c376967c0adbbd843afbc4f9739600b5fb287ac015868e36e6ca935c`

See more details on using hashes here.

dbless 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

DBless

💡 What is DBless?

Why DBless?

📚 Documentation Contents

🚀 Quick Start

📦 Installation

🔧 API Reference

DBlessEngine.from_pdf(path, chunk_size, overlap, vector_dim, factor_rank)

engine.search(query, k)

🖥️ CLI Usage

⚙️ How It Works

🧪 Testing

📊 Performance

🤝 Contributing

📄 License

🔗 Quick Links

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`DBlessEngine.from_pdf(path, chunk_size, overlap, vector_dim, factor_rank)`

`engine.search(query, k)`