Azure Cosmos DB object lookup provider for NLWeb - enriches search results with full documents
Project description
nlweb-cosmos-object-db
Azure Cosmos DB object lookup provider for NLWeb.
Overview
This provider enables NLWeb to enrich vector search results with full documents from Azure Cosmos DB. When vector databases return truncated content, this provider fetches the complete documents from Cosmos DB using document IDs.
Installation
pip install nlweb-core nlweb-cosmos-object-db
For a complete setup with vector search:
pip install nlweb-core nlweb-azure-vectordb nlweb-cosmos-object-db
Configuration
Create config.yaml:
object_storage:
type: cosmos
enabled: true
endpoint_env: AZURE_COSMOS_ENDPOINT
database_name: your-database
container_name: your-container
partition_key: /"@id"
import_path: nlweb_cosmos_object_db.cosmos_lookup
class_name: CosmosObjectLookup
Authentication
This provider uses Azure AD Managed Identity authentication via DefaultAzureCredential. No API keys required.
Set environment variable:
export AZURE_COSMOS_ENDPOINT=https://your-account.documents.azure.com:443/
Azure AD Setup
Ensure your Azure identity has appropriate Cosmos DB permissions:
Cosmos DB Built-in Data Readerrole- Or custom role with
Microsoft.DocumentDB/databaseAccounts/readMetadataand read permissions
Usage
The provider automatically enriches search results when configured:
import nlweb_core
# Initialize with config
nlweb_core.init(config_path="./config.yaml")
from nlweb_core import retriever
# Search with automatic enrichment
results = await retriever.search(
query="example query",
site="example.com",
num_results=10,
enrich_from_storage=True # Enable Cosmos DB enrichment
)
# Results now contain full documents from Cosmos DB
for result in results:
print(result.content) # Full content instead of truncated text
How It Works
- Vector Search: NLWeb queries the vector database (e.g., Azure AI Search) and gets IDs + truncated content
- ID Extraction: Document IDs are extracted from vector search results
- Cosmos DB Lookup: Provider queries Cosmos DB by
@idfield to fetch full documents - Content Enrichment: Full documents replace truncated content in search results
- Ranking: LLM ranks the enriched results
Features
- Azure AD managed identity authentication (no API keys)
- Async-compatible using thread executors
- Parameterized queries to prevent injection
- Configurable database, container, and partition key
- Seamless integration with NLWeb retrieval pipeline
- Compatible with NLWeb Protocol v0.5+
Document Structure
Your Cosmos DB documents should have an @id field that matches the IDs returned by your vector database:
{
"@id": "doc-12345",
"content": "Full document content here...",
"metadata": {
"title": "Document Title",
"url": "https://example.com/page"
}
}
Configuration Options
| Field | Required | Description |
|---|---|---|
type |
Yes | Must be "cosmos" |
enabled |
Yes | Set to true to enable enrichment |
endpoint_env |
Yes | Environment variable name for Cosmos endpoint |
database_name |
Yes | Cosmos DB database name |
container_name |
Yes | Cosmos DB container name |
partition_key |
Yes | Partition key path (e.g., /"@id") |
import_path |
Yes | nlweb_cosmos_object_db.cosmos_lookup |
class_name |
Yes | CosmosObjectLookup |
Creating Your Own Object Lookup Provider
Use this package as a template:
-
Create package structure:
nlweb-your-objectdb/ ├── pyproject.toml ├── README.md └── nlweb_your_objectdb/ ├── __init__.py └── your_lookup.py -
Implement ObjectLookupInterface:
from nlweb_core.retriever import ObjectLookupInterface class YourLookup(ObjectLookupInterface): async def get_by_id(self, doc_id: str) -> dict: # Your implementation pass
-
Declare dependencies in
pyproject.toml:dependencies = [ "nlweb-core>=0.5.5", "your-database-sdk>=1.0.0", ]
-
Configure in NLWeb:
object_storage: import_path: nlweb_your_objectdb.your_lookup class_name: YourLookup
License
MIT License - Copyright (c) 2025 Microsoft Corporation
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file nlweb_cosmos_object_db-0.7.0.tar.gz.
File metadata
- Download URL: nlweb_cosmos_object_db-0.7.0.tar.gz
- Upload date:
- Size: 4.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.10.4 {"installer":{"name":"uv","version":"0.10.4","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3c7943eabec97d36824942d290f63c31692b0239ca7c0bcb482507cbd17416cb
|
|
| MD5 |
cdd215712ccc4724cd9a9fd9570e012b
|
|
| BLAKE2b-256 |
b4d4aa2503352e7e90edbb42e38fbe47e4ded1814bd507dd78712822e4467ff9
|
File details
Details for the file nlweb_cosmos_object_db-0.7.0-py3-none-any.whl.
File metadata
- Download URL: nlweb_cosmos_object_db-0.7.0-py3-none-any.whl
- Upload date:
- Size: 4.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.10.4 {"installer":{"name":"uv","version":"0.10.4","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7f7c52908947404bdc66617a51278add0cb3bdf972cb935d1b526b760c061391
|
|
| MD5 |
4b42d914a2bcf260c1e8ab6925f9502c
|
|
| BLAKE2b-256 |
ecb12719da4e1724d05ad3c0273a606ae18f75c320aebbb07ee35ff00a8789ab
|