Skip to main content

Snowflake Cortex Search vector database provider for NLWeb

Project description

NLWeb Snowflake Cortex Search Provider

Snowflake Cortex Search vector database provider for NLWeb, enabling hybrid search capabilities using Snowflake's Cortex Search Service.

Features

  • Cortex Search Integration: Native integration with Snowflake Cortex Search Service
  • REST API Based: Uses Snowflake's REST API for search operations
  • Hybrid Search: Combines vector similarity with keyword search
  • Site Filtering: Filter search results by site or URL
  • PAT Authentication: Secure authentication using Programmatic Access Tokens
  • Async Support: Built with async/await for high performance

Installation

pip install nlweb-snowflake-vectordb

Configuration

Configure the Snowflake Cortex Search endpoint in your config.yaml:

retrieval_endpoints:
  snowflake_prod:
    db_type: snowflake_cortex_search
    api_endpoint: "https://your-account.snowflakecomputing.com"
    api_key: "${SNOWFLAKE_PAT}"
    index_name: "MY_DATABASE.MY_SCHEMA.MY_SEARCH_SERVICE"
    vector_dimensions: 1024

The index_name should be in the format: <database>.<schema>.<service>

Usage

Basic Search

from nlweb_snowflake_vectordb import SnowflakeCortexClient

# Initialize client
client = SnowflakeCortexClient(endpoint_name="snowflake_prod")

# Search for documents
results = await client.search(
    query="machine learning models",
    site="docs.example.com",
    num_results=10
)

# Process results
for url, schema_json, name, site in results:
    print(f"{name}: {url}")

Search by URL

# Find a specific document by URL
results = await client.search_by_url(
    url="https://docs.example.com/ml-guide",
    query="machine learning"
)

Get Available Sites

# Get list of all indexed sites
sites = await client.get_sites()
print(f"Available sites: {sites}")

API Reference

SnowflakeCortexClient

Main client for Snowflake Cortex Search operations.

Methods

  • search(query, site, num_results, **kwargs): Search for documents by query and site
  • search_by_url(url, query, **kwargs): Search for a specific document by URL
  • get_sites(**kwargs): Get list of unique site names

Snowflake Cortex Search Service

This provider requires a Snowflake Cortex Search Service with the following columns:

  • url: Document URL (TEXT)
  • site: Site name (TEXT)
  • schema_json: Schema metadata (TEXT/VARIANT)

The search service should be created with vector embeddings enabled.

Requirements

  • Python 3.10+
  • nlweb-core >= 0.5.5
  • httpx >= 0.28.1
  • Active Snowflake account with Cortex Search enabled
  • Valid Programmatic Access Token (PAT)

Note on Data Ingestion

Unlike other vector database providers, Snowflake Cortex Search does not support programmatic document upload through this client. Data must be loaded into Snowflake tables using Snowflake's native data loading tools (COPY INTO, Snowpipe, etc.) before creating the Cortex Search Service.

This package provides read-only access to existing Cortex Search Services.

License

MIT License - see LICENSE file for details.

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nlweb_snowflake_vectordb-0.5.5.tar.gz (7.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nlweb_snowflake_vectordb-0.5.5-py3-none-any.whl (6.2 kB view details)

Uploaded Python 3

File details

Details for the file nlweb_snowflake_vectordb-0.5.5.tar.gz.

File metadata

File hashes

Hashes for nlweb_snowflake_vectordb-0.5.5.tar.gz
Algorithm Hash digest
SHA256 1d6c6d1678ef2e68fe85d90fed85ac9830b97b5815d2feff93fc4e4928434d93
MD5 49c957c12631e727dfc3d0e86e77c492
BLAKE2b-256 3ff5db2772ef1cb19f85e32e228f285937bdb3e4d3754a3207b998440f184148

See more details on using hashes here.

File details

Details for the file nlweb_snowflake_vectordb-0.5.5-py3-none-any.whl.

File metadata

File hashes

Hashes for nlweb_snowflake_vectordb-0.5.5-py3-none-any.whl
Algorithm Hash digest
SHA256 93e0e4390823ae3e953de007044646c23ba3ddb955995bf75d73169626e1e355
MD5 bb6885ea276dd15bfee4f84c8c126104
BLAKE2b-256 5aacdf81de06e13d9594ef251375f1fbc7e47006ecca21ad1e4349c45504f4bf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page