Skip to main content

Semantic search API server using vector databases and ML embeddings

Project description

SemWare ๐Ÿš€

Tests Coverage Python License

A high-performance semantic search API server built with modern Python technologies. SemWare provides REST APIs for vector-based document storage, embedding generation, and similarity search using state-of-the-art machine learning models.

โœจ Features

  • ๐Ÿš„ High Performance: Built on FastAPI with automatic async/await support
  • ๐Ÿง  Smart Embeddings: Supports multiple embedding models (all-MiniLM-L6-v2, EmbeddingGemma-300M)
  • ๐Ÿ” Advanced Search: Similarity threshold and top-k search with sub-second response times
  • ๐Ÿ›ก๏ธ Secure: API key authentication with Bearer token support
  • ๐Ÿ“Š Vector Storage: Powered by LanceDB for efficient vector operations
  • ๐Ÿ”ง Developer Friendly: Comprehensive OpenAPI docs, type hints, and test coverage
  • ๐Ÿ“ˆ Scalable: Handles documents of any length with intelligent text batching
  • ๐Ÿ—๏ธ Production Ready: Comprehensive logging, error handling, and monitoring

๐Ÿ›๏ธ Architecture

SemWare follows a clean architecture pattern with separate layers:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   FastAPI       โ”‚    โ”‚   Services      โ”‚    โ”‚   Storage       โ”‚
โ”‚   REST APIs     โ”‚โ”€โ”€โ”€โ–ถโ”‚   Business      โ”‚โ”€โ”€โ”€โ–ถโ”‚   LanceDB       โ”‚
โ”‚   (Routes)      โ”‚    โ”‚   Logic         โ”‚    โ”‚   Vector DB     โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                โ”‚
                       โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                       โ”‚   ML Models     โ”‚
                       โ”‚   Embeddings    โ”‚
                       โ”‚   (HuggingFace) โ”‚
                       โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Core Components:

  • Table Management: Create custom schemas for different document types
  • Data Operations: CRUD operations with automatic embedding generation
  • Semantic Search: Vector similarity search with configurable parameters
  • Text Processing: Smart tokenization and batching for long documents

๐Ÿš€ Quick Start

Installation

Using uv (Recommended):

git clone https://github.com/your-org/semware.git
cd SemWare
uv sync --native-tls

Using pip:

git clone https://github.com/your-org/semware.git
cd SemWare
pip install -e .

Configuration

Create a .env file:

# Required
API_KEY=your-super-secret-api-key-here

# Optional (with defaults)
DEBUG=false
DB_PATH=./data
HOST=0.0.0.0
PORT=8000
LOG_LEVEL=INFO
EMBEDDING_MODEL_NAME=all-MiniLM-L6-v2
EMBEDDING_DIMENSION=384
MAX_TOKENS_PER_BATCH=2000

Start the Server

Simple Command (Recommended):

# Start with default settings from .env
semware

# Start with custom options
semware --debug --port 8080
semware --workers 4 --host 127.0.0.1
semware --reload  # Development mode with auto-reload

Alternative Methods:

# Using uv directly
uv run --native-tls semware

# Using Python module
uv run --native-tls python -m semware.main

# Using uvicorn directly
uv run --native-tls uvicorn semware.main:app --host 0.0.0.0 --port 8000 --workers 4

The server will be available at http://localhost:8000 with automatic API documentation at /docs.

CLI Options

The semware command supports these options:

semware --help                   Show help message
semware --version               Show version
semware --debug                 Enable debug mode & API docs
semware --reload                Development mode with auto-reload
semware --host 127.0.0.1       Bind to specific host
semware --port 8080             Use custom port
semware --workers 4             Number of worker processes
semware --log-level DEBUG       Set logging level

๐Ÿ“š API Reference

Authentication

All endpoints require authentication using one of:

  • Header: X-API-Key: your-api-key
  • Bearer Token: Authorization: Bearer your-api-key

๐Ÿ—‚๏ธ Table Management

Create Table

Create a new table with custom schema.

POST /tables
Content-Type: application/json
X-API-Key: your-api-key

{
  "schema": {
    "name": "research_papers",
    "columns": {
      "id": "string",
      "title": "string", 
      "abstract": "string",
      "authors": "string",
      "year": "int",
      "doi": "string"
    },
    "id_column": "id",
    "embedding_column": "abstract"
  }
}

Response (201):

{
  "message": "Table 'research_papers' created successfully",
  "table_name": "research_papers"
}

List Tables

Get all available tables.

GET /tables
X-API-Key: your-api-key

Response (200):

{
  "tables": ["research_papers", "product_docs", "customer_support"],
  "count": 3
}

Get Table Info

Get detailed information about a specific table.

GET /tables/research_papers
X-API-Key: your-api-key

Response (200):

{
  "table_name": "research_papers",
  "schema": {
    "name": "research_papers",
    "columns": {
      "id": "string",
      "title": "string",
      "abstract": "string",
      "authors": "string", 
      "year": "int",
      "doi": "string"
    },
    "id_column": "id",
    "embedding_column": "abstract"
  },
  "record_count": 1547,
  "created_at": "2024-01-15T10:30:00Z"
}

Delete Table

Delete a table and all its data.

DELETE /tables/research_papers
X-API-Key: your-api-key

Response (200):

{
  "message": "Table 'research_papers' deleted successfully",
  "table_name": "research_papers"
}

๐Ÿ“„ Data Operations

Insert/Update Documents

Insert new documents or update existing ones. Embeddings are generated automatically.

POST /tables/research_papers/data
Content-Type: application/json
X-API-Key: your-api-key

{
  "records": [
    {
      "data": {
        "id": "paper_001",
        "title": "Attention Is All You Need",
        "abstract": "The dominant sequence transduction models are based on complex recurrent or convolutional neural networks...",
        "authors": "Ashish Vaswani, Noam Shazeer, Niki Parmar",
        "year": 2017,
        "doi": "10.48550/arXiv.1706.03762"
      }
    },
    {
      "data": {
        "id": "paper_002", 
        "title": "BERT: Pre-training of Deep Bidirectional Transformers",
        "abstract": "We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations...",
        "authors": "Jacob Devlin, Ming-Wei Chang, Kenton Lee",
        "year": 2018,
        "doi": "10.48550/arXiv.1810.04805"
      }
    }
  ]
}

Response (201):

{
  "message": "Successfully processed 2 records",
  "inserted_count": 2,
  "updated_count": 0,
  "processing_time_ms": 1247.3
}

Get Document

Retrieve a specific document by ID.

GET /tables/research_papers/data/paper_001
X-API-Key: your-api-key

Response (200):

{
  "table_name": "research_papers",
  "record_id": "paper_001",
  "data": {
    "id": "paper_001",
    "title": "Attention Is All You Need",
    "abstract": "The dominant sequence transduction models are based on complex recurrent...",
    "authors": "Ashish Vaswani, Noam Shazeer, Niki Parmar",
    "year": 2017,
    "doi": "10.48550/arXiv.1706.03762"
  }
}

Delete Document

Remove a document from the table.

DELETE /tables/research_papers/data/paper_001
X-API-Key: your-api-key

Response (200):

{
  "message": "Record 'paper_001' deleted successfully",
  "table_name": "research_papers",
  "deleted_id": "paper_001"
}

๐Ÿ” Search Operations

Similarity Search

Find all documents with similarity above a threshold.

POST /tables/research_papers/search/similarity
Content-Type: application/json
X-API-Key: your-api-key

{
  "query": "transformer neural network attention mechanism",
  "threshold": 0.7,
  "limit": 10
}

Response (200):

{
  "query": "transformer neural network attention mechanism",
  "results": [
    {
      "id": "paper_001",
      "data": {
        "id": "paper_001",
        "title": "Attention Is All You Need",
        "abstract": "The dominant sequence transduction models...",
        "authors": "Ashish Vaswani, Noam Shazeer, Niki Parmar",
        "year": 2017,
        "doi": "10.48550/arXiv.1706.03762"
      },
      "similarity_score": 0.89
    },
    {
      "id": "paper_002",
      "data": {
        "id": "paper_002",
        "title": "BERT: Pre-training of Deep Bidirectional Transformers", 
        "abstract": "We introduce a new language representation model...",
        "authors": "Jacob Devlin, Ming-Wei Chang, Kenton Lee",
        "year": 2018,
        "doi": "10.48550/arXiv.1810.04805"
      },
      "similarity_score": 0.76
    }
  ],
  "total_results": 2,
  "search_time_ms": 23.4,
  "threshold": 0.7
}

Top-K Search

Find the K most similar documents.

POST /tables/research_papers/search/top-k
Content-Type: application/json
X-API-Key: your-api-key

{
  "query": "natural language processing BERT",
  "k": 5
}

Response (200):

{
  "query": "natural language processing BERT",
  "results": [
    {
      "id": "paper_002",
      "data": {
        "id": "paper_002",
        "title": "BERT: Pre-training of Deep Bidirectional Transformers",
        "abstract": "We introduce a new language representation model...",
        "authors": "Jacob Devlin, Ming-Wei Chang, Kenton Lee", 
        "year": 2018,
        "doi": "10.48550/arXiv.1810.04805"
      },
      "similarity_score": 0.94
    },
    {
      "id": "paper_001", 
      "data": {
        "id": "paper_001",
        "title": "Attention Is All You Need",
        "abstract": "The dominant sequence transduction models...",
        "authors": "Ashish Vaswani, Noam Shazeer, Niki Parmar",
        "year": 2017,
        "doi": "10.48550/arXiv.1706.03762"
      },
      "similarity_score": 0.81
    }
  ],
  "total_results": 5,
  "search_time_ms": 31.7,
  "k": 5
}

โค๏ธ Health Check

GET /health

Response (200):

{
  "status": "healthy",
  "app_name": "SemWare",
  "version": "0.1.0", 
  "timestamp": "2024-01-15T14:30:25.123456"
}

๐Ÿง  Embedding Process

SemWare uses advanced text processing for optimal semantic understanding:

1. Text Tokenization

  • Long texts are intelligently split into manageable chunks
  • Uses tiktoken with cl100k_base encoding for precise token counting
  • Default batch size: 2000 tokens with configurable limits

2. Batch Processing

  • Each text chunk is processed through the embedding model
  • Supports multiple embedding models via Hugging Face transformers
  • Automatic GPU acceleration when available

3. Embedding Aggregation

  • Multiple batch embeddings are combined using average pooling
  • Preserves semantic meaning across the entire document
  • Results in high-quality 384-dimensional vectors (MiniLM)

4. Normalization & Storage

  • Final embeddings are L2 normalized for consistent similarity scoring
  • Stored efficiently in LanceDB with optimized vector indexing
  • Enables sub-second search across millions of documents

๐Ÿ› ๏ธ Development

Running Tests

# Run all tests with coverage
uv run --native-tls pytest --cov=src --cov-report=html

# Run specific test file
uv run --native-tls pytest tests/test_api/test_search.py -v

# Run with debug output
uv run --native-tls pytest -s --log-cli-level=DEBUG

Code Quality

# Format code
uv run --native-tls ruff format src/ tests/

# Lint and fix issues
uv run --native-tls ruff check src/ tests/ --fix

# Type checking
uv run --native-tls mypy src/

API Documentation

Start the server with DEBUG=true in your .env and visit:

๐Ÿ“ Project Structure

SemWare/
โ”œโ”€โ”€ src/semware/
โ”‚   โ”œโ”€โ”€ api/                    # FastAPI route handlers
โ”‚   โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”‚   โ”œโ”€โ”€ auth.py            # Authentication middleware
โ”‚   โ”‚   โ”œโ”€โ”€ data.py            # Data CRUD operations
โ”‚   โ”‚   โ”œโ”€โ”€ search.py          # Search endpoints  
โ”‚   โ”‚   โ””โ”€โ”€ tables.py          # Table management
โ”‚   โ”œโ”€โ”€ models/                 # Pydantic data models
โ”‚   โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”‚   โ”œโ”€โ”€ requests.py        # Request/response models
โ”‚   โ”‚   โ””โ”€โ”€ schemas.py         # Core data schemas
โ”‚   โ”œโ”€โ”€ services/              # Business logic services
โ”‚   โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”‚   โ”œโ”€โ”€ embedding.py       # ML embedding generation
โ”‚   โ”‚   โ”œโ”€โ”€ search.py          # Search orchestration
โ”‚   โ”‚   โ””โ”€โ”€ vectordb.py        # Vector database operations
โ”‚   โ”œโ”€โ”€ utils/                 # Utility functions
โ”‚   โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”‚   โ”œโ”€โ”€ logging.py         # Logging configuration
โ”‚   โ”‚   โ””โ”€โ”€ tokenizer.py       # Text tokenization
โ”‚   โ”œโ”€โ”€ config.py              # Configuration management
โ”‚   โ””โ”€โ”€ main.py                # FastAPI application factory
โ”œโ”€โ”€ tests/                     # Comprehensive test suite
โ”‚   โ”œโ”€โ”€ conftest.py           # Test configuration & fixtures
โ”‚   โ”œโ”€โ”€ test_api/             # API endpoint tests
โ”‚   โ”œโ”€โ”€ test_services/        # Service layer tests
โ”‚   โ””โ”€โ”€ test_utils/           # Utility function tests
โ”œโ”€โ”€ pyproject.toml            # Project configuration
โ”œโ”€โ”€ .env.example             # Environment template
โ””โ”€โ”€ README.md               # This file

โš™๏ธ Configuration Reference

Variable Description Default Required
API_KEY Authentication key for all endpoints - โœ…
DEBUG Enable debug mode and API docs false โŒ
DB_PATH Database storage directory ./data โŒ
HOST Server bind address 0.0.0.0 โŒ
PORT Server port 8000 โŒ
LOG_LEVEL Logging level (DEBUG/INFO/WARNING/ERROR) INFO โŒ
LOG_FILE Log file path (optional) - โŒ
EMBEDDING_MODEL_NAME Hugging Face model name all-MiniLM-L6-v2 โŒ
EMBEDDING_DIMENSION Embedding vector dimensions 384 โŒ
MAX_TOKENS_PER_BATCH Max tokens per embedding batch 2000 โŒ
WORKERS Number of server workers 1 โŒ

๐Ÿšข Deployment

Docker

FROM python:3.11-slim

WORKDIR /app
COPY . .

RUN pip install uv
RUN uv sync --native-tls

EXPOSE 8000
CMD ["uv", "run", "--native-tls", "uvicorn", "semware.main:app", "--host", "0.0.0.0", "--port", "8000"]

Production Considerations

  • Use multiple workers: --workers 4
  • Enable access logs: --access-log
  • Set up reverse proxy (nginx) for HTTPS termination
  • Configure log rotation and monitoring
  • Use a dedicated vector storage solution for large scale

๐Ÿค Contributing

We welcome contributions! Please see our Contributing Guide for details.

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/amazing-feature
  3. Make your changes and add tests
  4. Run the test suite: uv run --native-tls pytest
  5. Submit a pull request

๐Ÿ“Š Performance

Benchmarks (on Apple M2 Pro, 16GB RAM):

  • Embedding Generation: ~200ms per batch (2000 tokens)
  • Document Insertion: ~500ms per document (including embedding)
  • Vector Search: <50ms for similarity search across 10K documents
  • Throughput: ~100 requests/second with 4 workers

๐Ÿ› Troubleshooting

Common Issues

Authentication Errors

# Ensure API key is set correctly
export API_KEY=your-secret-key
# Or check your .env file

Model Download Issues

# Clear Hugging Face cache
rm -rf ~/.cache/huggingface/
# Restart with debug logging
DEBUG=true uv run --native-tls python -m semware.main

Database Permissions

# Ensure write permissions to data directory
chmod 755 ./data

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ™ Acknowledgments

  • FastAPI for the excellent async web framework
  • LanceDB for high-performance vector storage
  • Hugging Face for the transformer models and ecosystem
  • Pydantic for robust data validation
  • The Python Community for the amazing open-source ecosystem

Built with โค๏ธ by the SemWare team

Report Bug โ€ข Discussions โ€ข Wiki

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

semware-0.1.0.tar.gz (221.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

semware-0.1.0-py3-none-any.whl (28.0 kB view details)

Uploaded Python 3

File details

Details for the file semware-0.1.0.tar.gz.

File metadata

  • Download URL: semware-0.1.0.tar.gz
  • Upload date:
  • Size: 221.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.14

File hashes

Hashes for semware-0.1.0.tar.gz
Algorithm Hash digest
SHA256 8ce0a48cb8a9395fc2a35c863f7141a371dd0e11f1cf2cf54ef615b02dbe0f36
MD5 a87b340ee795e67d0cd7917c505d1d6d
BLAKE2b-256 8a02c6394e787bf1bb03319e324567ac107c6550550fc6c0febdffc0aa6bb7a2

See more details on using hashes here.

File details

Details for the file semware-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: semware-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 28.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.14

File hashes

Hashes for semware-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9322ccfa2c6b5f1bd808dda6f3dcb5a1a82115c482d2e02b28653e688374de67
MD5 b686de875291ac586b5c238ff11981d7
BLAKE2b-256 4439860f21c0724ca08ffa9dc258a3d92db7529497e0755a8ea846c3f848b53a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page