Semantic search API server using vector databases and ML embeddings

These details have not been verified by PyPI

Project description

SemWare 🚀

A high-performance semantic search API server built with modern Python technologies. SemWare provides REST APIs for vector-based document storage, embedding generation, and similarity search using state-of-the-art machine learning models.

✨ Features

🚄 High Performance: Built on FastAPI with automatic async/await support
🧠 Smart Embeddings: Supports multiple embedding models (all-MiniLM-L6-v2, EmbeddingGemma-300M)
🔍 Advanced Search: Similarity threshold and top-k search with sub-second response times
🛡️ Secure: API key authentication with Bearer token support
📊 Vector Storage: Powered by LanceDB for efficient vector operations
🔧 Developer Friendly: Comprehensive OpenAPI docs, type hints, and test coverage
📈 Scalable: Handles documents of any length with intelligent text batching
🏗️ Production Ready: Comprehensive logging, error handling, and monitoring

🏛️ Architecture

SemWare follows a clean architecture pattern with separate layers:

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   FastAPI       │    │   Services      │    │   Storage       │
│   REST APIs     │───▶│   Business      │───▶│   LanceDB       │
│   (Routes)      │    │   Logic         │    │   Vector DB     │
└─────────────────┘    └─────────────────┘    └─────────────────┘
                                │
                       ┌─────────────────┐
                       │   ML Models     │
                       │   Embeddings    │
                       │   (HuggingFace) │
                       └─────────────────┘

Core Components:

Table Management: Create custom schemas for different document types
Data Operations: CRUD operations with automatic embedding generation
Semantic Search: Vector similarity search with configurable parameters
Text Processing: Smart tokenization and batching for long documents

🚀 Quick Start

Installation

Using uv (Recommended):

git clone https://github.com/your-org/semware.git
cd SemWare
uv sync --native-tls

Using pip:

git clone https://github.com/your-org/semware.git
cd SemWare
pip install -e .

Configuration

Create a .env file:

# Required
API_KEY=your-super-secret-api-key-here

# Optional (with defaults)
DEBUG=false
DB_PATH=./data
HOST=0.0.0.0
PORT=8000
LOG_LEVEL=INFO
EMBEDDING_MODEL_NAME=all-MiniLM-L6-v2
EMBEDDING_DIMENSION=384
MAX_TOKENS_PER_BATCH=2000

Start the Server

Simple Command (Recommended):

# Start with default settings from .env
semware

# Start with custom options
semware --debug --port 8080
semware --workers 4 --host 127.0.0.1
semware --reload  # Development mode with auto-reload

Alternative Methods:

# Using uv directly
uv run --native-tls semware

# Using Python module
uv run --native-tls python -m semware.main

# Using uvicorn directly
uv run --native-tls uvicorn semware.main:app --host 0.0.0.0 --port 8000 --workers 4

The server will be available at http://localhost:8000 with automatic API documentation at /docs.

CLI Options

The semware command supports these options:

semware --help                   Show help message
semware --version               Show version
semware --debug                 Enable debug mode & API docs
semware --reload                Development mode with auto-reload
semware --host 127.0.0.1       Bind to specific host
semware --port 8080             Use custom port
semware --workers 4             Number of worker processes
semware --log-level DEBUG       Set logging level

📚 API Reference

Authentication

All endpoints require authentication using one of:

Header: X-API-Key: your-api-key
Bearer Token: Authorization: Bearer your-api-key

🗂️ Table Management

Create Table

Create a new table with custom schema.

POST /tables
Content-Type: application/json
X-API-Key: your-api-key

{
  "schema": {
    "name": "research_papers",
    "columns": {
      "id": "string",
      "title": "string", 
      "abstract": "string",
      "authors": "string",
      "year": "int",
      "doi": "string"
    },
    "id_column": "id",
    "embedding_column": "abstract"
  }
}

Response (201):

{
  "message": "Table 'research_papers' created successfully",
  "table_name": "research_papers"
}

List Tables

Get all available tables.

GET /tables
X-API-Key: your-api-key

Response (200):

{
  "tables": ["research_papers", "product_docs", "customer_support"],
  "count": 3
}

Get Table Info

Get detailed information about a specific table.

GET /tables/research_papers
X-API-Key: your-api-key

Response (200):

{
  "table_name": "research_papers",
  "schema": {
    "name": "research_papers",
    "columns": {
      "id": "string",
      "title": "string",
      "abstract": "string",
      "authors": "string", 
      "year": "int",
      "doi": "string"
    },
    "id_column": "id",
    "embedding_column": "abstract"
  },
  "record_count": 1547,
  "created_at": "2024-01-15T10:30:00Z"
}

Delete Table

Delete a table and all its data.

DELETE /tables/research_papers
X-API-Key: your-api-key

Response (200):

{
  "message": "Table 'research_papers' deleted successfully",
  "table_name": "research_papers"
}

📄 Data Operations

Insert/Update Documents

Insert new documents or update existing ones. Embeddings are generated automatically.

POST /tables/research_papers/data
Content-Type: application/json
X-API-Key: your-api-key

{
  "records": [
    {
      "data": {
        "id": "paper_001",
        "title": "Attention Is All You Need",
        "abstract": "The dominant sequence transduction models are based on complex recurrent or convolutional neural networks...",
        "authors": "Ashish Vaswani, Noam Shazeer, Niki Parmar",
        "year": 2017,
        "doi": "10.48550/arXiv.1706.03762"
      }
    },
    {
      "data": {
        "id": "paper_002", 
        "title": "BERT: Pre-training of Deep Bidirectional Transformers",
        "abstract": "We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations...",
        "authors": "Jacob Devlin, Ming-Wei Chang, Kenton Lee",
        "year": 2018,
        "doi": "10.48550/arXiv.1810.04805"
      }
    }
  ]
}

Response (201):

{
  "message": "Successfully processed 2 records",
  "inserted_count": 2,
  "updated_count": 0,
  "processing_time_ms": 1247.3
}

Get Document

Retrieve a specific document by ID.

GET /tables/research_papers/data/paper_001
X-API-Key: your-api-key

Response (200):

{
  "table_name": "research_papers",
  "record_id": "paper_001",
  "data": {
    "id": "paper_001",
    "title": "Attention Is All You Need",
    "abstract": "The dominant sequence transduction models are based on complex recurrent...",
    "authors": "Ashish Vaswani, Noam Shazeer, Niki Parmar",
    "year": 2017,
    "doi": "10.48550/arXiv.1706.03762"
  }
}

Delete Document

Remove a document from the table.

DELETE /tables/research_papers/data/paper_001
X-API-Key: your-api-key

Response (200):

{
  "message": "Record 'paper_001' deleted successfully",
  "table_name": "research_papers",
  "deleted_id": "paper_001"
}

🔍 Search Operations

Similarity Search

Find all documents with similarity above a threshold.

POST /tables/research_papers/search/similarity
Content-Type: application/json
X-API-Key: your-api-key

{
  "query": "transformer neural network attention mechanism",
  "threshold": 0.7,
  "limit": 10
}

Response (200):

{
  "query": "transformer neural network attention mechanism",
  "results": [
    {
      "id": "paper_001",
      "data": {
        "id": "paper_001",
        "title": "Attention Is All You Need",
        "abstract": "The dominant sequence transduction models...",
        "authors": "Ashish Vaswani, Noam Shazeer, Niki Parmar",
        "year": 2017,
        "doi": "10.48550/arXiv.1706.03762"
      },
      "similarity_score": 0.89
    },
    {
      "id": "paper_002",
      "data": {
        "id": "paper_002",
        "title": "BERT: Pre-training of Deep Bidirectional Transformers", 
        "abstract": "We introduce a new language representation model...",
        "authors": "Jacob Devlin, Ming-Wei Chang, Kenton Lee",
        "year": 2018,
        "doi": "10.48550/arXiv.1810.04805"
      },
      "similarity_score": 0.76
    }
  ],
  "total_results": 2,
  "search_time_ms": 23.4,
  "threshold": 0.7
}

Top-K Search

Find the K most similar documents.

POST /tables/research_papers/search/top-k
Content-Type: application/json
X-API-Key: your-api-key

{
  "query": "natural language processing BERT",
  "k": 5
}

Response (200):

{
  "query": "natural language processing BERT",
  "results": [
    {
      "id": "paper_002",
      "data": {
        "id": "paper_002",
        "title": "BERT: Pre-training of Deep Bidirectional Transformers",
        "abstract": "We introduce a new language representation model...",
        "authors": "Jacob Devlin, Ming-Wei Chang, Kenton Lee", 
        "year": 2018,
        "doi": "10.48550/arXiv.1810.04805"
      },
      "similarity_score": 0.94
    },
    {
      "id": "paper_001", 
      "data": {
        "id": "paper_001",
        "title": "Attention Is All You Need",
        "abstract": "The dominant sequence transduction models...",
        "authors": "Ashish Vaswani, Noam Shazeer, Niki Parmar",
        "year": 2017,
        "doi": "10.48550/arXiv.1706.03762"
      },
      "similarity_score": 0.81
    }
  ],
  "total_results": 5,
  "search_time_ms": 31.7,
  "k": 5
}

❤️ Health Check

GET /health

Response (200):

{
  "status": "healthy",
  "app_name": "SemWare",
  "version": "0.1.0", 
  "timestamp": "2024-01-15T14:30:25.123456"
}

🧠 Embedding Process

SemWare uses advanced text processing for optimal semantic understanding:

1. Text Tokenization

Long texts are intelligently split into manageable chunks
Uses tiktoken with cl100k_base encoding for precise token counting
Default batch size: 2000 tokens with configurable limits

2. Batch Processing

Each text chunk is processed through the embedding model
Supports multiple embedding models via Hugging Face transformers
Automatic GPU acceleration when available

3. Embedding Aggregation

Multiple batch embeddings are combined using average pooling
Preserves semantic meaning across the entire document
Results in high-quality 384-dimensional vectors (MiniLM)

4. Normalization & Storage

Final embeddings are L2 normalized for consistent similarity scoring
Stored efficiently in LanceDB with optimized vector indexing
Enables sub-second search across millions of documents

🛠️ Development

Running Tests

# Run all tests with coverage
uv run --native-tls pytest --cov=src --cov-report=html

# Run specific test file
uv run --native-tls pytest tests/test_api/test_search.py -v

# Run with debug output
uv run --native-tls pytest -s --log-cli-level=DEBUG

Code Quality

# Format code
uv run --native-tls ruff format src/ tests/

# Lint and fix issues
uv run --native-tls ruff check src/ tests/ --fix

# Type checking
uv run --native-tls mypy src/

API Documentation

Start the server with DEBUG=true in your .env and visit:

Swagger UI: http://localhost:8000/docs
ReDoc: http://localhost:8000/redoc
OpenAPI JSON: http://localhost:8000/openapi.json

📁 Project Structure

SemWare/
├── src/semware/
│   ├── api/                    # FastAPI route handlers
│   │   ├── __init__.py
│   │   ├── auth.py            # Authentication middleware
│   │   ├── data.py            # Data CRUD operations
│   │   ├── search.py          # Search endpoints  
│   │   └── tables.py          # Table management
│   ├── models/                 # Pydantic data models
│   │   ├── __init__.py
│   │   ├── requests.py        # Request/response models
│   │   └── schemas.py         # Core data schemas
│   ├── services/              # Business logic services
│   │   ├── __init__.py
│   │   ├── embedding.py       # ML embedding generation
│   │   ├── search.py          # Search orchestration
│   │   └── vectordb.py        # Vector database operations
│   ├── utils/                 # Utility functions
│   │   ├── __init__.py
│   │   ├── logging.py         # Logging configuration
│   │   └── tokenizer.py       # Text tokenization
│   ├── config.py              # Configuration management
│   └── main.py                # FastAPI application factory
├── tests/                     # Comprehensive test suite
│   ├── conftest.py           # Test configuration & fixtures
│   ├── test_api/             # API endpoint tests
│   ├── test_services/        # Service layer tests
│   └── test_utils/           # Utility function tests
├── pyproject.toml            # Project configuration
├── .env.example             # Environment template
└── README.md               # This file

⚙️ Configuration Reference

Variable	Description	Default	Required
`API_KEY`	Authentication key for all endpoints	-	✅
`DEBUG`	Enable debug mode and API docs	`false`	❌
`DB_PATH`	Database storage directory	`./data`	❌
`HOST`	Server bind address	`0.0.0.0`	❌
`PORT`	Server port	`8000`	❌
`LOG_LEVEL`	Logging level (DEBUG/INFO/WARNING/ERROR)	`INFO`	❌
`LOG_FILE`	Log file path (optional)	-	❌
`EMBEDDING_MODEL_NAME`	Hugging Face model name	`all-MiniLM-L6-v2`	❌
`EMBEDDING_DIMENSION`	Embedding vector dimensions	`384`	❌
`MAX_TOKENS_PER_BATCH`	Max tokens per embedding batch	`2000`	❌
`WORKERS`	Number of server workers	`1`	❌

🚢 Deployment

Docker

FROM python:3.11-slim

WORKDIR /app
COPY . .

RUN pip install uv
RUN uv sync --native-tls

EXPOSE 8000
CMD ["uv", "run", "--native-tls", "uvicorn", "semware.main:app", "--host", "0.0.0.0", "--port", "8000"]

Production Considerations

Use multiple workers: --workers 4
Enable access logs: --access-log
Set up reverse proxy (nginx) for HTTPS termination
Configure log rotation and monitoring
Use a dedicated vector storage solution for large scale

🤝 Contributing

We welcome contributions! Please see our Contributing Guide for details.

Fork the repository
Create a feature branch: git checkout -b feature/amazing-feature
Make your changes and add tests
Run the test suite: uv run --native-tls pytest
Submit a pull request

📊 Performance

Benchmarks (on Apple M2 Pro, 16GB RAM):

Embedding Generation: ~200ms per batch (2000 tokens)
Document Insertion: ~500ms per document (including embedding)
Vector Search: <50ms for similarity search across 10K documents
Throughput: ~100 requests/second with 4 workers

🐛 Troubleshooting

Common Issues

Authentication Errors

# Ensure API key is set correctly
export API_KEY=your-secret-key
# Or check your .env file

Model Download Issues

# Clear Hugging Face cache
rm -rf ~/.cache/huggingface/
# Restart with debug logging
DEBUG=true uv run --native-tls python -m semware.main

Database Permissions

# Ensure write permissions to data directory
chmod 755 ./data

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

FastAPI for the excellent async web framework
LanceDB for high-performance vector storage
Hugging Face for the transformer models and ecosystem
Pydantic for robust data validation
The Python Community for the amazing open-source ecosystem

Built with ❤️ by the SemWare team

Report Bug • Discussions • Wiki

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

Sep 10, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

semware-0.1.0.tar.gz (221.4 kB view details)

Uploaded Sep 10, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

semware-0.1.0-py3-none-any.whl (28.0 kB view details)

Uploaded Sep 10, 2025 Python 3

File details

Details for the file semware-0.1.0.tar.gz.

File metadata

Download URL: semware-0.1.0.tar.gz
Upload date: Sep 10, 2025
Size: 221.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.8.14

File hashes

Hashes for semware-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`8ce0a48cb8a9395fc2a35c863f7141a371dd0e11f1cf2cf54ef615b02dbe0f36`
MD5	`a87b340ee795e67d0cd7917c505d1d6d`
BLAKE2b-256	`8a02c6394e787bf1bb03319e324567ac107c6550550fc6c0febdffc0aa6bb7a2`

See more details on using hashes here.

File details

Details for the file semware-0.1.0-py3-none-any.whl.

File metadata

Download URL: semware-0.1.0-py3-none-any.whl
Upload date: Sep 10, 2025
Size: 28.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.8.14

File hashes

Hashes for semware-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9322ccfa2c6b5f1bd808dda6f3dcb5a1a82115c482d2e02b28653e688374de67`
MD5	`b686de875291ac586b5c238ff11981d7`
BLAKE2b-256	`4439860f21c0724ca08ffa9dc258a3d92db7529497e0755a8ea846c3f848b53a`

See more details on using hashes here.

semware 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

SemWare 🚀

✨ Features

🏛️ Architecture

🚀 Quick Start

Installation

Configuration

Start the Server

CLI Options

📚 API Reference

Authentication

🗂️ Table Management

Create Table

List Tables

Get Table Info

Delete Table

📄 Data Operations

Insert/Update Documents

Get Document

Delete Document

🔍 Search Operations

Similarity Search

Top-K Search

❤️ Health Check

🧠 Embedding Process

1. Text Tokenization

2. Batch Processing

3. Embedding Aggregation

4. Normalization & Storage

🛠️ Development

Running Tests

Code Quality

API Documentation

📁 Project Structure

⚙️ Configuration Reference

🚢 Deployment

Docker

Production Considerations

🤝 Contributing

📊 Performance

🐛 Troubleshooting

Common Issues

📄 License

🙏 Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes