Semantic search API server using vector databases and ML embeddings
Project description
SemWare ๐
A high-performance semantic search API server built with modern Python technologies. SemWare provides REST APIs for vector-based document storage, embedding generation, and similarity search using state-of-the-art machine learning models.
โจ Features
- ๐ High Performance: Built on FastAPI with automatic async/await support
- ๐ง Smart Embeddings: Supports multiple embedding models (all-MiniLM-L6-v2, EmbeddingGemma-300M)
- ๐ Advanced Search: Similarity threshold and top-k search with sub-second response times
- ๐ก๏ธ Secure: API key authentication with Bearer token support
- ๐ Vector Storage: Powered by LanceDB for efficient vector operations
- ๐ง Developer Friendly: Comprehensive OpenAPI docs, type hints, and test coverage
- ๐ Scalable: Handles documents of any length with intelligent text batching
- ๐๏ธ Production Ready: Comprehensive logging, error handling, and monitoring
๐๏ธ Architecture
SemWare follows a clean architecture pattern with separate layers:
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ
โ FastAPI โ โ Services โ โ Storage โ
โ REST APIs โโโโโถโ Business โโโโโถโ LanceDB โ
โ (Routes) โ โ Logic โ โ Vector DB โ
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโ
โ ML Models โ
โ Embeddings โ
โ (HuggingFace) โ
โโโโโโโโโโโโโโโโโโโ
Core Components:
- Table Management: Create custom schemas for different document types
- Data Operations: CRUD operations with automatic embedding generation
- Semantic Search: Vector similarity search with configurable parameters
- Text Processing: Smart tokenization and batching for long documents
๐ Quick Start
Installation
Using uv (Recommended):
git clone https://github.com/your-org/semware.git
cd SemWare
uv sync --native-tls
Using pip:
git clone https://github.com/your-org/semware.git
cd SemWare
pip install -e .
Configuration
Create a .env file:
# Required
API_KEY=your-super-secret-api-key-here
# Optional (with defaults)
DEBUG=false
DB_PATH=./data
HOST=0.0.0.0
PORT=8000
LOG_LEVEL=INFO
EMBEDDING_MODEL_NAME=all-MiniLM-L6-v2
EMBEDDING_DIMENSION=384
MAX_TOKENS_PER_BATCH=2000
Start the Server
Simple Command (Recommended):
# Start with default settings from .env
semware
# Start with custom options
semware --debug --port 8080
semware --workers 4 --host 127.0.0.1
semware --reload # Development mode with auto-reload
Alternative Methods:
# Using uv directly
uv run --native-tls semware
# Using Python module
uv run --native-tls python -m semware.main
# Using uvicorn directly
uv run --native-tls uvicorn semware.main:app --host 0.0.0.0 --port 8000 --workers 4
The server will be available at http://localhost:8000 with automatic API documentation at /docs.
CLI Options
The semware command supports these options:
semware --help Show help message
semware --version Show version
semware --debug Enable debug mode & API docs
semware --reload Development mode with auto-reload
semware --host 127.0.0.1 Bind to specific host
semware --port 8080 Use custom port
semware --workers 4 Number of worker processes
semware --log-level DEBUG Set logging level
๐ API Reference
Authentication
All endpoints require authentication using one of:
- Header:
X-API-Key: your-api-key - Bearer Token:
Authorization: Bearer your-api-key
๐๏ธ Table Management
Create Table
Create a new table with custom schema.
POST /tables
Content-Type: application/json
X-API-Key: your-api-key
{
"schema": {
"name": "research_papers",
"columns": {
"id": "string",
"title": "string",
"abstract": "string",
"authors": "string",
"year": "int",
"doi": "string"
},
"id_column": "id",
"embedding_column": "abstract"
}
}
Response (201):
{
"message": "Table 'research_papers' created successfully",
"table_name": "research_papers"
}
List Tables
Get all available tables.
GET /tables
X-API-Key: your-api-key
Response (200):
{
"tables": ["research_papers", "product_docs", "customer_support"],
"count": 3
}
Get Table Info
Get detailed information about a specific table.
GET /tables/research_papers
X-API-Key: your-api-key
Response (200):
{
"table_name": "research_papers",
"schema": {
"name": "research_papers",
"columns": {
"id": "string",
"title": "string",
"abstract": "string",
"authors": "string",
"year": "int",
"doi": "string"
},
"id_column": "id",
"embedding_column": "abstract"
},
"record_count": 1547,
"created_at": "2024-01-15T10:30:00Z"
}
Delete Table
Delete a table and all its data.
DELETE /tables/research_papers
X-API-Key: your-api-key
Response (200):
{
"message": "Table 'research_papers' deleted successfully",
"table_name": "research_papers"
}
๐ Data Operations
Insert/Update Documents
Insert new documents or update existing ones. Embeddings are generated automatically.
POST /tables/research_papers/data
Content-Type: application/json
X-API-Key: your-api-key
{
"records": [
{
"data": {
"id": "paper_001",
"title": "Attention Is All You Need",
"abstract": "The dominant sequence transduction models are based on complex recurrent or convolutional neural networks...",
"authors": "Ashish Vaswani, Noam Shazeer, Niki Parmar",
"year": 2017,
"doi": "10.48550/arXiv.1706.03762"
}
},
{
"data": {
"id": "paper_002",
"title": "BERT: Pre-training of Deep Bidirectional Transformers",
"abstract": "We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations...",
"authors": "Jacob Devlin, Ming-Wei Chang, Kenton Lee",
"year": 2018,
"doi": "10.48550/arXiv.1810.04805"
}
}
]
}
Response (201):
{
"message": "Successfully processed 2 records",
"inserted_count": 2,
"updated_count": 0,
"processing_time_ms": 1247.3
}
Get Document
Retrieve a specific document by ID.
GET /tables/research_papers/data/paper_001
X-API-Key: your-api-key
Response (200):
{
"table_name": "research_papers",
"record_id": "paper_001",
"data": {
"id": "paper_001",
"title": "Attention Is All You Need",
"abstract": "The dominant sequence transduction models are based on complex recurrent...",
"authors": "Ashish Vaswani, Noam Shazeer, Niki Parmar",
"year": 2017,
"doi": "10.48550/arXiv.1706.03762"
}
}
Delete Document
Remove a document from the table.
DELETE /tables/research_papers/data/paper_001
X-API-Key: your-api-key
Response (200):
{
"message": "Record 'paper_001' deleted successfully",
"table_name": "research_papers",
"deleted_id": "paper_001"
}
๐ Search Operations
Similarity Search
Find all documents with similarity above a threshold.
POST /tables/research_papers/search/similarity
Content-Type: application/json
X-API-Key: your-api-key
{
"query": "transformer neural network attention mechanism",
"threshold": 0.7,
"limit": 10
}
Response (200):
{
"query": "transformer neural network attention mechanism",
"results": [
{
"id": "paper_001",
"data": {
"id": "paper_001",
"title": "Attention Is All You Need",
"abstract": "The dominant sequence transduction models...",
"authors": "Ashish Vaswani, Noam Shazeer, Niki Parmar",
"year": 2017,
"doi": "10.48550/arXiv.1706.03762"
},
"similarity_score": 0.89
},
{
"id": "paper_002",
"data": {
"id": "paper_002",
"title": "BERT: Pre-training of Deep Bidirectional Transformers",
"abstract": "We introduce a new language representation model...",
"authors": "Jacob Devlin, Ming-Wei Chang, Kenton Lee",
"year": 2018,
"doi": "10.48550/arXiv.1810.04805"
},
"similarity_score": 0.76
}
],
"total_results": 2,
"search_time_ms": 23.4,
"threshold": 0.7
}
Top-K Search
Find the K most similar documents.
POST /tables/research_papers/search/top-k
Content-Type: application/json
X-API-Key: your-api-key
{
"query": "natural language processing BERT",
"k": 5
}
Response (200):
{
"query": "natural language processing BERT",
"results": [
{
"id": "paper_002",
"data": {
"id": "paper_002",
"title": "BERT: Pre-training of Deep Bidirectional Transformers",
"abstract": "We introduce a new language representation model...",
"authors": "Jacob Devlin, Ming-Wei Chang, Kenton Lee",
"year": 2018,
"doi": "10.48550/arXiv.1810.04805"
},
"similarity_score": 0.94
},
{
"id": "paper_001",
"data": {
"id": "paper_001",
"title": "Attention Is All You Need",
"abstract": "The dominant sequence transduction models...",
"authors": "Ashish Vaswani, Noam Shazeer, Niki Parmar",
"year": 2017,
"doi": "10.48550/arXiv.1706.03762"
},
"similarity_score": 0.81
}
],
"total_results": 5,
"search_time_ms": 31.7,
"k": 5
}
โค๏ธ Health Check
GET /health
Response (200):
{
"status": "healthy",
"app_name": "SemWare",
"version": "0.1.0",
"timestamp": "2024-01-15T14:30:25.123456"
}
๐ง Embedding Process
SemWare uses advanced text processing for optimal semantic understanding:
1. Text Tokenization
- Long texts are intelligently split into manageable chunks
- Uses
tiktokenwithcl100k_baseencoding for precise token counting - Default batch size: 2000 tokens with configurable limits
2. Batch Processing
- Each text chunk is processed through the embedding model
- Supports multiple embedding models via Hugging Face transformers
- Automatic GPU acceleration when available
3. Embedding Aggregation
- Multiple batch embeddings are combined using average pooling
- Preserves semantic meaning across the entire document
- Results in high-quality 384-dimensional vectors (MiniLM)
4. Normalization & Storage
- Final embeddings are L2 normalized for consistent similarity scoring
- Stored efficiently in LanceDB with optimized vector indexing
- Enables sub-second search across millions of documents
๐ ๏ธ Development
Running Tests
# Run all tests with coverage
uv run --native-tls pytest --cov=src --cov-report=html
# Run specific test file
uv run --native-tls pytest tests/test_api/test_search.py -v
# Run with debug output
uv run --native-tls pytest -s --log-cli-level=DEBUG
Code Quality
# Format code
uv run --native-tls ruff format src/ tests/
# Lint and fix issues
uv run --native-tls ruff check src/ tests/ --fix
# Type checking
uv run --native-tls mypy src/
API Documentation
Start the server with DEBUG=true in your .env and visit:
- Swagger UI: http://localhost:8000/docs
- ReDoc: http://localhost:8000/redoc
- OpenAPI JSON: http://localhost:8000/openapi.json
๐ Project Structure
SemWare/
โโโ src/semware/
โ โโโ api/ # FastAPI route handlers
โ โ โโโ __init__.py
โ โ โโโ auth.py # Authentication middleware
โ โ โโโ data.py # Data CRUD operations
โ โ โโโ search.py # Search endpoints
โ โ โโโ tables.py # Table management
โ โโโ models/ # Pydantic data models
โ โ โโโ __init__.py
โ โ โโโ requests.py # Request/response models
โ โ โโโ schemas.py # Core data schemas
โ โโโ services/ # Business logic services
โ โ โโโ __init__.py
โ โ โโโ embedding.py # ML embedding generation
โ โ โโโ search.py # Search orchestration
โ โ โโโ vectordb.py # Vector database operations
โ โโโ utils/ # Utility functions
โ โ โโโ __init__.py
โ โ โโโ logging.py # Logging configuration
โ โ โโโ tokenizer.py # Text tokenization
โ โโโ config.py # Configuration management
โ โโโ main.py # FastAPI application factory
โโโ tests/ # Comprehensive test suite
โ โโโ conftest.py # Test configuration & fixtures
โ โโโ test_api/ # API endpoint tests
โ โโโ test_services/ # Service layer tests
โ โโโ test_utils/ # Utility function tests
โโโ pyproject.toml # Project configuration
โโโ .env.example # Environment template
โโโ README.md # This file
โ๏ธ Configuration Reference
| Variable | Description | Default | Required |
|---|---|---|---|
API_KEY |
Authentication key for all endpoints | - | โ |
DEBUG |
Enable debug mode and API docs | false |
โ |
DB_PATH |
Database storage directory | ./data |
โ |
HOST |
Server bind address | 0.0.0.0 |
โ |
PORT |
Server port | 8000 |
โ |
LOG_LEVEL |
Logging level (DEBUG/INFO/WARNING/ERROR) | INFO |
โ |
LOG_FILE |
Log file path (optional) | - | โ |
EMBEDDING_MODEL_NAME |
Hugging Face model name | all-MiniLM-L6-v2 |
โ |
EMBEDDING_DIMENSION |
Embedding vector dimensions | 384 |
โ |
MAX_TOKENS_PER_BATCH |
Max tokens per embedding batch | 2000 |
โ |
WORKERS |
Number of server workers | 1 |
โ |
๐ข Deployment
Docker
FROM python:3.11-slim
WORKDIR /app
COPY . .
RUN pip install uv
RUN uv sync --native-tls
EXPOSE 8000
CMD ["uv", "run", "--native-tls", "uvicorn", "semware.main:app", "--host", "0.0.0.0", "--port", "8000"]
Production Considerations
- Use multiple workers:
--workers 4 - Enable access logs:
--access-log - Set up reverse proxy (nginx) for HTTPS termination
- Configure log rotation and monitoring
- Use a dedicated vector storage solution for large scale
๐ค Contributing
We welcome contributions! Please see our Contributing Guide for details.
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Make your changes and add tests
- Run the test suite:
uv run --native-tls pytest - Submit a pull request
๐ Performance
Benchmarks (on Apple M2 Pro, 16GB RAM):
- Embedding Generation: ~200ms per batch (2000 tokens)
- Document Insertion: ~500ms per document (including embedding)
- Vector Search: <50ms for similarity search across 10K documents
- Throughput: ~100 requests/second with 4 workers
๐ Troubleshooting
Common Issues
Authentication Errors
# Ensure API key is set correctly
export API_KEY=your-secret-key
# Or check your .env file
Model Download Issues
# Clear Hugging Face cache
rm -rf ~/.cache/huggingface/
# Restart with debug logging
DEBUG=true uv run --native-tls python -m semware.main
Database Permissions
# Ensure write permissions to data directory
chmod 755 ./data
๐ License
This project is licensed under the MIT License - see the LICENSE file for details.
๐ Acknowledgments
- FastAPI for the excellent async web framework
- LanceDB for high-performance vector storage
- Hugging Face for the transformer models and ecosystem
- Pydantic for robust data validation
- The Python Community for the amazing open-source ecosystem
Built with โค๏ธ by the SemWare team
Report Bug โข Discussions โข Wiki
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file semware-0.1.0.tar.gz.
File metadata
- Download URL: semware-0.1.0.tar.gz
- Upload date:
- Size: 221.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8ce0a48cb8a9395fc2a35c863f7141a371dd0e11f1cf2cf54ef615b02dbe0f36
|
|
| MD5 |
a87b340ee795e67d0cd7917c505d1d6d
|
|
| BLAKE2b-256 |
8a02c6394e787bf1bb03319e324567ac107c6550550fc6c0febdffc0aa6bb7a2
|
File details
Details for the file semware-0.1.0-py3-none-any.whl.
File metadata
- Download URL: semware-0.1.0-py3-none-any.whl
- Upload date:
- Size: 28.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9322ccfa2c6b5f1bd808dda6f3dcb5a1a82115c482d2e02b28653e688374de67
|
|
| MD5 |
b686de875291ac586b5c238ff11981d7
|
|
| BLAKE2b-256 |
4439860f21c0724ca08ffa9dc258a3d92db7529497e0755a8ea846c3f848b53a
|