Skip to main content

amsdal_ml plugin for AMSDAL Framework

Project description

AMSDAL ML

CI Python 3.11+

Machine learning plugin for the AMSDAL Framework, providing embeddings, vector search, semantic retrieval, and AI agents with support for OpenAI models.

Features

  • Vector Embeddings: Generate and store embeddings for any AMSDAL model with automatic chunking
  • Semantic Search: Query your data using natural language with tag-based filtering
  • AI Agents: Build Q&A systems with streaming support and citation tracking
  • Async-First: Optimized for high-performance async operations
  • MCP Integration: Expose and consume tools via Model Context Protocol (stdio/HTTP)
  • File Attachments: Process and embed documents with built-in loaders
  • Extensible: Abstract base classes for custom models, retrievers, and ingesters

Installation

pip install amsdal-ml

Requirements

  • Python 3.11 or higher
  • AMSDAL Framework 0.5.6+
  • OpenAI API key (for default implementations)

Quick Start

1. Configuration

Create a .env file in your project root:

OPENAI_API_KEY=sk-your-api-key-here
async_mode=true
ml_model_class=amsdal_ml.ml_models.openai_model.OpenAIModel
ml_retriever_class=amsdal_ml.ml_retrievers.openai_retriever.OpenAIRetriever
ml_ingesting_class=amsdal_ml.ml_ingesting.openai_ingesting.OpenAIIngesting

Create a config.yml for AMSDAL connections:

application_name: my-ml-app
async_mode: true
connections:
  - name: sqlite_state
    backend: sqlite-state-async
    credentials:
      - db_path: ./warehouse/state.sqlite3
      - check_same_thread: false
  - name: lock
    backend: amsdal_data.lock.implementations.thread_lock.ThreadLock
resources_config:
  repository:
    default: sqlite_state
  lock: lock

2. Generate Embeddings

from amsdal_ml.ml_ingesting.openai_ingesting import OpenAIIngesting
from amsdal_ml.ml_config import ml_config

# Initialize ingesting
ingester = OpenAIIngesting(
    model=MyModel,
    embedding_field='embedding',
)

# Generate embeddings for an instance
instance = MyModel(content='Your text here')
embeddings = await ingester.agenerate_embeddings(instance)
await ingester.asave(embeddings, instance)

3. Semantic Search

from amsdal_ml.ml_retrievers.openai_retriever import OpenAIRetriever

retriever = OpenAIRetriever()

# Search for relevant content
results = await retriever.asimilarity_search(
    query='What is machine learning?',
    k=5,
    include_tags=['documentation']
)

for chunk in results:
    print(f'{chunk.object_class}:{chunk.object_id} - {chunk.raw_text}')

4. Build an AI Agent

from amsdal_ml.agents.default_qa_agent import DefaultQAAgent

agent = DefaultQAAgent()

# Ask questions
output = await agent.arun('Explain vector embeddings')
print(output.answer)
print(f'Used tools: {output.used_tools}')

# Stream responses
async for chunk in agent.astream('What is semantic search?'):
    print(chunk, end='', flush=True)

Architecture

Core Components

  • MLModel: Abstract interface for LLM inference (invoke, stream, with attachments)
  • MLIngesting: Generate text and embeddings from data objects with chunking
  • MLRetriever: Semantic similarity search with tag-based filtering
  • Agent: Q&A and task-oriented agents with streaming and citations
  • EmbeddingModel: Database model storing 1536-dimensional vectors linked to source objects
  • MCP Server/Client: Expose retrievers as tools or consume external MCP services

Configuration

All settings are managed via MLConfig in .env:

# Model Configuration
llm_model_name=gpt-4o
llm_temperature=0.0
embed_model_name=text-embedding-3-small

# Chunking Parameters
embed_max_depth=2
embed_max_chunks=10
embed_max_tokens_per_chunk=800

# Retrieval Settings
retriever_default_k=8

Development

Setup

# Install dependencies
pip install --upgrade uv hatch==1.14.2
hatch env create
hatch run sync

Testing

# Run all tests with coverage
hatch run cov

# Run specific tests
hatch run test tests/test_openai_model.py

# Watch mode
pytest tests/ -v

Code Quality

# Run all checks (style + typing)
hatch run all

# Format code
hatch run fmt

# Type checking
hatch run typing

AMSDAL CLI

# Generate a new model
amsdal generate model MyModel --format py

# Generate property
amsdal generate property --model MyModel embedding_field

# Generate transaction
amsdal generate transaction ProcessEmbeddings

# Generate hook
amsdal generate hook --model MyModel on_create

MCP Server

Run the retriever as an MCP server for integration with Claude Desktop or other MCP clients:

python -m amsdal_ml.mcp_server.server_retriever_stdio \
  --amsdal-config "$(echo '{"async_mode": true, ...}' | base64)"

The server exposes a search tool for semantic search in your knowledge base.

License

See amsdal_ml/Third-Party Materials - AMSDAL Dependencies - License Notices.md for dependency licenses.

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

amsdal_ml-0.1.4.tar.gz (218.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

amsdal_ml-0.1.4-py3-none-any.whl (39.8 kB view details)

Uploaded Python 3

File details

Details for the file amsdal_ml-0.1.4.tar.gz.

File metadata

  • Download URL: amsdal_ml-0.1.4.tar.gz
  • Upload date:
  • Size: 218.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-httpx/0.28.1

File hashes

Hashes for amsdal_ml-0.1.4.tar.gz
Algorithm Hash digest
SHA256 de7aa84fa273287b01058306d90b89a60cc3159bd544ad363e3ce7f8427a868c
MD5 5f4c926dff2f39e7958646b5c0902d0d
BLAKE2b-256 c264b64322cfbc29c280d8597c56548ea0bb73c08b93337d7e843e3a91d27e7e

See more details on using hashes here.

File details

Details for the file amsdal_ml-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: amsdal_ml-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 39.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-httpx/0.28.1

File hashes

Hashes for amsdal_ml-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 11afabc99c04fb3a368a2df0f02818b1f89b52845d32bf96186e4f8a5ec943d4
MD5 2201602da2bac61d8b350253a7acc904
BLAKE2b-256 0617f51d553d53a113e22b165dcfc480ccc584d1be3e1780a653d41bd2451c46

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page