Skip to main content

Shared utilities and base classes for Ashmatics Knowledge Base applications

Project description

Ashmatics Tools

Last updated: 2026-01-25

Version 0.7.1

v0.7.1 (2026-01-25)

  • Standardized MongoDB/CosmosDB environment variables with fallback chains
  • MONGO_URL is now canonical (with AZ_MONGO_CONNECTION_STRING, COSMOS_VECTOR_CONNECTION_STRING as fallbacks)
  • MONGO_DB is now canonical (with MONGO_DATABASE, COSMOS_VECTOR_DATABASE as fallbacks)
  • Updated ENV_VARIABLES.md documentation

A Python package providing shared utilities, base classes, and common functionality for Ashmatics Knowledge Base applications.

Overview

ashmatics-tools is a foundational library that centralizes reusable components across Ashmatics healthcare AI applications. It provides:

  • Data Import/Export Utilities: Excel data loading, GraphQL integration, batch processing with Hasura
  • Document Processors: Abstract base classes for MongoDB document processing
  • GraphQL Clients: Generic GraphQL query/mutation builders and client utilities
  • Schema Management: GraphQL schema introspection and analysis tools
  • Document Parsers: Advanced parsing for PDFs, DOCX, and other formats
  • Document Chunkers: Token-aware and semantic chunking strategies
  • Embedders: Generate embeddings using Azure OpenAI or OpenAI APIs
  • Vector Stores: Integration with CosmosDB, PostgreSQL, and Qdrant for vector search
  • Storage Backends: Cloud-agnostic storage abstraction for ADLS Gen2, MinIO, and AWS S3
  • LLM Clients: Unified interface for Azure OpenAI, OpenAI, HuggingFace, and custom providers
  • Ontology Services: Medical ontology management including SNOMED CT, RADLEX, LOINC, and custom Ashmatics ontologies
  • Term Services: Term resolution, hierarchical category management, and external ontology validation
  • External APIs: Clients for external data sources (FDA, Census, CMS) with retry, rate limiting, and pagination
  • MCP Servers: Model Context Protocol servers exposing APIs to LLMs with tool-based interfaces
  • Search/RAG: Retrieval-Augmented Generation strategies with streaming, context window management, and MCP tool definitions
  • Document Enrichers: Table classification, consolidation, and metrics extraction for parsed documents
  • Document Storage: Figure and table storage managers with content-addressed hashing and manifests

Installation

From Git Repository (Private)

# Using pip
pip install git+https://github.com/JFK-Ashmatics/ashmatics-tools.git

# Using uv
uv add git+https://github.com/JFK-Ashmatics/ashmatics-tools.git

# With optional dependencies
pip install "ashmatics-tools[mongodb,storage] @ git+https://github.com/JFK-Ashmatics/ashmatics-tools.git"

From Local Development

# Clone the repository
git clone https://github.com/JFK-Ashmatics/ashmatics-tools.git
cd ashmatics-tools

# Install in editable mode with dev dependencies
pip install -e ".[dev,mongodb,storage]"

Configuration

Environment Variables

ashmatics-tools requires various environment variables depending on which components you use. This library does not load .env files automatically - your application must handle environment variable loading.

See ENV_VARIABLES.md for:

  • Complete list of required environment variables by component
  • Example application setups (development with .env, production with Key Vault)
  • Environment-specific configurations

Quick example:

from dotenv import load_dotenv

# Load .env BEFORE importing ashmatics_tools
load_dotenv()

# Now use the library
from ashmatics_tools.embedders import create_embedder
embedder = create_embedder(provider="azure")

Usage

Knowledge Base Importer

from ashmatics_tools.utils.import_utils import KBImporter

# Initialize the importer
importer = KBImporter(
    graphql_endpoint="https://kb-api.ashmatics.com/v1/graphql",
    admin_secret="your-admin-secret",
    batch_size=100
)

# Load data from Excel
df = importer.load_excel_data("data.xlsx", sheet_name="Sheet1")

# Import to Knowledge Base via GraphQL
result = importer.import_to_kb(
    df=df,
    table_name="my_table",
    column_mapping={"excel_col": "db_col"}
)

Document Processor (MongoDB)

from ashmatics_tools.processors.base import DocumentProcessor
from pymongo import MongoClient

class MyDocumentProcessor(DocumentProcessor):
    def extract_metadata(self, document: dict) -> dict:
        return {"title": document.get("title"), "author": document.get("author")}

    def clean_text(self, text: str) -> str:
        return text.strip().lower()

    def get_identifier_key(self) -> str:
        return "document_id"

    def get_document_type(self) -> str:
        return "my_document_type"

    def process_document(self, file_path: str) -> dict:
        # Your document processing logic
        return {"document_id": "123", "content": "..."}

# Use the processor
client = MongoClient("mongodb://localhost:27017")
processor = MyDocumentProcessor(client, "my_database", "my_collection")
result = processor.upsert_document({"document_id": "123", "content": "..."})

Document Chunking

from ashmatics_tools.chunkers.factory import create_chunker

# Initialize chunker
chunker = create_chunker(strategy="docling")

# Chunk document
chunks = chunker.chunk_document(
    content="This is a sample document content.",
    title="Sample Document",
    source="document.pdf"
)

Embedding Generation

from ashmatics_tools.embedders.factory import create_embedder

# Initialize embedder
embedder = create_embedder(provider="azure")
embedder.initialize()

# Generate embeddings
embeddings = embedder.embed_chunks(["chunk1", "chunk2"])

Vector Store Integration

from ashmatics_tools.vector_stores.factory import create_vector_store

# Initialize vector store
vector_store = create_vector_store(provider="cosmosdb")

# Store embeddings
success, failed = vector_store.store_embeddings_batch(embeddings)

# Perform similarity search
results = vector_store.similarity_search(query_embedding, top_k=10)

Storage Backend Integration

from ashmatics_tools.storage import create_storage_client, StorageConfig, AuthType

# Initialize ADLS storage with DefaultAzureCredential (production)
config = StorageConfig(
    provider="adls",
    account_url="https://mystorageaccount.dfs.core.windows.net",
    container_name="my-container",
    auth_type=AuthType.DEFAULT_CREDENTIAL
)
storage = create_storage_client("adls", config)

# Or use connection string (development)
config = StorageConfig(
    provider="adls",
    connection_string="DefaultEndpointsProtocol=https;AccountName=...",
    container_name="my-container",
    auth_type=AuthType.CONNECTION_STRING
)
storage = create_storage_client("adls", config)

# Initialize MinIO storage
config = StorageConfig(
    provider="minio",
    endpoint="minio.example.com:9000",
    access_key="minioadmin",
    secret_key="minioadmin",
    container_name="my-bucket",
    auth_type=AuthType.ACCESS_KEY,
    use_ssl=False
)
storage = create_storage_client("minio", config)

# Use async context manager
async with storage:
    # Write object
    await storage.write_object("path/file.txt", b"Hello, World!")

    # Read object
    content = await storage.read_object("path/file.txt")

    # List objects
    objects = await storage.list_objects(prefix="path/", pattern="*.txt")

    # Stream large files
    async for chunk in storage.read_object_stream("large-file.bin"):
        process(chunk)

    # Check if exists
    exists = await storage.exists("path/file.txt")

    # Get metadata
    metadata = await storage.get_metadata("path/file.txt")

    # Copy object
    await storage.copy_object("src/file.txt", "dest/file.txt")

    # Delete object
    await storage.delete_object("path/file.txt")

LLM Module Usage

The LLM module provides a unified interface for working with various language model providers. Supports Azure OpenAI, OpenAI, HuggingFace, and custom providers via plugin registry.

Key Features

  • Async-first API: All operations are async for high-performance pipelines
  • Unified interface: Same API across all providers
  • Cost tracking: Automatic token counting and cost estimation
  • Plugin registry: Extensible with custom providers
  • Optional dependencies: HuggingFace support via [huggingface] extra

Azure OpenAI (Primary)

from ashmatics_tools.llm import create_llm_client, AzureOpenAIConfig

config = AzureOpenAIConfig(
    endpoint="https://my-resource.openai.azure.com/",
    api_key=os.getenv("AZURE_OPENAI_API_KEY"),
    deployment_name="gpt-4"
)

async with create_llm_client("azure_openai", config) as llm:
    response = await llm.complete(
        prompt="What is asthma?",
        temperature=0.7,
        max_tokens=500
    )
    print(response.text)
    print(f"Cost: ${response.tokens.estimated_cost:.4f}")

Ontology and Term Services

The ontology module provides comprehensive medical ontology management including term resolution, hierarchical category management, and integration with external ontologies.

Key Features

  • Term Resolution: MongoDB-based term lookup and management
  • Category Management: Hierarchical category structures for document tagging
  • External Ontology Integration: BioPortal API for validating terms against SNOMED CT, RADLEX, LOINC, NCIT
  • Custom Ontology: ASHMATICS domain-specific ontology for medical imaging AI concepts

Term Resolution

from ashmatics_tools.ontology import TermResolver
from pymongo import MongoClient

# Initialize term resolver
client = MongoClient("mongodb://localhost:27017")
term_resolver = TermResolver(mongodb_client=client)

# Resolve term
result = await term_resolver.resolve_term("breast cancer")
print(f"Resolved: {result.prefLabel} - {result.definition}")

Category Management

from ashmatics_tools.ontology import CategoryManager

# Initialize category manager
category_manager = CategoryManager(
    mongodb_database=client["ashmatics_kb"],
    term_resolver=term_resolver
)

# Create hierarchical category
category = await category_manager.create_category(
    name="Medical Imaging",
    parent_id=None,
    description="Top-level category for medical imaging"
)

# Add subcategory
subcategory = await category_manager.create_category(
    name="Breast Imaging",
    parent_id=category.id,
    description="Breast imaging techniques and AI models"
)

External Ontology Validation

from ashmatics_tools.ontology import BioPortalClient

# Initialize BioPortal client
bioportal = BioPortalClient(api_key="your-bioportal-api-key")

# Check term in external ontologies
exists, ontologies = await bioportal.check_term_in_ontology("Breast Cancer")
print(f"Term exists: {exists}")
print(f"Found in ontologies: {ontologies}")

Custom ASHMATICS Ontology

from ashmatics_tools.ontology import AshmaticsOntology

# Initialize custom ontology manager
ashmatics_ontology = AshmaticsOntology(mongodb_database=client["ashmatics_kb"])

# Create concept
concept = await ashmatics_ontology.create_concept(
    prefLabel="AI Breast Cancer Detector",
    definition="AI model for detecting breast cancer in medical images",
    synonyms=["Breast Cancer AI", "Mammography AI"]
)

# Add relationship
await ashmatics_ontology.add_relationship(
    source_id=concept.id,
    target_id=another_concept.id,
    relationship_type="related_to"
)

ASHCAI Clinical AI Governance Ontology

The ASHCAI (AshMatics Clinical AI Governance) ontology provides governance concepts for the CAI Framework, including policies, processes, controls, and regulatory crosswalks.

from ashmatics_tools.ontology import AshcaiOntology

# Initialize ASHCAI ontology manager
ashcai = AshcaiOntology(mongodb_database=client["ashmatics_kb"])

# Initialize collections and indexes
await ashcai.initialize_ontology()

# Create a policy domain with natural business ID
policy = await ashcai.create_policy_domain(
    domain_id="MMP-001",
    domain_code="MMP",
    label="Model Monitoring Policy",
    description="Policy governing AI model monitoring requirements",
    specifies=["MON"]  # Links to process domains
)

# Create a process domain
process = await ashcai.create_process_domain(
    domain_id="MON",
    label="Model Monitoring",
    primary_function="Continuous monitoring of AI model performance",
    integrates_with=["RM", "SA", "OVR"]
)

# Create base practice
practice = await ashcai.create_base_practice(
    practice_id="MON.BP01",
    label="Rollout and Change Management",
    process_domain="MON",
    sequence_order=1
)

# Create SOP template
sop = await ashcai.create_sop_template(
    sop_id="SOP-MON-01",
    label="Model Deployment SOP",
    purpose="Standard procedure for deploying AI models",
    base_practice="MON.BP01"
)

# Create work product template
wp = await ashcai.create_work_product_template(
    wp_id="WP-MON-02-Dashboard",
    label="Monitoring Dashboard",
    evidence_type="Dashboard",
    produced_by="SOP-MON-02",
    serves_as_evidence_for=["EXC-A7-04"]
)

# Create exemplar control
control = await ashcai.create_exemplar_control(
    control_id="EXC-A7-04",
    label="Performance Monitoring Control",
    iso_control="A.7.5",
    evidenced_by=["WP-MON-02-Dashboard"]
)

# Create regulatory requirement with crosswalk
requirement = await ashcai.create_regulatory_requirement(
    requirement_id="NIST-MAP-4.2",
    label="MAP 4.2",
    framework_id="NIST-AI-RMF",
    function="MAP",
    category="MAP-4",
    description="Internal risk controls for third-party AI resources",
    crosswalk={
        "addressedBy": ["TPP-001", "MMP-001"],
        "implementedThrough": ["EXC-A10-02", "EXC-A6-02"],
        "operationalizedIn": ["MON", "PV"],
        "evidencedBy": ["WP-MON-01", "WP-PV-03"]
    }
)

# Create relationships
await ashcai.link_policy_to_process("MMP-001", "MON")
await ashcai.link_process_to_practice("MON", "MON.BP01")
await ashcai.link_practice_to_sop("MON.BP01", "SOP-MON-01")
await ashcai.link_sop_to_workproduct("SOP-MON-01", "WP-MON-02-Dashboard")
await ashcai.link_workproduct_to_control("WP-MON-02-Dashboard", "EXC-A7-04")

# Traversal helpers
hierarchy = await ashcai.get_policy_hierarchy("MMP-001")
# Returns: policy, processes (with practices, SOPs, work products), controls

evidence_chain = await ashcai.get_evidence_chain("EXC-A7-04")
# Returns: control with all work products that evidence it

crosswalk = await ashcai.get_regulatory_crosswalk("NIST-MAP-4.2")
# Returns: requirement with all policies, controls, processes, evidence

# OWL/RDF export
uri = ashcai.generate_uri("MMP-001")
# Returns: http://asherinformatics.com/ontology/ashcai/MMP-001

Key Features:

  • Natural Business IDs: Human-readable identifiers (MMP-001, MON, SOP-MON-01) with regex validation
  • Type Discrimination: All documents include ontology: "ashcai" for filtering
  • 44 Relationship Types: Comprehensive relationships from TDD-001 specification
  • Traversal Helpers: Pre-built queries for policy hierarchies and regulatory crosswalks
  • OWL/RDF Export: Generate standard URIs from natural business IDs

External API Integration

The external_apis module provides clients for accessing external data sources with built-in retry logic, rate limiting, and pagination.

OpenFDA API Client

from ashmatics_tools.external_apis import create_api_client, OpenFDAConfig, OpenFDAEndpoint

# Create client with API key (recommended)
config = OpenFDAConfig(api_key="your_api_key")
async with create_api_client("openfda", config) as client:
    # Search device adverse events
    async for event in client.search(
        endpoint=OpenFDAEndpoint.DEVICE_EVENT,
        query="device_name:pacemaker AND date_received:[20230101 TO 20231231]",
        limit=100,
        max_records=1000
    ):
        device = event.get("device", [{}])[0]
        print(f"Device: {device.get('device_name')}")
        print(f"Event Date: {event.get('date_received')}")

    # Search 510(k) clearances
    async for clearance in client.search(
        endpoint=OpenFDAEndpoint.DEVICE_510K,
        query="product_code:OZP",
        limit=100
    ):
        print(f"K Number: {clearance.get('k_number')}")
        print(f"Applicant: {clearance.get('applicant')}")

    # Analytics - count by field
    counts = await client.count(
        endpoint=OpenFDAEndpoint.DEVICE_EVENT,
        query="date_received:[20230101 TO 20231231]",
        count_field="device.device_class.exact"
    )
    for item in counts:
        print(f"Class {item['term']}: {item['count']} events")

AccessGUDID API Client

from ashmatics_tools.external_apis import AccessGUDIDClient, AccessGUDIDConfig

# Create client (no API key required for basic operations)
config = AccessGUDIDConfig()
async with AccessGUDIDClient(config) as client:
    # Lookup device by Device Identifier (DI)
    device = await client.lookup_device(di="08717648200274")
    print(f"Brand: {device['gudid']['device']['brandName']}")
    print(f"Company: {device['gudid']['device']['companyName']}")

    # Parse a UDI string (GS1, HIBCC, or ICCBBA format)
    parsed = await client.parse_udi(
        udi="(01)00844588012919(17)141231(10)A213B1"
    )
    print(f"DI: {parsed['di']}")
    print(f"Issuing Agency: {parsed['issuingAgency']}")
    print(f"Expiration: {parsed['expirationDate']}")
    print(f"Lot Number: {parsed['lotNumber']}")

    # Get device version history
    history = await client.get_device_history(di="08717648200274")
    for version in history.get('deviceHistory', []):
        print(f"Version {version['publicVersionNumber']}: {version['publicVersionDate']}")

    # List implantable devices with date filtering
    async for device in client.list_implantable_devices(
        from_date="2024-01-01",
        max_records=100
    ):
        print(f"{device['brandName']} - {device['companyName']}")

# With UMLS API key for SNOMED lookups
config = AccessGUDIDConfig(umls_api_key="your_umls_key")
async with AccessGUDIDClient(config) as client:
    snomed = await client.get_device_snomed(di="08717648200274")
    for concept in snomed.get('concepts', []):
        print(f"{concept['snomedCTName']}: {concept['snomedIdentifier']}")

MCP Server Integration

The mcp_servers module provides Model Context Protocol servers that expose external APIs as tools for LLM consumption.

OpenFDA MCP Server

from ashmatics_tools.mcp_servers import create_mcp_server, OpenFDAMCPConfig
from ashmatics_tools.external_apis import OpenFDAConfig

# Create MCP server
config = OpenFDAMCPConfig(
    api_config=OpenFDAConfig(api_key="your_key")
)
server = create_mcp_server("openfda", config)

# Get available tools
tools = server.get_tools()
# Returns: search_devices, search_drugs, count_by_field

# Call a tool
result = await server.call_tool("search_devices", {
    "endpoint": "device_event",
    "query": "device_name:pacemaker",
    "limit": 10
})

print(f"Found {result['count']} results")
for item in result['results']:
    print(item)

AccessGUDID MCP Server

from ashmatics_tools.mcp_servers import create_mcp_server, AccessGUDIDMCPConfig
from ashmatics_tools.external_apis import AccessGUDIDConfig

# Create MCP server
config = AccessGUDIDMCPConfig(
    api_config=AccessGUDIDConfig()  # No API key required for basic operations
)
server = create_mcp_server("accessgudid", config)

# Get available tools
tools = server.get_tools()
# Returns: lookup_device, parse_udi, get_device_history, get_device_snomed, list_implantable_devices

# Lookup a device by DI
result = await server.call_tool("lookup_device", {"di": "08717648200274"})
print(f"Device: {result['summary']['brandName']}")

# Parse a UDI string
result = await server.call_tool("parse_udi", {
    "udi": "(01)00844588012919(17)141231(10)A213B1"
})
print(f"Parsed DI: {result['parsed']['di']}")

# List implantable devices
result = await server.call_tool("list_implantable_devices", {
    "from_date": "2024-01-01",
    "max_records": 50
})
print(f"Found {result['count']} implantable devices")

Running MCP Servers via stdio

# Run OpenFDA MCP server (set FDA_API_KEY for higher rate limits)
export FDA_API_KEY=your_key
python -m ashmatics_tools.mcp_servers.openfda

# Run AccessGUDID MCP server (set UMLS_API_KEY for SNOMED lookups)
export UMLS_API_KEY=your_key
python -m ashmatics_tools.mcp_servers.accessgudid

For detailed usage examples, see FDA API Usage Guide.

Search/RAG Module

The search module provides RAG (Retrieval-Augmented Generation) strategies for building AI-powered search applications.

Key Features

  • RAG Strategies: SimpleRAG and MultiQueryRAG with streaming support
  • Context Window Management: Automatic fitting of sources to model context limits
  • MCP Tool Definitions: Generic tool schemas for agent integration
  • LLM Streaming: SSE and NDJSON streaming support across all LLM providers

Simple RAG Query

from ashmatics_tools.llm import create_llm_client, AzureOpenAIConfig
from ashmatics_tools.embedders import create_embedder
from ashmatics_tools.vector_stores import create_vector_store
from ashmatics_tools.search import create_search_strategy, RAGConfig

# Setup components
llm = create_llm_client("azure_openai", AzureOpenAIConfig(...))
embedder = create_embedder("azure")
vector_store = create_vector_store("cosmosdb", config)

# Create RAG strategy
rag = create_search_strategy(
    "simple_rag",
    llm=llm,
    vector_store=vector_store,
    embedder=embedder,
    config=RAGConfig(top_k=10, temperature=0.7)
)

# Query with answer generation
async with llm:
    result = await rag.query("What are ISO 42001 requirements?")
    print(result.answer)
    print(f"Sources: {len(result.sources)}")
    print(f"Tokens: {result.metrics.total_tokens}")

Multi-Query RAG (Query Expansion)

from ashmatics_tools.search import create_search_strategy
from ashmatics_tools.search.strategies import MultiQueryConfig

# Multi-query expands to multiple query variants for better coverage
config = MultiQueryConfig(
    top_k=10,
    num_query_variants=3,  # Generate 3 query variants
    rrf_k=60,              # RRF ranking parameter
)

rag = create_search_strategy(
    "multi_query_rag",
    llm=llm,
    vector_store=vector_store,
    embedder=embedder,
    config=config
)

async with llm:
    result = await rag.query("What is risk management in AI governance?")
    print(f"Expanded queries: {result.metadata.get('expanded_queries')}")
    print(result.answer)

Streaming RAG Responses

# Stream answer generation for real-time UI
async with llm:
    async for chunk in rag.stream_query("Explain AI governance controls"):
        if chunk.text:
            print(chunk.text, end="", flush=True)
        if chunk.is_final:
            print(f"\n\nSources: {len(chunk.sources)}")

Context Window Management

from ashmatics_tools.llm import ContextWindowManager, ModelContextLimits

# Create manager for GPT-4 Turbo
manager = ContextWindowManager(
    model_limits=ModelContextLimits.GPT4_TURBO(),
    reserved_output=2000
)

# Fit sources into available context
fitted_sources = manager.fit_sources(
    sources=search_results,
    query="What are ISO 42001 requirements?",
    system_prompt=system_prompt
)
print(f"Fitted {len(fitted_sources)} of {len(search_results)} sources")

MCP Tool Definitions

from ashmatics_tools.search.mcp_tools import (
    get_tool_definitions,
    export_tools_yaml,
    RAG_SEARCH_TOOL,
)

# Get all tool definitions for MCP server registration
tools = get_tool_definitions()
for tool in tools:
    print(f"{tool.name}: {tool.description}")

# Export as YAML for configuration
yaml_config = export_tools_yaml()

Document Enrichers

The enrichers module provides post-parsing content analysis for tables and extracted data.

Table Classification

from ashmatics_tools.enrichers import TableClassifier, TableCategory

# Initialize classifier
classifier = TableClassifier(provider="azure_openai")

# Classify tables from a parsed document
categories, tokens = await classifier.classify_tables(parsed_doc.tables)

for table, category in zip(parsed_doc.tables, categories):
    if category == TableCategory.PERFORMANCE_METRICS:
        # Extract metrics from performance tables
        pass
    elif category == TableCategory.COMPARISON:
        # Process comparison tables
        pass

Table Consolidation (Multi-page Tables)

from ashmatics_tools.enrichers import TableConsolidator

# Handle tables that span multiple PDF pages
consolidator = TableConsolidator(
    column_similarity_threshold=0.85,
    use_llm_validation=True
)

consolidated = await consolidator.consolidate_tables(
    parsed_doc.tables,
    parsed_doc.markdown
)

for table in consolidated:
    if table.merged_from:
        print(f"{table.table_id} merged from pages: {table.merged_from}")

Metrics Extraction

from ashmatics_tools.enrichers import MetricsExtractor, DomainKnowledgeProvider

# With optional domain knowledge injection
extractor = MetricsExtractor(domain_knowledge=my_provider)
result = await extractor.extract_from_tables(
    tables=performance_tables,
    document_text=section_text
)

for metric in result.performance_metrics:
    print(f"{metric.metric_name}: {metric.value} [{metric.ci_lower}, {metric.ci_upper}]")

Document Storage

Storage managers for document processing artifacts with manifest generation.

Figure Storage

from ashmatics_tools.document_storage import FigureStorageManager

# Filter small images (logos, icons) and save significant figures
manager = FigureStorageManager(min_size=200)
processed = manager.process_figures(parsed_doc.figures, parsed_doc.markdown)
saved = manager.save_figures(processed, output_dir / 'figures', doc_id)
# Creates figures_manifest.json with metadata

Table Storage

from ashmatics_tools.document_storage import TableStorageManager

# Save tables in dual format (Markdown + JSON)
manager = TableStorageManager()
stored = manager.save_tables(consolidated_tables, output_dir / 'tables', doc_id)
# Creates tables_manifest.json with metadata

Features

Modern HTTP Client (httpx)

All HTTP communication uses httpx, providing:

  • Native Type Annotations: Full type safety without separate stub packages
  • Async/Await Support: Ready for async operations in performance-critical applications
  • HTTP/2 Support: Modern protocol support for improved performance
  • Backward Compatible: Synchronous API identical to requests for easy adoption

Secure by Default

  • SSL Verification Enabled: All HTTP requests verify SSL certificates by default
  • Explicit Opt-Out: SSL verification can only be disabled by explicitly passing verify=False to methods
  • Security Warnings: Disabling SSL verification triggers warning logs

Flexible Configuration

  • Environment Variables: Supports .env files for configuration
  • Configurable Endpoints: All API endpoints configurable via environment or parameters
  • Batch Processing: Configurable batch sizes for large dataset operations

MongoDB Integration (Optional)

  • Optional Dependency: MongoDB support is optional via [mongodb] extras
  • Abstract Base Classes: Extensible DocumentProcessor for custom document types
  • Upsert Operations: Intelligent upsert with identifier-based conflict resolution

Comprehensive Error Handling

  • Detailed Logging: Structured logging throughout all operations
  • Graceful Failures: Proper error handling with informative messages
  • Validation: Input validation and JSON compliance checking

Architecture

ashmatics-tools/
├── src/ashmatics_tools/
│   ├── __init__.py           # Public API exports
│   ├── chunkers/
│   │   ├── __init__.py
│   │   ├── azure_chunker.py
│   │   ├── base.py
│   │   ├── docling_chunker.py
│   │   └── simple_chunker.py
│   ├── document_storage/
│   │   ├── __init__.py
│   │   ├── figure_storage.py
│   │   └── table_storage.py
│   ├── embedders/
│   │   ├── __init__.py
│   │   ├── azure_embedder.py
│   │   ├── base.py
│   │   ├── factory.py
│   │   └── openai_embedder.py
│   ├── embedding/
│   │   ├── __init__.py
│   │   ├── base.py
│   │   ├── mongodb_pipeline.py
│   │   └── specialized/
│   ├── enrichers/
│   │   ├── __init__.py
│   │   ├── table_classifier.py
│   │   ├── table_consolidator.py
│   │   ├── metrics_extractor.py
│   │   └── training_data_extractor.py
│   ├── graphql/
│   │   ├── __init__.py
│   │   └── client.py
│   ├── llm/
│   │   ├── __init__.py
│   │   ├── azure_openai.py
│   │   ├── base.py
│   │   ├── factory.py
│   │   ├── huggingface.py
│   │   ├── openai.py
│   │   └── plugin_registry.py
│   ├── external_apis/
│   │   ├── __init__.py
│   │   ├── base.py
│   │   ├── factory.py
│   │   ├── openfda/
│   │   │   ├── __init__.py
│   │   │   ├── client.py
│   │   │   └── config.py
│   │   └── accessgudid/
│   │       ├── __init__.py
│   │       ├── client.py
│   │       └── config.py
│   ├── mcp_servers/
│   │   ├── __init__.py
│   │   ├── base.py
│   │   ├── factory.py
│   │   ├── openfda/
│   │   │   ├── __init__.py
│   │   │   ├── config.py
│   │   │   └── server.py
│   │   └── accessgudid/
│   │       ├── __init__.py
│   │       ├── config.py
│   │       └── server.py
│   ├── ontology/
│   │   ├── __init__.py
│   │   ├── categories/
│   │   ├── core/
│   │   ├── data/
│   │   └── terms/
│   ├── parsers/
│   │   ├── __init__.py
│   │   ├── base.py
│   │   ├── docling_parser.py
│   │   ├── factory.py
│   │   ├── llama_parser.py
│   │   └── simple_parser.py
│   ├── processors/
│   │   ├── __init__.py
│   │   └── base.py
│   ├── storage/
│   │   ├── __init__.py
│   │   ├── base.py
│   │   ├── factory.py
│   │   ├── adls_store.py
│   │   └── minio_store.py
│   ├── utils/
│   │   ├── __init__.py
│   │   ├── export_utils.py
│   │   ├── import_utils.py
│   │   └── schema_utils.py
│   └── vector_stores/
│       ├── __init__.py
│       ├── base.py
│       ├── cosmosdb_store.py
│       ├── factory.py
│       ├── pgvector_store.py
│       └── qdrant_store.py
├── tests/                   # Test suite
├── pyproject.toml          # Package configuration
└── README.md

Dependencies

Core Dependencies

  • pandas>=2.1.0 - Data manipulation and Excel file reading
  • openpyxl>=3.1.2 - Excel file format support
  • httpx>=0.27.0 - Modern HTTP client with native type annotations and async support
  • python-dotenv>=1.0.0 - Environment variable management
  • pyyaml>=6.0.0 - YAML parsing
  • numpy>=1.24.0 - Numerical operations

Optional Dependencies

  • pymongo>=4.0.0 - MongoDB integration (install with [mongodb] extra)

Development Dependencies

  • pytest>=7.4.0 - Testing framework
  • pytest-cov>=4.1.0 - Code coverage
  • ruff>=0.1.0 - Linting and formatting
  • mypy>=1.5.0 - Static type checking
  • pandas-stubs>=2.1.0 - Type stubs for pandas

Development

Running Tests

# Run all tests
pytest

# Run with coverage
pytest --cov=ashmatics_tools --cov-report=html

# Run specific test file
pytest tests/test_import_utils.py

Code Quality

# Lint with ruff
ruff check src/

# Format with ruff
ruff format src/

# Type check with mypy
mypy src/

Environment Variables

# GraphQL/Hasura Configuration
HASURA_GRAPHQL_ENDPOINT=https://kb-api.ashmatics.com/v1/graphql
HASURA_ADMIN_SECRET=your-admin-secret-here

# MongoDB Configuration (optional)
MONGODB_CONNECTION_STRING=mongodb://localhost:27017
MONGODB_DATABASE=ashmatics_kb

Python Version Support

  • Minimum: Python 3.11
  • Tested: Python 3.11, 3.12
  • Recommended: Python 3.12+

License

MIT License - See LICENSE file for details

Contributing

This is a private package for Ashmatics internal use. For questions or issues, please contact the development team.

Version History

0.7.0 (2026-01-19) - ASHCAI Clinical AI Governance Ontology (ASHTOOLS-10)

  • ASHCAI Ontology Module: Complete Clinical AI Governance ontology for the CAI Framework
    • AshcaiOntology: Manager class with CRUD operations, relationship management, and traversal helpers
    • ashcai-ontology-v1.0.json: Ontology definition with 22 semantic types (T92xx range) and 44 relationship types
    • ashcai_schemas.py: Comprehensive Pydantic schemas with natural business ID validation
  • Entity Types: PolicyDomain, ProcessDomain, BasePractice, SOPTemplate, WorkProductTemplate, ExemplarControl, RegulatoryFramework, RegulatoryRequirement
  • Natural Business IDs: Human-readable identifiers with regex validation
    • PolicyDomain: MMP-001, TPP-001, AGP-001
    • ProcessDomain: MON, RM, SA, OVR, PV
    • BasePractice: MON.BP01, RM.BP03
    • SOPTemplate: SOP-MON-01, SOP-RM-02
    • WorkProductTemplate: WP-MON-02-Dashboard
    • ExemplarControl: EXC-A7-04, EXC-A10-02
    • RegulatoryRequirement: NIST-MAP-4.2, JC-RUAIH-3
  • Type Discrimination: All ASHCAI documents include ontology: "ashcai" field for filtering
  • 44 Relationship Types: From TDD-001 specification including specifies, containsBasePractice, realizedBy, produces, servesAsEvidenceFor, addressedBy, implementedThrough, operationalizedIn
  • Traversal Helpers: Pre-built queries for common governance patterns
    • get_policy_hierarchy(): Full implementation path from policy to work products
    • get_evidence_chain(): Evidence trail for controls
    • get_regulatory_crosswalk(): Map requirements to framework elements
    • find_control_implementations(): SOPs implementing a control
  • OWL/RDF Export: Generate standard URIs from natural business IDs (http://asherinformatics.com/ontology/ashcai/{id})
  • MongoDB Collections: Separate collections per entity type with indexes for efficient querying
  • Exports: from ashmatics_tools.ontology import AshcaiOntology

0.6.2 (2026-01-07) - Retry Utilities with Exponential Backoff

  • Retry Module: New ashmatics_tools.llm.retry module for robust LLM API call handling
    • RetryConfig: Configurable dataclass for retry behavior (max_attempts, initial_delay, max_delay, exponential_base, jitter, request_delay)
    • calculate_backoff_delay(): Pure function for exponential backoff with jitter calculation
    • call_with_backoff(): Async wrapper that handles LLMRateLimitError with configurable retries
    • call_with_backoff_and_fallback(): Convenience wrapper with fallback function support
  • Presets: RetryConfig.aggressive() for batch processing, RetryConfig.conservative() for interactive use
  • API Hint Support: Respects retry_after hints from rate limit responses (Azure OpenAI, etc.)
  • Thundering Herd Prevention: Random jitter (default 30%) prevents synchronized retry storms
  • Exports: All retry utilities exported from ashmatics_tools.llm:
    from ashmatics_tools.llm import RetryConfig, call_with_backoff
    

0.6.1 (2025-12-30) - Ollama SDK Integration & LLM Enhancements

  • Ollama Python SDK Integration: Complete rewrite of OllamaClient using official ollama Python SDK
    • Embeddings: generate_embedding(), generate_embeddings() with batch support and dimension control
    • Vision: complete_with_vision() for image understanding (llava, llama3.2-vision models)
    • Tool Calling: complete_with_tools() for function calling/agentic workflows
    • Model Management: list_models(), pull_model(), show_model(), delete_model(), copy_model(), list_running_models()
    • Keep-Alive Control: Memory management via keep_alive config (e.g., "5m", "1h", "-1")
    • Streaming: Native async generators via stream_complete_with_messages()
    • Health Check: check_health() for server status verification
    • Helper Classes: OllamaTool, OllamaToolParameter, OllamaToolCall for tool definitions
  • New Optional Dependency: [ollama] extra (pip install "ashmatics-tools[ollama]")
  • Azure OpenAI Improvements: Enhanced AzureOpenAIClient with simple completion support
  • Example Notebook: Added examples/rag_and_llm_demo.ipynb demonstrating RAG pipelines and LLM client usage
  • Integration Tests: Comprehensive Ollama test suite (tests/integration/test_ollama_integration.py)
    • 24 tests covering chat, streaming, embeddings, vision, tools, model management
    • Performance and concurrent request testing
    • Error handling validation

0.6.0 (2025-12-28) - AI Search & RAG Module (ASHTOOLS-5)

  • Search Module: Complete RAG (Retrieval-Augmented Generation) framework
    • SimpleRAGStrategy: Basic RAG flow with embed → retrieve → generate
    • MultiQueryRAGStrategy: Query expansion with parallel retrieval and RRF ranking
    • RAGConfig, RAGResult, RAGMetrics, RAGStreamChunk dataclasses
    • Factory pattern: create_search_strategy("simple_rag", ...) with plugin registry
  • LLM Streaming: SSE and NDJSON streaming support for all LLM providers
    • StreamChunk dataclass for streaming responses
    • stream_complete() and stream_complete_with_messages() methods
    • Fallback to non-streaming for providers without native support
  • Context Window Management: Automatic context fitting for LLM requests
    • ContextWindowManager: Fit sources into available context with token estimation
    • ModelContextLimits: Presets for GPT-4, GPT-4o, Claude Sonnet/Opus/Haiku, Llama, Mistral
    • ModelFamily enum for tokenizer selection
  • MCP Tool Definitions: Generic tool schemas for agent integration
    • RAG_SEARCH_TOOL: RAG-enhanced search with answer generation
    • SEMANTIC_SEARCH_TOOL: Semantic similarity search without generation
    • MULTI_QUERY_SEARCH_TOOL: Multi-query RAG with query expansion
    • Export as JSON or YAML for MCP server registration
  • SearchResult ADR-045 Fields: Governance metadata for RAG sources
    • domain, control_refs, token_refs, document_type, source_uri
  • New Optional Dependencies: [search], [reranking], [rag] extras
  • 52 Tests: Comprehensive test coverage for all new functionality

0.5.3 (2025-12-25) - Modular Dependencies and Lazy Loading (ASHTOOLS-7)

  • Modular Optional Dependencies: Restructured pyproject.toml to minimize install size
    • Core install is now lightweight (~100MB) - NO torch/CUDA by default
    • Heavy dependencies (docling, transformers, qdrant) moved to optional extras
    • New extras: [api], [parsers], [chunkers], [docproc], [ml], [full]
    • [api] extra for API apps like Ashmatics-Knowledgebase (~150MB vs ~4GB)
  • Lazy Loading: Heavy modules loaded on-demand via __getattr__
    • DoclingParser, DoclingChunker only load torch when accessed
    • HuggingFace LLM providers registered lazily in factory
  • Graceful Import Handling: Clear error messages when optional deps missing
    • TYPE_CHECKING used for heavy type hints (docling_core)
    • Runtime checks with installation instructions
  • ADR Documentation: Added docs/ADRs/ADR-LazyLoadingBigDependencies-ASHTOOLS-7-2025-12-25.md
  • Container Size Reduction: Enables ~3GB+ savings for API-only applications

0.5.2 (2025-12-06) - llama.cpp Client Support

  • LlamaCppClient: Local LLM inference via llama.cpp server
    • Metal acceleration on M1/M2 Macs for fast local inference
    • OpenAI-compatible API (/v1/chat/completions endpoint)
    • VPS and on-premises deployment support
    • Server health and properties endpoints
    • llama.cpp-specific parameters (top_k, repeat_penalty, n_gpu_layers)
    • Embedding support: generate_embedding(), generate_embeddings(), get_embedding_dimension()
    • Always $0.00 cost (self-hosted)
  • LlamaCppConfig: Configuration dataclass with endpoint, model, timeout, SSL, context_size, n_gpu_layers
  • Factory Integration: create_llm_client("llamacpp", config) via plugin registry
  • Unified Interface: Same API as Azure OpenAI, OpenAI, Ollama, HuggingFace
  • Live Testing: Verified against local llama.cpp server (11/12 tests passing)
  • Note: Embeddings require llama-server to be started with --embeddings flag

0.5.1 (2025-12-03) - AccessGUDID API Integration

  • AccessGUDID API Client: NIH/FDA Global Unique Device Identification Database integration
    • Device lookup by DI, UDI, or record key
    • UDI parsing for GS1, HIBCC, ICCBBA formats
    • Device version history tracking
    • SNOMED CT code lookup (requires UMLS API key)
    • Implantable device listing with pagination
  • AccessGUDID MCP Server: 5 tools for LLM/agent access to GUDID data
  • Typed Models: Pydantic models for type-safe responses (GUDIDDevice, ParsedUDI, etc.)
  • Transform Functions: Raw API response to typed model conversion
  • Updated MCP Test Script: Now supports both OpenFDA and AccessGUDID servers

0.5.0 (2025-11-29) - Document Enrichers and Storage Managers

  • Enrichers Module: Post-parsing content analysis extracted from FDA 510(k) pipeline

    • TableClassifier: LLM-based table categorization (COMPARISON, PERFORMANCE_METRICS, STUDY_DESIGN, etc.)
    • TableConsolidator: Multi-page table merge with first-row matching, column similarity, and continuation detection
    • MetricsExtractor: Performance metrics extraction with confidence intervals and sample sizes
    • TrainingDataExtractor: AI/ML training dataset characteristics extraction
    • DomainKnowledgeProvider: Abstract base for domain-specific context injection
  • Document Storage Module: Artifact storage managers for document processing outputs

    • FigureStorageManager: Figure filtering, PNG conversion, content-addressed storage with manifests
    • TableStorageManager: Dual-format (Markdown + JSON) table storage with manifests
  • Reusable Across Pipelines: Supports FDA 510(k), research papers, clinical guidelines, pre-prints

  • Domain Extensibility: Domain-specific logic via provider injection without modifying base extractors

  • New Dependency: Added Pillow>=10.0.0 for figure processing

  • (NOTE - we had a screw up in the release and tags in the repo, so skipping version 0.4.0, even though 0.5.0 is really 0.4.0. Like the 13th floor...)

0.3.1 (2025-11-22) - FDA API Integration and MCP Servers

  • External APIs Module: Extensible framework for external data source integration
  • OpenFDA Client: Complete US FDA Open Data Portal integration
    • Support for all major endpoints: device (510k, events, recalls), drugs (labels, adverse events), food
    • Automatic retry with exponential backoff for transient errors
    • Client-side rate limiting with token bucket algorithm (respects FDA API limits)
    • Automatic pagination for large result sets
    • Query syntax support: field search, date ranges, boolean operators, wildcards
    • Count/analytics queries for aggregated data
  • MCP Servers Module: Model Context Protocol servers for LLM integration
    • BaseMCPServer abstract base for creating MCP tool servers
    • OpenFDAMCPServer exposing FDA API as LLM tools (search_devices, search_drugs, count_by_field)
    • JSON Schema-based input validation
    • Response formatting and error handling for LLM consumption
  • Factory Pattern: Plugin registry for custom API providers and MCP servers
  • Async-First: All operations use async/await for high-performance pipelines
  • Comprehensive Documentation:
    • Updated CLAUDE.md with 380+ lines of usage examples
    • New FDA_API_USAGE_GUIDE.md with complete reference (650+ lines)
    • Query syntax guide, field references, practical examples
  • Testing: Full test coverage with respx-based mocking for httpx
  • Future-Ready: Architecture supports Census Bureau, CMS, and other data sources

0.3.0 (2025-11-21) - Ontology and Term Services Integration

  • Ontology Module: Complete medical ontology management system
  • Term Resolution: MongoDB-based term lookup and management with TermResolver
  • Category Management: Hierarchical category structures with CategoryManager
  • External Ontology Integration: BioPortal API client for validating terms against SNOMED CT, RADLEX, LOINC, NCIT
  • Custom ASHMATICS Ontology: Domain-specific ontology manager for medical imaging AI concepts
  • Schema Definitions: Comprehensive Pydantic schemas for terms, categories, and ontology operations
  • Async API: Full async support for all ontology operations
  • Integration Ready: Seamless integration with existing document processing and vector search pipelines

0.2.0 (2025-11-12) - Major Module Migration

  • Complete Migration: Migrated ALL common modules from ashmatics-kb-tools
  • Parsers Module: SimpleParser, DoclingParser, LlamaParser with factory function
  • Chunkers Module: SimpleChunker, AzureChunker, DoclingChunker with factory function
  • Embedders Module: AzureEmbedder, OpenAIEmbedder with factory function
  • Embedding Pipelines: MongoDBEmbeddingPipeline + specialized pipelines for framework, use cases, and cards
  • Vector Stores: CosmosDB, PostgreSQL pgvector, Qdrant implementations with factory function
  • 52 Public Exports: Complete document processing, embedding, and vector search workflow
  • Code Quality: All files standardized with copyright headers, ruff-compliant, modernized type hints
  • Production Ready: 100% migration complete, all tests passing

0.1.0 (2025-01-12)

  • Initial release
  • Extracted from ashmatics-kb-tools repository
  • Core utilities: KBImporter, DataExporter, schema tools
  • Abstract DocumentProcessor base class
  • Generic GraphQL client utilities
  • Migrated to httpx: Modern HTTP client with native type annotations and async support
  • SSL verification enabled by default
  • Full mypy type safety (zero type errors)
  • Python 3.11+ support

Complete Document Processing Pipeline

The package now provides a complete end-to-end pipeline for document processing:

from ashmatics_tools import (
    create_parser,        # Parse documents (PDF, DOCX, etc.)
    create_chunker,       # Chunk into manageable pieces
    create_embedder,      # Generate embeddings
    create_vector_store   # Store and search vectors
)

# 1. Parse document
parser = create_parser("docling")
parsed_doc = await parser.parse_file("document.pdf")

# 2. Chunk document
chunker = create_chunker(strategy="docling")
chunks = await chunker.chunk_document(
    content=parsed_doc.markdown,
    title="Document Title",
    source="document.pdf"
)

# 3. Generate embeddings
embedder = create_embedder(provider="azure")
await embedder.initialize()
embedded_chunks = await embedder.embed_chunks(chunks)

# 4. Store in vector database
vector_store = create_vector_store(provider="cosmosdb")
success, failed = await vector_store.store_embeddings_batch(embedded_chunks)

# 5. Search
query_embedding = await embedder.generate_embedding("search query")
results = await vector_store.similarity_search(query_embedding, top_k=10)

Module Overview (v0.3.0)

Parsers (ashmatics_tools.parsers)

Document parsing with multiple backends:

  • SimpleParser: Basic fallback parser
  • DoclingParser: Advanced PDF parsing with tables/figures
  • LlamaParser: LlamaParse cloud service integration
  • Factory: create_parser(provider)

Chunkers (ashmatics_tools.chunkers)

Document chunking strategies:

  • SimpleChunker: Paragraph-based chunking
  • AzureChunker: Azure-compatible with tiktoken
  • DoclingChunker: Token-aware semantic chunking
  • Factory: create_chunker(strategy)

Embedders (ashmatics_tools.embedders)

Embedding generation:

  • AzureEmbedder: Azure OpenAI embeddings
  • OpenAIEmbedder: OpenAI embeddings
  • Factory: create_embedder(provider)

Embedding Pipelines (ashmatics_tools.embedding)

MongoDB-based embedding workflows:

  • MongoDBEmbeddingPipeline: Generic pipeline
  • Specialized Pipelines: Framework, use cases, cards

Vector Stores (ashmatics_tools.vector_stores)

Vector database integrations:

  • CosmosDBVectorStore: Azure CosmosDB with MongoDB vCore API
  • PgVectorStore: PostgreSQL with pgvector extension
  • QdrantVectorStore: Qdrant vector database
  • Factory: create_vector_store(provider)

Storage Backends (ashmatics_tools.storage)

Cloud-agnostic storage abstraction:

  • ADLSStorageClient: Azure Data Lake Storage Gen2 with dual auth (connection string or DefaultAzureCredential)
  • MinIOStorageClient: MinIO object storage (S3-compatible)
  • S3StorageClient: AWS S3 (reserved for future implementation)
  • Factory: create_storage_client(provider, config)
  • Features: Async API, buffered and streaming reads/writes, glob pattern matching, metadata operations

LLM Clients (ashmatics_tools.llm)

Unified interface for language model providers:

  • AzureOpenAIClient: Azure OpenAI Service
  • OpenAIClient: OpenAI direct API
  • HuggingFaceInferenceClient: HuggingFace Inference API (requires [huggingface] extra)
  • HuggingFaceLocalClient: Local HuggingFace models (requires [huggingface] extra)
  • AzureAIFoundryClient: Full Azure AI Foundry model catalog (requires [azure-ai] extra)
  • OllamaClient: Local/ACA/K8s Ollama inference with SDK (requires [ollama] extra) - embeddings, vision, tools, model management
  • Factory: create_llm_client(provider, config) with plugin registry
  • Features: Async-first API, unified completion interface, cost tracking, plugin registry, extensible via register_llm_provider()

Ontology Services (ashmatics_tools.ontology)

Medical ontology management and term services:

  • TermResolver: MongoDB-based term lookup and resolution
  • CategoryManager: Hierarchical category management for document tagging
  • BioPortalClient: External ontology validation via NCBO BioPortal API (SNOMED CT, RADLEX, LOINC, NCIT)
  • AshmaticsOntology: Custom ASHMATICS domain-specific ontology for medical imaging AI concepts
  • Features: Async API, comprehensive schema validation, integration with external ontologies

External APIs (ashmatics_tools.external_apis)

Clients for external data sources with robust error handling:

  • OpenFDAClient: US FDA Open Data Portal (open.fda.gov) integration
  • AccessGUDIDClient: NIH/FDA Global Unique Device Identification Database (accessgudid.nlm.nih.gov) integration
  • BaseAPIClient: Abstract base for creating custom API clients
  • OpenFDA Endpoints: Device 510(k), adverse events, recalls, drug labels, FAERS, enforcement actions
  • AccessGUDID Endpoints: Device lookup, UDI parsing, device history, SNOMED mappings, implantable device listings
  • Factory: create_api_client(provider, config) with plugin registry
  • Features: Async API, retry with exponential backoff, client-side rate limiting, automatic pagination
  • Query Syntax: Support for field search, date ranges, boolean operators, wildcards (OpenFDA)
  • Extensibility: Register custom providers via register_api_provider() for Census, CMS, etc.

MCP Servers (ashmatics_tools.mcp_servers)

Model Context Protocol servers for LLM integration:

  • OpenFDAMCPServer: Expose OpenFDA API as LLM tools (search_devices, search_drugs, count_by_field)
  • AccessGUDIDMCPServer: Expose AccessGUDID API as LLM tools (lookup_device, parse_udi, get_device_history, get_device_snomed, list_implantable_devices)
  • BaseMCPServer: Abstract base for creating MCP tool servers
  • Factory: create_mcp_server(name, config) with plugin registry
  • Features: JSON Schema validation, response formatting, error handling, streaming support
  • Use Case: Thin adapter layer between LLMs and external data sources
  • Extensibility: Register custom servers via register_mcp_server()
  • Stdio Runner: Run servers via python -m ashmatics_tools.mcp_servers.{openfda,accessgudid}

Search/RAG (ashmatics_tools.search)

RAG (Retrieval-Augmented Generation) strategies for AI-powered search:

  • SimpleRAGStrategy: Basic RAG flow with embed → retrieve → generate
  • MultiQueryRAGStrategy: Query expansion with parallel retrieval and RRF ranking
  • RAGConfig: Configuration for top_k, temperature, max_tokens, system_prompt
  • RAGResult: Answer with sources, metrics, and metadata
  • RAGStreamChunk: Streaming response chunks with partial sources
  • Factory: create_search_strategy(name, llm, vector_store, embedder, config) with plugin registry
  • MCP Tools: Generic tool definitions (rag_search, semantic_search, multi_query_search)
  • Context Management: ContextWindowManager for automatic source fitting
  • Model Presets: ModelContextLimits for GPT-4, Claude, Llama, Mistral
  • Features: Async-first API, streaming support, ADR-045 governance metadata

Enrichers (ashmatics_tools.enrichers)

Post-parsing document enrichment for tables and extracted data:

  • TableClassifier: LLM-based table categorization by content type
  • TableConsolidator: Multi-page table merge with heuristics and LLM validation
  • MetricsExtractor: Performance metrics extraction with statistical context
  • TrainingDataExtractor: AI/ML training dataset characteristics extraction
  • DomainKnowledgeProvider: Abstract base for domain-specific context injection
  • Categories: COMPARISON, PERFORMANCE_METRICS, STUDY_DESIGN, TECHNICAL_SPECS, DEMOGRAPHICS, etc.
  • Features: Handles PDF parser fragmentation, continuation markers, column similarity matching

Document Storage (ashmatics_tools.document_storage)

Artifact storage managers for document processing outputs:

  • FigureStorageManager: Figure filtering, PNG conversion, content-addressed storage
  • TableStorageManager: Dual-format (Markdown + JSON) table storage
  • ProcessedFigure: Dataclass for processed figures with metadata
  • StoredTable: Dataclass for stored tables with file paths
  • Features: Automatic manifest generation, content hashing, size filtering

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ashmatics_tools-0.7.2.tar.gz (323.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ashmatics_tools-0.7.2-py3-none-any.whl (326.8 kB view details)

Uploaded Python 3

File details

Details for the file ashmatics_tools-0.7.2.tar.gz.

File metadata

  • Download URL: ashmatics_tools-0.7.2.tar.gz
  • Upload date:
  • Size: 323.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ashmatics_tools-0.7.2.tar.gz
Algorithm Hash digest
SHA256 3b2ce5e38ffd77123c9e150ad2067223180ebc57eb985a8cab1aad62f1abf36d
MD5 0a8092ef2ebc73f3c4466418cd8173c7
BLAKE2b-256 6a1b241bee10551aa7a371dd836bc12dfae8d05cedf84001985b1184ca6061c8

See more details on using hashes here.

Provenance

The following attestation bundles were made for ashmatics_tools-0.7.2.tar.gz:

Publisher: publish.yml on AshMatics/ashmatics-tools

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ashmatics_tools-0.7.2-py3-none-any.whl.

File metadata

  • Download URL: ashmatics_tools-0.7.2-py3-none-any.whl
  • Upload date:
  • Size: 326.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ashmatics_tools-0.7.2-py3-none-any.whl
Algorithm Hash digest
SHA256 4eab26c7f86d9a86673e12ec6410ebe1c40394e1b1eec67817dc06a9f52a5695
MD5 b33d09ec2301ee8c4128a4ca29dc9cc1
BLAKE2b-256 8eabc9292c7e48ae72bb474e52df16135e204b8dba256ccb99ae798c8df6666e

See more details on using hashes here.

Provenance

The following attestation bundles were made for ashmatics_tools-0.7.2-py3-none-any.whl:

Publisher: publish.yml on AshMatics/ashmatics-tools

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page