Shared utilities and base classes for Ashmatics Knowledge Base applications

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

JFK-Ashmatics

These details have not been verified by PyPI

Project description

Ashmatics Tools

Last updated: 2026-01-25

Version 0.7.1

v0.7.1 (2026-01-25)

Standardized MongoDB/CosmosDB environment variables with fallback chains
MONGO_URL is now canonical (with AZ_MONGO_CONNECTION_STRING, COSMOS_VECTOR_CONNECTION_STRING as fallbacks)
MONGO_DB is now canonical (with MONGO_DATABASE, COSMOS_VECTOR_DATABASE as fallbacks)
Updated ENV_VARIABLES.md documentation

A Python package providing shared utilities, base classes, and common functionality for Ashmatics Knowledge Base applications.

Overview

ashmatics-tools is a foundational library that centralizes reusable components across Ashmatics healthcare AI applications. It provides:

Data Import/Export Utilities: Excel data loading, GraphQL integration, batch processing with Hasura
Document Processors: Abstract base classes for MongoDB document processing
GraphQL Clients: Generic GraphQL query/mutation builders and client utilities
Schema Management: GraphQL schema introspection and analysis tools
Document Parsers: Advanced parsing for PDFs, DOCX, and other formats
Document Chunkers: Token-aware and semantic chunking strategies
Embedders: Generate embeddings using Azure OpenAI or OpenAI APIs
Vector Stores: Integration with CosmosDB, PostgreSQL, and Qdrant for vector search
Storage Backends: Cloud-agnostic storage abstraction for ADLS Gen2, MinIO, and AWS S3
LLM Clients: Unified interface for Azure OpenAI, OpenAI, HuggingFace, and custom providers
Ontology Services: Medical ontology management including SNOMED CT, RADLEX, LOINC, and custom Ashmatics ontologies
Term Services: Term resolution, hierarchical category management, and external ontology validation
External APIs: Clients for external data sources (FDA, Census, CMS) with retry, rate limiting, and pagination
MCP Servers: Model Context Protocol servers exposing APIs to LLMs with tool-based interfaces
Search/RAG: Retrieval-Augmented Generation strategies with streaming, context window management, and MCP tool definitions
Document Enrichers: Table classification, consolidation, and metrics extraction for parsed documents
Document Storage: Figure and table storage managers with content-addressed hashing and manifests

Installation

From Git Repository (Private)

# Using pip
pip install git+https://github.com/JFK-Ashmatics/ashmatics-tools.git

# Using uv
uv add git+https://github.com/JFK-Ashmatics/ashmatics-tools.git

# With optional dependencies
pip install "ashmatics-tools[mongodb,storage] @ git+https://github.com/JFK-Ashmatics/ashmatics-tools.git"

From Local Development

# Clone the repository
git clone https://github.com/JFK-Ashmatics/ashmatics-tools.git
cd ashmatics-tools

# Install in editable mode with dev dependencies
pip install -e ".[dev,mongodb,storage]"

Configuration

Environment Variables

ashmatics-tools requires various environment variables depending on which components you use. This library does not load .env files automatically - your application must handle environment variable loading.

See ENV_VARIABLES.md for:

Complete list of required environment variables by component
Example application setups (development with .env, production with Key Vault)
Environment-specific configurations

Quick example:

from dotenv import load_dotenv

# Load .env BEFORE importing ashmatics_tools
load_dotenv()

# Now use the library
from ashmatics_tools.embedders import create_embedder
embedder = create_embedder(provider="azure")

Usage

Knowledge Base Importer

from ashmatics_tools.utils.import_utils import KBImporter

# Initialize the importer
importer = KBImporter(
    graphql_endpoint="https://kb-api.ashmatics.com/v1/graphql",
    admin_secret="your-admin-secret",
    batch_size=100
)

# Load data from Excel
df = importer.load_excel_data("data.xlsx", sheet_name="Sheet1")

# Import to Knowledge Base via GraphQL
result = importer.import_to_kb(
    df=df,
    table_name="my_table",
    column_mapping={"excel_col": "db_col"}
)

Document Processor (MongoDB)

from ashmatics_tools.processors.base import DocumentProcessor
from pymongo import MongoClient

class MyDocumentProcessor(DocumentProcessor):
    def extract_metadata(self, document: dict) -> dict:
        return {"title": document.get("title"), "author": document.get("author")}

    def clean_text(self, text: str) -> str:
        return text.strip().lower()

    def get_identifier_key(self) -> str:
        return "document_id"

    def get_document_type(self) -> str:
        return "my_document_type"

    def process_document(self, file_path: str) -> dict:
        # Your document processing logic
        return {"document_id": "123", "content": "..."}

# Use the processor
client = MongoClient("mongodb://localhost:27017")
processor = MyDocumentProcessor(client, "my_database", "my_collection")
result = processor.upsert_document({"document_id": "123", "content": "..."})

Document Chunking

from ashmatics_tools.chunkers.factory import create_chunker

# Initialize chunker
chunker = create_chunker(strategy="docling")

# Chunk document
chunks = chunker.chunk_document(
    content="This is a sample document content.",
    title="Sample Document",
    source="document.pdf"
)

Embedding Generation

from ashmatics_tools.embedders.factory import create_embedder

# Initialize embedder
embedder = create_embedder(provider="azure")
embedder.initialize()

# Generate embeddings
embeddings = embedder.embed_chunks(["chunk1", "chunk2"])

Vector Store Integration

from ashmatics_tools.vector_stores.factory import create_vector_store

# Initialize vector store
vector_store = create_vector_store(provider="cosmosdb")

# Store embeddings
success, failed = vector_store.store_embeddings_batch(embeddings)

# Perform similarity search
results = vector_store.similarity_search(query_embedding, top_k=10)

Storage Backend Integration

from ashmatics_tools.storage import create_storage_client, StorageConfig, AuthType

# Initialize ADLS storage with DefaultAzureCredential (production)
config = StorageConfig(
    provider="adls",
    account_url="https://mystorageaccount.dfs.core.windows.net",
    container_name="my-container",
    auth_type=AuthType.DEFAULT_CREDENTIAL
)
storage = create_storage_client("adls", config)

# Or use connection string (development)
config = StorageConfig(
    provider="adls",
    connection_string="DefaultEndpointsProtocol=https;AccountName=...",
    container_name="my-container",
    auth_type=AuthType.CONNECTION_STRING
)
storage = create_storage_client("adls", config)

# Initialize MinIO storage
config = StorageConfig(
    provider="minio",
    endpoint="minio.example.com:9000",
    access_key="minioadmin",
    secret_key="minioadmin",
    container_name="my-bucket",
    auth_type=AuthType.ACCESS_KEY,
    use_ssl=False
)
storage = create_storage_client("minio", config)

# Use async context manager
async with storage:
    # Write object
    await storage.write_object("path/file.txt", b"Hello, World!")

    # Read object
    content = await storage.read_object("path/file.txt")

    # List objects
    objects = await storage.list_objects(prefix="path/", pattern="*.txt")

    # Stream large files
    async for chunk in storage.read_object_stream("large-file.bin"):
        process(chunk)

    # Check if exists
    exists = await storage.exists("path/file.txt")

    # Get metadata
    metadata = await storage.get_metadata("path/file.txt")

    # Copy object
    await storage.copy_object("src/file.txt", "dest/file.txt")

    # Delete object
    await storage.delete_object("path/file.txt")

LLM Module Usage

The LLM module provides a unified interface for working with various language model providers. Supports Azure OpenAI, OpenAI, HuggingFace, and custom providers via plugin registry.

Key Features

Async-first API: All operations are async for high-performance pipelines
Unified interface: Same API across all providers
Cost tracking: Automatic token counting and cost estimation
Plugin registry: Extensible with custom providers
Optional dependencies: HuggingFace support via [huggingface] extra

Azure OpenAI (Primary)

from ashmatics_tools.llm import create_llm_client, AzureOpenAIConfig

config = AzureOpenAIConfig(
    endpoint="https://my-resource.openai.azure.com/",
    api_key=os.getenv("AZURE_OPENAI_API_KEY"),
    deployment_name="gpt-4"
)

async with create_llm_client("azure_openai", config) as llm:
    response = await llm.complete(
        prompt="What is asthma?",
        temperature=0.7,
        max_tokens=500
    )
    print(response.text)
    print(f"Cost: ${response.tokens.estimated_cost:.4f}")

Ontology and Term Services

The ontology module provides comprehensive medical ontology management including term resolution, hierarchical category management, and integration with external ontologies.

Key Features

Term Resolution: MongoDB-based term lookup and management
Category Management: Hierarchical category structures for document tagging
External Ontology Integration: BioPortal API for validating terms against SNOMED CT, RADLEX, LOINC, NCIT
Custom Ontology: ASHMATICS domain-specific ontology for medical imaging AI concepts

Term Resolution

from ashmatics_tools.ontology import TermResolver
from pymongo import MongoClient

# Initialize term resolver
client = MongoClient("mongodb://localhost:27017")
term_resolver = TermResolver(mongodb_client=client)

# Resolve term
result = await term_resolver.resolve_term("breast cancer")
print(f"Resolved: {result.prefLabel} - {result.definition}")

Category Management

from ashmatics_tools.ontology import CategoryManager

# Initialize category manager
category_manager = CategoryManager(
    mongodb_database=client["ashmatics_kb"],
    term_resolver=term_resolver
)

# Create hierarchical category
category = await category_manager.create_category(
    name="Medical Imaging",
    parent_id=None,
    description="Top-level category for medical imaging"
)

# Add subcategory
subcategory = await category_manager.create_category(
    name="Breast Imaging",
    parent_id=category.id,
    description="Breast imaging techniques and AI models"
)

External Ontology Validation

from ashmatics_tools.ontology import BioPortalClient

# Initialize BioPortal client
bioportal = BioPortalClient(api_key="your-bioportal-api-key")

# Check term in external ontologies
exists, ontologies = await bioportal.check_term_in_ontology("Breast Cancer")
print(f"Term exists: {exists}")
print(f"Found in ontologies: {ontologies}")

Custom ASHMATICS Ontology

from ashmatics_tools.ontology import AshmaticsOntology

# Initialize custom ontology manager
ashmatics_ontology = AshmaticsOntology(mongodb_database=client["ashmatics_kb"])

# Create concept
concept = await ashmatics_ontology.create_concept(
    prefLabel="AI Breast Cancer Detector",
    definition="AI model for detecting breast cancer in medical images",
    synonyms=["Breast Cancer AI", "Mammography AI"]
)

# Add relationship
await ashmatics_ontology.add_relationship(
    source_id=concept.id,
    target_id=another_concept.id,
    relationship_type="related_to"
)

ASHCAI Clinical AI Governance Ontology

The ASHCAI (AshMatics Clinical AI Governance) ontology provides governance concepts for the CAI Framework, including policies, processes, controls, and regulatory crosswalks.

from ashmatics_tools.ontology import AshcaiOntology

# Initialize ASHCAI ontology manager
ashcai = AshcaiOntology(mongodb_database=client["ashmatics_kb"])

# Initialize collections and indexes
await ashcai.initialize_ontology()

# Create a policy domain with natural business ID
policy = await ashcai.create_policy_domain(
    domain_id="MMP-001",
    domain_code="MMP",
    label="Model Monitoring Policy",
    description="Policy governing AI model monitoring requirements",
    specifies=["MON"]  # Links to process domains
)

# Create a process domain
process = await ashcai.create_process_domain(
    domain_id="MON",
    label="Model Monitoring",
    primary_function="Continuous monitoring of AI model performance",
    integrates_with=["RM", "SA", "OVR"]
)

# Create base practice
practice = await ashcai.create_base_practice(
    practice_id="MON.BP01",
    label="Rollout and Change Management",
    process_domain="MON",
    sequence_order=1
)

# Create SOP template
sop = await ashcai.create_sop_template(
    sop_id="SOP-MON-01",
    label="Model Deployment SOP",
    purpose="Standard procedure for deploying AI models",
    base_practice="MON.BP01"
)

# Create work product template
wp = await ashcai.create_work_product_template(
    wp_id="WP-MON-02-Dashboard",
    label="Monitoring Dashboard",
    evidence_type="Dashboard",
    produced_by="SOP-MON-02",
    serves_as_evidence_for=["EXC-A7-04"]
)

# Create exemplar control
control = await ashcai.create_exemplar_control(
    control_id="EXC-A7-04",
    label="Performance Monitoring Control",
    iso_control="A.7.5",
    evidenced_by=["WP-MON-02-Dashboard"]
)

# Create regulatory requirement with crosswalk
requirement = await ashcai.create_regulatory_requirement(
    requirement_id="NIST-MAP-4.2",
    label="MAP 4.2",
    framework_id="NIST-AI-RMF",
    function="MAP",
    category="MAP-4",
    description="Internal risk controls for third-party AI resources",
    crosswalk={
        "addressedBy": ["TPP-001", "MMP-001"],
        "implementedThrough": ["EXC-A10-02", "EXC-A6-02"],
        "operationalizedIn": ["MON", "PV"],
        "evidencedBy": ["WP-MON-01", "WP-PV-03"]
    }
)

# Create relationships
await ashcai.link_policy_to_process("MMP-001", "MON")
await ashcai.link_process_to_practice("MON", "MON.BP01")
await ashcai.link_practice_to_sop("MON.BP01", "SOP-MON-01")
await ashcai.link_sop_to_workproduct("SOP-MON-01", "WP-MON-02-Dashboard")
await ashcai.link_workproduct_to_control("WP-MON-02-Dashboard", "EXC-A7-04")

# Traversal helpers
hierarchy = await ashcai.get_policy_hierarchy("MMP-001")
# Returns: policy, processes (with practices, SOPs, work products), controls

evidence_chain = await ashcai.get_evidence_chain("EXC-A7-04")
# Returns: control with all work products that evidence it

crosswalk = await ashcai.get_regulatory_crosswalk("NIST-MAP-4.2")
# Returns: requirement with all policies, controls, processes, evidence

# OWL/RDF export
uri = ashcai.generate_uri("MMP-001")
# Returns: http://asherinformatics.com/ontology/ashcai/MMP-001

Key Features:

Natural Business IDs: Human-readable identifiers (MMP-001, MON, SOP-MON-01) with regex validation
Type Discrimination: All documents include ontology: "ashcai" for filtering
44 Relationship Types: Comprehensive relationships from TDD-001 specification
Traversal Helpers: Pre-built queries for policy hierarchies and regulatory crosswalks
OWL/RDF Export: Generate standard URIs from natural business IDs

External API Integration

The external_apis module provides clients for accessing external data sources with built-in retry logic, rate limiting, and pagination.

OpenFDA API Client

from ashmatics_tools.external_apis import create_api_client, OpenFDAConfig, OpenFDAEndpoint

# Create client with API key (recommended)
config = OpenFDAConfig(api_key="your_api_key")
async with create_api_client("openfda", config) as client:
    # Search device adverse events
    async for event in client.search(
        endpoint=OpenFDAEndpoint.DEVICE_EVENT,
        query="device_name:pacemaker AND date_received:[20230101 TO 20231231]",
        limit=100,
        max_records=1000
    ):
        device = event.get("device", [{}])[0]
        print(f"Device: {device.get('device_name')}")
        print(f"Event Date: {event.get('date_received')}")

    # Search 510(k) clearances
    async for clearance in client.search(
        endpoint=OpenFDAEndpoint.DEVICE_510K,
        query="product_code:OZP",
        limit=100
    ):
        print(f"K Number: {clearance.get('k_number')}")
        print(f"Applicant: {clearance.get('applicant')}")

    # Analytics - count by field
    counts = await client.count(
        endpoint=OpenFDAEndpoint.DEVICE_EVENT,
        query="date_received:[20230101 TO 20231231]",
        count_field="device.device_class.exact"
    )
    for item in counts:
        print(f"Class {item['term']}: {item['count']} events")

AccessGUDID API Client

from ashmatics_tools.external_apis import AccessGUDIDClient, AccessGUDIDConfig

# Create client (no API key required for basic operations)
config = AccessGUDIDConfig()
async with AccessGUDIDClient(config) as client:
    # Lookup device by Device Identifier (DI)
    device = await client.lookup_device(di="08717648200274")
    print(f"Brand: {device['gudid']['device']['brandName']}")
    print(f"Company: {device['gudid']['device']['companyName']}")

    # Parse a UDI string (GS1, HIBCC, or ICCBBA format)
    parsed = await client.parse_udi(
        udi="(01)00844588012919(17)141231(10)A213B1"
    )
    print(f"DI: {parsed['di']}")
    print(f"Issuing Agency: {parsed['issuingAgency']}")
    print(f"Expiration: {parsed['expirationDate']}")
    print(f"Lot Number: {parsed['lotNumber']}")

    # Get device version history
    history = await client.get_device_history(di="08717648200274")
    for version in history.get('deviceHistory', []):
        print(f"Version {version['publicVersionNumber']}: {version['publicVersionDate']}")

    # List implantable devices with date filtering
    async for device in client.list_implantable_devices(
        from_date="2024-01-01",
        max_records=100
    ):
        print(f"{device['brandName']} - {device['companyName']}")

# With UMLS API key for SNOMED lookups
config = AccessGUDIDConfig(umls_api_key="your_umls_key")
async with AccessGUDIDClient(config) as client:
    snomed = await client.get_device_snomed(di="08717648200274")
    for concept in snomed.get('concepts', []):
        print(f"{concept['snomedCTName']}: {concept['snomedIdentifier']}")

MCP Server Integration

The mcp_servers module provides Model Context Protocol servers that expose external APIs as tools for LLM consumption.

OpenFDA MCP Server

from ashmatics_tools.mcp_servers import create_mcp_server, OpenFDAMCPConfig
from ashmatics_tools.external_apis import OpenFDAConfig

# Create MCP server
config = OpenFDAMCPConfig(
    api_config=OpenFDAConfig(api_key="your_key")
)
server = create_mcp_server("openfda", config)

# Get available tools
tools = server.get_tools()
# Returns: search_devices, search_drugs, count_by_field

# Call a tool
result = await server.call_tool("search_devices", {
    "endpoint": "device_event",
    "query": "device_name:pacemaker",
    "limit": 10
})

print(f"Found {result['count']} results")
for item in result['results']:
    print(item)

AccessGUDID MCP Server

from ashmatics_tools.mcp_servers import create_mcp_server, AccessGUDIDMCPConfig
from ashmatics_tools.external_apis import AccessGUDIDConfig

# Create MCP server
config = AccessGUDIDMCPConfig(
    api_config=AccessGUDIDConfig()  # No API key required for basic operations
)
server = create_mcp_server("accessgudid", config)

# Get available tools
tools = server.get_tools()
# Returns: lookup_device, parse_udi, get_device_history, get_device_snomed, list_implantable_devices

# Lookup a device by DI
result = await server.call_tool("lookup_device", {"di": "08717648200274"})
print(f"Device: {result['summary']['brandName']}")

# Parse a UDI string
result = await server.call_tool("parse_udi", {
    "udi": "(01)00844588012919(17)141231(10)A213B1"
})
print(f"Parsed DI: {result['parsed']['di']}")

# List implantable devices
result = await server.call_tool("list_implantable_devices", {
    "from_date": "2024-01-01",
    "max_records": 50
})
print(f"Found {result['count']} implantable devices")

Running MCP Servers via stdio

# Run OpenFDA MCP server (set FDA_API_KEY for higher rate limits)
export FDA_API_KEY=your_key
python -m ashmatics_tools.mcp_servers.openfda

# Run AccessGUDID MCP server (set UMLS_API_KEY for SNOMED lookups)
export UMLS_API_KEY=your_key
python -m ashmatics_tools.mcp_servers.accessgudid

For detailed usage examples, see FDA API Usage Guide.

Search/RAG Module

The search module provides RAG (Retrieval-Augmented Generation) strategies for building AI-powered search applications.

Key Features

RAG Strategies: SimpleRAG and MultiQueryRAG with streaming support
Context Window Management: Automatic fitting of sources to model context limits
MCP Tool Definitions: Generic tool schemas for agent integration
LLM Streaming: SSE and NDJSON streaming support across all LLM providers

Simple RAG Query

from ashmatics_tools.llm import create_llm_client, AzureOpenAIConfig
from ashmatics_tools.embedders import create_embedder
from ashmatics_tools.vector_stores import create_vector_store
from ashmatics_tools.search import create_search_strategy, RAGConfig

# Setup components
llm = create_llm_client("azure_openai", AzureOpenAIConfig(...))
embedder = create_embedder("azure")
vector_store = create_vector_store("cosmosdb", config)

# Create RAG strategy
rag = create_search_strategy(
    "simple_rag",
    llm=llm,
    vector_store=vector_store,
    embedder=embedder,
    config=RAGConfig(top_k=10, temperature=0.7)
)

# Query with answer generation
async with llm:
    result = await rag.query("What are ISO 42001 requirements?")
    print(result.answer)
    print(f"Sources: {len(result.sources)}")
    print(f"Tokens: {result.metrics.total_tokens}")

Multi-Query RAG (Query Expansion)

from ashmatics_tools.search import create_search_strategy
from ashmatics_tools.search.strategies import MultiQueryConfig

# Multi-query expands to multiple query variants for better coverage
config = MultiQueryConfig(
    top_k=10,
    num_query_variants=3,  # Generate 3 query variants
    rrf_k=60,              # RRF ranking parameter
)

rag = create_search_strategy(
    "multi_query_rag",
    llm=llm,
    vector_store=vector_store,
    embedder=embedder,
    config=config
)

async with llm:
    result = await rag.query("What is risk management in AI governance?")
    print(f"Expanded queries: {result.metadata.get('expanded_queries')}")
    print(result.answer)

Streaming RAG Responses

# Stream answer generation for real-time UI
async with llm:
    async for chunk in rag.stream_query("Explain AI governance controls"):
        if chunk.text:
            print(chunk.text, end="", flush=True)
        if chunk.is_final:
            print(f"\n\nSources: {len(chunk.sources)}")

Context Window Management

from ashmatics_tools.llm import ContextWindowManager, ModelContextLimits

# Create manager for GPT-4 Turbo
manager = ContextWindowManager(
    model_limits=ModelContextLimits.GPT4_TURBO(),
    reserved_output=2000
)

# Fit sources into available context
fitted_sources = manager.fit_sources(
    sources=search_results,
    query="What are ISO 42001 requirements?",
    system_prompt=system_prompt
)
print(f"Fitted {len(fitted_sources)} of {len(search_results)} sources")

MCP Tool Definitions

from ashmatics_tools.search.mcp_tools import (
    get_tool_definitions,
    export_tools_yaml,
    RAG_SEARCH_TOOL,
)

# Get all tool definitions for MCP server registration
tools = get_tool_definitions()
for tool in tools:
    print(f"{tool.name}: {tool.description}")

# Export as YAML for configuration
yaml_config = export_tools_yaml()

Document Enrichers

The enrichers module provides post-parsing content analysis for tables and extracted data.

Table Classification

from ashmatics_tools.enrichers import TableClassifier, TableCategory

# Initialize classifier
classifier = TableClassifier(provider="azure_openai")

# Classify tables from a parsed document
categories, tokens = await classifier.classify_tables(parsed_doc.tables)

for table, category in zip(parsed_doc.tables, categories):
    if category == TableCategory.PERFORMANCE_METRICS:
        # Extract metrics from performance tables
        pass
    elif category == TableCategory.COMPARISON:
        # Process comparison tables
        pass

Table Consolidation (Multi-page Tables)

from ashmatics_tools.enrichers import TableConsolidator

# Handle tables that span multiple PDF pages
consolidator = TableConsolidator(
    column_similarity_threshold=0.85,
    use_llm_validation=True
)

consolidated = await consolidator.consolidate_tables(
    parsed_doc.tables,
    parsed_doc.markdown
)

for table in consolidated:
    if table.merged_from:
        print(f"{table.table_id} merged from pages: {table.merged_from}")

Metrics Extraction

from ashmatics_tools.enrichers import MetricsExtractor, DomainKnowledgeProvider

# With optional domain knowledge injection
extractor = MetricsExtractor(domain_knowledge=my_provider)
result = await extractor.extract_from_tables(
    tables=performance_tables,
    document_text=section_text
)

for metric in result.performance_metrics:
    print(f"{metric.metric_name}: {metric.value} [{metric.ci_lower}, {metric.ci_upper}]")

Document Storage

Storage managers for document processing artifacts with manifest generation.

Figure Storage

from ashmatics_tools.document_storage import FigureStorageManager

# Filter small images (logos, icons) and save significant figures
manager = FigureStorageManager(min_size=200)
processed = manager.process_figures(parsed_doc.figures, parsed_doc.markdown)
saved = manager.save_figures(processed, output_dir / 'figures', doc_id)
# Creates figures_manifest.json with metadata

Table Storage

from ashmatics_tools.document_storage import TableStorageManager

# Save tables in dual format (Markdown + JSON)
manager = TableStorageManager()
stored = manager.save_tables(consolidated_tables, output_dir / 'tables', doc_id)
# Creates tables_manifest.json with metadata

Features

Modern HTTP Client (httpx)

All HTTP communication uses httpx, providing:

Native Type Annotations: Full type safety without separate stub packages
Async/Await Support: Ready for async operations in performance-critical applications
HTTP/2 Support: Modern protocol support for improved performance
Backward Compatible: Synchronous API identical to requests for easy adoption

Secure by Default

SSL Verification Enabled: All HTTP requests verify SSL certificates by default
Explicit Opt-Out: SSL verification can only be disabled by explicitly passing verify=False to methods
Security Warnings: Disabling SSL verification triggers warning logs

Flexible Configuration

Environment Variables: Supports .env files for configuration
Configurable Endpoints: All API endpoints configurable via environment or parameters
Batch Processing: Configurable batch sizes for large dataset operations

MongoDB Integration (Optional)

Optional Dependency: MongoDB support is optional via [mongodb] extras
Abstract Base Classes: Extensible DocumentProcessor for custom document types
Upsert Operations: Intelligent upsert with identifier-based conflict resolution

Comprehensive Error Handling

Detailed Logging: Structured logging throughout all operations
Graceful Failures: Proper error handling with informative messages
Validation: Input validation and JSON compliance checking

Architecture

ashmatics-tools/
├── src/ashmatics_tools/
│   ├── __init__.py           # Public API exports
│   ├── chunkers/
│   │   ├── __init__.py
│   │   ├── azure_chunker.py
│   │   ├── base.py
│   │   ├── docling_chunker.py
│   │   └── simple_chunker.py
│   ├── document_storage/
│   │   ├── __init__.py
│   │   ├── figure_storage.py
│   │   └── table_storage.py
│   ├── embedders/
│   │   ├── __init__.py
│   │   ├── azure_embedder.py
│   │   ├── base.py
│   │   ├── factory.py
│   │   └── openai_embedder.py
│   ├── embedding/
│   │   ├── __init__.py
│   │   ├── base.py
│   │   ├── mongodb_pipeline.py
│   │   └── specialized/
│   ├── enrichers/
│   │   ├── __init__.py
│   │   ├── table_classifier.py
│   │   ├── table_consolidator.py
│   │   ├── metrics_extractor.py
│   │   └── training_data_extractor.py
│   ├── graphql/
│   │   ├── __init__.py
│   │   └── client.py
│   ├── llm/
│   │   ├── __init__.py
│   │   ├── azure_openai.py
│   │   ├── base.py
│   │   ├── factory.py
│   │   ├── huggingface.py
│   │   ├── openai.py
│   │   └── plugin_registry.py
│   ├── external_apis/
│   │   ├── __init__.py
│   │   ├── base.py
│   │   ├── factory.py
│   │   ├── openfda/
│   │   │   ├── __init__.py
│   │   │   ├── client.py
│   │   │   └── config.py
│   │   └── accessgudid/
│   │       ├── __init__.py
│   │       ├── client.py
│   │       └── config.py
│   ├── mcp_servers/
│   │   ├── __init__.py
│   │   ├── base.py
│   │   ├── factory.py
│   │   ├── openfda/
│   │   │   ├── __init__.py
│   │   │   ├── config.py
│   │   │   └── server.py
│   │   └── accessgudid/
│   │       ├── __init__.py
│   │       ├── config.py
│   │       └── server.py
│   ├── ontology/
│   │   ├── __init__.py
│   │   ├── categories/
│   │   ├── core/
│   │   ├── data/
│   │   └── terms/
│   ├── parsers/
│   │   ├── __init__.py
│   │   ├── base.py
│   │   ├── docling_parser.py
│   │   ├── factory.py
│   │   ├── llama_parser.py
│   │   └── simple_parser.py
│   ├── processors/
│   │   ├── __init__.py
│   │   └── base.py
│   ├── storage/
│   │   ├── __init__.py
│   │   ├── base.py
│   │   ├── factory.py
│   │   ├── adls_store.py
│   │   └── minio_store.py
│   ├── utils/
│   │   ├── __init__.py
│   │   ├── export_utils.py
│   │   ├── import_utils.py
│   │   └── schema_utils.py
│   └── vector_stores/
│       ├── __init__.py
│       ├── base.py
│       ├── cosmosdb_store.py
│       ├── factory.py
│       ├── pgvector_store.py
│       └── qdrant_store.py
├── tests/                   # Test suite
├── pyproject.toml          # Package configuration
└── README.md

Dependencies

Core Dependencies

pandas>=2.1.0 - Data manipulation and Excel file reading
openpyxl>=3.1.2 - Excel file format support
httpx>=0.27.0 - Modern HTTP client with native type annotations and async support
python-dotenv>=1.0.0 - Environment variable management
pyyaml>=6.0.0 - YAML parsing
numpy>=1.24.0 - Numerical operations

Optional Dependencies

pymongo>=4.0.0 - MongoDB integration (install with [mongodb] extra)

Development Dependencies

pytest>=7.4.0 - Testing framework
pytest-cov>=4.1.0 - Code coverage
ruff>=0.1.0 - Linting and formatting
mypy>=1.5.0 - Static type checking
pandas-stubs>=2.1.0 - Type stubs for pandas

Development

Running Tests

# Run all tests
pytest

# Run with coverage
pytest --cov=ashmatics_tools --cov-report=html

# Run specific test file
pytest tests/test_import_utils.py

Code Quality

# Lint with ruff
ruff check src/

# Format with ruff
ruff format src/

# Type check with mypy
mypy src/

Environment Variables

# GraphQL/Hasura Configuration
HASURA_GRAPHQL_ENDPOINT=https://kb-api.ashmatics.com/v1/graphql
HASURA_ADMIN_SECRET=your-admin-secret-here

# MongoDB Configuration (optional)
MONGODB_CONNECTION_STRING=mongodb://localhost:27017
MONGODB_DATABASE=ashmatics_kb

Python Version Support

Minimum: Python 3.11
Tested: Python 3.11, 3.12
Recommended: Python 3.12+

License

MIT License - See LICENSE file for details

Contributing

This is a private package for Ashmatics internal use. For questions or issues, please contact the development team.

Version History

0.7.0 (2026-01-19) - ASHCAI Clinical AI Governance Ontology (ASHTOOLS-10)

ASHCAI Ontology Module: Complete Clinical AI Governance ontology for the CAI Framework
- AshcaiOntology: Manager class with CRUD operations, relationship management, and traversal helpers
- ashcai-ontology-v1.0.json: Ontology definition with 22 semantic types (T92xx range) and 44 relationship types
- ashcai_schemas.py: Comprehensive Pydantic schemas with natural business ID validation
Entity Types: PolicyDomain, ProcessDomain, BasePractice, SOPTemplate, WorkProductTemplate, ExemplarControl, RegulatoryFramework, RegulatoryRequirement
Natural Business IDs: Human-readable identifiers with regex validation
- PolicyDomain: MMP-001, TPP-001, AGP-001
- ProcessDomain: MON, RM, SA, OVR, PV
- BasePractice: MON.BP01, RM.BP03
- SOPTemplate: SOP-MON-01, SOP-RM-02
- WorkProductTemplate: WP-MON-02-Dashboard
- ExemplarControl: EXC-A7-04, EXC-A10-02
- RegulatoryRequirement: NIST-MAP-4.2, JC-RUAIH-3
Type Discrimination: All ASHCAI documents include ontology: "ashcai" field for filtering
44 Relationship Types: From TDD-001 specification including specifies, containsBasePractice, realizedBy, produces, servesAsEvidenceFor, addressedBy, implementedThrough, operationalizedIn
Traversal Helpers: Pre-built queries for common governance patterns
- get_policy_hierarchy(): Full implementation path from policy to work products
- get_evidence_chain(): Evidence trail for controls
- get_regulatory_crosswalk(): Map requirements to framework elements
- find_control_implementations(): SOPs implementing a control
OWL/RDF Export: Generate standard URIs from natural business IDs (http://asherinformatics.com/ontology/ashcai/{id})
MongoDB Collections: Separate collections per entity type with indexes for efficient querying
Exports: from ashmatics_tools.ontology import AshcaiOntology

0.6.2 (2026-01-07) - Retry Utilities with Exponential Backoff

Retry Module: New ashmatics_tools.llm.retry module for robust LLM API call handling
- RetryConfig: Configurable dataclass for retry behavior (max_attempts, initial_delay, max_delay, exponential_base, jitter, request_delay)
- calculate_backoff_delay(): Pure function for exponential backoff with jitter calculation
- call_with_backoff(): Async wrapper that handles LLMRateLimitError with configurable retries
- call_with_backoff_and_fallback(): Convenience wrapper with fallback function support
Presets: RetryConfig.aggressive() for batch processing, RetryConfig.conservative() for interactive use
API Hint Support: Respects retry_after hints from rate limit responses (Azure OpenAI, etc.)
Thundering Herd Prevention: Random jitter (default 30%) prevents synchronized retry storms

Exports: All retry utilities exported from ashmatics_tools.llm:

from ashmatics_tools.llm import RetryConfig, call_with_backoff

0.6.1 (2025-12-30) - Ollama SDK Integration & LLM Enhancements

Ollama Python SDK Integration: Complete rewrite of OllamaClient using official ollama Python SDK
- Embeddings: generate_embedding(), generate_embeddings() with batch support and dimension control
- Vision: complete_with_vision() for image understanding (llava, llama3.2-vision models)
- Tool Calling: complete_with_tools() for function calling/agentic workflows
- Model Management: list_models(), pull_model(), show_model(), delete_model(), copy_model(), list_running_models()
- Keep-Alive Control: Memory management via keep_alive config (e.g., "5m", "1h", "-1")
- Streaming: Native async generators via stream_complete_with_messages()
- Health Check: check_health() for server status verification
- Helper Classes: OllamaTool, OllamaToolParameter, OllamaToolCall for tool definitions
New Optional Dependency: [ollama] extra (pip install "ashmatics-tools[ollama]")
Azure OpenAI Improvements: Enhanced AzureOpenAIClient with simple completion support
Example Notebook: Added examples/rag_and_llm_demo.ipynb demonstrating RAG pipelines and LLM client usage
Integration Tests: Comprehensive Ollama test suite (tests/integration/test_ollama_integration.py)
- 24 tests covering chat, streaming, embeddings, vision, tools, model management
- Performance and concurrent request testing
- Error handling validation

0.6.0 (2025-12-28) - AI Search & RAG Module (ASHTOOLS-5)

Search Module: Complete RAG (Retrieval-Augmented Generation) framework
- SimpleRAGStrategy: Basic RAG flow with embed → retrieve → generate
- MultiQueryRAGStrategy: Query expansion with parallel retrieval and RRF ranking
- RAGConfig, RAGResult, RAGMetrics, RAGStreamChunk dataclasses
- Factory pattern: create_search_strategy("simple_rag", ...) with plugin registry
LLM Streaming: SSE and NDJSON streaming support for all LLM providers
- StreamChunk dataclass for streaming responses
- stream_complete() and stream_complete_with_messages() methods
- Fallback to non-streaming for providers without native support
Context Window Management: Automatic context fitting for LLM requests
- ContextWindowManager: Fit sources into available context with token estimation
- ModelContextLimits: Presets for GPT-4, GPT-4o, Claude Sonnet/Opus/Haiku, Llama, Mistral
- ModelFamily enum for tokenizer selection
MCP Tool Definitions: Generic tool schemas for agent integration
- RAG_SEARCH_TOOL: RAG-enhanced search with answer generation
- SEMANTIC_SEARCH_TOOL: Semantic similarity search without generation
- MULTI_QUERY_SEARCH_TOOL: Multi-query RAG with query expansion
- Export as JSON or YAML for MCP server registration
SearchResult ADR-045 Fields: Governance metadata for RAG sources
- domain, control_refs, token_refs, document_type, source_uri
New Optional Dependencies: [search], [reranking], [rag] extras
52 Tests: Comprehensive test coverage for all new functionality

0.5.3 (2025-12-25) - Modular Dependencies and Lazy Loading (ASHTOOLS-7)

Modular Optional Dependencies: Restructured pyproject.toml to minimize install size
- Core install is now lightweight (~100MB) - NO torch/CUDA by default
- Heavy dependencies (docling, transformers, qdrant) moved to optional extras
- New extras: [api], [parsers], [chunkers], [docproc], [ml], [full]
- [api] extra for API apps like Ashmatics-Knowledgebase (~150MB vs ~4GB)
Lazy Loading: Heavy modules loaded on-demand via __getattr__
- DoclingParser, DoclingChunker only load torch when accessed
- HuggingFace LLM providers registered lazily in factory
Graceful Import Handling: Clear error messages when optional deps missing
- TYPE_CHECKING used for heavy type hints (docling_core)
- Runtime checks with installation instructions
ADR Documentation: Added docs/ADRs/ADR-LazyLoadingBigDependencies-ASHTOOLS-7-2025-12-25.md
Container Size Reduction: Enables ~3GB+ savings for API-only applications

0.5.2 (2025-12-06) - llama.cpp Client Support

LlamaCppClient: Local LLM inference via llama.cpp server
- Metal acceleration on M1/M2 Macs for fast local inference
- OpenAI-compatible API (/v1/chat/completions endpoint)
- VPS and on-premises deployment support
- Server health and properties endpoints
- llama.cpp-specific parameters (top_k, repeat_penalty, n_gpu_layers)
- Embedding support: generate_embedding(), generate_embeddings(), get_embedding_dimension()
- Always $0.00 cost (self-hosted)
LlamaCppConfig: Configuration dataclass with endpoint, model, timeout, SSL, context_size, n_gpu_layers
Factory Integration: create_llm_client("llamacpp", config) via plugin registry
Unified Interface: Same API as Azure OpenAI, OpenAI, Ollama, HuggingFace
Live Testing: Verified against local llama.cpp server (11/12 tests passing)
Note: Embeddings require llama-server to be started with --embeddings flag

0.5.1 (2025-12-03) - AccessGUDID API Integration

AccessGUDID API Client: NIH/FDA Global Unique Device Identification Database integration
- Device lookup by DI, UDI, or record key
- UDI parsing for GS1, HIBCC, ICCBBA formats
- Device version history tracking
- SNOMED CT code lookup (requires UMLS API key)
- Implantable device listing with pagination
AccessGUDID MCP Server: 5 tools for LLM/agent access to GUDID data
Typed Models: Pydantic models for type-safe responses (GUDIDDevice, ParsedUDI, etc.)
Transform Functions: Raw API response to typed model conversion
Updated MCP Test Script: Now supports both OpenFDA and AccessGUDID servers

0.5.0 (2025-11-29) - Document Enrichers and Storage Managers

Enrichers Module: Post-parsing content analysis extracted from FDA 510(k) pipeline
- TableClassifier: LLM-based table categorization (COMPARISON, PERFORMANCE_METRICS, STUDY_DESIGN, etc.)
- TableConsolidator: Multi-page table merge with first-row matching, column similarity, and continuation detection
- MetricsExtractor: Performance metrics extraction with confidence intervals and sample sizes
- TrainingDataExtractor: AI/ML training dataset characteristics extraction
- DomainKnowledgeProvider: Abstract base for domain-specific context injection
Document Storage Module: Artifact storage managers for document processing outputs
- FigureStorageManager: Figure filtering, PNG conversion, content-addressed storage with manifests
- TableStorageManager: Dual-format (Markdown + JSON) table storage with manifests
Reusable Across Pipelines: Supports FDA 510(k), research papers, clinical guidelines, pre-prints
Domain Extensibility: Domain-specific logic via provider injection without modifying base extractors
New Dependency: Added Pillow>=10.0.0 for figure processing
(NOTE - we had a screw up in the release and tags in the repo, so skipping version 0.4.0, even though 0.5.0 is really 0.4.0. Like the 13th floor...)

0.3.1 (2025-11-22) - FDA API Integration and MCP Servers

External APIs Module: Extensible framework for external data source integration
OpenFDA Client: Complete US FDA Open Data Portal integration
- Support for all major endpoints: device (510k, events, recalls), drugs (labels, adverse events), food
- Automatic retry with exponential backoff for transient errors
- Client-side rate limiting with token bucket algorithm (respects FDA API limits)
- Automatic pagination for large result sets
- Query syntax support: field search, date ranges, boolean operators, wildcards
- Count/analytics queries for aggregated data
MCP Servers Module: Model Context Protocol servers for LLM integration
- BaseMCPServer abstract base for creating MCP tool servers
- OpenFDAMCPServer exposing FDA API as LLM tools (search_devices, search_drugs, count_by_field)
- JSON Schema-based input validation
- Response formatting and error handling for LLM consumption
Factory Pattern: Plugin registry for custom API providers and MCP servers
Async-First: All operations use async/await for high-performance pipelines
Comprehensive Documentation:
- Updated CLAUDE.md with 380+ lines of usage examples
- New FDA_API_USAGE_GUIDE.md with complete reference (650+ lines)
- Query syntax guide, field references, practical examples
Testing: Full test coverage with respx-based mocking for httpx
Future-Ready: Architecture supports Census Bureau, CMS, and other data sources

0.3.0 (2025-11-21) - Ontology and Term Services Integration

Ontology Module: Complete medical ontology management system
Term Resolution: MongoDB-based term lookup and management with TermResolver
Category Management: Hierarchical category structures with CategoryManager
External Ontology Integration: BioPortal API client for validating terms against SNOMED CT, RADLEX, LOINC, NCIT
Custom ASHMATICS Ontology: Domain-specific ontology manager for medical imaging AI concepts
Schema Definitions: Comprehensive Pydantic schemas for terms, categories, and ontology operations
Async API: Full async support for all ontology operations
Integration Ready: Seamless integration with existing document processing and vector search pipelines

0.2.0 (2025-11-12) - Major Module Migration

Complete Migration: Migrated ALL common modules from ashmatics-kb-tools
Parsers Module: SimpleParser, DoclingParser, LlamaParser with factory function
Chunkers Module: SimpleChunker, AzureChunker, DoclingChunker with factory function
Embedders Module: AzureEmbedder, OpenAIEmbedder with factory function
Embedding Pipelines: MongoDBEmbeddingPipeline + specialized pipelines for framework, use cases, and cards
Vector Stores: CosmosDB, PostgreSQL pgvector, Qdrant implementations with factory function
52 Public Exports: Complete document processing, embedding, and vector search workflow
Code Quality: All files standardized with copyright headers, ruff-compliant, modernized type hints
Production Ready: 100% migration complete, all tests passing

0.1.0 (2025-01-12)

Initial release
Extracted from ashmatics-kb-tools repository
Core utilities: KBImporter, DataExporter, schema tools
Abstract DocumentProcessor base class
Generic GraphQL client utilities
Migrated to httpx: Modern HTTP client with native type annotations and async support
SSL verification enabled by default
Full mypy type safety (zero type errors)
Python 3.11+ support

Complete Document Processing Pipeline

The package now provides a complete end-to-end pipeline for document processing:

from ashmatics_tools import (
    create_parser,        # Parse documents (PDF, DOCX, etc.)
    create_chunker,       # Chunk into manageable pieces
    create_embedder,      # Generate embeddings
    create_vector_store   # Store and search vectors
)

# 1. Parse document
parser = create_parser("docling")
parsed_doc = await parser.parse_file("document.pdf")

# 2. Chunk document
chunker = create_chunker(strategy="docling")
chunks = await chunker.chunk_document(
    content=parsed_doc.markdown,
    title="Document Title",
    source="document.pdf"
)

# 3. Generate embeddings
embedder = create_embedder(provider="azure")
await embedder.initialize()
embedded_chunks = await embedder.embed_chunks(chunks)

# 4. Store in vector database
vector_store = create_vector_store(provider="cosmosdb")
success, failed = await vector_store.store_embeddings_batch(embedded_chunks)

# 5. Search
query_embedding = await embedder.generate_embedding("search query")
results = await vector_store.similarity_search(query_embedding, top_k=10)

Module Overview (v0.3.0)

Parsers (`ashmatics_tools.parsers`)

Document parsing with multiple backends:

SimpleParser: Basic fallback parser
DoclingParser: Advanced PDF parsing with tables/figures
LlamaParser: LlamaParse cloud service integration
Factory: create_parser(provider)

Chunkers (`ashmatics_tools.chunkers`)

Document chunking strategies:

SimpleChunker: Paragraph-based chunking
AzureChunker: Azure-compatible with tiktoken
DoclingChunker: Token-aware semantic chunking
Factory: create_chunker(strategy)

Embedders (`ashmatics_tools.embedders`)

Embedding generation:

AzureEmbedder: Azure OpenAI embeddings
OpenAIEmbedder: OpenAI embeddings
Factory: create_embedder(provider)

Embedding Pipelines (`ashmatics_tools.embedding`)

MongoDB-based embedding workflows:

MongoDBEmbeddingPipeline: Generic pipeline
Specialized Pipelines: Framework, use cases, cards

Vector Stores (`ashmatics_tools.vector_stores`)

Vector database integrations:

CosmosDBVectorStore: Azure CosmosDB with MongoDB vCore API
PgVectorStore: PostgreSQL with pgvector extension
QdrantVectorStore: Qdrant vector database
Factory: create_vector_store(provider)

Storage Backends (`ashmatics_tools.storage`)

Cloud-agnostic storage abstraction:

ADLSStorageClient: Azure Data Lake Storage Gen2 with dual auth (connection string or DefaultAzureCredential)
MinIOStorageClient: MinIO object storage (S3-compatible)
S3StorageClient: AWS S3 (reserved for future implementation)
Factory: create_storage_client(provider, config)
Features: Async API, buffered and streaming reads/writes, glob pattern matching, metadata operations

LLM Clients (`ashmatics_tools.llm`)

Unified interface for language model providers:

AzureOpenAIClient: Azure OpenAI Service
OpenAIClient: OpenAI direct API
HuggingFaceInferenceClient: HuggingFace Inference API (requires [huggingface] extra)
HuggingFaceLocalClient: Local HuggingFace models (requires [huggingface] extra)
AzureAIFoundryClient: Full Azure AI Foundry model catalog (requires [azure-ai] extra)
OllamaClient: Local/ACA/K8s Ollama inference with SDK (requires [ollama] extra) - embeddings, vision, tools, model management
Factory: create_llm_client(provider, config) with plugin registry
Features: Async-first API, unified completion interface, cost tracking, plugin registry, extensible via register_llm_provider()

Ontology Services (`ashmatics_tools.ontology`)

Medical ontology management and term services:

TermResolver: MongoDB-based term lookup and resolution
CategoryManager: Hierarchical category management for document tagging
BioPortalClient: External ontology validation via NCBO BioPortal API (SNOMED CT, RADLEX, LOINC, NCIT)
AshmaticsOntology: Custom ASHMATICS domain-specific ontology for medical imaging AI concepts
Features: Async API, comprehensive schema validation, integration with external ontologies

External APIs (`ashmatics_tools.external_apis`)

Clients for external data sources with robust error handling:

OpenFDAClient: US FDA Open Data Portal (open.fda.gov) integration
AccessGUDIDClient: NIH/FDA Global Unique Device Identification Database (accessgudid.nlm.nih.gov) integration
BaseAPIClient: Abstract base for creating custom API clients
OpenFDA Endpoints: Device 510(k), adverse events, recalls, drug labels, FAERS, enforcement actions
AccessGUDID Endpoints: Device lookup, UDI parsing, device history, SNOMED mappings, implantable device listings
Factory: create_api_client(provider, config) with plugin registry
Features: Async API, retry with exponential backoff, client-side rate limiting, automatic pagination
Query Syntax: Support for field search, date ranges, boolean operators, wildcards (OpenFDA)
Extensibility: Register custom providers via register_api_provider() for Census, CMS, etc.

MCP Servers (`ashmatics_tools.mcp_servers`)

Model Context Protocol servers for LLM integration:

OpenFDAMCPServer: Expose OpenFDA API as LLM tools (search_devices, search_drugs, count_by_field)
AccessGUDIDMCPServer: Expose AccessGUDID API as LLM tools (lookup_device, parse_udi, get_device_history, get_device_snomed, list_implantable_devices)
BaseMCPServer: Abstract base for creating MCP tool servers
Factory: create_mcp_server(name, config) with plugin registry
Features: JSON Schema validation, response formatting, error handling, streaming support
Use Case: Thin adapter layer between LLMs and external data sources
Extensibility: Register custom servers via register_mcp_server()
Stdio Runner: Run servers via python -m ashmatics_tools.mcp_servers.{openfda,accessgudid}

Search/RAG (`ashmatics_tools.search`)

RAG (Retrieval-Augmented Generation) strategies for AI-powered search:

SimpleRAGStrategy: Basic RAG flow with embed → retrieve → generate
MultiQueryRAGStrategy: Query expansion with parallel retrieval and RRF ranking
RAGConfig: Configuration for top_k, temperature, max_tokens, system_prompt
RAGResult: Answer with sources, metrics, and metadata
RAGStreamChunk: Streaming response chunks with partial sources
Factory: create_search_strategy(name, llm, vector_store, embedder, config) with plugin registry
MCP Tools: Generic tool definitions (rag_search, semantic_search, multi_query_search)
Context Management: ContextWindowManager for automatic source fitting
Model Presets: ModelContextLimits for GPT-4, Claude, Llama, Mistral
Features: Async-first API, streaming support, ADR-045 governance metadata

Enrichers (`ashmatics_tools.enrichers`)

Post-parsing document enrichment for tables and extracted data:

TableClassifier: LLM-based table categorization by content type
TableConsolidator: Multi-page table merge with heuristics and LLM validation
MetricsExtractor: Performance metrics extraction with statistical context
TrainingDataExtractor: AI/ML training dataset characteristics extraction
DomainKnowledgeProvider: Abstract base for domain-specific context injection
Categories: COMPARISON, PERFORMANCE_METRICS, STUDY_DESIGN, TECHNICAL_SPECS, DEMOGRAPHICS, etc.
Features: Handles PDF parser fragmentation, continuation markers, column similarity matching

Document Storage (`ashmatics_tools.document_storage`)

Artifact storage managers for document processing outputs:

FigureStorageManager: Figure filtering, PNG conversion, content-addressed storage
TableStorageManager: Dual-format (Markdown + JSON) table storage
ProcessedFigure: Dataclass for processed figures with metadata
StoredTable: Dataclass for stored tables with file paths
Features: Automatic manifest generation, content hashing, size filtering

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

JFK-Ashmatics

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.7.3

Apr 20, 2026

This version

0.7.2

Apr 15, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ashmatics_tools-0.7.2.tar.gz (323.4 kB view details)

Uploaded Apr 15, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ashmatics_tools-0.7.2-py3-none-any.whl (326.8 kB view details)

Uploaded Apr 15, 2026 Python 3

File details

Details for the file ashmatics_tools-0.7.2.tar.gz.

File metadata

Download URL: ashmatics_tools-0.7.2.tar.gz
Upload date: Apr 15, 2026
Size: 323.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ashmatics_tools-0.7.2.tar.gz
Algorithm	Hash digest
SHA256	`3b2ce5e38ffd77123c9e150ad2067223180ebc57eb985a8cab1aad62f1abf36d`
MD5	`0a8092ef2ebc73f3c4466418cd8173c7`
BLAKE2b-256	`6a1b241bee10551aa7a371dd836bc12dfae8d05cedf84001985b1184ca6061c8`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ashmatics_tools-0.7.2.tar.gz:

Publisher: publish.yml on AshMatics/ashmatics-tools

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ashmatics_tools-0.7.2.tar.gz
- Subject digest: 3b2ce5e38ffd77123c9e150ad2067223180ebc57eb985a8cab1aad62f1abf36d
- Sigstore transparency entry: 1313070446
- Sigstore integration time: Apr 15, 2026
Source repository:
- Permalink: AshMatics/ashmatics-tools@db3fccb12380d50ab3fb68beab35eb73f0aa14ff
- Branch / Tag: refs/tags/v0.7.2
- Owner: https://github.com/AshMatics
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@db3fccb12380d50ab3fb68beab35eb73f0aa14ff
- Trigger Event: push

File details

Details for the file ashmatics_tools-0.7.2-py3-none-any.whl.

File metadata

Download URL: ashmatics_tools-0.7.2-py3-none-any.whl
Upload date: Apr 15, 2026
Size: 326.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ashmatics_tools-0.7.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4eab26c7f86d9a86673e12ec6410ebe1c40394e1b1eec67817dc06a9f52a5695`
MD5	`b33d09ec2301ee8c4128a4ca29dc9cc1`
BLAKE2b-256	`8eabc9292c7e48ae72bb474e52df16135e204b8dba256ccb99ae798c8df6666e`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ashmatics_tools-0.7.2-py3-none-any.whl:

Publisher: publish.yml on AshMatics/ashmatics-tools

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ashmatics_tools-0.7.2-py3-none-any.whl
- Subject digest: 4eab26c7f86d9a86673e12ec6410ebe1c40394e1b1eec67817dc06a9f52a5695
- Sigstore transparency entry: 1313070536
- Sigstore integration time: Apr 15, 2026
Source repository:
- Permalink: AshMatics/ashmatics-tools@db3fccb12380d50ab3fb68beab35eb73f0aa14ff
- Branch / Tag: refs/tags/v0.7.2
- Owner: https://github.com/AshMatics
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@db3fccb12380d50ab3fb68beab35eb73f0aa14ff
- Trigger Event: push

ashmatics-tools 0.7.2

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Ashmatics Tools

v0.7.1 (2026-01-25)

Overview

Installation

From Git Repository (Private)

From Local Development

Configuration

Environment Variables

Usage

Knowledge Base Importer

Document Processor (MongoDB)

Document Chunking

Embedding Generation

Vector Store Integration

Storage Backend Integration

LLM Module Usage

Key Features

Azure OpenAI (Primary)

Ontology and Term Services

Key Features

Term Resolution

Category Management

External Ontology Validation

Custom ASHMATICS Ontology

ASHCAI Clinical AI Governance Ontology

External API Integration

OpenFDA API Client

AccessGUDID API Client

MCP Server Integration

OpenFDA MCP Server

AccessGUDID MCP Server

Running MCP Servers via stdio

Search/RAG Module

Key Features

Simple RAG Query

Multi-Query RAG (Query Expansion)

Streaming RAG Responses

Context Window Management

MCP Tool Definitions

Document Enrichers

Table Classification

Table Consolidation (Multi-page Tables)

Metrics Extraction

Document Storage

Figure Storage

Table Storage

Features

Modern HTTP Client (httpx)

Secure by Default

Flexible Configuration

MongoDB Integration (Optional)

Comprehensive Error Handling

Architecture

Dependencies

Core Dependencies

Optional Dependencies

Development Dependencies

Development

Running Tests

Code Quality

Environment Variables

Python Version Support

License

Contributing

Version History

0.7.0 (2026-01-19) - ASHCAI Clinical AI Governance Ontology (ASHTOOLS-10)

0.6.2 (2026-01-07) - Retry Utilities with Exponential Backoff

0.6.1 (2025-12-30) - Ollama SDK Integration & LLM Enhancements

0.6.0 (2025-12-28) - AI Search & RAG Module (ASHTOOLS-5)

0.5.3 (2025-12-25) - Modular Dependencies and Lazy Loading (ASHTOOLS-7)

Parsers (`ashmatics_tools.parsers`)

Chunkers (`ashmatics_tools.chunkers`)

Embedders (`ashmatics_tools.embedders`)

Embedding Pipelines (`ashmatics_tools.embedding`)

Vector Stores (`ashmatics_tools.vector_stores`)

Storage Backends (`ashmatics_tools.storage`)

LLM Clients (`ashmatics_tools.llm`)

Ontology Services (`ashmatics_tools.ontology`)

External APIs (`ashmatics_tools.external_apis`)

MCP Servers (`ashmatics_tools.mcp_servers`)

Search/RAG (`ashmatics_tools.search`)

Enrichers (`ashmatics_tools.enrichers`)

Document Storage (`ashmatics_tools.document_storage`)