Shared utilities and base classes for Ashmatics Knowledge Base applications
Project description
Ashmatics Tools
Last updated: 2026-01-25
Version 0.7.1
v0.7.1 (2026-01-25)
- Standardized MongoDB/CosmosDB environment variables with fallback chains
MONGO_URLis now canonical (withAZ_MONGO_CONNECTION_STRING,COSMOS_VECTOR_CONNECTION_STRINGas fallbacks)MONGO_DBis now canonical (withMONGO_DATABASE,COSMOS_VECTOR_DATABASEas fallbacks)- Updated
ENV_VARIABLES.mddocumentation
A Python package providing shared utilities, base classes, and common functionality for Ashmatics Knowledge Base applications.
Overview
ashmatics-tools is a foundational library that centralizes reusable components across Ashmatics healthcare AI applications. It provides:
- Data Import/Export Utilities: Excel data loading, GraphQL integration, batch processing with Hasura
- Document Processors: Abstract base classes for MongoDB document processing
- GraphQL Clients: Generic GraphQL query/mutation builders and client utilities
- Schema Management: GraphQL schema introspection and analysis tools
- Document Parsers: Advanced parsing for PDFs, DOCX, and other formats
- Document Chunkers: Token-aware and semantic chunking strategies
- Embedders: Generate embeddings using Azure OpenAI or OpenAI APIs
- Vector Stores: Integration with CosmosDB, PostgreSQL, and Qdrant for vector search
- Storage Backends: Cloud-agnostic storage abstraction for ADLS Gen2, MinIO, and AWS S3
- LLM Clients: Unified interface for Azure OpenAI, OpenAI, HuggingFace, and custom providers
- Ontology Services: Medical ontology management including SNOMED CT, RADLEX, LOINC, and custom Ashmatics ontologies
- Term Services: Term resolution, hierarchical category management, and external ontology validation
- External APIs: Clients for external data sources (FDA, Census, CMS) with retry, rate limiting, and pagination
- MCP Servers: Model Context Protocol servers exposing APIs to LLMs with tool-based interfaces
- Search/RAG: Retrieval-Augmented Generation strategies with streaming, context window management, and MCP tool definitions
- Document Enrichers: Table classification, consolidation, and metrics extraction for parsed documents
- Document Storage: Figure and table storage managers with content-addressed hashing and manifests
Installation
From Git Repository (Private)
# Using pip
pip install git+https://github.com/JFK-Ashmatics/ashmatics-tools.git
# Using uv
uv add git+https://github.com/JFK-Ashmatics/ashmatics-tools.git
# With optional dependencies
pip install "ashmatics-tools[mongodb,storage] @ git+https://github.com/JFK-Ashmatics/ashmatics-tools.git"
From Local Development
# Clone the repository
git clone https://github.com/JFK-Ashmatics/ashmatics-tools.git
cd ashmatics-tools
# Install in editable mode with dev dependencies
pip install -e ".[dev,mongodb,storage]"
Configuration
Environment Variables
ashmatics-tools requires various environment variables depending on which components you use. This library does not load .env files automatically - your application must handle environment variable loading.
See ENV_VARIABLES.md for:
- Complete list of required environment variables by component
- Example application setups (development with
.env, production with Key Vault) - Environment-specific configurations
Quick example:
from dotenv import load_dotenv
# Load .env BEFORE importing ashmatics_tools
load_dotenv()
# Now use the library
from ashmatics_tools.embedders import create_embedder
embedder = create_embedder(provider="azure")
Usage
Knowledge Base Importer
from ashmatics_tools.utils.import_utils import KBImporter
# Initialize the importer
importer = KBImporter(
graphql_endpoint="https://kb-api.ashmatics.com/v1/graphql",
admin_secret="your-admin-secret",
batch_size=100
)
# Load data from Excel
df = importer.load_excel_data("data.xlsx", sheet_name="Sheet1")
# Import to Knowledge Base via GraphQL
result = importer.import_to_kb(
df=df,
table_name="my_table",
column_mapping={"excel_col": "db_col"}
)
Document Processor (MongoDB)
from ashmatics_tools.processors.base import DocumentProcessor
from pymongo import MongoClient
class MyDocumentProcessor(DocumentProcessor):
def extract_metadata(self, document: dict) -> dict:
return {"title": document.get("title"), "author": document.get("author")}
def clean_text(self, text: str) -> str:
return text.strip().lower()
def get_identifier_key(self) -> str:
return "document_id"
def get_document_type(self) -> str:
return "my_document_type"
def process_document(self, file_path: str) -> dict:
# Your document processing logic
return {"document_id": "123", "content": "..."}
# Use the processor
client = MongoClient("mongodb://localhost:27017")
processor = MyDocumentProcessor(client, "my_database", "my_collection")
result = processor.upsert_document({"document_id": "123", "content": "..."})
Document Chunking
from ashmatics_tools.chunkers.factory import create_chunker
# Initialize chunker
chunker = create_chunker(strategy="docling")
# Chunk document
chunks = chunker.chunk_document(
content="This is a sample document content.",
title="Sample Document",
source="document.pdf"
)
Embedding Generation
from ashmatics_tools.embedders.factory import create_embedder
# Initialize embedder
embedder = create_embedder(provider="azure")
embedder.initialize()
# Generate embeddings
embeddings = embedder.embed_chunks(["chunk1", "chunk2"])
Vector Store Integration
from ashmatics_tools.vector_stores.factory import create_vector_store
# Initialize vector store
vector_store = create_vector_store(provider="cosmosdb")
# Store embeddings
success, failed = vector_store.store_embeddings_batch(embeddings)
# Perform similarity search
results = vector_store.similarity_search(query_embedding, top_k=10)
Storage Backend Integration
from ashmatics_tools.storage import create_storage_client, StorageConfig, AuthType
# Initialize ADLS storage with DefaultAzureCredential (production)
config = StorageConfig(
provider="adls",
account_url="https://mystorageaccount.dfs.core.windows.net",
container_name="my-container",
auth_type=AuthType.DEFAULT_CREDENTIAL
)
storage = create_storage_client("adls", config)
# Or use connection string (development)
config = StorageConfig(
provider="adls",
connection_string="DefaultEndpointsProtocol=https;AccountName=...",
container_name="my-container",
auth_type=AuthType.CONNECTION_STRING
)
storage = create_storage_client("adls", config)
# Initialize MinIO storage
config = StorageConfig(
provider="minio",
endpoint="minio.example.com:9000",
access_key="minioadmin",
secret_key="minioadmin",
container_name="my-bucket",
auth_type=AuthType.ACCESS_KEY,
use_ssl=False
)
storage = create_storage_client("minio", config)
# Use async context manager
async with storage:
# Write object
await storage.write_object("path/file.txt", b"Hello, World!")
# Read object
content = await storage.read_object("path/file.txt")
# List objects
objects = await storage.list_objects(prefix="path/", pattern="*.txt")
# Stream large files
async for chunk in storage.read_object_stream("large-file.bin"):
process(chunk)
# Check if exists
exists = await storage.exists("path/file.txt")
# Get metadata
metadata = await storage.get_metadata("path/file.txt")
# Copy object
await storage.copy_object("src/file.txt", "dest/file.txt")
# Delete object
await storage.delete_object("path/file.txt")
LLM Module Usage
The LLM module provides a unified interface for working with various language model providers. Supports Azure OpenAI, OpenAI, HuggingFace, and custom providers via plugin registry.
Key Features
- Async-first API: All operations are async for high-performance pipelines
- Unified interface: Same API across all providers
- Cost tracking: Automatic token counting and cost estimation
- Plugin registry: Extensible with custom providers
- Optional dependencies: HuggingFace support via
[huggingface]extra
Azure OpenAI (Primary)
from ashmatics_tools.llm import create_llm_client, AzureOpenAIConfig
config = AzureOpenAIConfig(
endpoint="https://my-resource.openai.azure.com/",
api_key=os.getenv("AZURE_OPENAI_API_KEY"),
deployment_name="gpt-4"
)
async with create_llm_client("azure_openai", config) as llm:
response = await llm.complete(
prompt="What is asthma?",
temperature=0.7,
max_tokens=500
)
print(response.text)
print(f"Cost: ${response.tokens.estimated_cost:.4f}")
Ontology and Term Services
The ontology module provides comprehensive medical ontology management including term resolution, hierarchical category management, and integration with external ontologies.
Key Features
- Term Resolution: MongoDB-based term lookup and management
- Category Management: Hierarchical category structures for document tagging
- External Ontology Integration: BioPortal API for validating terms against SNOMED CT, RADLEX, LOINC, NCIT
- Custom Ontology: ASHMATICS domain-specific ontology for medical imaging AI concepts
Term Resolution
from ashmatics_tools.ontology import TermResolver
from pymongo import MongoClient
# Initialize term resolver
client = MongoClient("mongodb://localhost:27017")
term_resolver = TermResolver(mongodb_client=client)
# Resolve term
result = await term_resolver.resolve_term("breast cancer")
print(f"Resolved: {result.prefLabel} - {result.definition}")
Category Management
from ashmatics_tools.ontology import CategoryManager
# Initialize category manager
category_manager = CategoryManager(
mongodb_database=client["ashmatics_kb"],
term_resolver=term_resolver
)
# Create hierarchical category
category = await category_manager.create_category(
name="Medical Imaging",
parent_id=None,
description="Top-level category for medical imaging"
)
# Add subcategory
subcategory = await category_manager.create_category(
name="Breast Imaging",
parent_id=category.id,
description="Breast imaging techniques and AI models"
)
External Ontology Validation
from ashmatics_tools.ontology import BioPortalClient
# Initialize BioPortal client
bioportal = BioPortalClient(api_key="your-bioportal-api-key")
# Check term in external ontologies
exists, ontologies = await bioportal.check_term_in_ontology("Breast Cancer")
print(f"Term exists: {exists}")
print(f"Found in ontologies: {ontologies}")
Custom ASHMATICS Ontology
from ashmatics_tools.ontology import AshmaticsOntology
# Initialize custom ontology manager
ashmatics_ontology = AshmaticsOntology(mongodb_database=client["ashmatics_kb"])
# Create concept
concept = await ashmatics_ontology.create_concept(
prefLabel="AI Breast Cancer Detector",
definition="AI model for detecting breast cancer in medical images",
synonyms=["Breast Cancer AI", "Mammography AI"]
)
# Add relationship
await ashmatics_ontology.add_relationship(
source_id=concept.id,
target_id=another_concept.id,
relationship_type="related_to"
)
ASHCAI Clinical AI Governance Ontology
The ASHCAI (AshMatics Clinical AI Governance) ontology provides governance concepts for the CAI Framework, including policies, processes, controls, and regulatory crosswalks.
from ashmatics_tools.ontology import AshcaiOntology
# Initialize ASHCAI ontology manager
ashcai = AshcaiOntology(mongodb_database=client["ashmatics_kb"])
# Initialize collections and indexes
await ashcai.initialize_ontology()
# Create a policy domain with natural business ID
policy = await ashcai.create_policy_domain(
domain_id="MMP-001",
domain_code="MMP",
label="Model Monitoring Policy",
description="Policy governing AI model monitoring requirements",
specifies=["MON"] # Links to process domains
)
# Create a process domain
process = await ashcai.create_process_domain(
domain_id="MON",
label="Model Monitoring",
primary_function="Continuous monitoring of AI model performance",
integrates_with=["RM", "SA", "OVR"]
)
# Create base practice
practice = await ashcai.create_base_practice(
practice_id="MON.BP01",
label="Rollout and Change Management",
process_domain="MON",
sequence_order=1
)
# Create SOP template
sop = await ashcai.create_sop_template(
sop_id="SOP-MON-01",
label="Model Deployment SOP",
purpose="Standard procedure for deploying AI models",
base_practice="MON.BP01"
)
# Create work product template
wp = await ashcai.create_work_product_template(
wp_id="WP-MON-02-Dashboard",
label="Monitoring Dashboard",
evidence_type="Dashboard",
produced_by="SOP-MON-02",
serves_as_evidence_for=["EXC-A7-04"]
)
# Create exemplar control
control = await ashcai.create_exemplar_control(
control_id="EXC-A7-04",
label="Performance Monitoring Control",
iso_control="A.7.5",
evidenced_by=["WP-MON-02-Dashboard"]
)
# Create regulatory requirement with crosswalk
requirement = await ashcai.create_regulatory_requirement(
requirement_id="NIST-MAP-4.2",
label="MAP 4.2",
framework_id="NIST-AI-RMF",
function="MAP",
category="MAP-4",
description="Internal risk controls for third-party AI resources",
crosswalk={
"addressedBy": ["TPP-001", "MMP-001"],
"implementedThrough": ["EXC-A10-02", "EXC-A6-02"],
"operationalizedIn": ["MON", "PV"],
"evidencedBy": ["WP-MON-01", "WP-PV-03"]
}
)
# Create relationships
await ashcai.link_policy_to_process("MMP-001", "MON")
await ashcai.link_process_to_practice("MON", "MON.BP01")
await ashcai.link_practice_to_sop("MON.BP01", "SOP-MON-01")
await ashcai.link_sop_to_workproduct("SOP-MON-01", "WP-MON-02-Dashboard")
await ashcai.link_workproduct_to_control("WP-MON-02-Dashboard", "EXC-A7-04")
# Traversal helpers
hierarchy = await ashcai.get_policy_hierarchy("MMP-001")
# Returns: policy, processes (with practices, SOPs, work products), controls
evidence_chain = await ashcai.get_evidence_chain("EXC-A7-04")
# Returns: control with all work products that evidence it
crosswalk = await ashcai.get_regulatory_crosswalk("NIST-MAP-4.2")
# Returns: requirement with all policies, controls, processes, evidence
# OWL/RDF export
uri = ashcai.generate_uri("MMP-001")
# Returns: http://asherinformatics.com/ontology/ashcai/MMP-001
Key Features:
- Natural Business IDs: Human-readable identifiers (MMP-001, MON, SOP-MON-01) with regex validation
- Type Discrimination: All documents include
ontology: "ashcai"for filtering - 44 Relationship Types: Comprehensive relationships from TDD-001 specification
- Traversal Helpers: Pre-built queries for policy hierarchies and regulatory crosswalks
- OWL/RDF Export: Generate standard URIs from natural business IDs
External API Integration
The external_apis module provides clients for accessing external data sources with built-in retry logic, rate limiting, and pagination.
OpenFDA API Client
from ashmatics_tools.external_apis import create_api_client, OpenFDAConfig, OpenFDAEndpoint
# Create client with API key (recommended)
config = OpenFDAConfig(api_key="your_api_key")
async with create_api_client("openfda", config) as client:
# Search device adverse events
async for event in client.search(
endpoint=OpenFDAEndpoint.DEVICE_EVENT,
query="device_name:pacemaker AND date_received:[20230101 TO 20231231]",
limit=100,
max_records=1000
):
device = event.get("device", [{}])[0]
print(f"Device: {device.get('device_name')}")
print(f"Event Date: {event.get('date_received')}")
# Search 510(k) clearances
async for clearance in client.search(
endpoint=OpenFDAEndpoint.DEVICE_510K,
query="product_code:OZP",
limit=100
):
print(f"K Number: {clearance.get('k_number')}")
print(f"Applicant: {clearance.get('applicant')}")
# Analytics - count by field
counts = await client.count(
endpoint=OpenFDAEndpoint.DEVICE_EVENT,
query="date_received:[20230101 TO 20231231]",
count_field="device.device_class.exact"
)
for item in counts:
print(f"Class {item['term']}: {item['count']} events")
AccessGUDID API Client
from ashmatics_tools.external_apis import AccessGUDIDClient, AccessGUDIDConfig
# Create client (no API key required for basic operations)
config = AccessGUDIDConfig()
async with AccessGUDIDClient(config) as client:
# Lookup device by Device Identifier (DI)
device = await client.lookup_device(di="08717648200274")
print(f"Brand: {device['gudid']['device']['brandName']}")
print(f"Company: {device['gudid']['device']['companyName']}")
# Parse a UDI string (GS1, HIBCC, or ICCBBA format)
parsed = await client.parse_udi(
udi="(01)00844588012919(17)141231(10)A213B1"
)
print(f"DI: {parsed['di']}")
print(f"Issuing Agency: {parsed['issuingAgency']}")
print(f"Expiration: {parsed['expirationDate']}")
print(f"Lot Number: {parsed['lotNumber']}")
# Get device version history
history = await client.get_device_history(di="08717648200274")
for version in history.get('deviceHistory', []):
print(f"Version {version['publicVersionNumber']}: {version['publicVersionDate']}")
# List implantable devices with date filtering
async for device in client.list_implantable_devices(
from_date="2024-01-01",
max_records=100
):
print(f"{device['brandName']} - {device['companyName']}")
# With UMLS API key for SNOMED lookups
config = AccessGUDIDConfig(umls_api_key="your_umls_key")
async with AccessGUDIDClient(config) as client:
snomed = await client.get_device_snomed(di="08717648200274")
for concept in snomed.get('concepts', []):
print(f"{concept['snomedCTName']}: {concept['snomedIdentifier']}")
MCP Server Integration
The mcp_servers module provides Model Context Protocol servers that expose external APIs as tools for LLM consumption.
OpenFDA MCP Server
from ashmatics_tools.mcp_servers import create_mcp_server, OpenFDAMCPConfig
from ashmatics_tools.external_apis import OpenFDAConfig
# Create MCP server
config = OpenFDAMCPConfig(
api_config=OpenFDAConfig(api_key="your_key")
)
server = create_mcp_server("openfda", config)
# Get available tools
tools = server.get_tools()
# Returns: search_devices, search_drugs, count_by_field
# Call a tool
result = await server.call_tool("search_devices", {
"endpoint": "device_event",
"query": "device_name:pacemaker",
"limit": 10
})
print(f"Found {result['count']} results")
for item in result['results']:
print(item)
AccessGUDID MCP Server
from ashmatics_tools.mcp_servers import create_mcp_server, AccessGUDIDMCPConfig
from ashmatics_tools.external_apis import AccessGUDIDConfig
# Create MCP server
config = AccessGUDIDMCPConfig(
api_config=AccessGUDIDConfig() # No API key required for basic operations
)
server = create_mcp_server("accessgudid", config)
# Get available tools
tools = server.get_tools()
# Returns: lookup_device, parse_udi, get_device_history, get_device_snomed, list_implantable_devices
# Lookup a device by DI
result = await server.call_tool("lookup_device", {"di": "08717648200274"})
print(f"Device: {result['summary']['brandName']}")
# Parse a UDI string
result = await server.call_tool("parse_udi", {
"udi": "(01)00844588012919(17)141231(10)A213B1"
})
print(f"Parsed DI: {result['parsed']['di']}")
# List implantable devices
result = await server.call_tool("list_implantable_devices", {
"from_date": "2024-01-01",
"max_records": 50
})
print(f"Found {result['count']} implantable devices")
Running MCP Servers via stdio
# Run OpenFDA MCP server (set FDA_API_KEY for higher rate limits)
export FDA_API_KEY=your_key
python -m ashmatics_tools.mcp_servers.openfda
# Run AccessGUDID MCP server (set UMLS_API_KEY for SNOMED lookups)
export UMLS_API_KEY=your_key
python -m ashmatics_tools.mcp_servers.accessgudid
For detailed usage examples, see FDA API Usage Guide.
Search/RAG Module
The search module provides RAG (Retrieval-Augmented Generation) strategies for building AI-powered search applications.
Key Features
- RAG Strategies: SimpleRAG and MultiQueryRAG with streaming support
- Context Window Management: Automatic fitting of sources to model context limits
- MCP Tool Definitions: Generic tool schemas for agent integration
- LLM Streaming: SSE and NDJSON streaming support across all LLM providers
Simple RAG Query
from ashmatics_tools.llm import create_llm_client, AzureOpenAIConfig
from ashmatics_tools.embedders import create_embedder
from ashmatics_tools.vector_stores import create_vector_store
from ashmatics_tools.search import create_search_strategy, RAGConfig
# Setup components
llm = create_llm_client("azure_openai", AzureOpenAIConfig(...))
embedder = create_embedder("azure")
vector_store = create_vector_store("cosmosdb", config)
# Create RAG strategy
rag = create_search_strategy(
"simple_rag",
llm=llm,
vector_store=vector_store,
embedder=embedder,
config=RAGConfig(top_k=10, temperature=0.7)
)
# Query with answer generation
async with llm:
result = await rag.query("What are ISO 42001 requirements?")
print(result.answer)
print(f"Sources: {len(result.sources)}")
print(f"Tokens: {result.metrics.total_tokens}")
Multi-Query RAG (Query Expansion)
from ashmatics_tools.search import create_search_strategy
from ashmatics_tools.search.strategies import MultiQueryConfig
# Multi-query expands to multiple query variants for better coverage
config = MultiQueryConfig(
top_k=10,
num_query_variants=3, # Generate 3 query variants
rrf_k=60, # RRF ranking parameter
)
rag = create_search_strategy(
"multi_query_rag",
llm=llm,
vector_store=vector_store,
embedder=embedder,
config=config
)
async with llm:
result = await rag.query("What is risk management in AI governance?")
print(f"Expanded queries: {result.metadata.get('expanded_queries')}")
print(result.answer)
Streaming RAG Responses
# Stream answer generation for real-time UI
async with llm:
async for chunk in rag.stream_query("Explain AI governance controls"):
if chunk.text:
print(chunk.text, end="", flush=True)
if chunk.is_final:
print(f"\n\nSources: {len(chunk.sources)}")
Context Window Management
from ashmatics_tools.llm import ContextWindowManager, ModelContextLimits
# Create manager for GPT-4 Turbo
manager = ContextWindowManager(
model_limits=ModelContextLimits.GPT4_TURBO(),
reserved_output=2000
)
# Fit sources into available context
fitted_sources = manager.fit_sources(
sources=search_results,
query="What are ISO 42001 requirements?",
system_prompt=system_prompt
)
print(f"Fitted {len(fitted_sources)} of {len(search_results)} sources")
MCP Tool Definitions
from ashmatics_tools.search.mcp_tools import (
get_tool_definitions,
export_tools_yaml,
RAG_SEARCH_TOOL,
)
# Get all tool definitions for MCP server registration
tools = get_tool_definitions()
for tool in tools:
print(f"{tool.name}: {tool.description}")
# Export as YAML for configuration
yaml_config = export_tools_yaml()
Document Enrichers
The enrichers module provides post-parsing content analysis for tables and extracted data.
Table Classification
from ashmatics_tools.enrichers import TableClassifier, TableCategory
# Initialize classifier
classifier = TableClassifier(provider="azure_openai")
# Classify tables from a parsed document
categories, tokens = await classifier.classify_tables(parsed_doc.tables)
for table, category in zip(parsed_doc.tables, categories):
if category == TableCategory.PERFORMANCE_METRICS:
# Extract metrics from performance tables
pass
elif category == TableCategory.COMPARISON:
# Process comparison tables
pass
Table Consolidation (Multi-page Tables)
from ashmatics_tools.enrichers import TableConsolidator
# Handle tables that span multiple PDF pages
consolidator = TableConsolidator(
column_similarity_threshold=0.85,
use_llm_validation=True
)
consolidated = await consolidator.consolidate_tables(
parsed_doc.tables,
parsed_doc.markdown
)
for table in consolidated:
if table.merged_from:
print(f"{table.table_id} merged from pages: {table.merged_from}")
Metrics Extraction
from ashmatics_tools.enrichers import MetricsExtractor, DomainKnowledgeProvider
# With optional domain knowledge injection
extractor = MetricsExtractor(domain_knowledge=my_provider)
result = await extractor.extract_from_tables(
tables=performance_tables,
document_text=section_text
)
for metric in result.performance_metrics:
print(f"{metric.metric_name}: {metric.value} [{metric.ci_lower}, {metric.ci_upper}]")
Document Storage
Storage managers for document processing artifacts with manifest generation.
Figure Storage
from ashmatics_tools.document_storage import FigureStorageManager
# Filter small images (logos, icons) and save significant figures
manager = FigureStorageManager(min_size=200)
processed = manager.process_figures(parsed_doc.figures, parsed_doc.markdown)
saved = manager.save_figures(processed, output_dir / 'figures', doc_id)
# Creates figures_manifest.json with metadata
Table Storage
from ashmatics_tools.document_storage import TableStorageManager
# Save tables in dual format (Markdown + JSON)
manager = TableStorageManager()
stored = manager.save_tables(consolidated_tables, output_dir / 'tables', doc_id)
# Creates tables_manifest.json with metadata
Features
Modern HTTP Client (httpx)
All HTTP communication uses httpx, providing:
- Native Type Annotations: Full type safety without separate stub packages
- Async/Await Support: Ready for async operations in performance-critical applications
- HTTP/2 Support: Modern protocol support for improved performance
- Backward Compatible: Synchronous API identical to
requestsfor easy adoption
Secure by Default
- SSL Verification Enabled: All HTTP requests verify SSL certificates by default
- Explicit Opt-Out: SSL verification can only be disabled by explicitly passing
verify=Falseto methods - Security Warnings: Disabling SSL verification triggers warning logs
Flexible Configuration
- Environment Variables: Supports
.envfiles for configuration - Configurable Endpoints: All API endpoints configurable via environment or parameters
- Batch Processing: Configurable batch sizes for large dataset operations
MongoDB Integration (Optional)
- Optional Dependency: MongoDB support is optional via
[mongodb]extras - Abstract Base Classes: Extensible
DocumentProcessorfor custom document types - Upsert Operations: Intelligent upsert with identifier-based conflict resolution
Comprehensive Error Handling
- Detailed Logging: Structured logging throughout all operations
- Graceful Failures: Proper error handling with informative messages
- Validation: Input validation and JSON compliance checking
Architecture
ashmatics-tools/
├── src/ashmatics_tools/
│ ├── __init__.py # Public API exports
│ ├── chunkers/
│ │ ├── __init__.py
│ │ ├── azure_chunker.py
│ │ ├── base.py
│ │ ├── docling_chunker.py
│ │ └── simple_chunker.py
│ ├── document_storage/
│ │ ├── __init__.py
│ │ ├── figure_storage.py
│ │ └── table_storage.py
│ ├── embedders/
│ │ ├── __init__.py
│ │ ├── azure_embedder.py
│ │ ├── base.py
│ │ ├── factory.py
│ │ └── openai_embedder.py
│ ├── embedding/
│ │ ├── __init__.py
│ │ ├── base.py
│ │ ├── mongodb_pipeline.py
│ │ └── specialized/
│ ├── enrichers/
│ │ ├── __init__.py
│ │ ├── table_classifier.py
│ │ ├── table_consolidator.py
│ │ ├── metrics_extractor.py
│ │ └── training_data_extractor.py
│ ├── graphql/
│ │ ├── __init__.py
│ │ └── client.py
│ ├── llm/
│ │ ├── __init__.py
│ │ ├── azure_openai.py
│ │ ├── base.py
│ │ ├── factory.py
│ │ ├── huggingface.py
│ │ ├── openai.py
│ │ └── plugin_registry.py
│ ├── external_apis/
│ │ ├── __init__.py
│ │ ├── base.py
│ │ ├── factory.py
│ │ ├── openfda/
│ │ │ ├── __init__.py
│ │ │ ├── client.py
│ │ │ └── config.py
│ │ └── accessgudid/
│ │ ├── __init__.py
│ │ ├── client.py
│ │ └── config.py
│ ├── mcp_servers/
│ │ ├── __init__.py
│ │ ├── base.py
│ │ ├── factory.py
│ │ ├── openfda/
│ │ │ ├── __init__.py
│ │ │ ├── config.py
│ │ │ └── server.py
│ │ └── accessgudid/
│ │ ├── __init__.py
│ │ ├── config.py
│ │ └── server.py
│ ├── ontology/
│ │ ├── __init__.py
│ │ ├── categories/
│ │ ├── core/
│ │ ├── data/
│ │ └── terms/
│ ├── parsers/
│ │ ├── __init__.py
│ │ ├── base.py
│ │ ├── docling_parser.py
│ │ ├── factory.py
│ │ ├── llama_parser.py
│ │ └── simple_parser.py
│ ├── processors/
│ │ ├── __init__.py
│ │ └── base.py
│ ├── storage/
│ │ ├── __init__.py
│ │ ├── base.py
│ │ ├── factory.py
│ │ ├── adls_store.py
│ │ └── minio_store.py
│ ├── utils/
│ │ ├── __init__.py
│ │ ├── export_utils.py
│ │ ├── import_utils.py
│ │ └── schema_utils.py
│ └── vector_stores/
│ ├── __init__.py
│ ├── base.py
│ ├── cosmosdb_store.py
│ ├── factory.py
│ ├── pgvector_store.py
│ └── qdrant_store.py
├── tests/ # Test suite
├── pyproject.toml # Package configuration
└── README.md
Dependencies
Core Dependencies
pandas>=2.1.0- Data manipulation and Excel file readingopenpyxl>=3.1.2- Excel file format supporthttpx>=0.27.0- Modern HTTP client with native type annotations and async supportpython-dotenv>=1.0.0- Environment variable managementpyyaml>=6.0.0- YAML parsingnumpy>=1.24.0- Numerical operations
Optional Dependencies
pymongo>=4.0.0- MongoDB integration (install with[mongodb]extra)
Development Dependencies
pytest>=7.4.0- Testing frameworkpytest-cov>=4.1.0- Code coverageruff>=0.1.0- Linting and formattingmypy>=1.5.0- Static type checkingpandas-stubs>=2.1.0- Type stubs for pandas
Development
Running Tests
# Run all tests
pytest
# Run with coverage
pytest --cov=ashmatics_tools --cov-report=html
# Run specific test file
pytest tests/test_import_utils.py
Code Quality
# Lint with ruff
ruff check src/
# Format with ruff
ruff format src/
# Type check with mypy
mypy src/
Environment Variables
# GraphQL/Hasura Configuration
HASURA_GRAPHQL_ENDPOINT=https://kb-api.ashmatics.com/v1/graphql
HASURA_ADMIN_SECRET=your-admin-secret-here
# MongoDB Configuration (optional)
MONGODB_CONNECTION_STRING=mongodb://localhost:27017
MONGODB_DATABASE=ashmatics_kb
Python Version Support
- Minimum: Python 3.11
- Tested: Python 3.11, 3.12
- Recommended: Python 3.12+
License
MIT License - See LICENSE file for details
Contributing
This is a private package for Ashmatics internal use. For questions or issues, please contact the development team.
Version History
0.7.0 (2026-01-19) - ASHCAI Clinical AI Governance Ontology (ASHTOOLS-10)
- ASHCAI Ontology Module: Complete Clinical AI Governance ontology for the CAI Framework
AshcaiOntology: Manager class with CRUD operations, relationship management, and traversal helpersashcai-ontology-v1.0.json: Ontology definition with 22 semantic types (T92xx range) and 44 relationship typesashcai_schemas.py: Comprehensive Pydantic schemas with natural business ID validation
- Entity Types: PolicyDomain, ProcessDomain, BasePractice, SOPTemplate, WorkProductTemplate, ExemplarControl, RegulatoryFramework, RegulatoryRequirement
- Natural Business IDs: Human-readable identifiers with regex validation
- PolicyDomain:
MMP-001,TPP-001,AGP-001 - ProcessDomain:
MON,RM,SA,OVR,PV - BasePractice:
MON.BP01,RM.BP03 - SOPTemplate:
SOP-MON-01,SOP-RM-02 - WorkProductTemplate:
WP-MON-02-Dashboard - ExemplarControl:
EXC-A7-04,EXC-A10-02 - RegulatoryRequirement:
NIST-MAP-4.2,JC-RUAIH-3
- PolicyDomain:
- Type Discrimination: All ASHCAI documents include
ontology: "ashcai"field for filtering - 44 Relationship Types: From TDD-001 specification including specifies, containsBasePractice, realizedBy, produces, servesAsEvidenceFor, addressedBy, implementedThrough, operationalizedIn
- Traversal Helpers: Pre-built queries for common governance patterns
get_policy_hierarchy(): Full implementation path from policy to work productsget_evidence_chain(): Evidence trail for controlsget_regulatory_crosswalk(): Map requirements to framework elementsfind_control_implementations(): SOPs implementing a control
- OWL/RDF Export: Generate standard URIs from natural business IDs (
http://asherinformatics.com/ontology/ashcai/{id}) - MongoDB Collections: Separate collections per entity type with indexes for efficient querying
- Exports:
from ashmatics_tools.ontology import AshcaiOntology
0.6.2 (2026-01-07) - Retry Utilities with Exponential Backoff
- Retry Module: New
ashmatics_tools.llm.retrymodule for robust LLM API call handlingRetryConfig: Configurable dataclass for retry behavior (max_attempts, initial_delay, max_delay, exponential_base, jitter, request_delay)calculate_backoff_delay(): Pure function for exponential backoff with jitter calculationcall_with_backoff(): Async wrapper that handlesLLMRateLimitErrorwith configurable retriescall_with_backoff_and_fallback(): Convenience wrapper with fallback function support
- Presets:
RetryConfig.aggressive()for batch processing,RetryConfig.conservative()for interactive use - API Hint Support: Respects
retry_afterhints from rate limit responses (Azure OpenAI, etc.) - Thundering Herd Prevention: Random jitter (default 30%) prevents synchronized retry storms
- Exports: All retry utilities exported from
ashmatics_tools.llm:from ashmatics_tools.llm import RetryConfig, call_with_backoff
0.6.1 (2025-12-30) - Ollama SDK Integration & LLM Enhancements
- Ollama Python SDK Integration: Complete rewrite of
OllamaClientusing officialollamaPython SDK- Embeddings:
generate_embedding(),generate_embeddings()with batch support and dimension control - Vision:
complete_with_vision()for image understanding (llava, llama3.2-vision models) - Tool Calling:
complete_with_tools()for function calling/agentic workflows - Model Management:
list_models(),pull_model(),show_model(),delete_model(),copy_model(),list_running_models() - Keep-Alive Control: Memory management via
keep_aliveconfig (e.g.,"5m","1h","-1") - Streaming: Native async generators via
stream_complete_with_messages() - Health Check:
check_health()for server status verification - Helper Classes:
OllamaTool,OllamaToolParameter,OllamaToolCallfor tool definitions
- Embeddings:
- New Optional Dependency:
[ollama]extra (pip install "ashmatics-tools[ollama]") - Azure OpenAI Improvements: Enhanced
AzureOpenAIClientwith simple completion support - Example Notebook: Added
examples/rag_and_llm_demo.ipynbdemonstrating RAG pipelines and LLM client usage - Integration Tests: Comprehensive Ollama test suite (
tests/integration/test_ollama_integration.py)- 24 tests covering chat, streaming, embeddings, vision, tools, model management
- Performance and concurrent request testing
- Error handling validation
0.6.0 (2025-12-28) - AI Search & RAG Module (ASHTOOLS-5)
- Search Module: Complete RAG (Retrieval-Augmented Generation) framework
SimpleRAGStrategy: Basic RAG flow with embed → retrieve → generateMultiQueryRAGStrategy: Query expansion with parallel retrieval and RRF rankingRAGConfig,RAGResult,RAGMetrics,RAGStreamChunkdataclasses- Factory pattern:
create_search_strategy("simple_rag", ...)with plugin registry
- LLM Streaming: SSE and NDJSON streaming support for all LLM providers
StreamChunkdataclass for streaming responsesstream_complete()andstream_complete_with_messages()methods- Fallback to non-streaming for providers without native support
- Context Window Management: Automatic context fitting for LLM requests
ContextWindowManager: Fit sources into available context with token estimationModelContextLimits: Presets for GPT-4, GPT-4o, Claude Sonnet/Opus/Haiku, Llama, MistralModelFamilyenum for tokenizer selection
- MCP Tool Definitions: Generic tool schemas for agent integration
RAG_SEARCH_TOOL: RAG-enhanced search with answer generationSEMANTIC_SEARCH_TOOL: Semantic similarity search without generationMULTI_QUERY_SEARCH_TOOL: Multi-query RAG with query expansion- Export as JSON or YAML for MCP server registration
- SearchResult ADR-045 Fields: Governance metadata for RAG sources
domain,control_refs,token_refs,document_type,source_uri
- New Optional Dependencies:
[search],[reranking],[rag]extras - 52 Tests: Comprehensive test coverage for all new functionality
0.5.3 (2025-12-25) - Modular Dependencies and Lazy Loading (ASHTOOLS-7)
- Modular Optional Dependencies: Restructured
pyproject.tomlto minimize install size- Core install is now lightweight (~100MB) - NO torch/CUDA by default
- Heavy dependencies (docling, transformers, qdrant) moved to optional extras
- New extras:
[api],[parsers],[chunkers],[docproc],[ml],[full] [api]extra for API apps like Ashmatics-Knowledgebase (~150MB vs ~4GB)
- Lazy Loading: Heavy modules loaded on-demand via
__getattr__DoclingParser,DoclingChunkeronly load torch when accessed- HuggingFace LLM providers registered lazily in factory
- Graceful Import Handling: Clear error messages when optional deps missing
TYPE_CHECKINGused for heavy type hints (docling_core)- Runtime checks with installation instructions
- ADR Documentation: Added
docs/ADRs/ADR-LazyLoadingBigDependencies-ASHTOOLS-7-2025-12-25.md - Container Size Reduction: Enables ~3GB+ savings for API-only applications
0.5.2 (2025-12-06) - llama.cpp Client Support
- LlamaCppClient: Local LLM inference via llama.cpp server
- Metal acceleration on M1/M2 Macs for fast local inference
- OpenAI-compatible API (
/v1/chat/completionsendpoint) - VPS and on-premises deployment support
- Server health and properties endpoints
- llama.cpp-specific parameters (top_k, repeat_penalty, n_gpu_layers)
- Embedding support:
generate_embedding(),generate_embeddings(),get_embedding_dimension() - Always $0.00 cost (self-hosted)
- LlamaCppConfig: Configuration dataclass with endpoint, model, timeout, SSL, context_size, n_gpu_layers
- Factory Integration:
create_llm_client("llamacpp", config)via plugin registry - Unified Interface: Same API as Azure OpenAI, OpenAI, Ollama, HuggingFace
- Live Testing: Verified against local llama.cpp server (11/12 tests passing)
- Note: Embeddings require llama-server to be started with
--embeddingsflag
0.5.1 (2025-12-03) - AccessGUDID API Integration
- AccessGUDID API Client: NIH/FDA Global Unique Device Identification Database integration
- Device lookup by DI, UDI, or record key
- UDI parsing for GS1, HIBCC, ICCBBA formats
- Device version history tracking
- SNOMED CT code lookup (requires UMLS API key)
- Implantable device listing with pagination
- AccessGUDID MCP Server: 5 tools for LLM/agent access to GUDID data
- Typed Models: Pydantic models for type-safe responses (GUDIDDevice, ParsedUDI, etc.)
- Transform Functions: Raw API response to typed model conversion
- Updated MCP Test Script: Now supports both OpenFDA and AccessGUDID servers
0.5.0 (2025-11-29) - Document Enrichers and Storage Managers
-
Enrichers Module: Post-parsing content analysis extracted from FDA 510(k) pipeline
TableClassifier: LLM-based table categorization (COMPARISON, PERFORMANCE_METRICS, STUDY_DESIGN, etc.)TableConsolidator: Multi-page table merge with first-row matching, column similarity, and continuation detectionMetricsExtractor: Performance metrics extraction with confidence intervals and sample sizesTrainingDataExtractor: AI/ML training dataset characteristics extractionDomainKnowledgeProvider: Abstract base for domain-specific context injection
-
Document Storage Module: Artifact storage managers for document processing outputs
FigureStorageManager: Figure filtering, PNG conversion, content-addressed storage with manifestsTableStorageManager: Dual-format (Markdown + JSON) table storage with manifests
-
Reusable Across Pipelines: Supports FDA 510(k), research papers, clinical guidelines, pre-prints
-
Domain Extensibility: Domain-specific logic via provider injection without modifying base extractors
-
New Dependency: Added
Pillow>=10.0.0for figure processing -
(NOTE - we had a screw up in the release and tags in the repo, so skipping version 0.4.0, even though 0.5.0 is really 0.4.0. Like the 13th floor...)
0.3.1 (2025-11-22) - FDA API Integration and MCP Servers
- External APIs Module: Extensible framework for external data source integration
- OpenFDA Client: Complete US FDA Open Data Portal integration
- Support for all major endpoints: device (510k, events, recalls), drugs (labels, adverse events), food
- Automatic retry with exponential backoff for transient errors
- Client-side rate limiting with token bucket algorithm (respects FDA API limits)
- Automatic pagination for large result sets
- Query syntax support: field search, date ranges, boolean operators, wildcards
- Count/analytics queries for aggregated data
- MCP Servers Module: Model Context Protocol servers for LLM integration
- BaseMCPServer abstract base for creating MCP tool servers
- OpenFDAMCPServer exposing FDA API as LLM tools (search_devices, search_drugs, count_by_field)
- JSON Schema-based input validation
- Response formatting and error handling for LLM consumption
- Factory Pattern: Plugin registry for custom API providers and MCP servers
- Async-First: All operations use async/await for high-performance pipelines
- Comprehensive Documentation:
- Updated CLAUDE.md with 380+ lines of usage examples
- New FDA_API_USAGE_GUIDE.md with complete reference (650+ lines)
- Query syntax guide, field references, practical examples
- Testing: Full test coverage with respx-based mocking for httpx
- Future-Ready: Architecture supports Census Bureau, CMS, and other data sources
0.3.0 (2025-11-21) - Ontology and Term Services Integration
- Ontology Module: Complete medical ontology management system
- Term Resolution: MongoDB-based term lookup and management with TermResolver
- Category Management: Hierarchical category structures with CategoryManager
- External Ontology Integration: BioPortal API client for validating terms against SNOMED CT, RADLEX, LOINC, NCIT
- Custom ASHMATICS Ontology: Domain-specific ontology manager for medical imaging AI concepts
- Schema Definitions: Comprehensive Pydantic schemas for terms, categories, and ontology operations
- Async API: Full async support for all ontology operations
- Integration Ready: Seamless integration with existing document processing and vector search pipelines
0.2.0 (2025-11-12) - Major Module Migration
- Complete Migration: Migrated ALL common modules from ashmatics-kb-tools
- Parsers Module: SimpleParser, DoclingParser, LlamaParser with factory function
- Chunkers Module: SimpleChunker, AzureChunker, DoclingChunker with factory function
- Embedders Module: AzureEmbedder, OpenAIEmbedder with factory function
- Embedding Pipelines: MongoDBEmbeddingPipeline + specialized pipelines for framework, use cases, and cards
- Vector Stores: CosmosDB, PostgreSQL pgvector, Qdrant implementations with factory function
- 52 Public Exports: Complete document processing, embedding, and vector search workflow
- Code Quality: All files standardized with copyright headers, ruff-compliant, modernized type hints
- Production Ready: 100% migration complete, all tests passing
0.1.0 (2025-01-12)
- Initial release
- Extracted from ashmatics-kb-tools repository
- Core utilities: KBImporter, DataExporter, schema tools
- Abstract DocumentProcessor base class
- Generic GraphQL client utilities
- Migrated to httpx: Modern HTTP client with native type annotations and async support
- SSL verification enabled by default
- Full mypy type safety (zero type errors)
- Python 3.11+ support
Complete Document Processing Pipeline
The package now provides a complete end-to-end pipeline for document processing:
from ashmatics_tools import (
create_parser, # Parse documents (PDF, DOCX, etc.)
create_chunker, # Chunk into manageable pieces
create_embedder, # Generate embeddings
create_vector_store # Store and search vectors
)
# 1. Parse document
parser = create_parser("docling")
parsed_doc = await parser.parse_file("document.pdf")
# 2. Chunk document
chunker = create_chunker(strategy="docling")
chunks = await chunker.chunk_document(
content=parsed_doc.markdown,
title="Document Title",
source="document.pdf"
)
# 3. Generate embeddings
embedder = create_embedder(provider="azure")
await embedder.initialize()
embedded_chunks = await embedder.embed_chunks(chunks)
# 4. Store in vector database
vector_store = create_vector_store(provider="cosmosdb")
success, failed = await vector_store.store_embeddings_batch(embedded_chunks)
# 5. Search
query_embedding = await embedder.generate_embedding("search query")
results = await vector_store.similarity_search(query_embedding, top_k=10)
Module Overview (v0.3.0)
Parsers (ashmatics_tools.parsers)
Document parsing with multiple backends:
- SimpleParser: Basic fallback parser
- DoclingParser: Advanced PDF parsing with tables/figures
- LlamaParser: LlamaParse cloud service integration
- Factory:
create_parser(provider)
Chunkers (ashmatics_tools.chunkers)
Document chunking strategies:
- SimpleChunker: Paragraph-based chunking
- AzureChunker: Azure-compatible with tiktoken
- DoclingChunker: Token-aware semantic chunking
- Factory:
create_chunker(strategy)
Embedders (ashmatics_tools.embedders)
Embedding generation:
- AzureEmbedder: Azure OpenAI embeddings
- OpenAIEmbedder: OpenAI embeddings
- Factory:
create_embedder(provider)
Embedding Pipelines (ashmatics_tools.embedding)
MongoDB-based embedding workflows:
- MongoDBEmbeddingPipeline: Generic pipeline
- Specialized Pipelines: Framework, use cases, cards
Vector Stores (ashmatics_tools.vector_stores)
Vector database integrations:
- CosmosDBVectorStore: Azure CosmosDB with MongoDB vCore API
- PgVectorStore: PostgreSQL with pgvector extension
- QdrantVectorStore: Qdrant vector database
- Factory:
create_vector_store(provider)
Storage Backends (ashmatics_tools.storage)
Cloud-agnostic storage abstraction:
- ADLSStorageClient: Azure Data Lake Storage Gen2 with dual auth (connection string or DefaultAzureCredential)
- MinIOStorageClient: MinIO object storage (S3-compatible)
- S3StorageClient: AWS S3 (reserved for future implementation)
- Factory:
create_storage_client(provider, config) - Features: Async API, buffered and streaming reads/writes, glob pattern matching, metadata operations
LLM Clients (ashmatics_tools.llm)
Unified interface for language model providers:
- AzureOpenAIClient: Azure OpenAI Service
- OpenAIClient: OpenAI direct API
- HuggingFaceInferenceClient: HuggingFace Inference API (requires
[huggingface]extra) - HuggingFaceLocalClient: Local HuggingFace models (requires
[huggingface]extra) - AzureAIFoundryClient: Full Azure AI Foundry model catalog (requires
[azure-ai]extra) - OllamaClient: Local/ACA/K8s Ollama inference with SDK (requires
[ollama]extra) - embeddings, vision, tools, model management - Factory:
create_llm_client(provider, config)with plugin registry - Features: Async-first API, unified completion interface, cost tracking, plugin registry, extensible via
register_llm_provider()
Ontology Services (ashmatics_tools.ontology)
Medical ontology management and term services:
- TermResolver: MongoDB-based term lookup and resolution
- CategoryManager: Hierarchical category management for document tagging
- BioPortalClient: External ontology validation via NCBO BioPortal API (SNOMED CT, RADLEX, LOINC, NCIT)
- AshmaticsOntology: Custom ASHMATICS domain-specific ontology for medical imaging AI concepts
- Features: Async API, comprehensive schema validation, integration with external ontologies
External APIs (ashmatics_tools.external_apis)
Clients for external data sources with robust error handling:
- OpenFDAClient: US FDA Open Data Portal (open.fda.gov) integration
- AccessGUDIDClient: NIH/FDA Global Unique Device Identification Database (accessgudid.nlm.nih.gov) integration
- BaseAPIClient: Abstract base for creating custom API clients
- OpenFDA Endpoints: Device 510(k), adverse events, recalls, drug labels, FAERS, enforcement actions
- AccessGUDID Endpoints: Device lookup, UDI parsing, device history, SNOMED mappings, implantable device listings
- Factory:
create_api_client(provider, config)with plugin registry - Features: Async API, retry with exponential backoff, client-side rate limiting, automatic pagination
- Query Syntax: Support for field search, date ranges, boolean operators, wildcards (OpenFDA)
- Extensibility: Register custom providers via
register_api_provider()for Census, CMS, etc.
MCP Servers (ashmatics_tools.mcp_servers)
Model Context Protocol servers for LLM integration:
- OpenFDAMCPServer: Expose OpenFDA API as LLM tools (search_devices, search_drugs, count_by_field)
- AccessGUDIDMCPServer: Expose AccessGUDID API as LLM tools (lookup_device, parse_udi, get_device_history, get_device_snomed, list_implantable_devices)
- BaseMCPServer: Abstract base for creating MCP tool servers
- Factory:
create_mcp_server(name, config)with plugin registry - Features: JSON Schema validation, response formatting, error handling, streaming support
- Use Case: Thin adapter layer between LLMs and external data sources
- Extensibility: Register custom servers via
register_mcp_server() - Stdio Runner: Run servers via
python -m ashmatics_tools.mcp_servers.{openfda,accessgudid}
Search/RAG (ashmatics_tools.search)
RAG (Retrieval-Augmented Generation) strategies for AI-powered search:
- SimpleRAGStrategy: Basic RAG flow with embed → retrieve → generate
- MultiQueryRAGStrategy: Query expansion with parallel retrieval and RRF ranking
- RAGConfig: Configuration for top_k, temperature, max_tokens, system_prompt
- RAGResult: Answer with sources, metrics, and metadata
- RAGStreamChunk: Streaming response chunks with partial sources
- Factory:
create_search_strategy(name, llm, vector_store, embedder, config)with plugin registry - MCP Tools: Generic tool definitions (
rag_search,semantic_search,multi_query_search) - Context Management:
ContextWindowManagerfor automatic source fitting - Model Presets:
ModelContextLimitsfor GPT-4, Claude, Llama, Mistral - Features: Async-first API, streaming support, ADR-045 governance metadata
Enrichers (ashmatics_tools.enrichers)
Post-parsing document enrichment for tables and extracted data:
- TableClassifier: LLM-based table categorization by content type
- TableConsolidator: Multi-page table merge with heuristics and LLM validation
- MetricsExtractor: Performance metrics extraction with statistical context
- TrainingDataExtractor: AI/ML training dataset characteristics extraction
- DomainKnowledgeProvider: Abstract base for domain-specific context injection
- Categories: COMPARISON, PERFORMANCE_METRICS, STUDY_DESIGN, TECHNICAL_SPECS, DEMOGRAPHICS, etc.
- Features: Handles PDF parser fragmentation, continuation markers, column similarity matching
Document Storage (ashmatics_tools.document_storage)
Artifact storage managers for document processing outputs:
- FigureStorageManager: Figure filtering, PNG conversion, content-addressed storage
- TableStorageManager: Dual-format (Markdown + JSON) table storage
- ProcessedFigure: Dataclass for processed figures with metadata
- StoredTable: Dataclass for stored tables with file paths
- Features: Automatic manifest generation, content hashing, size filtering
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ashmatics_tools-0.7.2.tar.gz.
File metadata
- Download URL: ashmatics_tools-0.7.2.tar.gz
- Upload date:
- Size: 323.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3b2ce5e38ffd77123c9e150ad2067223180ebc57eb985a8cab1aad62f1abf36d
|
|
| MD5 |
0a8092ef2ebc73f3c4466418cd8173c7
|
|
| BLAKE2b-256 |
6a1b241bee10551aa7a371dd836bc12dfae8d05cedf84001985b1184ca6061c8
|
Provenance
The following attestation bundles were made for ashmatics_tools-0.7.2.tar.gz:
Publisher:
publish.yml on AshMatics/ashmatics-tools
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ashmatics_tools-0.7.2.tar.gz -
Subject digest:
3b2ce5e38ffd77123c9e150ad2067223180ebc57eb985a8cab1aad62f1abf36d - Sigstore transparency entry: 1313070446
- Sigstore integration time:
-
Permalink:
AshMatics/ashmatics-tools@db3fccb12380d50ab3fb68beab35eb73f0aa14ff -
Branch / Tag:
refs/tags/v0.7.2 - Owner: https://github.com/AshMatics
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@db3fccb12380d50ab3fb68beab35eb73f0aa14ff -
Trigger Event:
push
-
Statement type:
File details
Details for the file ashmatics_tools-0.7.2-py3-none-any.whl.
File metadata
- Download URL: ashmatics_tools-0.7.2-py3-none-any.whl
- Upload date:
- Size: 326.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4eab26c7f86d9a86673e12ec6410ebe1c40394e1b1eec67817dc06a9f52a5695
|
|
| MD5 |
b33d09ec2301ee8c4128a4ca29dc9cc1
|
|
| BLAKE2b-256 |
8eabc9292c7e48ae72bb474e52df16135e204b8dba256ccb99ae798c8df6666e
|
Provenance
The following attestation bundles were made for ashmatics_tools-0.7.2-py3-none-any.whl:
Publisher:
publish.yml on AshMatics/ashmatics-tools
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ashmatics_tools-0.7.2-py3-none-any.whl -
Subject digest:
4eab26c7f86d9a86673e12ec6410ebe1c40394e1b1eec67817dc06a9f52a5695 - Sigstore transparency entry: 1313070536
- Sigstore integration time:
-
Permalink:
AshMatics/ashmatics-tools@db3fccb12380d50ab3fb68beab35eb73f0aa14ff -
Branch / Tag:
refs/tags/v0.7.2 - Owner: https://github.com/AshMatics
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@db3fccb12380d50ab3fb68beab35eb73f0aa14ff -
Trigger Event:
push
-
Statement type: