AI-powered metadata enhancement for Hasura DDN schema files
Project description
DDN Metadata Bootstrap
AI-powered metadata enhancement for Hasura DDN (Data Delivery Network) schema files. Automatically generate descriptions and detect sophisticated relationships in your YAML/HML schema definitions using advanced AI and intelligent pattern recognition.
๐ Features
- ๐ค AI-Powered Descriptions: Generate natural language descriptions for schema elements using Anthropic's Claude
- ๐ Advanced Relationship Detection:
- Foreign key relationships with confidence scoring
- Shared business key many-to-many relationships
- Bidirectional relationship generation
- camelCase/snake_case field name support
- Safe incremental enhancement (preserves existing relationships)
- ๐ Domain Analysis: Intelligent analysis of business domains and terminology
- โก Batch Processing: Process entire directories of schema files efficiently
- ๐ฏ DDN Optimized: Specifically designed for Hasura DDN schema structures
- ๐ง Configurable: Extensive configuration options via environment variables or CLI
- ๐๏ธ Model-Aware: Only processes ObjectTypes with associated Models for production-ready relationships
๐ฆ Installation
From PyPI (Recommended)
pip install ddn-metadata-bootstrap
From Source
git clone https://github.com/hasura/ddn-metadata-bootstrap.git
cd ddn-metadata-bootstrap
pip install -e .
๐ Quick Start
1. Set up your environment
export ANTHROPIC_API_KEY="your-anthropic-api-key"
export METADATA_BOOTSTRAP_INPUT_DIR="./input"
export METADATA_BOOTSTRAP_OUTPUT_DIR="./output"
2. Run the tool
# Process entire directory
ddn-metadata-bootstrap
# Or with CLI arguments
ddn-metadata-bootstrap --input-dir ./schema --output-dir ./enhanced --api-key YOUR_KEY
3. Or use as a Python package
from ddn_metadata_bootstrap import MetadataBootstrapper
bootstrapper = MetadataBootstrapper(
api_key="your-anthropic-api-key",
use_case="E-commerce platform"
)
# Process directory
bootstrapper.process_directory("./input", "./output")
# Get statistics
stats = bootstrapper.get_statistics()
print(f"Generated {stats['relationships_generated']} relationships")
print(f"FK relationships: {stats['fk_relationships']}")
print(f"Shared field relationships: {stats['shared_field_relationships']}")
๐ Examples
Schema Description Enhancement
Input HML File
kind: ObjectType
version: v1
definition:
name: User
fields:
- name: id
type: ID!
- name: email
type: String!
- name: created_at
type: String
Enhanced Output
kind: ObjectType
version: v1
definition:
name: User
description: |
Represents a user account in the system with authentication
and profile information.
fields:
- name: id
type: ID!
description: Unique identifier for the user account.
- name: email
type: String!
description: User's email address for authentication and communication.
- name: created_at
type: String
description: Timestamp when the user account was created.
Relationship Detection Examples
Foreign Key Detection (camelCase Support)
# Input Schema
kind: ObjectType
definition:
name: UserProfile
fields:
- name: userId # camelCase field
type: String
- name: companyId # camelCase field
type: String
# Generated Relationships
---
kind: Relationship
definition:
name: user # Forward relationship
sourceType: UserProfile
target:
model:
name: User
relationshipType: Object
mapping:
- source:
fieldPath: [fieldName: userId] # Original camelCase preserved
target:
modelField: [fieldName: id]
---
kind: Relationship
definition:
name: userProfilesByUser # Reverse relationship
sourceType: User
target:
model:
name: UserProfile
relationshipType: Array
mapping:
- source:
fieldPath: [fieldName: id]
target:
modelField: [fieldName: userId] # Original camelCase preserved
Shared Business Key Detection (Many-to-Many)
# Input: Multiple entities with shared business fields
# Entity A
kind: ObjectType
definition:
name: Application
fields:
- name: category # Shared business key
type: String
- name: version # Shared business key
type: String
# Entity B
kind: ObjectType
definition:
name: PolicyCompliance
fields:
- name: category # Same business key
type: String
# Generated Many-to-Many Relationship
---
kind: Relationship
definition:
name: policyCompliancesByCategory
sourceType: Application
target:
model:
name: PolicyCompliance
relationshipType: Array # Many-to-many via shared key
mapping:
- source:
fieldPath: [fieldName: category]
target:
modelField: [fieldName: category]
๐ What It Does
1. AI-Powered Description Generation
- Analyzes schema element names and types for context
- Generates human-readable descriptions using Anthropic's Claude
- Respects character limits and DDN style guidelines
- Supports field-level and entity-level descriptions
- Understands business domain terminology
2. Advanced Relationship Detection
Foreign Key Detection
- Pattern Recognition: Detects FK patterns like
user_id,userId,company_id,departmentName - camelCase Support: Handles
userIdโuser_idconversion for analysis while preserving original field names (automatic) - Confidence Scoring: Uses minimum confidence thresholds to prevent spurious relationships
- Bidirectional Generation: Creates both forward (many-to-one) and reverse (one-to-many) relationships
- Cross-Subgraph Intelligence: Smart entity matching across subgraph boundaries
Shared Business Key Detection
- Business Logic Focus: Identifies meaningful shared fields like
category,version,customer_id,project_code - Many-to-Many Relationships: Creates bidirectional many-to-many relationships via business keys
- Generic Field Filtering: Excludes meaningless generic fields (
id,name,status) to focus on business relationships - Mixed Naming Support: Handles
departmentNameโdepartment_namefield matching
Quality & Precision
- Model Association Requirement: Only processes ObjectTypes that have associated Models (production-ready constraint)
- Confidence Thresholds: Rejects weak matches (e.g.,
lastUsedFileNameโ spurious entity matches) - Relationship Deduplication: Detects and avoids creating duplicate relationships with same field mappings (prevents equivalent relationships with different names)
- Existing Relationship Protection: Never overwrites existing relationship definitions in your schema files
- Automatic Field Preservation: Always maintains exact original field names in generated YAML (not configurable)
3. Domain Analysis
- Extracts business terminology from schema structure
- Identifies domain-specific patterns and relationships
- Provides contextual AI prompts based on detected domains
- Supports configurable domain-specific relationship hints
4. Schema Enhancement
- Preserves original schema structure and formatting
- Adds descriptions without breaking DDN functionality
- Generates proper DDN relationship definitions without overwriting existing ones
- Maintains YAML formatting, comments, and field order
- Handles complex nested structures and cross-references
- Smart deduplication: Won't create redundant relationships even if they have different names but same field mappings
โ๏ธ Configuration
Environment Variables
All configuration can be done via environment variables with the METADATA_BOOTSTRAP_ prefix:
# Required
ANTHROPIC_API_KEY=your_api_key_here
# Input/Output (choose one mode)
METADATA_BOOTSTRAP_INPUT_DIR=./input
METADATA_BOOTSTRAP_OUTPUT_DIR=./output
# OR single file mode
METADATA_BOOTSTRAP_INPUT_FILE=./schema.hml
METADATA_BOOTSTRAP_OUTPUT_FILE=./enhanced.hml
# AI Configuration
METADATA_BOOTSTRAP_USE_CASE="E-commerce platform"
METADATA_BOOTSTRAP_MODEL=claude-3-haiku-20240307
METADATA_BOOTSTRAP_FIELD_DESC_MAX_LENGTH=120
METADATA_BOOTSTRAP_KIND_DESC_MAX_LENGTH=250
# Relationship Detection
METADATA_BOOTSTRAP_GENERIC_FIELDS="_id,_key,id,key,name,status,type,created,updated"
METADATA_BOOTSTRAP_FK_TEMPLATES="{pt}_{gi}|{gi},{fs}_{pt}_{gi}|{gi}"
METADATA_BOOTSTRAP_DOMAIN_IDENTIFIERS="user,customer,order,product,company"
CLI Arguments
ddn-metadata-bootstrap --help
Options:
--input-dir PATH Input directory containing HML files
--output-dir PATH Output directory for enhanced files
--input-file PATH Single input HML file
--output-file PATH Single output HML file
--api-key TEXT Anthropic API key
--use-case TEXT Business domain description
--model TEXT AI model to use
--field-max-length INTEGER Max characters for field descriptions
--kind-max-length INTEGER Max characters for kind descriptions
--verbose Enable verbose logging
--dry-run Validate configuration without processing
--stats Show relationship detection statistics
๐ง Relationship Detection Deep Dive
Smart Deduplication Example
# Existing relationship in schema
kind: Relationship
definition:
name: userAccount # Custom name
sourceType: UserProfile
mapping:
- source:
fieldPath: [fieldName: userId]
target:
modelField: [fieldName: id]
# Tool detects same mapping and WON'T create:
# - Another relationship with same UserProfile.userId -> User.id mapping
# - Even if it would be named differently (like "user")
# Result: No duplicate/redundant relationships generated
Supported Field Naming Conventions
| Input Field | Analysis | FK Detection | Shared Field | Output Preserved |
|---|---|---|---|---|
userId |
user_id |
โ
user entity |
โ
Matches user_id |
userId |
user_id |
user_id |
โ
user entity |
โ
Matches userId |
user_id |
companyId |
company_id |
โ
company entity |
โ Business key | companyId |
departmentName |
department_name |
โ Not FK pattern | โ Shared field | departmentName |
lastUsedFileName |
last_used_file_name |
โ Low confidence | โ Business data | lastUsedFileName |
Relationship Types Generated
-
Forward FK Relationships (Many-to-One)
UserProfile.userโUser(Object)- Based on
userId/user_idfields
-
Reverse FK Relationships (One-to-Many)
User.userProfilesByUserโUserProfile[](Array)- Contextual naming with field reference
-
Shared Field Relationships (Many-to-Many)
Application.policiesByCategoryโPolicy[](Array)Policy.applicationsByCategoryโApplication[](Array)- Based on shared business keys
Quality Filters
- Confidence Scoring: FK matches must score โฅ50 to prevent spurious relationships
- Model Association: Only ObjectTypes with Models participate in relationships
- Generic Field Exclusion: Shared field detection ignores
id,name,key,status, etc. - Business Logic Validation: Prevents meaningless connections like audit fields โ entities
- Smart Deduplication: Analyzes existing relationship mappings to avoid creating functionally equivalent relationships (even with different names)
- Existing Relationship Protection: Scans for and preserves existing relationship definitions - never overwrites
- Automatic Field Preservation: Original field names (
userId,department_name) are always preserved in output - this behavior is built-in and not configurable
๐๏ธ Architecture
The tool is built with a modular architecture:
ai/- AI integration and description generationschema/- Schema analysis and metadata collectionrelationships/- Advanced relationship detection and generationdetector.py- FK and shared field pattern detectiongenerator.py- YAML relationship definition creationmapper.py- Relationship orchestration and context
processors/- File and directory processingutils/- Text processing, YAML handling, path utilities
๐งช Testing
# Install dev dependencies
pip install -e ".[dev]"
# Run tests
pytest
# Run with coverage
pytest --cov=ddn_metadata_bootstrap
# Test relationship detection specifically
pytest tests/test_relationships.py -v
# Type checking
mypy ddn_metadata_bootstrap/
# Code formatting
black ddn_metadata_bootstrap/
๐ Statistics & Reporting
The tool provides detailed statistics on processing:
stats = bootstrapper.get_statistics()
print(f"Entities processed: {stats['entities_processed']}")
print(f"Entities with Models: {stats['entities_with_models']}")
print(f"Total relationships: {stats['relationships_generated']}")
print(f"FK relationships: {stats['fk_relationships']}")
print(f"Shared field relationships: {stats['shared_field_relationships']}")
print(f"Cross-subgraph relationships: {stats['cross_subgraph_relationships']}")
print(f"Descriptions generated: {stats['descriptions_generated']}")
๐ Future Enhancements
The current implementation focuses on metadata/schema analysis for fast, secure relationship detection. Several exciting enhancements could build on this foundation:
Data Analysis Validation
# Validate detected relationships against actual data
ddn-metadata-bootstrap --validate-with-data --db-connection postgresql://...
- Referential Integrity Checking: Verify that
user_idvalues actually exist inusers.id - Statistical Confidence: "94% of order.customer_id values found in customers.id"
- Orphaned Relationship Detection: Find foreign key fields that don't reference anything
- Convention-Independent Discovery: Detect relationships regardless of naming patterns
Advanced Pattern Recognition
- Machine Learning Field Classification: Train models to recognize relationship patterns beyond naming conventions
- Semantic Analysis: Use NLP to understand field meanings (
customer_refโclients.id) - Cross-Database Pattern Learning: Learn from relationship patterns across multiple schemas
- Domain-Specific Templates: Industry-specific relationship detection (e-commerce, healthcare, finance)
Enhanced AI Integration
- Relationship Validation: Ask AI to validate if detected relationships make business sense
- Missing Relationship Suggestions: AI-powered analysis of potential missing connections
- Relationship Documentation: Auto-generate business logic explanations for relationships
- Schema Quality Scoring: Overall relationship completeness and quality metrics
Why Metadata-First Design Enables These
The current schema analysis foundation provides:
- ๐ Fast Detection: Quick feedback for iterative development
- ๐ Security: No production data access required for core functionality
- ๐ Schema Validation: Works on schemas before data exists
- ๐ฏ DDN Integration: Native support for Hasura DDN patterns
Data analysis features would be additive enhancements that complement rather than replace the metadata approach, providing validation and discovery capabilities for mature schemas with production data.
๐ค Contributing
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Development Guidelines
- Add tests for new relationship detection patterns
- Update configuration documentation for new options
- Follow the existing code style and architecture
- Include examples in docstrings for complex functions
๐ License
This project is licensed under the MIT License - see the LICENSE file for details.
๐ Support
- ๐ Documentation
- ๐ Bug Reports
- ๐ฌ Discussions
- ๐ Relationship Detection Issues
๐ท๏ธ Version History
See CHANGELOG.md for version history and breaking changes.
โญ Acknowledgments
- Built for Hasura DDN
- Powered by Anthropic Claude
- Inspired by the GraphQL and OpenAPI communities
- Relationship detection algorithms inspired by database schema analysis tools
Made with โค๏ธ by the Hasura team
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ddn_metadata_bootstrap-1.0.8.tar.gz.
File metadata
- Download URL: ddn_metadata_bootstrap-1.0.8.tar.gz
- Upload date:
- Size: 77.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d2f6ad778931472ec0ea8fbcfa3571312e9d78ad4c323bbaf4ef72c3a0ecb41a
|
|
| MD5 |
b06ab8065048890b7c5d00e10f450773
|
|
| BLAKE2b-256 |
1e14a09fcdd05307c306499143b4249ef1cdbe0325cd5fcb794ea813be11d122
|
File details
Details for the file ddn_metadata_bootstrap-1.0.8-py3-none-any.whl.
File metadata
- Download URL: ddn_metadata_bootstrap-1.0.8-py3-none-any.whl
- Upload date:
- Size: 82.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7b939f7fc93bc1079a73d60bb4fcf8c8cdc95a9aa838b715c44b7b443fcfbfff
|
|
| MD5 |
58b2266831f62ea8f38352e402ceecda
|
|
| BLAKE2b-256 |
6c4f5545bfb0520ce230cda4cfbac857b07273443cdce4bbfc6ebbaa993eb2a2
|