AI-powered metadata enhancement for Hasura DDN schema files

These details have not been verified by PyPI

Project links

Project description

DDN Metadata Bootstrap

AI-powered metadata enhancement for Hasura DDN (Data Delivery Network) schema files. Automatically generate descriptions and detect sophisticated relationships in your YAML/HML schema definitions using advanced AI and intelligent pattern recognition.

🚀 Features

🤖 AI-Powered Descriptions: Generate natural language descriptions for schema elements using Anthropic's Claude
🔗 Advanced Relationship Detection:
- Foreign key relationships with confidence scoring
- Shared business key many-to-many relationships
- Bidirectional relationship generation
- camelCase/snake_case field name support
- Safe incremental enhancement (preserves existing relationships)
📊 Domain Analysis: Intelligent analysis of business domains and terminology
⚡ Batch Processing: Process entire directories of schema files efficiently
🎯 DDN Optimized: Specifically designed for Hasura DDN schema structures
🔧 Configurable: Extensive configuration options via environment variables or CLI
🏗️ Model-Aware: Only processes ObjectTypes with associated Models for production-ready relationships

📦 Installation

From PyPI (Recommended)

pip install ddn-metadata-bootstrap

From Source

git clone https://github.com/hasura/ddn-metadata-bootstrap.git
cd ddn-metadata-bootstrap
pip install -e .

🏃 Quick Start

1. Set up your environment

export ANTHROPIC_API_KEY="your-anthropic-api-key"
export METADATA_BOOTSTRAP_INPUT_DIR="./input"
export METADATA_BOOTSTRAP_OUTPUT_DIR="./output"

2. Run the tool

# Process entire directory
ddn-metadata-bootstrap

# Or with CLI arguments
ddn-metadata-bootstrap --input-dir ./schema --output-dir ./enhanced --api-key YOUR_KEY

3. Or use as a Python package

from ddn_metadata_bootstrap import MetadataBootstrapper

bootstrapper = MetadataBootstrapper(
    api_key="your-anthropic-api-key",
    use_case="E-commerce platform"
)

# Process directory
bootstrapper.process_directory("./input", "./output")

# Get statistics
stats = bootstrapper.get_statistics()
print(f"Generated {stats['relationships_generated']} relationships")
print(f"FK relationships: {stats['fk_relationships']}")
print(f"Shared field relationships: {stats['shared_field_relationships']}")

📝 Examples

Schema Description Enhancement

Input HML File

kind: ObjectType
version: v1
definition:
  name: User
  fields:
    - name: id
      type: ID!
    - name: email
      type: String!
    - name: created_at
      type: String

Enhanced Output

kind: ObjectType
version: v1
definition:
  name: User
  description: |
    Represents a user account in the system with authentication
    and profile information.
  fields:
    - name: id
      type: ID!
      description: Unique identifier for the user account.
    - name: email
      type: String!
      description: User's email address for authentication and communication.
    - name: created_at
      type: String
      description: Timestamp when the user account was created.

Relationship Detection Examples

Foreign Key Detection (camelCase Support)

# Input Schema
kind: ObjectType
definition:
  name: UserProfile
  fields:
    - name: userId        # camelCase field
      type: String
    - name: companyId     # camelCase field  
      type: String

# Generated Relationships
---
kind: Relationship
definition:
  name: user             # Forward relationship
  sourceType: UserProfile
  target:
    model:
      name: User
      relationshipType: Object
  mapping:
  - source:
      fieldPath: [fieldName: userId]    # Original camelCase preserved
    target:
      modelField: [fieldName: id]
---
kind: Relationship  
definition:
  name: userProfilesByUser    # Reverse relationship
  sourceType: User
  target:
    model:
      name: UserProfile
      relationshipType: Array
  mapping:
  - source:
      fieldPath: [fieldName: id]
    target:
      modelField: [fieldName: userId]  # Original camelCase preserved

Shared Business Key Detection (Many-to-Many)

# Input: Multiple entities with shared business fields
# Entity A
kind: ObjectType
definition:
  name: Application
  fields:
    - name: category      # Shared business key
      type: String
    - name: version       # Shared business key
      type: String

# Entity B  
kind: ObjectType
definition:
  name: PolicyCompliance
  fields:
    - name: category      # Same business key
      type: String

# Generated Many-to-Many Relationship
---
kind: Relationship
definition:
  name: policyCompliancesByCategory
  sourceType: Application
  target:
    model:
      name: PolicyCompliance
      relationshipType: Array      # Many-to-many via shared key
  mapping:
  - source:
      fieldPath: [fieldName: category]
    target:
      modelField: [fieldName: category]

🔄 What It Does

1. AI-Powered Description Generation

Analyzes schema element names and types for context
Generates human-readable descriptions using Anthropic's Claude
Respects character limits and DDN style guidelines
Supports field-level and entity-level descriptions
Understands business domain terminology

2. Advanced Relationship Detection

Foreign Key Detection

Pattern Recognition: Detects FK patterns like user_id, userId, company_id, departmentName
camelCase Support: Handles userId → user_id conversion for analysis while preserving original field names (automatic)
Confidence Scoring: Uses minimum confidence thresholds to prevent spurious relationships
Bidirectional Generation: Creates both forward (many-to-one) and reverse (one-to-many) relationships
Cross-Subgraph Intelligence: Smart entity matching across subgraph boundaries

Shared Business Key Detection

Business Logic Focus: Identifies meaningful shared fields like category, version, customer_id, project_code
Many-to-Many Relationships: Creates bidirectional many-to-many relationships via business keys
Generic Field Filtering: Excludes meaningless generic fields (id, name, status) to focus on business relationships
Mixed Naming Support: Handles departmentName ↔ department_name field matching

Quality & Precision

Model Association Requirement: Only processes ObjectTypes that have associated Models (production-ready constraint)
Confidence Thresholds: Rejects weak matches (e.g., lastUsedFileName → spurious entity matches)
Relationship Deduplication: Detects and avoids creating duplicate relationships with same field mappings (prevents equivalent relationships with different names)
Existing Relationship Protection: Never overwrites existing relationship definitions in your schema files
Automatic Field Preservation: Always maintains exact original field names in generated YAML (not configurable)

3. Domain Analysis

Extracts business terminology from schema structure
Identifies domain-specific patterns and relationships
Provides contextual AI prompts based on detected domains
Supports configurable domain-specific relationship hints

4. Schema Enhancement

Preserves original schema structure and formatting
Adds descriptions without breaking DDN functionality
Generates proper DDN relationship definitions without overwriting existing ones
Maintains YAML formatting, comments, and field order
Handles complex nested structures and cross-references
Smart deduplication: Won't create redundant relationships even if they have different names but same field mappings

⚙️ Configuration

Environment Variables

All configuration can be done via environment variables with the METADATA_BOOTSTRAP_ prefix:

# Required
ANTHROPIC_API_KEY=your_api_key_here

# Input/Output (choose one mode)
METADATA_BOOTSTRAP_INPUT_DIR=./input
METADATA_BOOTSTRAP_OUTPUT_DIR=./output

# OR single file mode
METADATA_BOOTSTRAP_INPUT_FILE=./schema.hml  
METADATA_BOOTSTRAP_OUTPUT_FILE=./enhanced.hml

# AI Configuration
METADATA_BOOTSTRAP_USE_CASE="E-commerce platform"
METADATA_BOOTSTRAP_MODEL=claude-3-haiku-20240307
METADATA_BOOTSTRAP_FIELD_DESC_MAX_LENGTH=120
METADATA_BOOTSTRAP_KIND_DESC_MAX_LENGTH=250

# Relationship Detection
METADATA_BOOTSTRAP_GENERIC_FIELDS="_id,_key,id,key,name,status,type,created,updated"
METADATA_BOOTSTRAP_FK_TEMPLATES="{pt}_{gi}|{gi},{fs}_{pt}_{gi}|{gi}"
METADATA_BOOTSTRAP_DOMAIN_IDENTIFIERS="user,customer,order,product,company"

CLI Arguments

ddn-metadata-bootstrap --help

Options:
  --input-dir PATH              Input directory containing HML files
  --output-dir PATH             Output directory for enhanced files
  --input-file PATH             Single input HML file
  --output-file PATH            Single output HML file
  --api-key TEXT                Anthropic API key
  --use-case TEXT               Business domain description
  --model TEXT                  AI model to use
  --field-max-length INTEGER    Max characters for field descriptions
  --kind-max-length INTEGER     Max characters for kind descriptions
  --verbose                     Enable verbose logging
  --dry-run                     Validate configuration without processing
  --stats                       Show relationship detection statistics

🔧 Relationship Detection Deep Dive

Smart Deduplication Example

# Existing relationship in schema
kind: Relationship
definition:
  name: userAccount        # Custom name
  sourceType: UserProfile
  mapping:
  - source:
      fieldPath: [fieldName: userId]
    target:
      modelField: [fieldName: id]

# Tool detects same mapping and WON'T create:
# - Another relationship with same UserProfile.userId -> User.id mapping
# - Even if it would be named differently (like "user")
# Result: No duplicate/redundant relationships generated

Supported Field Naming Conventions

Input Field	Analysis	FK Detection	Shared Field	Output Preserved
`userId`	`user_id`	✅ `user` entity	✅ Matches `user_id`	`userId`
`user_id`	`user_id`	✅ `user` entity	✅ Matches `userId`	`user_id`
`companyId`	`company_id`	✅ `company` entity	✅ Business key	`companyId`
`departmentName`	`department_name`	❌ Not FK pattern	✅ Shared field	`departmentName`
`lastUsedFileName`	`last_used_file_name`	❌ Low confidence	❌ Business data	`lastUsedFileName`

Relationship Types Generated

Forward FK Relationships (Many-to-One)
- UserProfile.user → User (Object)
- Based on userId/user_id fields
Reverse FK Relationships (One-to-Many)
- User.userProfilesByUser → UserProfile[] (Array)
- Contextual naming with field reference
Shared Field Relationships (Many-to-Many)
- Application.policiesByCategory → Policy[] (Array)
- Policy.applicationsByCategory → Application[] (Array)
- Based on shared business keys

Quality Filters

Confidence Scoring: FK matches must score ≥50 to prevent spurious relationships
Model Association: Only ObjectTypes with Models participate in relationships
Generic Field Exclusion: Shared field detection ignores id, name, key, status, etc.
Business Logic Validation: Prevents meaningless connections like audit fields → entities
Smart Deduplication: Analyzes existing relationship mappings to avoid creating functionally equivalent relationships (even with different names)
Existing Relationship Protection: Scans for and preserves existing relationship definitions - never overwrites
Automatic Field Preservation: Original field names (userId, department_name) are always preserved in output - this behavior is built-in and not configurable

🏗️ Architecture

The tool is built with a modular architecture:

ai/ - AI integration and description generation
schema/ - Schema analysis and metadata collection
relationships/ - Advanced relationship detection and generation
- detector.py - FK and shared field pattern detection
- generator.py - YAML relationship definition creation
- mapper.py - Relationship orchestration and context
processors/ - File and directory processing
utils/ - Text processing, YAML handling, path utilities

🧪 Testing

# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Run with coverage
pytest --cov=ddn_metadata_bootstrap

# Test relationship detection specifically
pytest tests/test_relationships.py -v

# Type checking
mypy ddn_metadata_bootstrap/

# Code formatting
black ddn_metadata_bootstrap/

📊 Statistics & Reporting

The tool provides detailed statistics on processing:

stats = bootstrapper.get_statistics()
print(f"Entities processed: {stats['entities_processed']}")
print(f"Entities with Models: {stats['entities_with_models']}")
print(f"Total relationships: {stats['relationships_generated']}")
print(f"FK relationships: {stats['fk_relationships']}")
print(f"Shared field relationships: {stats['shared_field_relationships']}")
print(f"Cross-subgraph relationships: {stats['cross_subgraph_relationships']}")
print(f"Descriptions generated: {stats['descriptions_generated']}")

🚀 Future Enhancements

The current implementation focuses on metadata/schema analysis for fast, secure relationship detection. Several exciting enhancements could build on this foundation:

Data Analysis Validation

# Validate detected relationships against actual data
ddn-metadata-bootstrap --validate-with-data --db-connection postgresql://...

Referential Integrity Checking: Verify that user_id values actually exist in users.id
Statistical Confidence: "94% of order.customer_id values found in customers.id"
Orphaned Relationship Detection: Find foreign key fields that don't reference anything
Convention-Independent Discovery: Detect relationships regardless of naming patterns

Advanced Pattern Recognition

Machine Learning Field Classification: Train models to recognize relationship patterns beyond naming conventions
Semantic Analysis: Use NLP to understand field meanings (customer_ref → clients.id)
Cross-Database Pattern Learning: Learn from relationship patterns across multiple schemas
Domain-Specific Templates: Industry-specific relationship detection (e-commerce, healthcare, finance)

Enhanced AI Integration

Relationship Validation: Ask AI to validate if detected relationships make business sense
Missing Relationship Suggestions: AI-powered analysis of potential missing connections
Relationship Documentation: Auto-generate business logic explanations for relationships
Schema Quality Scoring: Overall relationship completeness and quality metrics

Why Metadata-First Design Enables These

The current schema analysis foundation provides:

🚀 Fast Detection: Quick feedback for iterative development
🔒 Security: No production data access required for core functionality
📋 Schema Validation: Works on schemas before data exists
🎯 DDN Integration: Native support for Hasura DDN patterns

Data analysis features would be additive enhancements that complement rather than replace the metadata approach, providing validation and discovery capabilities for mature schemas with production data.

🤝 Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Development Guidelines

Add tests for new relationship detection patterns
Update configuration documentation for new options
Follow the existing code style and architecture
Include examples in docstrings for complex functions

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🆘 Support

🏷️ Version History

See CHANGELOG.md for version history and breaking changes.

⭐ Acknowledgments

Built for Hasura DDN
Powered by Anthropic Claude
Inspired by the GraphQL and OpenAPI communities
Relationship detection algorithms inspired by database schema analysis tools

Made with ❤️ by the Hasura team

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.0.16

Jul 20, 2025

1.0.15

Jul 20, 2025

1.0.14

Jul 19, 2025

1.0.13

Jul 17, 2025

1.0.12

Jul 16, 2025

1.0.11

Jul 12, 2025

1.0.9

Jul 1, 2025

This version

1.0.8

Jun 21, 2025

1.0.6

May 23, 2025

1.0.5

May 23, 2025

1.0.4

May 23, 2025

1.0.3

May 23, 2025

1.0.2

May 23, 2025

1.0.1

May 23, 2025

1.0.0

May 23, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ddn_metadata_bootstrap-1.0.8.tar.gz (77.5 kB view details)

Uploaded Jun 21, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ddn_metadata_bootstrap-1.0.8-py3-none-any.whl (82.0 kB view details)

Uploaded Jun 21, 2025 Python 3

File details

Details for the file ddn_metadata_bootstrap-1.0.8.tar.gz.

File metadata

Download URL: ddn_metadata_bootstrap-1.0.8.tar.gz
Upload date: Jun 21, 2025
Size: 77.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.0

File hashes

Hashes for ddn_metadata_bootstrap-1.0.8.tar.gz
Algorithm	Hash digest
SHA256	`d2f6ad778931472ec0ea8fbcfa3571312e9d78ad4c323bbaf4ef72c3a0ecb41a`
MD5	`b06ab8065048890b7c5d00e10f450773`
BLAKE2b-256	`1e14a09fcdd05307c306499143b4249ef1cdbe0325cd5fcb794ea813be11d122`

See more details on using hashes here.

File details

Details for the file ddn_metadata_bootstrap-1.0.8-py3-none-any.whl.

File metadata

Download URL: ddn_metadata_bootstrap-1.0.8-py3-none-any.whl
Upload date: Jun 21, 2025
Size: 82.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.0

File hashes

Hashes for ddn_metadata_bootstrap-1.0.8-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7b939f7fc93bc1079a73d60bb4fcf8c8cdc95a9aa838b715c44b7b443fcfbfff`
MD5	`58b2266831f62ea8f38352e402ceecda`
BLAKE2b-256	`6c4f5545bfb0520ce230cda4cfbac857b07273443cdce4bbfc6ebbaa993eb2a2`

See more details on using hashes here.

ddn-metadata-bootstrap 1.0.8

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

DDN Metadata Bootstrap

🚀 Features

📦 Installation

From PyPI (Recommended)

From Source

🏃 Quick Start

1. Set up your environment

2. Run the tool

3. Or use as a Python package

📝 Examples

Schema Description Enhancement

Input HML File

Enhanced Output

Relationship Detection Examples

Foreign Key Detection (camelCase Support)

Shared Business Key Detection (Many-to-Many)

🔄 What It Does

1. AI-Powered Description Generation

2. Advanced Relationship Detection

Foreign Key Detection

Shared Business Key Detection

Quality & Precision

3. Domain Analysis

4. Schema Enhancement

⚙️ Configuration

Environment Variables

CLI Arguments

🔧 Relationship Detection Deep Dive

Smart Deduplication Example

Supported Field Naming Conventions

Relationship Types Generated

Quality Filters

🏗️ Architecture

🧪 Testing

📊 Statistics & Reporting

🚀 Future Enhancements

Data Analysis Validation

Advanced Pattern Recognition

Enhanced AI Integration

Why Metadata-First Design Enables These

🤝 Contributing

Development Guidelines

📄 License

🆘 Support

🏷️ Version History

⭐ Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes