Skip to main content

AI-powered metadata enhancement for Hasura DDN schema files

Project description

DDN Metadata Bootstrap

PyPI version Python versions License: MIT

AI-powered metadata enhancement for Hasura DDN (Data Delivery Network) schema files. Automatically generate descriptions and detect relationships in your YAML/HML schema definitions using advanced AI.

🚀 Features

  • 🤖 AI-Powered Descriptions: Generate natural language descriptions for schema elements using Anthropic's Claude
  • 🔗 Relationship Detection: Automatically detect and generate foreign key relationships
  • 📊 Domain Analysis: Intelligent analysis of business domains and terminology
  • ⚡ Batch Processing: Process entire directories of schema files efficiently
  • 🎯 DDN Optimized: Specifically designed for Hasura DDN schema structures
  • 🔧 Configurable: Extensive configuration options via environment variables or CLI

📦 Installation

From PyPI (Recommended)

pip install ddn-metadata-bootstrap

From Source

git clone https://github.com/hasura/ddn-metadata-bootstrap.git
cd ddn-metadata-bootstrap
pip install -e .

🏃 Quick Start

1. Set up your environment

export ANTHROPIC_API_KEY="your-anthropic-api-key"
export METADATA_BOOTSTRAP_INPUT_DIR="./input"
export METADATA_BOOTSTRAP_OUTPUT_DIR="./output"

2. Run the tool

# Process entire directory
ddn-metadata-bootstrap

# Or with CLI arguments
ddn-metadata-bootstrap --input-dir ./schema --output-dir ./enhanced --api-key YOUR_KEY

3. Or use as a Python package

from ddn_metadata_bootstrap import MetadataBootstrapper

bootstrapper = MetadataBootstrapper(
    api_key="your-anthropic-api-key",
    use_case="E-commerce platform"
)

# Process directory
bootstrapper.process_directory("./input", "./output")

# Get statistics
stats = bootstrapper.get_statistics()
print(f"Generated {stats['relationships_generated']} relationships")

📝 Example

Input HML File

kind: ObjectType
version: v1
definition:
  name: User
  fields:
    - name: id
      type: ID!
    - name: email
      type: String!
    - name: created_at
      type: String

Enhanced Output

kind: ObjectType
version: v1
definition:
  name: User
  description: |
    Represents a user account in the system with authentication
    and profile information.
  fields:
    - name: id
      type: ID!
      description: Unique identifier for the user account.
    - name: email
      type: String!
      description: User's email address for authentication and communication.
    - name: created_at
      type: String
      description: Timestamp when the user account was created.

⚙️ Configuration

Environment Variables

All configuration can be done via environment variables with the METADATA_BOOTSTRAP_ prefix:

# Required
ANTHROPIC_API_KEY=your_api_key_here

# Input/Output (choose one mode)
METADATA_BOOTSTRAP_INPUT_DIR=./input
METADATA_BOOTSTRAP_OUTPUT_DIR=./output

# OR single file mode
METADATA_BOOTSTRAP_INPUT_FILE=./schema.hml  
METADATA_BOOTSTRAP_OUTPUT_FILE=./enhanced.hml

# Optional
METADATA_BOOTSTRAP_USE_CASE="E-commerce platform"
METADATA_BOOTSTRAP_MODEL=claude-3-haiku-20240307
METADATA_BOOTSTRAP_FIELD_DESC_MAX_LENGTH=120
METADATA_BOOTSTRAP_KIND_DESC_MAX_LENGTH=250

CLI Arguments

ddn-metadata-bootstrap --help

Options:
  --input-dir PATH              Input directory containing HML files
  --output-dir PATH             Output directory for enhanced files
  --input-file PATH             Single input HML file
  --output-file PATH            Single output HML file
  --api-key TEXT                Anthropic API key
  --use-case TEXT               Business domain description
  --model TEXT                  AI model to use
  --field-max-length INTEGER    Max characters for field descriptions
  --kind-max-length INTEGER     Max characters for kind descriptions
  --verbose                     Enable verbose logging
  --dry-run                     Validate configuration without processing

🔄 What It Does

1. Description Generation

  • Analyzes schema element names and types
  • Generates contextual descriptions using AI
  • Respects character limits and style guidelines
  • Supports field-level and entity-level descriptions

2. Relationship Detection

  • Detects foreign key patterns (e.g., user_id, customer_id)
  • Identifies shared fields between entities
  • Generates bidirectional relationship definitions
  • Supports cross-subgraph relationships

3. Domain Analysis

  • Extracts business terminology from schema
  • Identifies domain-specific patterns
  • Provides contextual AI prompts
  • Supports domain-specific relationship hints

4. Schema Enhancement

  • Preserves original schema structure
  • Adds descriptions without breaking functionality
  • Generates proper DDN relationship definitions
  • Maintains YAML formatting and comments

🏗️ Architecture

The tool is built with a modular architecture:

  • ai/ - AI integration and description generation
  • schema/ - Schema analysis and metadata collection
  • relationships/ - Relationship detection and generation
  • processors/ - File and directory processing
  • utils/ - Text processing, YAML handling, path utilities

🧪 Testing

# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Run with coverage
pytest --cov=ddn_metadata_bootstrap

# Type checking
mypy ddn_metadata_bootstrap/

# Code formatting
black ddn_metadata_bootstrap/

🤝 Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🆘 Support

🏷️ Version History

See CHANGELOG.md for version history and breaking changes.

⭐ Acknowledgments


Made with ❤️ by the Hasura team

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ddn_metadata_bootstrap-1.0.1.tar.gz (60.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ddn_metadata_bootstrap-1.0.1-py3-none-any.whl (68.6 kB view details)

Uploaded Python 3

File details

Details for the file ddn_metadata_bootstrap-1.0.1.tar.gz.

File metadata

  • Download URL: ddn_metadata_bootstrap-1.0.1.tar.gz
  • Upload date:
  • Size: 60.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.0

File hashes

Hashes for ddn_metadata_bootstrap-1.0.1.tar.gz
Algorithm Hash digest
SHA256 32f2b10d1f8da8816d6fbead97b8f726a9829614d8858b2c387bd757cbda0477
MD5 e25782635fe7e6f0fce09c19643a8900
BLAKE2b-256 5cecb10f17e423fd8313f3c35e2eb1b5e23b83b7a696f76ff3f263a063fa77b6

See more details on using hashes here.

File details

Details for the file ddn_metadata_bootstrap-1.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for ddn_metadata_bootstrap-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 1eecaaa8d69cc79ea0f03ae15bc585bb7c3f4ed028f8dff8514c0de003f8be61
MD5 45daabcf6d2dc4ae5f618902fd6a5fcf
BLAKE2b-256 8b86e3bbf709f6b1106aad583fe436510109a9bfc063c469b7a852a9dedcb98d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page