Skip to main content

Hera2 Python SDK for metadata ingestion and management

Project description

Hera2 Python SDK

A modern, fluent Python SDK for OpenMetadata that provides an intuitive API for all operations. Authentication is routed through the Heimdall authorization service for DataOS integration.

Replaces openmetadata-ingestion for SDK use: Installing hera2-sdk pulls in openmetadata-ingestion as a dependency and shares the metadata namespace, so you get both metadata.sdk (this package) and the full ingestion stack: metadata.ingestion, metadata.generated, metadata.clients, metadata.profiler, metadata.utils, metadata.workflow, etc. Use a single venv with hera2-sdk and you have the same top-level metadata/ surface as with openmetadata-ingestion alone, plus metadata.sdk.

Installation

Python 3.10+ required. The dependency openmetadata-ingestion 1.11.8.x requires collate-data-diff>=0.11.9, which only supports Python 3.10+. Use a venv with Python 3.10 or 3.11:

python3.10 -m venv .venv
source .venv/bin/activate   # or .venv\Scripts\activate on Windows
pip install hera2-sdk

If you see an error like No matching distribution found for openmetadata-ingestion==1.11.8.x (that exact patch version is not on PyPI), install from the repo:

pip install /path/to/hera2/hera2-sdk

Or from the project root: pip install -e hera2-sdk for an editable install.

Data Quality SDK Installation

For running data quality tests, additional dependencies may be required:

DataFrame Validation:

pip install 'hera2-sdk[pandas]'

Table-Based Testing:

pip install 'hera2-sdk[mysql]'        # For MySQL
pip install 'hera2-sdk[postgres]'     # For PostgreSQL
pip install 'hera2-sdk[snowflake]'    # For Snowflake
pip install 'hera2-sdk[clickhouse]'   # For ClickHouse

Troubleshooting

  • "Defaulting to user installation because normal site-packages is not writeable" — Pip is installing to your user directory instead of the active venv.

    1. Use the venv’s Python explicitly: python -m pip install hera2-sdk.
    2. If it still happens, make the venv writable and reinstall: chmod -R u+w .venv2 (or your venv dir), then python -m pip install --force-reinstall hera2-sdk.
    3. Confirm the venv is the one being used: python -c "import sys; print(sys.prefix)" should print a path inside your venv (e.g. .../test_hera2_sdk/.venv2). If it prints a system path, the venv wasn’t activated or the python in your shell isn’t from the venv.
  • TypeError: unsupported operand type(s) for |: 'type' and 'NoneType' when importing metadata.ingestion — The interpreter is loading the local metadata package from the hera2 ingestion source (.../hera2/ingestion/src/metadata/) instead of the installed openmetadata-ingestion. Something is adding that path to sys.path.

    1. Unset PYTHONPATH before running Python: unset PYTHONPATH, then run your script again.
    2. See what adds the repo to the path: python -c "import sys; print([p for p in sys.path if 'hera2' in p or 'ingestion' in p])". If you see a path like .../hera2/ingestion/src, it was added by PYTHONPATH or a .pth file in site-packages.
    3. If you use a user site-packages (e.g. ~/Library/Python/3.9/lib/python/site-packages), look for .pth files there that mention hera2 or ingestion and remove or rename them so the installed openmetadata-ingestion is used.
    4. Run Python from a directory that is outside the hera2 repo (e.g. cd ~/test_hera2_sdk only if that folder is not inside the hera2 tree).
  • ERROR: Cannot install hera2-sdk because these package versions have conflicting dependencies / collate-data-diff has no matching distributions — You are on Python 3.9. The dependency collate-data-diff>=0.11.9 (required by openmetadata-ingestion 1.11.8.x) only supports Python 3.10+. Create a venv with Python 3.10 or 3.11 and install again: python3.10 -m venv .venv && source .venv/bin/activate && pip install hera2-sdk.

  • ModuleNotFoundError: No module named 'cachetools' when importing metadata.ingestion — The upstream openmetadata-ingestion uses cachetools but does not always declare it. Install it: pip install cachetools. Newer hera2-sdk releases include cachetools as a dependency so this is not needed after upgrading.

Quick Start

Configure the SDK (Heimdall Auth — Recommended)

Use heimdallConfiguration (same structure as hera/config/config.yaml authenticationConfiguration.heimdallConfiguration):

from metadata.sdk import configure

configure(
    host="http://localhost:8585/api",
    api_key="your-dataos-api-key",
    heimdall_configuration={
        "enabled": True,
        "baseUrl": "https://your-instance.dataos.cloud/heimdall",
        "timeout": 10,
        "fallbackOnBasic": True,
    },
)

Or use the legacy heimdall_url:

configure(
    host="http://localhost:8585/api",
    api_key="your-dataos-api-key",
    heimdall_url="https://your-instance.dataos.cloud/heimdall",
)

Or set environment variables and call configure() with no arguments:

export OPENMETADATA_HOST="http://localhost:8585/api"
export OPENMETADATA_API_KEY="your-dataos-api-key"
export HEIMDALL_BASE_URL="https://your-instance.dataos.cloud/heimdall"
from metadata.sdk import configure
configure()

Configure Parameters

The configure() function supports:

  • host or server_url: OpenMetadata server URL
  • api_key or jwt_token: DataOS API key or JWT token
  • heimdall_configuration: Dict matching hera/config/config.yaml heimdallConfiguration (enabled, baseUrl, timeout, fallbackOnBasic, trustAll)
  • heimdall_url: Heimdall base URL (legacy; use heimdall_configuration when possible)
  • Falls back to environment variables:
    • OPENMETADATA_HOST or OPENMETADATA_SERVER_URL for the server URL
    • OPENMETADATA_API_KEY or OPENMETADATA_JWT_TOKEN for authentication
    • HEIMDALL_BASE_URL: Heimdall service URL (enables Heimdall auth)
    • HEIMDALL_TIMEOUT: Heimdall request timeout in seconds (default: 10)
    • HEIMDALL_TRUST_ALL: Trust all SSL certs for Heimdall (default: true)
    • OPENMETADATA_VERIFY_SSL: Enable SSL verification (default: false)
    • OPENMETADATA_CA_BUNDLE: Path to CA bundle
    • OPENMETADATA_CLIENT_TIMEOUT: Client timeout in seconds (default: 30)

Alternative: Builder Pattern

from metadata.sdk.config import OpenMetadataConfig

config = (
    OpenMetadataConfig.builder()
    .server_url("http://localhost:8585/api")
    .api_key("your-dataos-api-key")
    .heimdall_configuration({
        "enabled": True,
        "baseUrl": "https://your-instance.dataos.cloud/heimdall",
        "timeout": 15,
        "fallbackOnBasic": True,
    })
    .build()
)

Or with flat params: .heimdall_url("...").heimdall_timeout(15).

Alternative: Direct JWT (Legacy)

If Heimdall is not available, the SDK falls back to direct JWT authentication:

from metadata.sdk import configure
configure(host="http://localhost:8585/api", jwt_token="your-om-jwt-token")

Using like the OpenMetadata Python SDK

hera2-sdk depends on openmetadata-ingestion, so you can use the same low-level API as in the official docs: OpenMetadataConnection + OpenMetadata(server_config), then metadata.create_or_update(), metadata.get_by_name(), metadata.delete(), etc.

Option 1 — Standard OpenMetadata style (same as the docs)

from metadata.ingestion.ometa.ometa_api import OpenMetadata
from metadata.generated.schema.entity.services.connections.metadata.openMetadataConnection import (
    OpenMetadataConnection,
    AuthProvider,
)
from metadata.generated.schema.security.client.openMetadataJWTClientConfig import (
    OpenMetadataJWTClientConfig,
)
from metadata.generated.schema.entity.data.table import Table

server_config = OpenMetadataConnection(
    hostPort="http://localhost:8585/api",
    authProvider=AuthProvider.openmetadata,
    securityConfig=OpenMetadataJWTClientConfig(
        jwtToken="<YOUR-INGESTION-BOT-JWT-TOKEN>",
    ),
)
metadata = OpenMetadata(server_config)

# Same API as in the docs
metadata.health_check()
service_entity = metadata.create_or_update(data=create_service)
my_table = metadata.get_by_name(entity=Table, fqn="test-service-table.test-db.test-schema.test")
metadata.delete(entity=Table, entity_id=my_table.id)

Option 2 — hera2-sdk wrapper (same API + optional Heimdall)

Use configure() or OpenMetadataConfig, then get the underlying client via .ometa and call the same methods:

from metadata.sdk import configure, client
from metadata.generated.schema.entity.data.table import Table

configure(
    host="http://localhost:8585/api",
    api_key="your-dataos-api-key",
    heimdall_configuration={
        "enabled": True,
        "baseUrl": "https://your-instance.dataos.cloud/heimdall",
        "timeout": 10,
        "fallbackOnBasic": True,
    },
)

metadata = client().ometa   # same interface as OpenMetadata(server_config)

metadata.health_check()
service_entity = metadata.create_or_update(data=create_service)
my_table = metadata.get_by_name(entity=Table, fqn="test-service-table.test-db.test-schema.test")
metadata.delete(entity=Table, entity_id=my_table.id)

So you can follow the OpenMetadata SDK walkthrough (create DatabaseService, Database, Schema, Table, etc.) with either the raw OpenMetadata from metadata.ingestion.ometa.ometa_api or with client().ometa after configuring hera2-sdk.

Manual Initialization

For more control, you can manually initialize the SDK:

from metadata.sdk import OpenMetadata, OpenMetadataConfig
from metadata.sdk.entities import Table, User
from metadata.sdk.api import Search, Lineage, Bulk

config = OpenMetadataConfig(
    server_url="http://localhost:8585/api",
    api_key="your-dataos-api-key",
    heimdall_configuration={
        "enabled": True,
        "baseUrl": "https://your-instance.dataos.cloud/heimdall",
        "timeout": 10,
        "fallbackOnBasic": True,
    },
)

client = OpenMetadata.initialize(config)

Table.set_default_client(client)
User.set_default_client(client)
Search.set_default_client(client)
Lineage.set_default_client(client)
Bulk.set_default_client(client)

Configuration from Environment Variables Only

from metadata.sdk.config import OpenMetadataConfig

# Reads from OPENMETADATA_HOST, OPENMETADATA_API_KEY, HEIMDALL_BASE_URL, etc.
config = OpenMetadataConfig.from_env()

Entity Operations

Tables

from metadata.generated.schema.api.data.createTable import CreateTableRequest
from metadata.sdk.entities.table import TableListParams

# Create a table
request = CreateTableRequest(
    name="my_table",
    databaseSchema="my_schema",
    columns=[...]
)
table = Table.create(request)

# Retrieve a table by ID
table = Table.retrieve("table-id")

# Retrieve by fully qualified name with specific fields
table = Table.retrieve_by_name(
    "service.database.schema.table",
    fields=["owners", "tags", "columns"]
)

# List tables with pagination
for table in Table.list().auto_paging_iterable():
    print(table.name)

# List with filters
params = TableListParams.builder() \
    .limit(50) \
    .database("my_database") \
    .fields(["owners", "tags"]) \
    .build()

tables = Table.list(params)

# Update a table
table.description = "Updated description"
updated = Table.update(table.id, table)

# Delete a table
Table.delete("table-id")

# Delete with options
Table.delete("table-id", recursive=True, hard_delete=True)

# Export/Import CSV
csv_data = Table.export_csv("table-name")
Table.import_csv(csv_data, dry_run=False)

Supported Entity Types

The SDK provides the same fluent API for all OpenMetadata entity types:

  • Data Assets: Table, Database, DatabaseSchema, Dashboard, Pipeline, Topic, Container, Query, StoredProcedure, DashboardDataModel, SearchIndex, MlModel, Report
  • Services: DatabaseService, MessagingService, DashboardService, PipelineService, MlModelService, StorageService, SearchService, MetadataService, ApiService
  • Teams & Users: User, Team, Role, Policy
  • Governance: Glossary, GlossaryTerm, Classification, Tag, DataProduct, Domain
  • Quality: TestCase, TestSuite, TestDefinition, DataQualityDashboard
  • Ingestion: Ingestion, Workflow, Connection
  • Other: Type, Webhook, Kpi, Application, Persona, DocStore, Page, SearchQuery

Testing

Run the SDK tests:

# Run all SDK tests
pytest tests/unit/sdk/

# Run specific test
pytest tests/unit/sdk/test_sdk_entities.py

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hera2_sdk-1.11.8.6.tar.gz (58.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hera2_sdk-1.11.8.6-py3-none-any.whl (77.0 kB view details)

Uploaded Python 3

File details

Details for the file hera2_sdk-1.11.8.6.tar.gz.

File metadata

  • Download URL: hera2_sdk-1.11.8.6.tar.gz
  • Upload date:
  • Size: 58.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for hera2_sdk-1.11.8.6.tar.gz
Algorithm Hash digest
SHA256 553a9dd53c20862b358a32d2b118fe6747f15bdf4e91041e62e3995d08468c86
MD5 c2e0df87bbd7172cd460f3657703ef08
BLAKE2b-256 9461fce613b5e24dd5d9a16cee6216313c4b14b572de1d778eb87fdb21495c5e

See more details on using hashes here.

File details

Details for the file hera2_sdk-1.11.8.6-py3-none-any.whl.

File metadata

  • Download URL: hera2_sdk-1.11.8.6-py3-none-any.whl
  • Upload date:
  • Size: 77.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for hera2_sdk-1.11.8.6-py3-none-any.whl
Algorithm Hash digest
SHA256 5fd8c43cdb95ba68f63ed93f1a6ba292a10bcaaa4336a28a507e6b3aff0d0053
MD5 f4e9db10a096f5cbfcf9273abe4d74bc
BLAKE2b-256 408fd4bc676c3339f529af58f0ec64b4a2d785ec643ea81914b5a879319bcb35

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page