Skip to main content

Hive database adapter for Datus

Project description

datus-hive

Hive database adapter for Datus.

Installation

pip install datus-hive

This will automatically install the required dependencies:

  • datus-agent
  • datus-sqlalchemy
  • pyhive
  • thrift
  • thrift-sasl
  • pure-sasl

Usage

The adapter is automatically registered with Datus when installed. Configure your Hive connection in your Datus configuration:

namespace:
  hive:
    type: hive
    host: 127.0.0.1
    port: 10000
    username: hive
    database: default

With authentication and session configuration:

namespace:
  hive_production:
    type: hive
    host: 127.0.0.1
    port: 10000
    database: mydb
    username: hive_user
    password: your_password
    auth: CUSTOM
    configuration:
      hive.execution.engine: spark
      spark.app.name: my_app
      spark.executor.memory: 1G
      spark.executor.instances: 2

Or use programmatically:

from datus_hive import HiveConnector, HiveConfig

# Create connector
config = HiveConfig(
    host="127.0.0.1",
    port=10000,
    database="default",
    username="hive",
)

connector = HiveConnector(config)

# Test connection
connector.test_connection()

# Execute query
result = connector.execute(
    {"sql_query": "SELECT * FROM my_table LIMIT 10"},
    result_format="list",
)
print(result.sql_return)

# Get table list
tables = connector.get_tables()
print(f"Tables: {tables}")

# Get table schema
schema = connector.get_schema(table_name="my_table")
for column in schema:
    print(f"{column['name']}: {column['type']}")

Configuration Parameters

Parameter Type Default Description
host str 127.0.0.1 HiveServer2 host
port int 10000 HiveServer2 Thrift port
database str None Default database (falls back to default)
username str required Hive username
password str "" Password (for LDAP/CUSTOM auth)
auth str None Auth mechanism: NONE, LDAP, CUSTOM, KERBEROS
configuration dict {} Hive session configuration key-value pairs
timeout_seconds int 30 Connection timeout in seconds

Features

  • Query execution with multiple result formats (list, csv, pandas, arrow)
  • DDL execution (CREATE, ALTER, DROP)
  • Metadata retrieval (databases, tables, views, schemas)
  • DDL retrieval (SHOW CREATE TABLE)
  • Sample data extraction
  • Database context switching (USE statement)
  • Connection pooling and management
  • Hive session configuration support

Testing

Unit Tests

uv run pytest datus-hive/tests/unit -v

Integration Tests

Start Hive using Docker:

cd datus-hive
docker compose up -d

# Wait for Hive to be healthy (about 1-2 minutes)
docker inspect --format='{{.State.Health.Status}}' datus-hive-server

Run integration tests:

uv run pytest datus-hive/tests/integration -v

Stop Hive:

cd datus-hive
docker compose down

TPC-H Test Data

Initialize TPC-H sample data for manual testing:

uv run python datus-hive/scripts/init_tpch_data.py

# With custom connection:
uv run python datus-hive/scripts/init_tpch_data.py --host localhost --port 10000 --username hive

# Clean re-init (drop existing tables first):
uv run python datus-hive/scripts/init_tpch_data.py --drop

This creates 5 TPC-H tables with sample data:

Table Rows
tpch_region 5
tpch_nation 25
tpch_customer 10
tpch_orders 15
tpch_supplier 5

Requirements

  • Python >= 3.10
  • Apache Hive >= 2.x (tested with 4.0.1)
  • datus-agent >= 0.3.0
  • datus-sqlalchemy >= 0.1.0
  • pyhive >= 0.7.0

License

Apache License 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datus_hive-0.1.2.tar.gz (15.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

datus_hive-0.1.2-py3-none-any.whl (8.3 kB view details)

Uploaded Python 3

File details

Details for the file datus_hive-0.1.2.tar.gz.

File metadata

  • Download URL: datus_hive-0.1.2.tar.gz
  • Upload date:
  • Size: 15.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for datus_hive-0.1.2.tar.gz
Algorithm Hash digest
SHA256 096b0d90e7a2e90b7d3800132bec9aaca2bb9422b1ed92bee3de285204cb732d
MD5 6ea23fe2be2171ad940a0b324c4a8fa2
BLAKE2b-256 d5cd8bbf473908708e6bd611e223adcdf7072dbf719ddc6c73080c13cca00f9e

See more details on using hashes here.

File details

Details for the file datus_hive-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: datus_hive-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 8.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for datus_hive-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 a7afad4969546a3862a5663a4d565ebdf840795977181d9498fe9178a9b6d134
MD5 bbcb415dcff0d82d8d9b89b7a62ee977
BLAKE2b-256 a355ad0ad5cb7d2498090ab8b9b3a1c6bf03e97f919b87c323f4d8784ed84bb3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page