Skip to main content

Hive database adapter for Datus

Project description

datus-hive

Hive database adapter for Datus.

Installation

pip install datus-hive

This will automatically install the required dependencies:

  • datus-agent
  • datus-sqlalchemy
  • pyhive
  • thrift
  • thrift-sasl
  • pure-sasl

Usage

The adapter is automatically registered with Datus when installed. Configure your Hive connection in your Datus configuration:

namespace:
  hive:
    type: hive
    host: 127.0.0.1
    port: 10000
    username: hive
    database: default

With authentication and session configuration:

namespace:
  hive_production:
    type: hive
    host: 127.0.0.1
    port: 10000
    database: mydb
    username: hive_user
    password: your_password
    auth: CUSTOM
    configuration:
      hive.execution.engine: spark
      spark.app.name: my_app
      spark.executor.memory: 1G
      spark.executor.instances: 2

Or use programmatically:

from datus_hive import HiveConnector, HiveConfig

# Create connector
config = HiveConfig(
    host="127.0.0.1",
    port=10000,
    database="default",
    username="hive",
)

connector = HiveConnector(config)

# Test connection
connector.test_connection()

# Execute query
result = connector.execute(
    {"sql_query": "SELECT * FROM my_table LIMIT 10"},
    result_format="list",
)
print(result.sql_return)

# Get table list
tables = connector.get_tables()
print(f"Tables: {tables}")

# Get table schema
schema = connector.get_schema(table_name="my_table")
for column in schema:
    print(f"{column['name']}: {column['type']}")

Configuration Parameters

Parameter Type Default Description
host str 127.0.0.1 HiveServer2 host
port int 10000 HiveServer2 Thrift port
database str None Default database (falls back to default)
username str required Hive username
password str "" Password (for LDAP/CUSTOM auth)
auth str None Auth mechanism: NONE, LDAP, CUSTOM, KERBEROS
configuration dict {} Hive session configuration key-value pairs
timeout_seconds int 30 Connection timeout in seconds

Features

  • Query execution with multiple result formats (list, csv, pandas, arrow)
  • DDL execution (CREATE, ALTER, DROP)
  • Metadata retrieval (databases, tables, views, schemas)
  • DDL retrieval (SHOW CREATE TABLE)
  • Sample data extraction
  • Database context switching (USE statement)
  • Connection pooling and management
  • Hive session configuration support

Testing

Unit Tests

uv run pytest datus-hive/tests/unit -v

Integration Tests

Start Hive using Docker:

cd datus-hive
docker compose up -d

# Wait for Hive to be healthy (about 1-2 minutes)
docker inspect --format='{{.State.Health.Status}}' datus-hive-server

Run integration tests:

uv run pytest datus-hive/tests/integration -v

Stop Hive:

cd datus-hive
docker compose down

TPC-H Test Data

Initialize TPC-H sample data for manual testing:

uv run python datus-hive/scripts/init_tpch_data.py

# With custom connection:
uv run python datus-hive/scripts/init_tpch_data.py --host localhost --port 10000 --username hive

# Clean re-init (drop existing tables first):
uv run python datus-hive/scripts/init_tpch_data.py --drop

This creates 5 TPC-H tables with sample data:

Table Rows
tpch_region 5
tpch_nation 25
tpch_customer 10
tpch_orders 15
tpch_supplier 5

Requirements

  • Python >= 3.10
  • Apache Hive >= 2.x (tested with 4.0.1)
  • datus-agent >= 0.3.0
  • datus-sqlalchemy >= 0.1.0
  • pyhive >= 0.7.0

License

Apache License 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datus_hive-0.1.0.tar.gz (16.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

datus_hive-0.1.0-py3-none-any.whl (7.7 kB view details)

Uploaded Python 3

File details

Details for the file datus_hive-0.1.0.tar.gz.

File metadata

  • Download URL: datus_hive-0.1.0.tar.gz
  • Upload date:
  • Size: 16.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for datus_hive-0.1.0.tar.gz
Algorithm Hash digest
SHA256 2f0578db52b8946163f45116964eb2ede69e5a4d6e3aad8114a8b29a7614ce44
MD5 228dafa1b85ee806dbe904551805fb08
BLAKE2b-256 14df565dc46d3e4024bbb0b5e36b95b5854feeac2b33776502e1dda5fc8a7446

See more details on using hashes here.

File details

Details for the file datus_hive-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: datus_hive-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 7.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for datus_hive-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ccc1d1d12f8874e56edab6563bf609628151277bee5ca76f93c286696328c1d2
MD5 fbf05269a9ce69b2c474f41886fd3acc
BLAKE2b-256 ffc04c03a7b8023a0bfb9bf19877f794de104c593c6a027c9546232423b07b84

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page