Skip to main content

Hive database adapter for Datus

Project description

datus-hive

Hive database adapter for Datus.

Installation

pip install datus-hive

This will automatically install the required dependencies:

  • datus-agent
  • datus-sqlalchemy
  • pyhive
  • thrift
  • thrift-sasl
  • pure-sasl

Usage

The adapter is automatically registered with Datus when installed. Configure your Hive connection in your Datus configuration:

namespace:
  hive:
    type: hive
    host: 127.0.0.1
    port: 10000
    username: hive
    database: default

With authentication and session configuration:

namespace:
  hive_production:
    type: hive
    host: 127.0.0.1
    port: 10000
    database: mydb
    username: hive_user
    password: your_password
    auth: CUSTOM
    configuration:
      hive.execution.engine: spark
      spark.app.name: my_app
      spark.executor.memory: 1G
      spark.executor.instances: 2

Or use programmatically:

from datus_hive import HiveConnector, HiveConfig

# Create connector
config = HiveConfig(
    host="127.0.0.1",
    port=10000,
    database="default",
    username="hive",
)

connector = HiveConnector(config)

# Test connection
connector.test_connection()

# Execute query
result = connector.execute(
    {"sql_query": "SELECT * FROM my_table LIMIT 10"},
    result_format="list",
)
print(result.sql_return)

# Get table list
tables = connector.get_tables()
print(f"Tables: {tables}")

# Get table schema
schema = connector.get_schema(table_name="my_table")
for column in schema:
    print(f"{column['name']}: {column['type']}")

Configuration Parameters

Parameter Type Default Description
host str 127.0.0.1 HiveServer2 host
port int 10000 HiveServer2 Thrift port
database str None Default database (falls back to default)
username str required Hive username
password str "" Password (for LDAP/CUSTOM auth)
auth str None Auth mechanism: NONE, LDAP, CUSTOM, KERBEROS
configuration dict {} Hive session configuration key-value pairs
timeout_seconds int 30 Connection timeout in seconds

Features

  • Query execution with multiple result formats (list, csv, pandas, arrow)
  • DDL execution (CREATE, ALTER, DROP)
  • Metadata retrieval (databases, tables, views, schemas)
  • DDL retrieval (SHOW CREATE TABLE)
  • Sample data extraction
  • Database context switching (USE statement)
  • Connection pooling and management
  • Hive session configuration support

Testing

Unit Tests

uv run pytest datus-hive/tests/unit -v

Integration Tests

Start Hive using Docker:

cd datus-hive
docker compose up -d

# Wait for Hive to be healthy (about 1-2 minutes)
docker inspect --format='{{.State.Health.Status}}' datus-hive-server

Run integration tests:

uv run pytest datus-hive/tests/integration -v

Stop Hive:

cd datus-hive
docker compose down

TPC-H Test Data

Initialize TPC-H sample data for manual testing:

uv run python datus-hive/scripts/init_tpch_data.py

# With custom connection:
uv run python datus-hive/scripts/init_tpch_data.py --host localhost --port 10000 --username hive

# Clean re-init (drop existing tables first):
uv run python datus-hive/scripts/init_tpch_data.py --drop

This creates 5 TPC-H tables with sample data:

Table Rows
tpch_region 5
tpch_nation 25
tpch_customer 10
tpch_orders 15
tpch_supplier 5

Requirements

  • Python >= 3.10
  • Apache Hive >= 2.x (tested with 4.0.1)
  • datus-agent >= 0.3.0
  • datus-sqlalchemy >= 0.1.0
  • pyhive >= 0.7.0

License

Apache License 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datus_hive-0.1.2rc1.tar.gz (15.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

datus_hive-0.1.2rc1-py3-none-any.whl (8.3 kB view details)

Uploaded Python 3

File details

Details for the file datus_hive-0.1.2rc1.tar.gz.

File metadata

  • Download URL: datus_hive-0.1.2rc1.tar.gz
  • Upload date:
  • Size: 15.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for datus_hive-0.1.2rc1.tar.gz
Algorithm Hash digest
SHA256 aaaf1beec1be2f3b730bb2f38c80bebc1b51db5d783d7b7f83c07c553fc56464
MD5 3fbe46c3804f891c20225d116da1914c
BLAKE2b-256 676e4f8c947420c55f6570d2de6715cf187d5b3c8b43f2c87f01c159f0665844

See more details on using hashes here.

File details

Details for the file datus_hive-0.1.2rc1-py3-none-any.whl.

File metadata

  • Download URL: datus_hive-0.1.2rc1-py3-none-any.whl
  • Upload date:
  • Size: 8.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for datus_hive-0.1.2rc1-py3-none-any.whl
Algorithm Hash digest
SHA256 ad4759dc1f0c3af2da411b6f21452004b21b695b9fda2f40a907060c01e6b08b
MD5 21cbc75f780ffc6622823e5e6b7bf653
BLAKE2b-256 c7bc799c80c7699a3f35e2e77cd6b53878d421df80dfe1eaddda30616a1ea67c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page