Skip to main content

No project description provided

Project description

hbase-driver

Tests License

Pure-Python native HBase client (no Thrift). This project implements core HBase regionserver and master RPCs so Python programs can perform table and metadata operations against an HBase cluster.

Status

  • Integration test status (local): 77 / 77 tests passing (2026-02-21) using the custom 3-node Docker cluster.

Quick Start

Get started with hbase-driver in just a few lines:

from hbasedriver.client.client import Client
from hbasedriver.hbase_constants import HConstants
from hbasedriver.operations.put import Put
from hbasedriver.operations.get import Get
from hbasedriver.operations.scan import Scan

# Initialize client
config = {HConstants.ZOOKEEPER_QUORUM: "localhost:2181"}
client = Client(config)

# Put and Get data
with client.get_table("default", "mytable") as table:
    table.put(Put(b"row1").add_column(b"cf", b"col", b"value"))
    row = table.get(Get(b"row1"))
    print(row.get(b"cf", b"col"))  # b"value"

# Scan data
with client.get_table("default", "mytable") as table:
    with table.scan(Scan()) as scanner:
        for row in scanner:
            print(row.rowkey)

Complete Example

For a comprehensive example covering all major features, see complete_example.py which demonstrates:

  • Basic CRUD operations (Put, Get, Delete, Scan)
  • Advanced features (Batch, CheckAndPut, Increment)
  • Filter usage for server-side filtering
  • DDL operations (Create, Disable, Enable, Delete, Truncate)
  • Connection management and resource cleanup
  • Cache invalidation after table modifications

Run the example:

python3 complete_example.py

Installation

pip install hbase-driver

Or for development:

git clone https://github.com/innovationb1ue/hbase-driver.git
cd hbase-driver
pip install -e .

Connection Management

The hbase-driver provides context managers for automatic resource cleanup, similar to Java's try-with-resources:

from hbasedriver.client.client import Client
from hbasedriver.hbase_constants import HConstants

# Using context manager - automatic cleanup
with Client({HConstants.ZOOKEEPER_QUORUM: "localhost:2181"}) as client:
    with client.get_table("default", "mytable") as table:
        # Do operations
        table.put(Put(b"row1").add_column(b"cf", b"col", b"value"))
    # Table automatically closed here
# Client automatically closed here

# Manual cleanup
client = Client({HConstants.ZOOKEEPER_QUORUM: "localhost:2181"})
try:
    table = client.get_table("default", "mytable")
    try:
        # Do operations
        table.put(Put(b"row1").add_column(b"cf", b"col", b"value"))
    finally:
        table.close()
finally:
    client.close()

Resource Classes Supporting Context Managers

  • Client - Main client connection
  • Table - Table operations
  • Admin - DDL operations
  • ResultScanner - Scan results iteration

Configuration

Configuration Options

Key Default Description
hbase.zookeeper.quorum Required ZooKeeper quorum addresses (comma-separated)
zookeeper.znode.parent /hbase ZooKeeper parent znode
hbase.connection.pool.size 10 Maximum connections per pool
hbase.connection.idle.timeout 300 Idle timeout in seconds

Using Configuration Constants

The driver provides named constants for configuration keys, compatible with HBase Java driver's HConstants:

from hbasedriver.client.client import Client
from hbasedriver.hbase_constants import HConstants

config = {
    HConstants.ZOOKEEPER_QUORUM: "localhost:2181",
    HConstants.CONNECTION_POOL_SIZE: 20,
    HConstants.CONNECTION_IDLE_TIMEOUT: 600
}
client = Client(config)

String Literals (Backward Compatible)

You can also use string literals directly:

config = {
    "hbase.zookeeper.quorum": "localhost:2181",
    "hbase.connection.pool.size": 20
}
client = Client(config)

API Reference

Client

Main entry point for HBase operations.

from hbasedriver.client.client import Client
from hbasedriver.hbase_constants import HConstants

client = Client({HConstants.ZOOKEEPER_QUORUM: "localhost:2181"})

# Get admin for DDL operations
admin = client.get_admin()

# Get table for data operations
table = client.get_table("default", "mytable")

# Close when done
client.close()

Table

Perform data operations on a specific table.

from hbasedriver.operations.put import Put
from hbasedriver.operations.get import Get
from hbasedriver.operations.delete import Delete
from hbasedriver.operations.scan import Scan

# Put data
table.put(Put(b"row1").add_column(b"cf", b"col", b"value"))

# Get data
row = table.get(Get(b"row1"))
if row:
    print(row.get(b"cf", b"col"))

# Delete data
table.delete(Delete(b"row1"))

# Scan data
with table.scan(Scan()) as scanner:
    for row in scanner:
        print(row.rowkey)

Admin

Perform DDL operations.

from hbasedriver.model import ColumnFamilyDescriptor
from hbasedriver.table_name import TableName

# Create table
cfd = ColumnFamilyDescriptor(b"cf")
admin.create_table(TableName.value_of(b"mytable"), [cfd])

# Disable table
admin.disable_table(TableName.value_of(b"mytable"))

# Enable table
admin.enable_table(TableName.value_of(b"mytable"))

# Delete table (must be disabled first)
admin.disable_table(TableName.value_of(b"mytable"))
admin.delete_table(TableName.value_of(b"mytable"))

# List tables
tables = admin.list_tables()

Operations

Put

from hbasedriver.operations.put import Put

# Simple put
put = Put(b"row1").add_column(b"cf", b"col", b"value")
table.put(put)

# Multiple columns
put = Put(b"row2") \
    .add_column(b"cf", b"name", b"Alice") \
    .add_column(b"cf", b"age", b"30") \
    .add_column(b"info", b"email", b"alice@example.com")
table.put(put)

# With timestamp
import time
ts = int(time.time() * 1000)
put = Put(b"row3").add_column(b"cf", b"col", b"value", ts=ts)
table.put(put)

Get

from hbasedriver.operations.get import Get

# Get entire row
row = table.get(Get(b"row1"))

# Get specific column
row = table.get(Get(b"row1").add_column(b"cf", b"col"))

# Get multiple columns
row = table.get(Get(b"row1")
    .add_column(b"cf", b"name")
    .add_column(b"cf", b"age"))

# With time range
start_ts = int((time.time() - 86400) * 1000)  # 24 hours ago
end_ts = int(time.time() * 1000)
row = table.get(Get(b"row1").set_time_range(start_ts, end_ts))

# Check existence only
exists = table.get(Get(b"row1").set_check_existence_only(True)) is not None

Scan

from hbasedriver.operations.scan import Scan

# Scan entire table
with table.scan(Scan()) as scanner:
    for row in scanner:
        print(row.rowkey)

# Scan with row key range
scan = Scan(start_row=b"row1", end_row=b"row9")
with table.scan(scan) as scanner:
    for row in scanner:
        print(row.rowkey)

# Scan with specific columns
scan = Scan().add_column(b"cf", b"col")
with table.scan(scan) as scanner:
    for row in scanner:
        print(row.get(b"cf", b"col"))

# Scan with limit
scan = Scan().set_limit(100)
with table.scan(scan) as scanner:
    for row in scanner:
        print(row.rowkey)

# Pagination
scan = Scan()
rows, resume_key = table.scan_page(scan, page_size=100)
while resume_key:
    scan = Scan(start_row=resume_key, include_start_row=False)
    rows, resume_key = table.scan_page(scan, page_size=100)

Batch Operations

from hbasedriver.operations.batch import BatchGet, BatchPut

# Batch get
bg = BatchGet([b"row1", b"row2", b"row3"])
bg.add_column(b"cf", b"col1")
results = table.batch_get(bg)

# Batch put
bp = BatchPut()
bp.add_put(Put(b"row1").add_column(b"cf", b"col1", b"value1"))
bp.add_put(Put(b"row2").add_column(b"cf", b"col1", b"value2"))
results = table.batch_put(bp)

# Context manager for batch
with table.batch(batch_size=1000) as batch:
    batch.put(b"row1", {b"cf:col1": b"value1"})
    batch.put(b"row2", {b"cf:col1": b"value2"})
    batch.delete(b"row3")
# All operations executed when exiting context

Check and Put

from hbasedriver.operations.increment import CheckAndPut

cap = CheckAndPut(b"row1")
cap.set_check(b"cf", b"lock", b"")  # Check if lock is empty
cap.set_put(Put(b"row1").add_column(b"cf", b"data", b"value"))
success = table.check_and_put(cap)

Increment

from hbasedriver.operations.increment import Increment

inc = Increment(b"row1")
inc.add_column(b"cf", b"counter", 1)
new_value = table.increment(inc)

Filters

The driver supports server-side filters:

from hbasedriver.filter import PrefixFilter, RowFilter
from hbasedriver.filter.compare_filter import CompareOperator
from hbasedriver.filter.binary_comparator import BinaryComparator

# Prefix filter
scan = Scan().set_filter(PrefixFilter(b"abc"))

# Row filter with comparison
scan = Scan().set_filter(
    RowFilter(CompareOperator.EQUAL, BinaryComparator(b"row1"))
)

Java Compatibility

This driver is designed to be familiar to HBase Java developers. Here's a quick comparison:

Java Driver Python Driver
Connection connection = ConnectionFactory.createConnection(config) client = Client(config)
Table table = connection.getTable(TableName.valueOf("mytable")) table = client.get_table("default", "mytable")
table.put(new Put(Bytes.toBytes("row1"))... table.put(Put(b"row1")...
Result result = table.get(new Get(Bytes.toBytes("row1"))... row = table.get(Get(b"row1")...
try (ResultScanner scanner = table.scan(...)) with table.scan(...) as scanner:
try (Connection conn = ...) with Client(config) as client:

Configuration Constants

Java's HConstants are available as hbase_constants.HConstants:

Java Python
HConstants.ZOOKEEPER_QUORUM HConstants.ZOOKEEPER_QUORUM
HConstants.ZOOKEEPER_ZNODE_PARENT HConstants.ZOOKEEPER_ZNODE_PARENT
(Custom connection pool size) HConstants.CONNECTION_POOL_SIZE
(Custom idle timeout) HConstants.CONNECTION_IDLE_TIMEOUT

Development

Quickstart (3-node Docker dev environment)

Prerequisites: Docker and docker-compose installed.

  1. Build, start the custom 3-node cluster and run the full test suite:
./scripts/run_tests_3node.sh
  1. To run tests against an already-running cluster (fast):
./scripts/run_tests_3node.sh --no-start
  1. Run a single test file or case:
./scripts/run_tests_3node.sh test/test_scan.py
./scripts/run_tests_3node.sh test/test_scan.py::test_scan

Legacy single-node dev workflow (still available):

./scripts/run_tests_docker.sh

See TEST_GUIDE.md and DEV_ENV.md for full documentation and troubleshooting steps.

Documentation

License

Apache License 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hbase_driver-1.0.4.tar.gz (176.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hbase_driver-1.0.4-py3-none-any.whl (231.4 kB view details)

Uploaded Python 3

File details

Details for the file hbase_driver-1.0.4.tar.gz.

File metadata

  • Download URL: hbase_driver-1.0.4.tar.gz
  • Upload date:
  • Size: 176.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.7

File hashes

Hashes for hbase_driver-1.0.4.tar.gz
Algorithm Hash digest
SHA256 6513591bbbb06dffd908d92c6e758b271e5b788274413fdd6af67266156a88cd
MD5 7ee240bd9059aefac2a8cba7c2a5852e
BLAKE2b-256 15b7cf03462b66f14f9e81025bf3fec77fb74b196051f0fc8d0e410c2dd500e4

See more details on using hashes here.

File details

Details for the file hbase_driver-1.0.4-py3-none-any.whl.

File metadata

  • Download URL: hbase_driver-1.0.4-py3-none-any.whl
  • Upload date:
  • Size: 231.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.7

File hashes

Hashes for hbase_driver-1.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 5ae5699d20616ba58bd1d27eccf7b2217157b1a9bc55d5353e93f5b011eda7da
MD5 54f590c65b6f4cce9fc89b8a72912ad9
BLAKE2b-256 e8d8efe28612fe422ce875b546b47f3c2503d5ba9d7c42592b54aa5c8a22e654

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page