Skip to main content

No project description provided

Project description

hbase-driver

Tests License

Pure-Python native HBase client (no Thrift). This project implements core HBase regionserver and master RPCs so Python programs can perform table and metadata operations against an HBase cluster.

Status

  • Integration test status (local): 238 tests with comprehensive coverage for all features (2026-03-15).

Quick Start

Get started with hbase-driver in just a few lines:

from hbasedriver.client.client import Client
from hbasedriver.hbase_constants import HConstants
from hbasedriver.operations.put import Put
from hbasedriver.operations.get import Get
from hbasedriver.operations.scan import Scan

# Initialize client
config = {HConstants.ZOOKEEPER_QUORUM: "localhost:2181"}
client = Client(config)

# Put and Get data
with client.get_table("default", "mytable") as table:
    table.put(Put(b"row1").add_column(b"cf", b"col", b"value"))
    row = table.get(Get(b"row1"))
    print(row.get(b"cf", b"col"))  # b"value"

# Scan data
with client.get_table("default", "mytable") as table:
    with table.scan(Scan()) as scanner:
        for row in scanner:
            print(row.rowkey)

Complete Example

For a comprehensive example covering all major features, see complete_example.py which demonstrates:

  • Basic CRUD operations (Put, Get, Delete, Scan)
  • Advanced features (Batch, CheckAndPut, Increment)
  • Filter usage for server-side filtering
  • DDL operations (Create, Disable, Enable, Delete, Truncate)
  • Connection management and resource cleanup
  • Cache invalidation after table modifications

Run the example:

python3 complete_example.py

Installation

pip install hbase-driver

Or for development:

git clone https://github.com/innovationb1ue/hbase-driver.git
cd hbase-driver
pip install -e .

Connection Management

The hbase-driver provides context managers for automatic resource cleanup, similar to Java's try-with-resources:

from hbasedriver.client.client import Client
from hbasedriver.hbase_constants import HConstants

# Using context manager - automatic cleanup
with Client({HConstants.ZOOKEEPER_QUORUM: "localhost:2181"}) as client:
    with client.get_table("default", "mytable") as table:
        # Do operations
        table.put(Put(b"row1").add_column(b"cf", b"col", b"value"))
    # Table automatically closed here
# Client automatically closed here

# Manual cleanup
client = Client({HConstants.ZOOKEEPER_QUORUM: "localhost:2181"})
try:
    table = client.get_table("default", "mytable")
    try:
        # Do operations
        table.put(Put(b"row1").add_column(b"cf", b"col", b"value"))
    finally:
        table.close()
finally:
    client.close()

Resource Classes Supporting Context Managers

  • Client - Main client connection
  • Table - Table operations
  • Admin - DDL operations
  • ResultScanner - Scan results iteration

Configuration

Configuration Options

Key Default Description
hbase.zookeeper.quorum Required ZooKeeper quorum addresses (comma-separated)
zookeeper.znode.parent /hbase ZooKeeper parent znode
hbase.connection.pool.size 10 Maximum connections per pool
hbase.connection.idle.timeout 300 Idle timeout in seconds

Using Configuration Constants

The driver provides named constants for configuration keys, compatible with HBase Java driver's HConstants:

from hbasedriver.client.client import Client
from hbasedriver.hbase_constants import HConstants

config = {
    HConstants.ZOOKEEPER_QUORUM: "localhost:2181",
    HConstants.CONNECTION_POOL_SIZE: 20,
    HConstants.CONNECTION_IDLE_TIMEOUT: 600
}
client = Client(config)

String Literals (Backward Compatible)

You can also use string literals directly:

config = {
    "hbase.zookeeper.quorum": "localhost:2181",
    "hbase.connection.pool.size": 20
}
client = Client(config)

API Reference

Client

Main entry point for HBase operations.

from hbasedriver.client.client import Client
from hbasedriver.hbase_constants import HConstants

client = Client({HConstants.ZOOKEEPER_QUORUM: "localhost:2181"})

# Get admin for DDL operations
admin = client.get_admin()

# Get table for data operations
table = client.get_table("default", "mytable")

# Close when done
client.close()

Table

Perform data operations on a specific table.

from hbasedriver.operations.put import Put
from hbasedriver.operations.get import Get
from hbasedriver.operations.delete import Delete
from hbasedriver.operations.scan import Scan

# Put data
table.put(Put(b"row1").add_column(b"cf", b"col", b"value"))

# Get data
row = table.get(Get(b"row1"))
if row:
    print(row.get(b"cf", b"col"))

# Delete data
table.delete(Delete(b"row1"))

# Scan data
with table.scan(Scan()) as scanner:
    for row in scanner:
        print(row.rowkey)

Admin

Perform DDL operations.

from hbasedriver.model import ColumnFamilyDescriptor
from hbasedriver.table_name import TableName

# Create table
cfd = ColumnFamilyDescriptor(b"cf")
admin.create_table(TableName.value_of(b"mytable"), [cfd])

# Disable table
admin.disable_table(TableName.value_of(b"mytable"))

# Enable table
admin.enable_table(TableName.value_of(b"mytable"))

# Delete table (must be disabled first)
admin.disable_table(TableName.value_of(b"mytable"))
admin.delete_table(TableName.value_of(b"mytable"))

# List tables
tables = admin.list_tables()

Operations

Put

from hbasedriver.operations.put import Put

# Simple put
put = Put(b"row1").add_column(b"cf", b"col", b"value")
table.put(put)

# Multiple columns
put = Put(b"row2") \
    .add_column(b"cf", b"name", b"Alice") \
    .add_column(b"cf", b"age", b"30") \
    .add_column(b"info", b"email", b"alice@example.com")
table.put(put)

# With timestamp
import time
ts = int(time.time() * 1000)
put = Put(b"row3").add_column(b"cf", b"col", b"value", ts=ts)
table.put(put)

Get

from hbasedriver.operations.get import Get

# Get entire row
row = table.get(Get(b"row1"))

# Get specific column
row = table.get(Get(b"row1").add_column(b"cf", b"col"))

# Get multiple columns
row = table.get(Get(b"row1")
    .add_column(b"cf", b"name")
    .add_column(b"cf", b"age"))

# With time range
start_ts = int((time.time() - 86400) * 1000)  # 24 hours ago
end_ts = int(time.time() * 1000)
row = table.get(Get(b"row1").set_time_range(start_ts, end_ts))

# Check existence only
exists = table.get(Get(b"row1").set_check_existence_only(True)) is not None

Scan

from hbasedriver.operations.scan import Scan

# Scan entire table
with table.scan(Scan()) as scanner:
    for row in scanner:
        print(row.rowkey)

# Scan with row key range
scan = Scan(start_row=b"row1", end_row=b"row9")
with table.scan(scan) as scanner:
    for row in scanner:
        print(row.rowkey)

# Scan with specific columns
scan = Scan().add_column(b"cf", b"col")
with table.scan(scan) as scanner:
    for row in scanner:
        print(row.get(b"cf", b"col"))

# Scan with limit
scan = Scan().set_limit(100)
with table.scan(scan) as scanner:
    for row in scanner:
        print(row.rowkey)

# Pagination
scan = Scan()
rows, resume_key = table.scan_page(scan, page_size=100)
while resume_key:
    scan = Scan(start_row=resume_key, include_start_row=False)
    rows, resume_key = table.scan_page(scan, page_size=100)

Batch Operations

from hbasedriver.operations.batch import BatchGet, BatchPut

# Batch get
bg = BatchGet([b"row1", b"row2", b"row3"])
bg.add_column(b"cf", b"col1")
results = table.batch_get(bg)

# Batch put
bp = BatchPut()
bp.add_put(Put(b"row1").add_column(b"cf", b"col1", b"value1"))
bp.add_put(Put(b"row2").add_column(b"cf", b"col1", b"value2"))
results = table.batch_put(bp)

# Context manager for batch
with table.batch(batch_size=1000) as batch:
    batch.put(b"row1", {b"cf:col1": b"value1"})
    batch.put(b"row2", {b"cf:col1": b"value2"})
    batch.delete(b"row3")
# All operations executed when exiting context

Check and Put

from hbasedriver.operations.increment import CheckAndPut

cap = CheckAndPut(b"row1")
cap.set_check(b"cf", b"lock", b"")  # Check if lock is empty
cap.set_put(Put(b"row1").add_column(b"cf", b"data", b"value"))
success = table.check_and_put(cap)

Increment

from hbasedriver.operations.increment import Increment

inc = Increment(b"row1")
inc.add_column(b"cf", b"counter", 1)
new_value = table.increment(inc)

Append

from hbasedriver.operations.append import Append

append = Append(b"row1")
append.add_column(b"cf", b"tags", b",new_tag")
result = table.append(append)
new_value = result.get(b"cf", b"tags")

Check and Delete

from hbasedriver.operations.delete import Delete

delete = Delete(b"row1").add_column(b"cf", b"col")
success = table.check_and_delete(
    b"row1", b"cf", b"lock", b"", delete  # Delete if lock is empty
)

RowMutations (Atomic Multi-Mutation)

from hbasedriver.operations import RowMutations, Put, Delete

rm = RowMutations(b"row1")
rm.add(Put(b"row1").add_column(b"cf", b"status", b"active"))
rm.add(Delete(b"row1").add_column(b"cf", b"old_field"))
success = table.mutate_row(rm)  # All mutations applied atomically

Exists

# Check if row exists
exists = table.exists(Get(b"row1"))

# Check if specific column exists
exists = table.exists(Get(b"row1").add_column(b"cf", b"col"))

# Check multiple rows at once
results = table.exists_all([Get(b"row1"), Get(b"row2"), Get(b"row3")])

BufferedMutator (Efficient Bulk Writes)

from hbasedriver.client.buffered_mutator import BufferedMutatorParams

# Using context manager - auto-flush on close
params = BufferedMutatorParams(write_buffer_size=2*1024*1024)  # 2MB buffer
with client.get_buffered_mutator(b"default", b"mytable", params) as mutator:
    for i in range(10000):
        mutator.mutate(Put(f"row{i}".encode()).add_column(b"cf", b"data", f"value{i}".encode()))
    # Auto-flushes when exiting context

# Manual flush control
mutator = client.get_buffered_mutator(b"default", b"mytable")
mutator.mutate(Put(b"row1").add_column(b"cf", b"col", b"value"))
mutator.flush()  # Explicit flush
mutator.close()

Filters

The driver supports server-side filters:

from hbasedriver.filter import PrefixFilter, RowFilter
from hbasedriver.filter.compare_filter import CompareOperator
from hbasedriver.filter.binary_comparator import BinaryComparator

# Prefix filter
scan = Scan().set_filter(PrefixFilter(b"abc"))

# Row filter with comparison
scan = Scan().set_filter(
    RowFilter(CompareOperator.EQUAL, BinaryComparator(b"row1"))
)

Java Compatibility

This driver is designed to be familiar to HBase Java developers. Here's a quick comparison:

Java Driver Python Driver
Connection connection = ConnectionFactory.createConnection(config) client = Client(config)
Table table = connection.getTable(TableName.valueOf("mytable")) table = client.get_table("default", "mytable")
table.put(new Put(Bytes.toBytes("row1"))... table.put(Put(b"row1")...
Result result = table.get(new Get(Bytes.toBytes("row1"))... row = table.get(Get(b"row1")...
try (ResultScanner scanner = table.scan(...)) with table.scan(...) as scanner:
try (Connection conn = ...) with Client(config) as client:

Configuration Constants

Java's HConstants are available as hbase_constants.HConstants:

Java Python
HConstants.ZOOKEEPER_QUORUM HConstants.ZOOKEEPER_QUORUM
HConstants.ZOOKEEPER_ZNODE_PARENT HConstants.ZOOKEEPER_ZNODE_PARENT
(Custom connection pool size) HConstants.CONNECTION_POOL_SIZE
(Custom idle timeout) HConstants.CONNECTION_IDLE_TIMEOUT

Development

Quickstart (3-node Docker dev environment)

Prerequisites: Docker and docker-compose installed.

  1. Build, start the custom 3-node cluster and run the full test suite:
./scripts/run_tests_3node.sh
  1. To run tests against an already-running cluster (fast):
./scripts/run_tests_3node.sh --no-start
  1. Run a single test file or case:
./scripts/run_tests_3node.sh test/test_scan.py
./scripts/run_tests_3node.sh test/test_scan.py::test_scan

Legacy single-node dev workflow (still available):

./scripts/run_tests_docker.sh

See TEST_GUIDE.md and DEV_ENV.md for full documentation and troubleshooting steps.

Documentation

License

Apache License 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hbase_driver-1.1.0.tar.gz (199.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hbase_driver-1.1.0-py3-none-any.whl (247.1 kB view details)

Uploaded Python 3

File details

Details for the file hbase_driver-1.1.0.tar.gz.

File metadata

  • Download URL: hbase_driver-1.1.0.tar.gz
  • Upload date:
  • Size: 199.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for hbase_driver-1.1.0.tar.gz
Algorithm Hash digest
SHA256 3e43b0cf46f9aaf68232220ed789751e742a928ee3b6fe04e34a8b42d5d4770e
MD5 224875ef5b585a9e4249b8d542933e0d
BLAKE2b-256 222a9b33c44a0d7ee6fd8e32fc45bf0a9eddf9eed3c939ed3130d98dd1ed58d4

See more details on using hashes here.

File details

Details for the file hbase_driver-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: hbase_driver-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 247.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for hbase_driver-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 45e9f3924ffb42590b61fdcb295fec0f6e0b5d277ee97bfa5efa91a386ac3182
MD5 61806b2edfbdcf0cea20ba0153761446
BLAKE2b-256 147d08ab1cc9b7479ae4e12f565fd623dda28d98c0485cecf229616e0d0ae6aa

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page