No project description provided

Project description

hbase-driver

Pure-Python native HBase client (no Thrift). This project implements core HBase regionserver and master RPCs so Python programs can perform table and metadata operations against an HBase cluster.

Status

Integration test status (local): 238 tests with comprehensive coverage for all features (2026-03-15).

Quick Start

Get started with hbase-driver in just a few lines:

from hbasedriver.client.client import Client
from hbasedriver.hbase_constants import HConstants
from hbasedriver.operations.put import Put
from hbasedriver.operations.get import Get
from hbasedriver.operations.scan import Scan

# Initialize client
config = {HConstants.ZOOKEEPER_QUORUM: "localhost:2181"}
client = Client(config)

# Put and Get data
with client.get_table("default", "mytable") as table:
    table.put(Put(b"row1").add_column(b"cf", b"col", b"value"))
    row = table.get(Get(b"row1"))
    print(row.get(b"cf", b"col"))  # b"value"

# Scan data
with client.get_table("default", "mytable") as table:
    with table.scan(Scan()) as scanner:
        for row in scanner:
            print(row.rowkey)

Complete Example

For a comprehensive example covering all major features, see complete_example.py which demonstrates:

Basic CRUD operations (Put, Get, Delete, Scan)
Advanced features (Batch, CheckAndPut, Increment)
Filter usage for server-side filtering
DDL operations (Create, Disable, Enable, Delete, Truncate)
Connection management and resource cleanup
Cache invalidation after table modifications

Run the example:

python3 complete_example.py

Installation

pip install hbase-driver

Or for development:

git clone https://github.com/innovationb1ue/hbase-driver.git
cd hbase-driver
pip install -e .

Connection Management

The hbase-driver provides context managers for automatic resource cleanup, similar to Java's try-with-resources:

from hbasedriver.client.client import Client
from hbasedriver.hbase_constants import HConstants

# Using context manager - automatic cleanup
with Client({HConstants.ZOOKEEPER_QUORUM: "localhost:2181"}) as client:
    with client.get_table("default", "mytable") as table:
        # Do operations
        table.put(Put(b"row1").add_column(b"cf", b"col", b"value"))
    # Table automatically closed here
# Client automatically closed here

# Manual cleanup
client = Client({HConstants.ZOOKEEPER_QUORUM: "localhost:2181"})
try:
    table = client.get_table("default", "mytable")
    try:
        # Do operations
        table.put(Put(b"row1").add_column(b"cf", b"col", b"value"))
    finally:
        table.close()
finally:
    client.close()

Resource Classes Supporting Context Managers

Client - Main client connection
Table - Table operations
Admin - DDL operations
ResultScanner - Scan results iteration

Configuration

Configuration Options

Key	Default	Description
`hbase.zookeeper.quorum`	Required	ZooKeeper quorum addresses (comma-separated)
`zookeeper.znode.parent`	`/hbase`	ZooKeeper parent znode
`hbase.connection.pool.size`	10	Maximum connections per pool
`hbase.connection.idle.timeout`	300	Idle timeout in seconds

Using Configuration Constants

The driver provides named constants for configuration keys, compatible with HBase Java driver's HConstants:

from hbasedriver.client.client import Client
from hbasedriver.hbase_constants import HConstants

config = {
    HConstants.ZOOKEEPER_QUORUM: "localhost:2181",
    HConstants.CONNECTION_POOL_SIZE: 20,
    HConstants.CONNECTION_IDLE_TIMEOUT: 600
}
client = Client(config)

String Literals (Backward Compatible)

You can also use string literals directly:

config = {
    "hbase.zookeeper.quorum": "localhost:2181",
    "hbase.connection.pool.size": 20
}
client = Client(config)

API Reference

Client

Main entry point for HBase operations.

from hbasedriver.client.client import Client
from hbasedriver.hbase_constants import HConstants

client = Client({HConstants.ZOOKEEPER_QUORUM: "localhost:2181"})

# Get admin for DDL operations
admin = client.get_admin()

# Get table for data operations
table = client.get_table("default", "mytable")

# Close when done
client.close()

Table

Perform data operations on a specific table.

from hbasedriver.operations.put import Put
from hbasedriver.operations.get import Get
from hbasedriver.operations.delete import Delete
from hbasedriver.operations.scan import Scan

# Put data
table.put(Put(b"row1").add_column(b"cf", b"col", b"value"))

# Get data
row = table.get(Get(b"row1"))
if row:
    print(row.get(b"cf", b"col"))

# Delete data
table.delete(Delete(b"row1"))

# Scan data
with table.scan(Scan()) as scanner:
    for row in scanner:
        print(row.rowkey)

Admin

Perform DDL operations.

from hbasedriver.model import ColumnFamilyDescriptor
from hbasedriver.table_name import TableName

# Create table
cfd = ColumnFamilyDescriptor(b"cf")
admin.create_table(TableName.value_of(b"mytable"), [cfd])

# Disable table
admin.disable_table(TableName.value_of(b"mytable"))

# Enable table
admin.enable_table(TableName.value_of(b"mytable"))

# Delete table (must be disabled first)
admin.disable_table(TableName.value_of(b"mytable"))
admin.delete_table(TableName.value_of(b"mytable"))

# List tables
tables = admin.list_tables()

Operations

Put

from hbasedriver.operations.put import Put

# Simple put
put = Put(b"row1").add_column(b"cf", b"col", b"value")
table.put(put)

# Multiple columns
put = Put(b"row2") \
    .add_column(b"cf", b"name", b"Alice") \
    .add_column(b"cf", b"age", b"30") \
    .add_column(b"info", b"email", b"alice@example.com")
table.put(put)

# With timestamp
import time
ts = int(time.time() * 1000)
put = Put(b"row3").add_column(b"cf", b"col", b"value", ts=ts)
table.put(put)

Get

from hbasedriver.operations.get import Get

# Get entire row
row = table.get(Get(b"row1"))

# Get specific column
row = table.get(Get(b"row1").add_column(b"cf", b"col"))

# Get multiple columns
row = table.get(Get(b"row1")
    .add_column(b"cf", b"name")
    .add_column(b"cf", b"age"))

# With time range
start_ts = int((time.time() - 86400) * 1000)  # 24 hours ago
end_ts = int(time.time() * 1000)
row = table.get(Get(b"row1").set_time_range(start_ts, end_ts))

# Check existence only
exists = table.get(Get(b"row1").set_check_existence_only(True)) is not None

Scan

from hbasedriver.operations.scan import Scan

# Scan entire table
with table.scan(Scan()) as scanner:
    for row in scanner:
        print(row.rowkey)

# Scan with row key range
scan = Scan(start_row=b"row1", end_row=b"row9")
with table.scan(scan) as scanner:
    for row in scanner:
        print(row.rowkey)

# Scan with specific columns
scan = Scan().add_column(b"cf", b"col")
with table.scan(scan) as scanner:
    for row in scanner:
        print(row.get(b"cf", b"col"))

# Scan with limit
scan = Scan().set_limit(100)
with table.scan(scan) as scanner:
    for row in scanner:
        print(row.rowkey)

# Pagination
scan = Scan()
rows, resume_key = table.scan_page(scan, page_size=100)
while resume_key:
    scan = Scan(start_row=resume_key, include_start_row=False)
    rows, resume_key = table.scan_page(scan, page_size=100)

Batch Operations

from hbasedriver.operations.batch import BatchGet, BatchPut

# Batch get
bg = BatchGet([b"row1", b"row2", b"row3"])
bg.add_column(b"cf", b"col1")
results = table.batch_get(bg)

# Batch put
bp = BatchPut()
bp.add_put(Put(b"row1").add_column(b"cf", b"col1", b"value1"))
bp.add_put(Put(b"row2").add_column(b"cf", b"col1", b"value2"))
results = table.batch_put(bp)

# Context manager for batch
with table.batch(batch_size=1000) as batch:
    batch.put(b"row1", {b"cf:col1": b"value1"})
    batch.put(b"row2", {b"cf:col1": b"value2"})
    batch.delete(b"row3")
# All operations executed when exiting context

Check and Put

from hbasedriver.operations.increment import CheckAndPut

cap = CheckAndPut(b"row1")
cap.set_check(b"cf", b"lock", b"")  # Check if lock is empty
cap.set_put(Put(b"row1").add_column(b"cf", b"data", b"value"))
success = table.check_and_put(cap)

Increment

from hbasedriver.operations.increment import Increment

inc = Increment(b"row1")
inc.add_column(b"cf", b"counter", 1)
new_value = table.increment(inc)

Append

from hbasedriver.operations.append import Append

append = Append(b"row1")
append.add_column(b"cf", b"tags", b",new_tag")
result = table.append(append)
new_value = result.get(b"cf", b"tags")

Check and Delete

from hbasedriver.operations.delete import Delete

delete = Delete(b"row1").add_column(b"cf", b"col")
success = table.check_and_delete(
    b"row1", b"cf", b"lock", b"", delete  # Delete if lock is empty
)

RowMutations (Atomic Multi-Mutation)

from hbasedriver.operations import RowMutations, Put, Delete

rm = RowMutations(b"row1")
rm.add(Put(b"row1").add_column(b"cf", b"status", b"active"))
rm.add(Delete(b"row1").add_column(b"cf", b"old_field"))
success = table.mutate_row(rm)  # All mutations applied atomically

Exists

# Check if row exists
exists = table.exists(Get(b"row1"))

# Check if specific column exists
exists = table.exists(Get(b"row1").add_column(b"cf", b"col"))

# Check multiple rows at once
results = table.exists_all([Get(b"row1"), Get(b"row2"), Get(b"row3")])

BufferedMutator (Efficient Bulk Writes)

from hbasedriver.client.buffered_mutator import BufferedMutatorParams

# Using context manager - auto-flush on close
params = BufferedMutatorParams(write_buffer_size=2*1024*1024)  # 2MB buffer
with client.get_buffered_mutator(b"default", b"mytable", params) as mutator:
    for i in range(10000):
        mutator.mutate(Put(f"row{i}".encode()).add_column(b"cf", b"data", f"value{i}".encode()))
    # Auto-flushes when exiting context

# Manual flush control
mutator = client.get_buffered_mutator(b"default", b"mytable")
mutator.mutate(Put(b"row1").add_column(b"cf", b"col", b"value"))
mutator.flush()  # Explicit flush
mutator.close()

Filters

The driver supports server-side filters:

from hbasedriver.filter import PrefixFilter, RowFilter
from hbasedriver.filter.compare_filter import CompareOperator
from hbasedriver.filter.binary_comparator import BinaryComparator

# Prefix filter
scan = Scan().set_filter(PrefixFilter(b"abc"))

# Row filter with comparison
scan = Scan().set_filter(
    RowFilter(CompareOperator.EQUAL, BinaryComparator(b"row1"))
)

Java Compatibility

This driver is designed to be familiar to HBase Java developers. Here's a quick comparison:

Java Driver	Python Driver
`Connection connection = ConnectionFactory.createConnection(config)`	`client = Client(config)`
`Table table = connection.getTable(TableName.valueOf("mytable"))`	`table = client.get_table("default", "mytable")`
`table.put(new Put(Bytes.toBytes("row1"))...`	`table.put(Put(b"row1")...`
`Result result = table.get(new Get(Bytes.toBytes("row1"))...`	`row = table.get(Get(b"row1")...`
`try (ResultScanner scanner = table.scan(...))`	`with table.scan(...) as scanner:`
`try (Connection conn = ...)`	`with Client(config) as client:`

Configuration Constants

Java's HConstants are available as hbase_constants.HConstants:

Java	Python
`HConstants.ZOOKEEPER_QUORUM`	`HConstants.ZOOKEEPER_QUORUM`
`HConstants.ZOOKEEPER_ZNODE_PARENT`	`HConstants.ZOOKEEPER_ZNODE_PARENT`
(Custom connection pool size)	`HConstants.CONNECTION_POOL_SIZE`
(Custom idle timeout)	`HConstants.CONNECTION_IDLE_TIMEOUT`

Development

Quickstart (3-node Docker dev environment)

Prerequisites: Docker and docker-compose installed.

Build, start the custom 3-node cluster and run the full test suite:

./scripts/run_tests_3node.sh

To run tests against an already-running cluster (fast):

./scripts/run_tests_3node.sh --no-start

Run a single test file or case:

./scripts/run_tests_3node.sh test/test_scan.py
./scripts/run_tests_3node.sh test/test_scan.py::test_scan

Legacy single-node dev workflow (still available):

./scripts/run_tests_docker.sh

See TEST_GUIDE.md and DEV_ENV.md for full documentation and troubleshooting steps.

Documentation

API Reference - Comprehensive API documentation
Advanced Usage - Advanced features and patterns
Performance Guide - Performance tuning and optimization
Troubleshooting - Common issues and solutions
Migration Guide - Migration from Java HBase, Happybase, and Thrift clients
Migration from happybase - Guide for migrating from happybase/happybase-thrift
中文介绍 - 中文文档介绍 | Chinese Introduction

License

Apache License 2.0

Project details

Release history Release notifications | RSS feed

This version

1.1.0

Mar 17, 2026

1.0.4

Feb 23, 2026

1.0.3

Feb 22, 2026

1.0.2

Feb 20, 2026

1.0.1

Feb 19, 2026

1.0.0

Apr 12, 2024

0.0.14

Apr 1, 2024

0.0.13

Feb 28, 2024

0.0.12

Feb 28, 2024

0.0.11

Feb 21, 2024

0.0.10

Feb 21, 2024

0.0.9

Feb 21, 2024

0.0.8

Feb 21, 2024

0.0.7

Feb 16, 2024

0.0.6

Feb 16, 2024

0.0.5

Feb 15, 2024

0.0.4

Feb 15, 2024

0.0.3

Feb 15, 2024

0.0.2

Feb 13, 2024

0.0.1

Feb 13, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hbase_driver-1.1.0.tar.gz (199.6 kB view details)

Uploaded Mar 17, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

hbase_driver-1.1.0-py3-none-any.whl (247.1 kB view details)

Uploaded Mar 17, 2026 Python 3

File details

Details for the file hbase_driver-1.1.0.tar.gz.

File metadata

Download URL: hbase_driver-1.1.0.tar.gz
Upload date: Mar 17, 2026
Size: 199.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for hbase_driver-1.1.0.tar.gz
Algorithm	Hash digest
SHA256	`3e43b0cf46f9aaf68232220ed789751e742a928ee3b6fe04e34a8b42d5d4770e`
MD5	`224875ef5b585a9e4249b8d542933e0d`
BLAKE2b-256	`222a9b33c44a0d7ee6fd8e32fc45bf0a9eddf9eed3c939ed3130d98dd1ed58d4`

See more details on using hashes here.

File details

Details for the file hbase_driver-1.1.0-py3-none-any.whl.

File metadata

Download URL: hbase_driver-1.1.0-py3-none-any.whl
Upload date: Mar 17, 2026
Size: 247.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for hbase_driver-1.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`45e9f3924ffb42590b61fdcb295fec0f6e0b5d277ee97bfa5efa91a386ac3182`
MD5	`61806b2edfbdcf0cea20ba0153761446`
BLAKE2b-256	`147d08ab1cc9b7479ae4e12f565fd623dda28d98c0485cecf229616e0d0ae6aa`

See more details on using hashes here.

hbase-driver 1.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Project description

hbase-driver

Status

Quick Start

Complete Example

Installation

Connection Management

Resource Classes Supporting Context Managers

Configuration

Configuration Options

Using Configuration Constants

String Literals (Backward Compatible)

API Reference

Client

Table

Admin

Operations

Put

Get

Scan

Batch Operations

Check and Put

Increment

Append

Check and Delete

RowMutations (Atomic Multi-Mutation)

Exists

BufferedMutator (Efficient Bulk Writes)

Filters

Java Compatibility

Configuration Constants

Development

Quickstart (3-node Docker dev environment)

Documentation

License

Project details

Verified details

Maintainers

Unverified details

Project links

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes