No project description provided
Project description
hbase-driver
Pure-Python native HBase client (no Thrift). This project implements core HBase regionserver and master RPCs so Python programs can perform table and metadata operations against an HBase cluster.
Status
- Integration test status (local): 238 tests with comprehensive coverage for all features (2026-03-15).
Quick Start
Get started with hbase-driver in just a few lines:
from hbasedriver.client.client import Client
from hbasedriver.hbase_constants import HConstants
from hbasedriver.operations.put import Put
from hbasedriver.operations.get import Get
from hbasedriver.operations.scan import Scan
# Initialize client
config = {HConstants.ZOOKEEPER_QUORUM: "localhost:2181"}
client = Client(config)
# Put and Get data
with client.get_table("default", "mytable") as table:
table.put(Put(b"row1").add_column(b"cf", b"col", b"value"))
row = table.get(Get(b"row1"))
print(row.get(b"cf", b"col")) # b"value"
# Scan data
with client.get_table("default", "mytable") as table:
with table.scan(Scan()) as scanner:
for row in scanner:
print(row.rowkey)
Complete Example
For a comprehensive example covering all major features, see complete_example.py which demonstrates:
- Basic CRUD operations (Put, Get, Delete, Scan)
- Advanced features (Batch, CheckAndPut, Increment)
- Filter usage for server-side filtering
- DDL operations (Create, Disable, Enable, Delete, Truncate)
- Connection management and resource cleanup
- Cache invalidation after table modifications
Run the example:
python3 complete_example.py
Installation
pip install hbase-driver
Or for development:
git clone https://github.com/innovationb1ue/hbase-driver.git
cd hbase-driver
pip install -e .
Connection Management
The hbase-driver provides context managers for automatic resource cleanup, similar to Java's try-with-resources:
from hbasedriver.client.client import Client
from hbasedriver.hbase_constants import HConstants
# Using context manager - automatic cleanup
with Client({HConstants.ZOOKEEPER_QUORUM: "localhost:2181"}) as client:
with client.get_table("default", "mytable") as table:
# Do operations
table.put(Put(b"row1").add_column(b"cf", b"col", b"value"))
# Table automatically closed here
# Client automatically closed here
# Manual cleanup
client = Client({HConstants.ZOOKEEPER_QUORUM: "localhost:2181"})
try:
table = client.get_table("default", "mytable")
try:
# Do operations
table.put(Put(b"row1").add_column(b"cf", b"col", b"value"))
finally:
table.close()
finally:
client.close()
Resource Classes Supporting Context Managers
Client- Main client connectionTable- Table operationsAdmin- DDL operationsResultScanner- Scan results iteration
Configuration
Configuration Options
| Key | Default | Description |
|---|---|---|
hbase.zookeeper.quorum |
Required | ZooKeeper quorum addresses (comma-separated) |
zookeeper.znode.parent |
/hbase |
ZooKeeper parent znode |
hbase.connection.pool.size |
10 | Maximum connections per pool |
hbase.connection.idle.timeout |
300 | Idle timeout in seconds |
Using Configuration Constants
The driver provides named constants for configuration keys, compatible with HBase Java driver's HConstants:
from hbasedriver.client.client import Client
from hbasedriver.hbase_constants import HConstants
config = {
HConstants.ZOOKEEPER_QUORUM: "localhost:2181",
HConstants.CONNECTION_POOL_SIZE: 20,
HConstants.CONNECTION_IDLE_TIMEOUT: 600
}
client = Client(config)
String Literals (Backward Compatible)
You can also use string literals directly:
config = {
"hbase.zookeeper.quorum": "localhost:2181",
"hbase.connection.pool.size": 20
}
client = Client(config)
API Reference
Client
Main entry point for HBase operations.
from hbasedriver.client.client import Client
from hbasedriver.hbase_constants import HConstants
client = Client({HConstants.ZOOKEEPER_QUORUM: "localhost:2181"})
# Get admin for DDL operations
admin = client.get_admin()
# Get table for data operations
table = client.get_table("default", "mytable")
# Close when done
client.close()
Table
Perform data operations on a specific table.
from hbasedriver.operations.put import Put
from hbasedriver.operations.get import Get
from hbasedriver.operations.delete import Delete
from hbasedriver.operations.scan import Scan
# Put data
table.put(Put(b"row1").add_column(b"cf", b"col", b"value"))
# Get data
row = table.get(Get(b"row1"))
if row:
print(row.get(b"cf", b"col"))
# Delete data
table.delete(Delete(b"row1"))
# Scan data
with table.scan(Scan()) as scanner:
for row in scanner:
print(row.rowkey)
Admin
Perform DDL operations.
from hbasedriver.model import ColumnFamilyDescriptor
from hbasedriver.table_name import TableName
# Create table
cfd = ColumnFamilyDescriptor(b"cf")
admin.create_table(TableName.value_of(b"mytable"), [cfd])
# Disable table
admin.disable_table(TableName.value_of(b"mytable"))
# Enable table
admin.enable_table(TableName.value_of(b"mytable"))
# Delete table (must be disabled first)
admin.disable_table(TableName.value_of(b"mytable"))
admin.delete_table(TableName.value_of(b"mytable"))
# List tables
tables = admin.list_tables()
Operations
Put
from hbasedriver.operations.put import Put
# Simple put
put = Put(b"row1").add_column(b"cf", b"col", b"value")
table.put(put)
# Multiple columns
put = Put(b"row2") \
.add_column(b"cf", b"name", b"Alice") \
.add_column(b"cf", b"age", b"30") \
.add_column(b"info", b"email", b"alice@example.com")
table.put(put)
# With timestamp
import time
ts = int(time.time() * 1000)
put = Put(b"row3").add_column(b"cf", b"col", b"value", ts=ts)
table.put(put)
Get
from hbasedriver.operations.get import Get
# Get entire row
row = table.get(Get(b"row1"))
# Get specific column
row = table.get(Get(b"row1").add_column(b"cf", b"col"))
# Get multiple columns
row = table.get(Get(b"row1")
.add_column(b"cf", b"name")
.add_column(b"cf", b"age"))
# With time range
start_ts = int((time.time() - 86400) * 1000) # 24 hours ago
end_ts = int(time.time() * 1000)
row = table.get(Get(b"row1").set_time_range(start_ts, end_ts))
# Check existence only
exists = table.get(Get(b"row1").set_check_existence_only(True)) is not None
Scan
from hbasedriver.operations.scan import Scan
# Scan entire table
with table.scan(Scan()) as scanner:
for row in scanner:
print(row.rowkey)
# Scan with row key range
scan = Scan(start_row=b"row1", end_row=b"row9")
with table.scan(scan) as scanner:
for row in scanner:
print(row.rowkey)
# Scan with specific columns
scan = Scan().add_column(b"cf", b"col")
with table.scan(scan) as scanner:
for row in scanner:
print(row.get(b"cf", b"col"))
# Scan with limit
scan = Scan().set_limit(100)
with table.scan(scan) as scanner:
for row in scanner:
print(row.rowkey)
# Pagination
scan = Scan()
rows, resume_key = table.scan_page(scan, page_size=100)
while resume_key:
scan = Scan(start_row=resume_key, include_start_row=False)
rows, resume_key = table.scan_page(scan, page_size=100)
Batch Operations
from hbasedriver.operations.batch import BatchGet, BatchPut
# Batch get
bg = BatchGet([b"row1", b"row2", b"row3"])
bg.add_column(b"cf", b"col1")
results = table.batch_get(bg)
# Batch put
bp = BatchPut()
bp.add_put(Put(b"row1").add_column(b"cf", b"col1", b"value1"))
bp.add_put(Put(b"row2").add_column(b"cf", b"col1", b"value2"))
results = table.batch_put(bp)
# Context manager for batch
with table.batch(batch_size=1000) as batch:
batch.put(b"row1", {b"cf:col1": b"value1"})
batch.put(b"row2", {b"cf:col1": b"value2"})
batch.delete(b"row3")
# All operations executed when exiting context
Check and Put
from hbasedriver.operations.increment import CheckAndPut
cap = CheckAndPut(b"row1")
cap.set_check(b"cf", b"lock", b"") # Check if lock is empty
cap.set_put(Put(b"row1").add_column(b"cf", b"data", b"value"))
success = table.check_and_put(cap)
Increment
from hbasedriver.operations.increment import Increment
inc = Increment(b"row1")
inc.add_column(b"cf", b"counter", 1)
new_value = table.increment(inc)
Append
from hbasedriver.operations.append import Append
append = Append(b"row1")
append.add_column(b"cf", b"tags", b",new_tag")
result = table.append(append)
new_value = result.get(b"cf", b"tags")
Check and Delete
from hbasedriver.operations.delete import Delete
delete = Delete(b"row1").add_column(b"cf", b"col")
success = table.check_and_delete(
b"row1", b"cf", b"lock", b"", delete # Delete if lock is empty
)
RowMutations (Atomic Multi-Mutation)
from hbasedriver.operations import RowMutations, Put, Delete
rm = RowMutations(b"row1")
rm.add(Put(b"row1").add_column(b"cf", b"status", b"active"))
rm.add(Delete(b"row1").add_column(b"cf", b"old_field"))
success = table.mutate_row(rm) # All mutations applied atomically
Exists
# Check if row exists
exists = table.exists(Get(b"row1"))
# Check if specific column exists
exists = table.exists(Get(b"row1").add_column(b"cf", b"col"))
# Check multiple rows at once
results = table.exists_all([Get(b"row1"), Get(b"row2"), Get(b"row3")])
BufferedMutator (Efficient Bulk Writes)
from hbasedriver.client.buffered_mutator import BufferedMutatorParams
# Using context manager - auto-flush on close
params = BufferedMutatorParams(write_buffer_size=2*1024*1024) # 2MB buffer
with client.get_buffered_mutator(b"default", b"mytable", params) as mutator:
for i in range(10000):
mutator.mutate(Put(f"row{i}".encode()).add_column(b"cf", b"data", f"value{i}".encode()))
# Auto-flushes when exiting context
# Manual flush control
mutator = client.get_buffered_mutator(b"default", b"mytable")
mutator.mutate(Put(b"row1").add_column(b"cf", b"col", b"value"))
mutator.flush() # Explicit flush
mutator.close()
Filters
The driver supports server-side filters:
from hbasedriver.filter import PrefixFilter, RowFilter
from hbasedriver.filter.compare_filter import CompareOperator
from hbasedriver.filter.binary_comparator import BinaryComparator
# Prefix filter
scan = Scan().set_filter(PrefixFilter(b"abc"))
# Row filter with comparison
scan = Scan().set_filter(
RowFilter(CompareOperator.EQUAL, BinaryComparator(b"row1"))
)
Java Compatibility
This driver is designed to be familiar to HBase Java developers. Here's a quick comparison:
| Java Driver | Python Driver |
|---|---|
Connection connection = ConnectionFactory.createConnection(config) |
client = Client(config) |
Table table = connection.getTable(TableName.valueOf("mytable")) |
table = client.get_table("default", "mytable") |
table.put(new Put(Bytes.toBytes("row1"))... |
table.put(Put(b"row1")... |
Result result = table.get(new Get(Bytes.toBytes("row1"))... |
row = table.get(Get(b"row1")... |
try (ResultScanner scanner = table.scan(...)) |
with table.scan(...) as scanner: |
try (Connection conn = ...) |
with Client(config) as client: |
Configuration Constants
Java's HConstants are available as hbase_constants.HConstants:
| Java | Python |
|---|---|
HConstants.ZOOKEEPER_QUORUM |
HConstants.ZOOKEEPER_QUORUM |
HConstants.ZOOKEEPER_ZNODE_PARENT |
HConstants.ZOOKEEPER_ZNODE_PARENT |
| (Custom connection pool size) | HConstants.CONNECTION_POOL_SIZE |
| (Custom idle timeout) | HConstants.CONNECTION_IDLE_TIMEOUT |
Development
Quickstart (3-node Docker dev environment)
Prerequisites: Docker and docker-compose installed.
- Build, start the custom 3-node cluster and run the full test suite:
./scripts/run_tests_3node.sh
- To run tests against an already-running cluster (fast):
./scripts/run_tests_3node.sh --no-start
- Run a single test file or case:
./scripts/run_tests_3node.sh test/test_scan.py
./scripts/run_tests_3node.sh test/test_scan.py::test_scan
Legacy single-node dev workflow (still available):
./scripts/run_tests_docker.sh
See TEST_GUIDE.md and DEV_ENV.md for full documentation and troubleshooting steps.
Documentation
- API Reference - Comprehensive API documentation
- Advanced Usage - Advanced features and patterns
- Performance Guide - Performance tuning and optimization
- Troubleshooting - Common issues and solutions
- Migration Guide - Migration from Java HBase, Happybase, and Thrift clients
- Migration from happybase - Guide for migrating from happybase/happybase-thrift
- 中文介绍 - 中文文档介绍 | Chinese Introduction
License
Apache License 2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file hbase_driver-1.1.0.tar.gz.
File metadata
- Download URL: hbase_driver-1.1.0.tar.gz
- Upload date:
- Size: 199.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3e43b0cf46f9aaf68232220ed789751e742a928ee3b6fe04e34a8b42d5d4770e
|
|
| MD5 |
224875ef5b585a9e4249b8d542933e0d
|
|
| BLAKE2b-256 |
222a9b33c44a0d7ee6fd8e32fc45bf0a9eddf9eed3c939ed3130d98dd1ed58d4
|
File details
Details for the file hbase_driver-1.1.0-py3-none-any.whl.
File metadata
- Download URL: hbase_driver-1.1.0-py3-none-any.whl
- Upload date:
- Size: 247.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
45e9f3924ffb42590b61fdcb295fec0f6e0b5d277ee97bfa5efa91a386ac3182
|
|
| MD5 |
61806b2edfbdcf0cea20ba0153761446
|
|
| BLAKE2b-256 |
147d08ab1cc9b7479ae4e12f565fd623dda28d98c0485cecf229616e0d0ae6aa
|