A Python client for connecting to GooseFS Table Master via gRPC
Project description
GooseFS Metastore Client
A Python client library for connecting to GooseFS Table Master service via gRPC protocol.
Overview
GooseFS Metastore Client provides a Python interface to interact with GooseFS's Table Master service, enabling you to manage databases and tables in the GooseFS catalog. The client supports both traditional GooseFS operations and Lance-compatible catalog operations.
Architecture
The client follows this call flow:
Python Client (GoosefsMetastoreClient)
↓
gRPC Client (TableMasterClientServiceStub)
↓
[gRPC Network Call]
↓
GooseFS Master (TableMasterClientServiceHandler)
↓
DefaultTableMaster
Features
Core Features
- List and retrieve databases and tables from GooseFS catalog
- Attach external databases (e.g., from Hive) to GooseFS
- Mount/unmount tables for caching in GooseFS
- Synchronize databases with underlying metadata stores
- Retrieve table access statistics
- Get table and partition column statistics
- Transform tables with custom definitions
Lance-Compatible Catalog Features (New)
- Namespace Operations: Create, describe, list, and drop namespaces
- Table CRUD Operations: Create, read, update, and delete tables with Lance-style predicates
- Table Index Operations: Create, list, and manage table indices
- Table Schema Operations: Update schema metadata, add/alter/drop columns
- Table Version Operations: List versions, create/describe/batch-delete versions, restore tables, rename tables
- Table Tag Operations: Create, update, delete, and list table tags
- Table Statistics: Get table stats, explain and analyze query plans
- Transaction Operations: Describe and alter transactions
Installation
pip install -e .
Requirements
- Python >= 3.7
- grpcio >= 1.50.0
- protobuf >= 3.20.0
- A running GooseFS cluster with Table Master service enabled
Quick Start
Basic Usage
from goosefs_metastore_client import GoosefsMetastoreClient
GOOSEFS_HOST = "localhost"
GOOSEFS_PORT = 9220
with GoosefsMetastoreClient(GOOSEFS_HOST, GOOSEFS_PORT) as client:
databases = client.get_all_databases()
for db_info in databases:
print(f"Database: {db_info.name}")
List Databases
from goosefs_metastore_client import GoosefsMetastoreClient
with GoosefsMetastoreClient("localhost", 9220) as client:
databases = client.get_all_databases()
for db in databases:
print(f"{db.name} - Type: {db.type}")
Get Database Details
from goosefs_metastore_client import GoosefsMetastoreClient
with GoosefsMetastoreClient("localhost", 9220) as client:
database = client.get_database("my_database")
print(f"Location: {database.location}")
print(f"Owner: {database.owner_name}")
Attach External Database
from goosefs_metastore_client import GoosefsMetastoreClient
with GoosefsMetastoreClient("localhost", 9220) as client:
sync_status = client.attach_database(
udb_type="hive",
udb_db_name="hive_db",
db_name="goosefs_db",
configuration={
"hive.metastore.uris": "thrift://hive-metastore:9083",
},
auto_mount=True,
)
print(f"Tables synced: {len(sync_status.tables_updated)}")
List and Get Tables
from goosefs_metastore_client import GoosefsMetastoreClient
with GoosefsMetastoreClient("localhost", 9220) as client:
tables = client.get_all_tables("my_database")
for table in tables:
print(f"Table: {table.name}, Mounted: {table.is_mount}")
table_info = client.get_table("my_database", "my_table")
print(f"Owner: {table_info.owner}")
for col in table_info.schema.cols:
print(f"Column: {col.name} ({col.type})")
Mount/Unmount Tables
from goosefs_metastore_client import GoosefsMetastoreClient
with GoosefsMetastoreClient("localhost", 9220) as client:
client.mount_table("my_database", "my_table")
print("Table mounted for caching")
client.unmount_table("my_database", "my_table")
print("Table unmounted")
Access Statistics
from goosefs_metastore_client import GoosefsMetastoreClient
with GoosefsMetastoreClient("localhost", 9220) as client:
stats = client.access_stat(days=7, top_nums=10)
for stat in stats:
print(f"{stat.db_name}.{stat.tb_name}: {stat.hots} accesses")
Lance-Compatible Operations
Namespace Operations
from goosefs_metastore_client import GoosefsMetastoreClient
from grpc_files.table_master_pb2 import (
ListNamespacesPRequest,
CreateNamespacePRequest,
DescribeNamespacePRequest,
)
with GoosefsMetastoreClient("localhost", 9220) as client:
# List namespaces
list_request = ListNamespacesPRequest()
list_request.id.extend(["my_catalog"])
list_request.limit = 10
result = client.list_namespaces(list_request)
print(f"Namespaces: {result['namespaces']}")
# Create namespace
create_request = CreateNamespacePRequest()
create_request.id.extend(["my_catalog", "new_namespace"])
create_request.properties["owner"] = "admin"
client.create_namespace(create_request)
# Describe namespace
desc_request = DescribeNamespacePRequest()
desc_request.id.extend(["my_catalog", "new_namespace"])
props = client.describe_namespace(desc_request)
print(f"Namespace properties: {props}")
Table CRUD Operations
from goosefs_metastore_client import GoosefsMetastoreClient
from grpc_files.table_master_pb2 import (
CreateTablePRequest,
CountTableRowsPRequest,
UpdateTablePRequest,
DeleteFromTablePRequest,
QueryTablePRequest,
)
with GoosefsMetastoreClient("localhost", 9220) as client:
# Create table
create_request = CreateTablePRequest()
create_request.id.extend(["catalog", "namespace", "my_table"])
create_request.schema = "id INT, name STRING, created_at TIMESTAMP"
result = client.create_table(create_request)
print(f"Table created at: {result['location']}")
# Count rows with predicate (Lance-style)
count_request = CountTableRowsPRequest()
count_request.id.extend(["catalog", "namespace", "my_table"])
count_request.predicate = "status = 'active'" # Lance predicate field
count_request.version = 0 # Latest version
count = client.count_table_rows(count_request)
print(f"Active rows: {count}")
# Update table (supports both Lance predicate and GooseFS where_clause)
update_request = UpdateTablePRequest()
update_request.id.extend(["catalog", "namespace", "my_table"])
update_request.updates = "status = 'inactive'"
update_request.predicate = "last_login < '2024-01-01'" # Lance field
# Or use: update_request.where_clause = "..." # GooseFS field
rows_updated = client.update_table(update_request)
print(f"Updated {rows_updated} rows")
# Delete from table
delete_request = DeleteFromTablePRequest()
delete_request.id.extend(["catalog", "namespace", "my_table"])
delete_request.predicate = "expired = true"
rows_deleted = client.delete_from_table(delete_request)
print(f"Deleted {rows_deleted} rows")
# Query table
query_request = QueryTablePRequest()
query_request.id.extend(["catalog", "namespace", "my_table"])
query_request.query = "SELECT * FROM my_table WHERE id > 100"
result = client.query_table(query_request)
print(f"Query result: {result}")
Table Index Operations
from goosefs_metastore_client import GoosefsMetastoreClient
from grpc_files.table_master_pb2 import (
CreateTableIndexPRequest,
ListTableIndicesPRequest,
DescribeTableIndexStatsPRequest,
)
with GoosefsMetastoreClient("localhost", 9220) as client:
# Create index
create_idx_request = CreateTableIndexPRequest()
create_idx_request.id.extend(["catalog", "namespace", "my_table"])
create_idx_request.columns.extend(["embedding"])
create_idx_request.index_type = "IVF_PQ"
create_idx_request.index_name = "idx_embedding"
client.create_table_index(create_idx_request)
# List indices (with Lance pagination)
list_idx_request = ListTableIndicesPRequest()
list_idx_request.id.extend(["catalog", "namespace", "my_table"])
list_idx_request.page_token = "" # Lance field
list_idx_request.limit = 10 # Lance field
indices = client.list_table_indices(list_idx_request)
print(f"Indices: {indices}")
# Get index stats
stats_request = DescribeTableIndexStatsPRequest()
stats_request.id.extend(["catalog", "namespace", "my_table"])
stats_request.index_name = "idx_embedding"
stats = client.describe_table_index_stats(stats_request)
print(f"Index stats: {stats}")
Table Version and Tag Operations
from goosefs_metastore_client import GoosefsMetastoreClient
from grpc_files.table_master_pb2 import (
ListTableVersionsPRequest,
CreateTableVersionPRequest,
DescribeTableVersionPRequest,
BatchDeleteTableVersionsPRequest,
VersionRange,
RestoreTablePRequest,
CreateTableTagPRequest,
ListTableTagsPRequest,
)
with GoosefsMetastoreClient("localhost", 9220) as client:
# List versions
versions_request = ListTableVersionsPRequest()
versions_request.id.extend(["catalog", "namespace", "my_table"])
versions = client.list_table_versions(versions_request)
print(f"Available versions: {versions}")
# Create a table version
create_ver_request = CreateTableVersionPRequest()
create_ver_request.id.extend(["catalog", "namespace", "my_table"])
create_ver_request.version = 10
create_ver_request.manifest_path = "/path/to/manifest"
create_ver_request.manifest_size = 1024
create_ver_request.naming_scheme = "V2"
result = client.create_table_version(create_ver_request)
print(f"Created version: {result}")
# Describe a table version
desc_ver_request = DescribeTableVersionPRequest()
desc_ver_request.id.extend(["catalog", "namespace", "my_table"])
desc_ver_request.version = 10
ver_info = client.describe_table_version(desc_ver_request)
print(f"Version info: {ver_info}")
# Batch delete table versions
batch_del_request = BatchDeleteTableVersionsPRequest()
batch_del_request.id.extend(["catalog", "namespace", "my_table"])
version_range = VersionRange()
version_range.start_version = 1
version_range.end_version = 5
batch_del_request.ranges.append(version_range)
del_result = client.batch_delete_table_versions(batch_del_request)
print(f"Deleted {del_result.get('deleted_count', 0)} versions")
# Restore to specific version
restore_request = RestoreTablePRequest()
restore_request.id.extend(["catalog", "namespace", "my_table"])
restore_request.version = 5
client.restore_table(restore_request)
# Create tag
tag_request = CreateTableTagPRequest()
tag_request.id.extend(["catalog", "namespace", "my_table"])
tag_request.tag = "v1.0-release"
tag_request.version = 10
client.create_table_tag(tag_request)
# List tags
list_tags_request = ListTableTagsPRequest()
list_tags_request.id.extend(["catalog", "namespace", "my_table"])
tags = client.list_table_tags(list_tags_request)
print(f"Tags: {tags}")
Transaction Operations
from goosefs_metastore_client import GoosefsMetastoreClient
from grpc_files.table_master_pb2 import (
DescribeTransactionPRequest,
AlterTransactionPRequest,
)
with GoosefsMetastoreClient("localhost", 9220) as client:
# Describe transaction (supports both Lance id and GooseFS transaction_id)
desc_txn_request = DescribeTransactionPRequest()
desc_txn_request.transaction_id = "txn_12345" # GooseFS field
# Or use: desc_txn_request.id.extend(["txn_12345"]) # Lance field
txn_info = client.describe_transaction(desc_txn_request)
print(f"Transaction status: {txn_info}")
# Alter transaction
alter_txn_request = AlterTransactionPRequest()
alter_txn_request.transaction_id = "txn_12345"
alter_txn_request.action = "commit"
client.alter_transaction(alter_txn_request)
Using Builders
The client provides builder classes to construct database and table objects:
from goosefs_metastore_client.builders import DatabaseBuilder, TableBuilder, FieldSchemaBuilder
database = DatabaseBuilder(
db_name="my_database",
description="My test database",
location="/user/hive/warehouse/my_database.db",
owner_name="admin",
parameters={"key": "value"},
).build()
API Reference
GoosefsMetastoreClient
Main client class for interacting with GooseFS Table Master.
Database Operations
get_all_databases()- Get all databases in the catalogget_database(db_name)- Get a specific database by nameattach_database(udb_type, udb_db_name, db_name, ...)- Attach an external databasedetach_database(db_name)- Detach a database from the catalogsync_database(db_name)- Sync a database with its underlying store
Basic Table Operations
get_all_tables(database)- Get all tables in a databaseget_table(db_name, table_name)- Get a specific tablemount_table(db_name, tb_name)- Mount a table to GooseFSunmount_table(db_name, tb_name)- Unmount a table from GooseFS
Statistics and Analytics
access_stat(days, top_nums)- Get table access statisticsget_table_column_statistics(db_name, table_name, col_names)- Get column statisticsget_partition_column_statistics(db_name, table_name, col_names, part_names)- Get partition statistics
Transform Operations
read_table(db_name, table_name, constraint)- Read table partitions with constraintstransform_table(db_name, table_name, definition)- Transform a tableget_transform_job_info(job_id)- Get transformation job information
Namespace Operations (Lance-Compatible)
list_namespaces(request)- List namespaces with paginationcreate_namespace(request)- Create a new namespacedescribe_namespace(request)- Get namespace propertiesnamespace_exists(request)- Check if namespace existsdrop_namespace(request)- Drop a namespace
Table CRUD Operations (Lance-Compatible)
list_tables(request)- List tables in a namespacetable_exists(request)- Check if table existsdescribe_table(request)- Get table description and storage optionscreate_table(request)- Create a new tablecreate_empty_table(request)- Create an empty tableinsert_into_table(request)- Insert data into tablemerge_insert_into_table(request)- Merge insert dataupdate_table(request)- Update records (supports Lancepredicateand GooseFSwhere_clause)delete_from_table(request)- Delete records (supports Lancepredicateand GooseFSwhere_clause)query_table(request)- Query table datacount_table_rows(request)- Count rows (supports Lancepredicateandversion)drop_table(request)- Drop a table
Table Index Operations (Lance-Compatible)
create_table_index(request)- Create a table indexcreate_table_scalar_index(request)- Create a scalar indexlist_table_indices(request)- List indices (supports Lance pagination:page_token,limit,version)describe_table_index_stats(request)- Get index statisticsdrop_table_index(request)- Drop an index
Table Schema Operations (Lance-Compatible)
update_table_schema_metadata(request)- Update schema (supports Lancemetadataand GooseFSschema)alter_table_add_columns(request)- Add columns to tablealter_table_alter_columns(request)- Alter existing columnsalter_table_drop_columns(request)- Drop columns from table
Table Version Operations (Lance-Compatible)
list_all_tables(request)- List all tables with paginationlist_table_versions(request)- List table versionscreate_table_version(request)- Create a table version with manifest infodescribe_table_version(request)- Describe a specific table versionbatch_delete_table_versions(request)- Batch delete table versions by rangesrestore_table(request)- Restore table to specific versionrename_table(request)- Rename a table
Table Tag Operations (Lance-Compatible)
list_table_tags(request)- List table tagsget_table_tag_version(request)- Get version for a tagcreate_table_tag(request)- Create a table tagupdate_table_tag(request)- Update a table tagdelete_table_tag(request)- Delete a table tag
Table Statistics Operations (Lance-Compatible)
get_table_stats(request)- Get table statisticsexplain_table_query_plan(request)- Explain query plananalyze_table_query_plan(request)- Analyze query plan
Registration Operations (Lance-Compatible)
declare_table(request)- Declare a tablederegister_table(request)- Deregister a tableregister_table(request)- Register a tableregister_namespace_impl(name, class_name)- Register namespace implementationunregister_namespace_impl(name)- Unregister namespace implementationis_registered(name)- Check if namespace is registered
Transaction Operations (Lance-Compatible)
describe_transaction(request)- Describe transaction (supports Lanceidand GooseFStransaction_id)alter_transaction(request)- Alter transaction state
Lance vs GooseFS Field Compatibility
The client supports dual-style API fields for compatibility with both Lance and GooseFS systems:
| Operation | Lance Field | GooseFS Field |
|---|---|---|
| DeleteFromTable | predicate |
where_clause |
| UpdateTable | predicate |
where_clause |
| CountTableRows | predicate, version |
- |
| ListTableIndices | page_token, limit, version |
- |
| UpdateTableSchemaMetadata | metadata |
schema |
| DescribeTransaction | id (repeated) |
transaction_id |
Configuration
Connection Parameters
host: GooseFS master hostnameport: Table Master client service port (default: 9220)max_retries: Maximum retry attempts for failed requests (default: 3)timeout: Timeout in seconds for gRPC calls (default: 30)credentials: Optional gRPC credentials for secure connections
Example with Custom Configuration
client = GoosefsMetastoreClient(
host="goosefs-master.example.com",
port=9220,
max_retries=5,
timeout=60,
)
client.connect()
try:
databases = client.get_all_databases()
finally:
client.close()
Development
Setup Development Environment
pip install -r requirements.dev.txt
Running Tests
# Run all tests
pytest tests/
# Run with verbose output
pytest tests/ -v
# Run specific test file
pytest tests/unit/goosefs_metastore_client/test_goosefs_metastore_client.py
# Run tests with coverage
pytest tests/ --cov=goosefs_metastore_client --cov-report=html
Test Coverage
The test suite covers:
- Client initialization and connection
- All 52+ API methods
- Lance-compatible operations (namespace, table CRUD, index, schema, version, tag, stats, transaction)
- Error handling (not connected errors)
- Retry logic
Code Formatting
black goosefs_metastore_client/
Linting
flake8 goosefs_metastore_client/
mypy goosefs_metastore_client/
Project Structure
goosefs-metastore-client/
├── goosefs_metastore_client/
│ ├── __init__.py
│ ├── goosefs_metastore_client.py # Main client with 52+ methods
│ └── builders/
│ ├── __init__.py
│ ├── abstract_builder.py
│ ├── database_builder.py
│ ├── field_schema_builder.py
│ └── table_builder.py
├── grpc_files/
│ ├── __init__.py
│ ├── common_pb2.py
│ ├── job_master_pb2.py
│ ├── table_master_pb2.py
│ ├── table_master_pb2_grpc.py
│ └── proto/
│ ├── common.proto
│ ├── job_master.proto
│ └── table_master.proto
├── examples/
│ ├── __init__.py
│ ├── # Basic Examples
│ ├── list_databases.py
│ ├── get_database.py
│ ├── attach_database.py
│ ├── list_tables.py
│ ├── get_table.py
│ ├── mount_table.py
│ ├── access_statistics.py
│ ├── complete_example.py
│ ├── detailed_flow_example.py
│ ├── # Lance-Compatible Examples
│ ├── namespace_operations.py # Namespace CRUD
│ ├── table_crud_operations.py # Table create/read/update/delete
│ ├── table_index_operations.py # Index management
│ ├── table_schema_operations.py # Schema operations
│ ├── table_version_operations.py # Version and restore
│ ├── table_tag_operations.py # Tag management
│ ├── table_stats_operations.py # Statistics and query plans
│ ├── table_registration_operations.py # Registration operations
│ ├── table_transform_operations.py # Transform operations
│ └── transaction_operations.py # Transaction management
├── tests/
│ └── unit/
│ └── goosefs_metastore_client/
│ ├── conftest.py
│ ├── test_goosefs_metastore_client.py # 60+ test cases
│ └── builders/
│ ├── test_database_builder.py
│ ├── test_field_schema_builder.py
│ └── test_table_builder.py
├── setup.py
├── requirements.txt
└── README.md
Comparison with Hive Metastore Client
| Feature | Hive Metastore Client | GooseFS Metastore Client |
|---|---|---|
| Protocol | Thrift | gRPC |
| Server | Hive Metastore | GooseFS Table Master |
| Connection | TSocket + TBinaryProtocol | gRPC Channel + Stub |
| Retry Logic | Manual | Built-in with configurable retries |
| Database Operations | Thrift API | gRPC API |
| Table Operations | Thrift API | gRPC API |
| Lance Catalog Support | No | Yes (52+ methods) |
| Table Versioning | No | Yes |
| Table Tagging | No | Yes |
| Vector Index Support | No | Yes |
| Transaction Support | No | Yes |
License
Apache License 2.0
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file goosefs_metastore_client-0.1.4.tar.gz.
File metadata
- Download URL: goosefs_metastore_client-0.1.4.tar.gz
- Upload date:
- Size: 73.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7bd7e859d91d147116675e96c5ddd951b63ca89e2f2f6d50854198ffd9a0b2d1
|
|
| MD5 |
8456edcef42a2e822457c4810d05114c
|
|
| BLAKE2b-256 |
17676cc7a7a5313ac8a87b74a343e114cbd56c5c58af0e9b6ecb01b654f5f5de
|
File details
Details for the file goosefs_metastore_client-0.1.4-py3-none-any.whl.
File metadata
- Download URL: goosefs_metastore_client-0.1.4-py3-none-any.whl
- Upload date:
- Size: 50.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6530c323e5d02fe60e7f63a9a34bae9b6209eb2dd0852d8e35820b4bc7f511e5
|
|
| MD5 |
eae80a83dfce4ebfced4675665669f6f
|
|
| BLAKE2b-256 |
765566a8ea61c28e9530dc77b7e4afaec120f69533148fb2bc22e58d823841b0
|