A Python client for connecting to GooseFS Table Master via gRPC
Project description
GooseFS Metastore Client
A Python client library for connecting to GooseFS Table Master service via gRPC protocol.
Overview
GooseFS Metastore Client provides a Python interface to interact with GooseFS's Table Master service, enabling you to manage databases and tables in the GooseFS catalog.
Architecture
The client follows this call flow:
Python Client (GoosefsMetastoreClient)
↓
gRPC Client (TableMasterClientServiceStub)
↓
[gRPC Network Call]
↓
GooseFS Master (TableMasterClientServiceHandler)
↓
DefaultTableMaster
Features
- List and retrieve databases and tables from GooseFS catalog
- Attach external databases (e.g., from Hive) to GooseFS
- Mount/unmount tables for caching in GooseFS
- Synchronize databases with underlying metadata stores
- Retrieve table access statistics
- Get table and partition column statistics
- Transform tables with custom definitions
Installation
pip install -e .
Requirements
- Python >= 3.7
- grpcio >= 1.50.0
- protobuf >= 3.20.0
- A running GooseFS cluster with Table Master service enabled
Quick Start
Basic Usage
from goosefs_metastore_client import GoosefsMetastoreClient
GOOSEFS_HOST = "localhost"
GOOSEFS_PORT = 9200
with GoosefsMetastoreClient(GOOSEFS_HOST, GOOSEFS_PORT) as client:
databases = client.get_all_databases()
for db_info in databases:
print(f"Database: {db_info.name}")
List Databases
from goosefs_metastore_client import GoosefsMetastoreClient
with GoosefsMetastoreClient("localhost", 9200) as client:
databases = client.get_all_databases()
for db in databases:
print(f"{db.name} - Type: {db.type}")
Get Database Details
from goosefs_metastore_client import GoosefsMetastoreClient
with GoosefsMetastoreClient("localhost", 9200) as client:
database = client.get_database("my_database")
print(f"Location: {database.location}")
print(f"Owner: {database.owner_name}")
Attach External Database
from goosefs_metastore_client import GoosefsMetastoreClient
with GoosefsMetastoreClient("localhost", 9200) as client:
sync_status = client.attach_database(
udb_type="hive",
udb_db_name="hive_db",
db_name="goosefs_db",
configuration={
"hive.metastore.uris": "thrift://hive-metastore:9083",
},
auto_mount=True,
)
print(f"Tables synced: {len(sync_status.tables_updated)}")
List and Get Tables
from goosefs_metastore_client import GoosefsMetastoreClient
with GoosefsMetastoreClient("localhost", 9200) as client:
tables = client.get_all_tables("my_database")
for table in tables:
print(f"Table: {table.name}, Mounted: {table.is_mount}")
table_info = client.get_table("my_database", "my_table")
print(f"Owner: {table_info.owner}")
for col in table_info.schema.cols:
print(f"Column: {col.name} ({col.type})")
Mount/Unmount Tables
from goosefs_metastore_client import GoosefsMetastoreClient
with GoosefsMetastoreClient("localhost", 9200) as client:
client.mount_table("my_database", "my_table")
print("Table mounted for caching")
client.unmount_table("my_database", "my_table")
print("Table unmounted")
Access Statistics
from goosefs_metastore_client import GoosefsMetastoreClient
with GoosefsMetastoreClient("localhost", 9200) as client:
stats = client.access_stat(days=7, top_nums=10)
for stat in stats:
print(f"{stat.db_name}.{stat.tb_name}: {stat.hots} accesses")
Using Builders
The client provides builder classes to construct database and table objects:
from goosefs_metastore_client.builders import DatabaseBuilder, TableBuilder, FieldSchemaBuilder
database = DatabaseBuilder(
db_name="my_database",
description="My test database",
location="/user/hive/warehouse/my_database.db",
owner_name="admin",
parameters={"key": "value"},
).build()
API Reference
GoosefsMetastoreClient
Main client class for interacting with GooseFS Table Master.
Database Operations
get_all_databases()- Get all databases in the catalogget_database(db_name)- Get a specific database by nameattach_database(udb_type, udb_db_name, db_name, ...)- Attach an external databasedetach_database(db_name)- Detach a database from the catalogsync_database(db_name)- Sync a database with its underlying store
Table Operations
get_all_tables(database)- Get all tables in a databaseget_table(db_name, table_name)- Get a specific tablemount_table(db_name, tb_name)- Mount a table to GooseFSunmount_table(db_name, tb_name)- Unmount a table from GooseFS
Statistics and Analytics
access_stat(days, top_nums)- Get table access statisticsget_table_column_statistics(db_name, table_name, col_names)- Get column statisticsget_partition_column_statistics(db_name, table_name, col_names, part_names)- Get partition statistics
Advanced Operations
read_table(db_name, table_name, constraint)- Read table partitions with constraintstransform_table(db_name, table_name, definition)- Transform a tableget_transform_job_info(job_id)- Get transformation job information
Configuration
Connection Parameters
host: GooseFS master hostnameport: Table Master client service port (default: 9200)max_retries: Maximum retry attempts for failed requests (default: 3)timeout: Timeout in seconds for gRPC calls (default: 30)credentials: Optional gRPC credentials for secure connections
Example with Custom Configuration
client = GoosefsMetastoreClient(
host="goosefs-master.example.com",
port=9200,
max_retries=5,
timeout=60,
)
client.connect()
try:
databases = client.get_all_databases()
finally:
client.close()
Development
Setup Development Environment
pip install -r requirements.dev.txt
Running Tests
pytest tests/
Code Formatting
black goosefs_metastore_client/
Linting
flake8 goosefs_metastore_client/
mypy goosefs_metastore_client/
Project Structure
goosefs-metastore-client/
├── goosefs_metastore_client/
│ ├── __init__.py
│ ├── goosefs_metastore_client.py
│ └── builders/
│ ├── __init__.py
│ ├── abstract_builder.py
│ ├── database_builder.py
│ ├── field_schema_builder.py
│ └── table_builder.py
├── grpc_files/
│ ├── __init__.py
│ ├── common_pb2.py
│ ├── job_master_pb2.py
│ ├── table_master_pb2.py
│ ├── table_master_pb2_grpc.py
│ └── proto/
│ ├── common.proto
│ ├── job_master.proto
│ └── table_master.proto
├── examples/
│ ├── list_databases.py
│ ├── get_database.py
│ ├── attach_database.py
│ ├── list_tables.py
│ ├── get_table.py
│ ├── mount_table.py
│ └── access_statistics.py
├── tests/
│ └── unit/
│ └── goosefs_metastore_client/
│ └── builders/
├── setup.py
├── requirements.txt
└── README.md
Comparison with Hive Metastore Client
| Feature | Hive Metastore Client | GooseFS Metastore Client |
|---|---|---|
| Protocol | Thrift | gRPC |
| Server | Hive Metastore | GooseFS Table Master |
| Connection | TSocket + TBinaryProtocol | gRPC Channel + Stub |
| Retry Logic | Manual | Built-in with configurable retries |
| Database Operations | Thrift API | gRPC API |
| Table Operations | Thrift API | gRPC API |
License
Apache License 2.0
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file goosefs_metastore_client-0.1.1.tar.gz.
File metadata
- Download URL: goosefs_metastore_client-0.1.1.tar.gz
- Upload date:
- Size: 59.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cae60ac756beca422c91cad31f718bd24fb46f3b63f98f416bab2eb70766daf6
|
|
| MD5 |
d1e37e12b8951e5dc2588f39248b9d3a
|
|
| BLAKE2b-256 |
48435a693f7b397ff5084e768af772fc61590b110692c51f9864b172c838cefe
|
File details
Details for the file goosefs_metastore_client-0.1.1-py3-none-any.whl.
File metadata
- Download URL: goosefs_metastore_client-0.1.1-py3-none-any.whl
- Upload date:
- Size: 43.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d32eb37a3182f9a98f561d99eca9b634ad15349dbb88add19ea228b56031aba4
|
|
| MD5 |
8f9f0e1a5d0e010bbe891931974f13fe
|
|
| BLAKE2b-256 |
7c59610c757d7453ab33b9a3e11a35e53f329979f868fff00e52ad978f06692c
|