Skip to main content

A Python client for connecting to GooseFS Table Master via gRPC

Project description

GooseFS Metastore Client

A Python client library for connecting to GooseFS Table Master service via gRPC protocol.

Overview

GooseFS Metastore Client provides a Python interface to interact with GooseFS's Table Master service, enabling you to manage databases and tables in the GooseFS catalog.

Architecture

The client follows this call flow:

Python Client (GoosefsMetastoreClient)
    ↓
gRPC Client (TableMasterClientServiceStub)
    ↓
[gRPC Network Call]
    ↓
GooseFS Master (TableMasterClientServiceHandler)
    ↓
DefaultTableMaster

Features

  • List and retrieve databases and tables from GooseFS catalog
  • Attach external databases (e.g., from Hive) to GooseFS
  • Mount/unmount tables for caching in GooseFS
  • Synchronize databases with underlying metadata stores
  • Retrieve table access statistics
  • Get table and partition column statistics
  • Transform tables with custom definitions

Installation

pip install -e .

Requirements

  • Python >= 3.7
  • grpcio >= 1.50.0
  • protobuf >= 3.20.0
  • A running GooseFS cluster with Table Master service enabled

Quick Start

Basic Usage

from goosefs_metastore_client import GoosefsMetastoreClient

GOOSEFS_HOST = "localhost"
GOOSEFS_PORT = 9200

with GoosefsMetastoreClient(GOOSEFS_HOST, GOOSEFS_PORT) as client:
    databases = client.get_all_databases()
    for db_info in databases:
        print(f"Database: {db_info.name}")

List Databases

from goosefs_metastore_client import GoosefsMetastoreClient

with GoosefsMetastoreClient("localhost", 9200) as client:
    databases = client.get_all_databases()
    for db in databases:
        print(f"{db.name} - Type: {db.type}")

Get Database Details

from goosefs_metastore_client import GoosefsMetastoreClient

with GoosefsMetastoreClient("localhost", 9200) as client:
    database = client.get_database("my_database")
    print(f"Location: {database.location}")
    print(f"Owner: {database.owner_name}")

Attach External Database

from goosefs_metastore_client import GoosefsMetastoreClient

with GoosefsMetastoreClient("localhost", 9200) as client:
    sync_status = client.attach_database(
        udb_type="hive",
        udb_db_name="hive_db",
        db_name="goosefs_db",
        configuration={
            "hive.metastore.uris": "thrift://hive-metastore:9083",
        },
        auto_mount=True,
    )
    print(f"Tables synced: {len(sync_status.tables_updated)}")

List and Get Tables

from goosefs_metastore_client import GoosefsMetastoreClient

with GoosefsMetastoreClient("localhost", 9200) as client:
    tables = client.get_all_tables("my_database")
    for table in tables:
        print(f"Table: {table.name}, Mounted: {table.is_mount}")
    
    table_info = client.get_table("my_database", "my_table")
    print(f"Owner: {table_info.owner}")
    for col in table_info.schema.cols:
        print(f"Column: {col.name} ({col.type})")

Mount/Unmount Tables

from goosefs_metastore_client import GoosefsMetastoreClient

with GoosefsMetastoreClient("localhost", 9200) as client:
    client.mount_table("my_database", "my_table")
    print("Table mounted for caching")
    
    client.unmount_table("my_database", "my_table")
    print("Table unmounted")

Access Statistics

from goosefs_metastore_client import GoosefsMetastoreClient

with GoosefsMetastoreClient("localhost", 9200) as client:
    stats = client.access_stat(days=7, top_nums=10)
    for stat in stats:
        print(f"{stat.db_name}.{stat.tb_name}: {stat.hots} accesses")

Using Builders

The client provides builder classes to construct database and table objects:

from goosefs_metastore_client.builders import DatabaseBuilder, TableBuilder, FieldSchemaBuilder

database = DatabaseBuilder(
    db_name="my_database",
    description="My test database",
    location="/user/hive/warehouse/my_database.db",
    owner_name="admin",
    parameters={"key": "value"},
).build()

API Reference

GoosefsMetastoreClient

Main client class for interacting with GooseFS Table Master.

Database Operations

  • get_all_databases() - Get all databases in the catalog
  • get_database(db_name) - Get a specific database by name
  • attach_database(udb_type, udb_db_name, db_name, ...) - Attach an external database
  • detach_database(db_name) - Detach a database from the catalog
  • sync_database(db_name) - Sync a database with its underlying store

Table Operations

  • get_all_tables(database) - Get all tables in a database
  • get_table(db_name, table_name) - Get a specific table
  • mount_table(db_name, tb_name) - Mount a table to GooseFS
  • unmount_table(db_name, tb_name) - Unmount a table from GooseFS

Statistics and Analytics

  • access_stat(days, top_nums) - Get table access statistics
  • get_table_column_statistics(db_name, table_name, col_names) - Get column statistics
  • get_partition_column_statistics(db_name, table_name, col_names, part_names) - Get partition statistics

Advanced Operations

  • read_table(db_name, table_name, constraint) - Read table partitions with constraints
  • transform_table(db_name, table_name, definition) - Transform a table
  • get_transform_job_info(job_id) - Get transformation job information

Configuration

Connection Parameters

  • host: GooseFS master hostname
  • port: Table Master client service port (default: 9200)
  • max_retries: Maximum retry attempts for failed requests (default: 3)
  • timeout: Timeout in seconds for gRPC calls (default: 30)
  • credentials: Optional gRPC credentials for secure connections

Example with Custom Configuration

client = GoosefsMetastoreClient(
    host="goosefs-master.example.com",
    port=9200,
    max_retries=5,
    timeout=60,
)
client.connect()
try:
    databases = client.get_all_databases()
finally:
    client.close()

Development

Setup Development Environment

pip install -r requirements.dev.txt

Running Tests

pytest tests/

Code Formatting

black goosefs_metastore_client/

Linting

flake8 goosefs_metastore_client/
mypy goosefs_metastore_client/

Project Structure

goosefs-metastore-client/
├── goosefs_metastore_client/
│   ├── __init__.py
│   ├── goosefs_metastore_client.py
│   └── builders/
│       ├── __init__.py
│       ├── abstract_builder.py
│       ├── database_builder.py
│       ├── field_schema_builder.py
│       └── table_builder.py
├── grpc_files/
│   ├── __init__.py
│   ├── common_pb2.py
│   ├── job_master_pb2.py
│   ├── table_master_pb2.py
│   ├── table_master_pb2_grpc.py
│   └── proto/
│       ├── common.proto
│       ├── job_master.proto
│       └── table_master.proto
├── examples/
│   ├── list_databases.py
│   ├── get_database.py
│   ├── attach_database.py
│   ├── list_tables.py
│   ├── get_table.py
│   ├── mount_table.py
│   └── access_statistics.py
├── tests/
│   └── unit/
│       └── goosefs_metastore_client/
│           └── builders/
├── setup.py
├── requirements.txt
└── README.md

Comparison with Hive Metastore Client

Feature Hive Metastore Client GooseFS Metastore Client
Protocol Thrift gRPC
Server Hive Metastore GooseFS Table Master
Connection TSocket + TBinaryProtocol gRPC Channel + Stub
Retry Logic Manual Built-in with configurable retries
Database Operations Thrift API gRPC API
Table Operations Thrift API gRPC API

License

Apache License 2.0

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

goosefs_metastore_client-0.1.1.tar.gz (59.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

goosefs_metastore_client-0.1.1-py3-none-any.whl (43.1 kB view details)

Uploaded Python 3

File details

Details for the file goosefs_metastore_client-0.1.1.tar.gz.

File metadata

  • Download URL: goosefs_metastore_client-0.1.1.tar.gz
  • Upload date:
  • Size: 59.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.6

File hashes

Hashes for goosefs_metastore_client-0.1.1.tar.gz
Algorithm Hash digest
SHA256 cae60ac756beca422c91cad31f718bd24fb46f3b63f98f416bab2eb70766daf6
MD5 d1e37e12b8951e5dc2588f39248b9d3a
BLAKE2b-256 48435a693f7b397ff5084e768af772fc61590b110692c51f9864b172c838cefe

See more details on using hashes here.

File details

Details for the file goosefs_metastore_client-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for goosefs_metastore_client-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 d32eb37a3182f9a98f561d99eca9b634ad15349dbb88add19ea228b56031aba4
MD5 8f9f0e1a5d0e010bbe891931974f13fe
BLAKE2b-256 7c59610c757d7453ab33b9a3e11a35e53f329979f868fff00e52ad978f06692c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page