Skip to main content

A Python client for connecting to GooseFS Table Master via gRPC

Project description

GooseFS Metastore Client

A Python client library for connecting to GooseFS Table Master service via gRPC protocol.

Overview

GooseFS Metastore Client provides a Python interface to interact with GooseFS's Table Master service, enabling you to manage databases and tables in the GooseFS catalog. The client supports both traditional GooseFS operations and Lance-compatible catalog operations.

Architecture

The client follows this call flow:

Python Client (GoosefsMetastoreClient)
    ↓
gRPC Client (TableMasterClientServiceStub)
    ↓
[gRPC Network Call]
    ↓
GooseFS Master (TableMasterClientServiceHandler)
    ↓
DefaultTableMaster

Features

Core Features

  • List and retrieve databases and tables from GooseFS catalog
  • Attach external databases (e.g., from Hive) to GooseFS
  • Mount/unmount tables for caching in GooseFS
  • Synchronize databases with underlying metadata stores
  • Retrieve table access statistics
  • Get table and partition column statistics
  • Transform tables with custom definitions

Lance-Compatible Catalog Features (New)

  • Namespace Operations: Create, describe, list, and drop namespaces
  • Table CRUD Operations: Create, read, update, and delete tables with Lance-style predicates
  • Table Index Operations: Create, list, and manage table indices
  • Table Schema Operations: Update schema metadata, add/alter/drop columns
  • Table Version Operations: List versions, create/describe/batch-delete versions, restore tables, rename tables
  • Table Tag Operations: Create, update, delete, and list table tags
  • Table Statistics: Get table stats, explain and analyze query plans
  • Transaction Operations: Describe and alter transactions

Installation

pip install -e .

Requirements

  • Python >= 3.7
  • grpcio >= 1.50.0
  • protobuf >= 3.20.0
  • A running GooseFS cluster with Table Master service enabled

Quick Start

Basic Usage

from goosefs_metastore_client import GoosefsMetastoreClient

GOOSEFS_HOST = "localhost"
GOOSEFS_PORT = 9220

with GoosefsMetastoreClient(GOOSEFS_HOST, GOOSEFS_PORT) as client:
    databases = client.get_all_databases()
    for db_info in databases:
        print(f"Database: {db_info.name}")

List Databases

from goosefs_metastore_client import GoosefsMetastoreClient

with GoosefsMetastoreClient("localhost", 9220) as client:
    databases = client.get_all_databases()
    for db in databases:
        print(f"{db.name} - Type: {db.type}")

Get Database Details

from goosefs_metastore_client import GoosefsMetastoreClient

with GoosefsMetastoreClient("localhost", 9220) as client:
    database = client.get_database("my_database")
    print(f"Location: {database.location}")
    print(f"Owner: {database.owner_name}")

Attach External Database

from goosefs_metastore_client import GoosefsMetastoreClient

with GoosefsMetastoreClient("localhost", 9220) as client:
    sync_status = client.attach_database(
        udb_type="hive",
        udb_db_name="hive_db",
        db_name="goosefs_db",
        configuration={
            "hive.metastore.uris": "thrift://hive-metastore:9083",
        },
        auto_mount=True,
    )
    print(f"Tables synced: {len(sync_status.tables_updated)}")

List and Get Tables

from goosefs_metastore_client import GoosefsMetastoreClient

with GoosefsMetastoreClient("localhost", 9220) as client:
    tables = client.get_all_tables("my_database")
    for table in tables:
        print(f"Table: {table.name}, Mounted: {table.is_mount}")
    
    table_info = client.get_table("my_database", "my_table")
    print(f"Owner: {table_info.owner}")
    for col in table_info.schema.cols:
        print(f"Column: {col.name} ({col.type})")

Mount/Unmount Tables

from goosefs_metastore_client import GoosefsMetastoreClient

with GoosefsMetastoreClient("localhost", 9220) as client:
    client.mount_table("my_database", "my_table")
    print("Table mounted for caching")
    
    client.unmount_table("my_database", "my_table")
    print("Table unmounted")

Access Statistics

from goosefs_metastore_client import GoosefsMetastoreClient

with GoosefsMetastoreClient("localhost", 9220) as client:
    stats = client.access_stat(days=7, top_nums=10)
    for stat in stats:
        print(f"{stat.db_name}.{stat.tb_name}: {stat.hots} accesses")

Lance-Compatible Operations

Namespace Operations

from goosefs_metastore_client import GoosefsMetastoreClient
from grpc_files.table_master_pb2 import (
    ListNamespacesPRequest,
    CreateNamespacePRequest,
    DescribeNamespacePRequest,
)

with GoosefsMetastoreClient("localhost", 9220) as client:
    # List namespaces
    list_request = ListNamespacesPRequest()
    list_request.id.extend(["my_catalog"])
    list_request.limit = 10
    result = client.list_namespaces(list_request)
    print(f"Namespaces: {result['namespaces']}")
    
    # Create namespace
    create_request = CreateNamespacePRequest()
    create_request.id.extend(["my_catalog", "new_namespace"])
    create_request.properties["owner"] = "admin"
    client.create_namespace(create_request)
    
    # Describe namespace
    desc_request = DescribeNamespacePRequest()
    desc_request.id.extend(["my_catalog", "new_namespace"])
    props = client.describe_namespace(desc_request)
    print(f"Namespace properties: {props}")

Table CRUD Operations

from goosefs_metastore_client import GoosefsMetastoreClient
from grpc_files.table_master_pb2 import (
    CreateTablePRequest,
    CountTableRowsPRequest,
    UpdateTablePRequest,
    DeleteFromTablePRequest,
    QueryTablePRequest,
)

with GoosefsMetastoreClient("localhost", 9220) as client:
    # Create table
    create_request = CreateTablePRequest()
    create_request.id.extend(["catalog", "namespace", "my_table"])
    create_request.schema = "id INT, name STRING, created_at TIMESTAMP"
    result = client.create_table(create_request)
    print(f"Table created at: {result['location']}")
    
    # Count rows with predicate (Lance-style)
    count_request = CountTableRowsPRequest()
    count_request.id.extend(["catalog", "namespace", "my_table"])
    count_request.predicate = "status = 'active'"  # Lance predicate field
    count_request.version = 0  # Latest version
    count = client.count_table_rows(count_request)
    print(f"Active rows: {count}")
    
    # Update table (supports both Lance predicate and GooseFS where_clause)
    update_request = UpdateTablePRequest()
    update_request.id.extend(["catalog", "namespace", "my_table"])
    update_request.updates = "status = 'inactive'"
    update_request.predicate = "last_login < '2024-01-01'"  # Lance field
    # Or use: update_request.where_clause = "..."  # GooseFS field
    rows_updated = client.update_table(update_request)
    print(f"Updated {rows_updated} rows")
    
    # Delete from table
    delete_request = DeleteFromTablePRequest()
    delete_request.id.extend(["catalog", "namespace", "my_table"])
    delete_request.predicate = "expired = true"
    rows_deleted = client.delete_from_table(delete_request)
    print(f"Deleted {rows_deleted} rows")
    
    # Query table
    query_request = QueryTablePRequest()
    query_request.id.extend(["catalog", "namespace", "my_table"])
    query_request.query = "SELECT * FROM my_table WHERE id > 100"
    result = client.query_table(query_request)
    print(f"Query result: {result}")

Table Index Operations

from goosefs_metastore_client import GoosefsMetastoreClient
from grpc_files.table_master_pb2 import (
    CreateTableIndexPRequest,
    ListTableIndicesPRequest,
    DescribeTableIndexStatsPRequest,
)

with GoosefsMetastoreClient("localhost", 9220) as client:
    # Create index
    create_idx_request = CreateTableIndexPRequest()
    create_idx_request.id.extend(["catalog", "namespace", "my_table"])
    create_idx_request.columns.extend(["embedding"])
    create_idx_request.index_type = "IVF_PQ"
    create_idx_request.index_name = "idx_embedding"
    client.create_table_index(create_idx_request)
    
    # List indices (with Lance pagination)
    list_idx_request = ListTableIndicesPRequest()
    list_idx_request.id.extend(["catalog", "namespace", "my_table"])
    list_idx_request.page_token = ""  # Lance field
    list_idx_request.limit = 10       # Lance field
    indices = client.list_table_indices(list_idx_request)
    print(f"Indices: {indices}")
    
    # Get index stats
    stats_request = DescribeTableIndexStatsPRequest()
    stats_request.id.extend(["catalog", "namespace", "my_table"])
    stats_request.index_name = "idx_embedding"
    stats = client.describe_table_index_stats(stats_request)
    print(f"Index stats: {stats}")

Table Version and Tag Operations

from goosefs_metastore_client import GoosefsMetastoreClient
from grpc_files.table_master_pb2 import (
    ListTableVersionsPRequest,
    CreateTableVersionPRequest,
    DescribeTableVersionPRequest,
    BatchDeleteTableVersionsPRequest,
    VersionRange,
    RestoreTablePRequest,
    CreateTableTagPRequest,
    ListTableTagsPRequest,
)

with GoosefsMetastoreClient("localhost", 9220) as client:
    # List versions
    versions_request = ListTableVersionsPRequest()
    versions_request.id.extend(["catalog", "namespace", "my_table"])
    versions = client.list_table_versions(versions_request)
    print(f"Available versions: {versions}")
    
    # Create a table version
    create_ver_request = CreateTableVersionPRequest()
    create_ver_request.id.extend(["catalog", "namespace", "my_table"])
    create_ver_request.version = 10
    create_ver_request.manifest_path = "/path/to/manifest"
    create_ver_request.manifest_size = 1024
    create_ver_request.naming_scheme = "V2"
    result = client.create_table_version(create_ver_request)
    print(f"Created version: {result}")
    
    # Describe a table version
    desc_ver_request = DescribeTableVersionPRequest()
    desc_ver_request.id.extend(["catalog", "namespace", "my_table"])
    desc_ver_request.version = 10
    ver_info = client.describe_table_version(desc_ver_request)
    print(f"Version info: {ver_info}")
    
    # Batch delete table versions
    batch_del_request = BatchDeleteTableVersionsPRequest()
    batch_del_request.id.extend(["catalog", "namespace", "my_table"])
    version_range = VersionRange()
    version_range.start_version = 1
    version_range.end_version = 5
    batch_del_request.ranges.append(version_range)
    del_result = client.batch_delete_table_versions(batch_del_request)
    print(f"Deleted {del_result.get('deleted_count', 0)} versions")
    
    # Restore to specific version
    restore_request = RestoreTablePRequest()
    restore_request.id.extend(["catalog", "namespace", "my_table"])
    restore_request.version = 5
    client.restore_table(restore_request)
    
    # Create tag
    tag_request = CreateTableTagPRequest()
    tag_request.id.extend(["catalog", "namespace", "my_table"])
    tag_request.tag = "v1.0-release"
    tag_request.version = 10
    client.create_table_tag(tag_request)
    
    # List tags
    list_tags_request = ListTableTagsPRequest()
    list_tags_request.id.extend(["catalog", "namespace", "my_table"])
    tags = client.list_table_tags(list_tags_request)
    print(f"Tags: {tags}")

Transaction Operations

from goosefs_metastore_client import GoosefsMetastoreClient
from grpc_files.table_master_pb2 import (
    DescribeTransactionPRequest,
    AlterTransactionPRequest,
)

with GoosefsMetastoreClient("localhost", 9220) as client:
    # Describe transaction (supports both Lance id and GooseFS transaction_id)
    desc_txn_request = DescribeTransactionPRequest()
    desc_txn_request.transaction_id = "txn_12345"  # GooseFS field
    # Or use: desc_txn_request.id.extend(["txn_12345"])  # Lance field
    txn_info = client.describe_transaction(desc_txn_request)
    print(f"Transaction status: {txn_info}")
    
    # Alter transaction
    alter_txn_request = AlterTransactionPRequest()
    alter_txn_request.transaction_id = "txn_12345"
    alter_txn_request.action = "commit"
    client.alter_transaction(alter_txn_request)

Using Builders

The client provides builder classes to construct database and table objects:

from goosefs_metastore_client.builders import DatabaseBuilder, TableBuilder, FieldSchemaBuilder

database = DatabaseBuilder(
    db_name="my_database",
    description="My test database",
    location="/user/hive/warehouse/my_database.db",
    owner_name="admin",
    parameters={"key": "value"},
).build()

API Reference

GoosefsMetastoreClient

Main client class for interacting with GooseFS Table Master.

Database Operations

  • get_all_databases() - Get all databases in the catalog
  • get_database(db_name) - Get a specific database by name
  • attach_database(udb_type, udb_db_name, db_name, ...) - Attach an external database
  • detach_database(db_name) - Detach a database from the catalog
  • sync_database(db_name) - Sync a database with its underlying store

Basic Table Operations

  • get_all_tables(database) - Get all tables in a database
  • get_table(db_name, table_name) - Get a specific table
  • mount_table(db_name, tb_name) - Mount a table to GooseFS
  • unmount_table(db_name, tb_name) - Unmount a table from GooseFS

Statistics and Analytics

  • access_stat(days, top_nums) - Get table access statistics
  • get_table_column_statistics(db_name, table_name, col_names) - Get column statistics
  • get_partition_column_statistics(db_name, table_name, col_names, part_names) - Get partition statistics

Transform Operations

  • read_table(db_name, table_name, constraint) - Read table partitions with constraints
  • transform_table(db_name, table_name, definition) - Transform a table
  • get_transform_job_info(job_id) - Get transformation job information

Namespace Operations (Lance-Compatible)

  • list_namespaces(request) - List namespaces with pagination
  • create_namespace(request) - Create a new namespace
  • describe_namespace(request) - Get namespace properties
  • namespace_exists(request) - Check if namespace exists
  • drop_namespace(request) - Drop a namespace

Table CRUD Operations (Lance-Compatible)

  • list_tables(request) - List tables in a namespace
  • table_exists(request) - Check if table exists
  • describe_table(request) - Get table description and storage options
  • create_table(request) - Create a new table
  • create_empty_table(request) - Create an empty table
  • insert_into_table(request) - Insert data into table
  • merge_insert_into_table(request) - Merge insert data
  • update_table(request) - Update records (supports Lance predicate and GooseFS where_clause)
  • delete_from_table(request) - Delete records (supports Lance predicate and GooseFS where_clause)
  • query_table(request) - Query table data
  • count_table_rows(request) - Count rows (supports Lance predicate and version)
  • drop_table(request) - Drop a table

Table Index Operations (Lance-Compatible)

  • create_table_index(request) - Create a table index
  • create_table_scalar_index(request) - Create a scalar index
  • list_table_indices(request) - List indices (supports Lance pagination: page_token, limit, version)
  • describe_table_index_stats(request) - Get index statistics
  • drop_table_index(request) - Drop an index

Table Schema Operations (Lance-Compatible)

  • update_table_schema_metadata(request) - Update schema (supports Lance metadata and GooseFS schema)
  • alter_table_add_columns(request) - Add columns to table
  • alter_table_alter_columns(request) - Alter existing columns
  • alter_table_drop_columns(request) - Drop columns from table

Table Version Operations (Lance-Compatible)

  • list_all_tables(request) - List all tables with pagination
  • list_table_versions(request) - List table versions
  • create_table_version(request) - Create a table version with manifest info
  • describe_table_version(request) - Describe a specific table version
  • batch_delete_table_versions(request) - Batch delete table versions by ranges
  • restore_table(request) - Restore table to specific version
  • rename_table(request) - Rename a table

Table Tag Operations (Lance-Compatible)

  • list_table_tags(request) - List table tags
  • get_table_tag_version(request) - Get version for a tag
  • create_table_tag(request) - Create a table tag
  • update_table_tag(request) - Update a table tag
  • delete_table_tag(request) - Delete a table tag

Table Statistics Operations (Lance-Compatible)

  • get_table_stats(request) - Get table statistics
  • explain_table_query_plan(request) - Explain query plan
  • analyze_table_query_plan(request) - Analyze query plan

Registration Operations (Lance-Compatible)

  • declare_table(request) - Declare a table
  • deregister_table(request) - Deregister a table
  • register_table(request) - Register a table
  • register_namespace_impl(name, class_name) - Register namespace implementation
  • unregister_namespace_impl(name) - Unregister namespace implementation
  • is_registered(name) - Check if namespace is registered

Transaction Operations (Lance-Compatible)

  • describe_transaction(request) - Describe transaction (supports Lance id and GooseFS transaction_id)
  • alter_transaction(request) - Alter transaction state

Lance vs GooseFS Field Compatibility

The client supports dual-style API fields for compatibility with both Lance and GooseFS systems:

Operation Lance Field GooseFS Field
DeleteFromTable predicate where_clause
UpdateTable predicate where_clause
CountTableRows predicate, version -
ListTableIndices page_token, limit, version -
UpdateTableSchemaMetadata metadata schema
DescribeTransaction id (repeated) transaction_id

Configuration

Connection Parameters

  • host: GooseFS master hostname
  • port: Table Master client service port (default: 9220)
  • max_retries: Maximum retry attempts for failed requests (default: 3)
  • timeout: Timeout in seconds for gRPC calls (default: 30)
  • credentials: Optional gRPC credentials for secure connections

Example with Custom Configuration

client = GoosefsMetastoreClient(
    host="goosefs-master.example.com",
    port=9220,
    max_retries=5,
    timeout=60,
)
client.connect()
try:
    databases = client.get_all_databases()
finally:
    client.close()

Development

Setup Development Environment

pip install -r requirements.dev.txt

Running Tests

# Run all tests
pytest tests/

# Run with verbose output
pytest tests/ -v

# Run specific test file
pytest tests/unit/goosefs_metastore_client/test_goosefs_metastore_client.py

# Run tests with coverage
pytest tests/ --cov=goosefs_metastore_client --cov-report=html

Test Coverage

The test suite covers:

  • Client initialization and connection
  • All 52+ API methods
  • Lance-compatible operations (namespace, table CRUD, index, schema, version, tag, stats, transaction)
  • Error handling (not connected errors)
  • Retry logic

Code Formatting

black goosefs_metastore_client/

Linting

flake8 goosefs_metastore_client/
mypy goosefs_metastore_client/

Project Structure

goosefs-metastore-client/
├── goosefs_metastore_client/
│   ├── __init__.py
│   ├── goosefs_metastore_client.py    # Main client with 52+ methods
│   └── builders/
│       ├── __init__.py
│       ├── abstract_builder.py
│       ├── database_builder.py
│       ├── field_schema_builder.py
│       └── table_builder.py
├── grpc_files/
│   ├── __init__.py
│   ├── common_pb2.py
│   ├── job_master_pb2.py
│   ├── table_master_pb2.py
│   ├── table_master_pb2_grpc.py
│   └── proto/
│       ├── common.proto
│       ├── job_master.proto
│       └── table_master.proto
├── examples/
│   ├── __init__.py
│   ├── # Basic Examples
│   ├── list_databases.py
│   ├── get_database.py
│   ├── attach_database.py
│   ├── list_tables.py
│   ├── get_table.py
│   ├── mount_table.py
│   ├── access_statistics.py
│   ├── complete_example.py
│   ├── detailed_flow_example.py
│   ├── # Lance-Compatible Examples
│   ├── namespace_operations.py        # Namespace CRUD
│   ├── table_crud_operations.py       # Table create/read/update/delete
│   ├── table_index_operations.py      # Index management
│   ├── table_schema_operations.py     # Schema operations
│   ├── table_version_operations.py    # Version and restore
│   ├── table_tag_operations.py        # Tag management
│   ├── table_stats_operations.py      # Statistics and query plans
│   ├── table_registration_operations.py  # Registration operations
│   ├── table_transform_operations.py  # Transform operations
│   └── transaction_operations.py      # Transaction management
├── tests/
│   └── unit/
│       └── goosefs_metastore_client/
│           ├── conftest.py
│           ├── test_goosefs_metastore_client.py  # 60+ test cases
│           └── builders/
│               ├── test_database_builder.py
│               ├── test_field_schema_builder.py
│               └── test_table_builder.py
├── setup.py
├── requirements.txt
└── README.md

Comparison with Hive Metastore Client

Feature Hive Metastore Client GooseFS Metastore Client
Protocol Thrift gRPC
Server Hive Metastore GooseFS Table Master
Connection TSocket + TBinaryProtocol gRPC Channel + Stub
Retry Logic Manual Built-in with configurable retries
Database Operations Thrift API gRPC API
Table Operations Thrift API gRPC API
Lance Catalog Support No Yes (52+ methods)
Table Versioning No Yes
Table Tagging No Yes
Vector Index Support No Yes
Transaction Support No Yes

License

Apache License 2.0

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

goosefs_metastore_client-0.1.3.tar.gz (71.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

goosefs_metastore_client-0.1.3-py3-none-any.whl (49.0 kB view details)

Uploaded Python 3

File details

Details for the file goosefs_metastore_client-0.1.3.tar.gz.

File metadata

  • Download URL: goosefs_metastore_client-0.1.3.tar.gz
  • Upload date:
  • Size: 71.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.10

File hashes

Hashes for goosefs_metastore_client-0.1.3.tar.gz
Algorithm Hash digest
SHA256 ffd8ac9236a9d85152cfdad8cce62871957a55d11cb4bd10e7488b5847ada7c4
MD5 b5d2cf8ed8b8affa339287f3cf79dec9
BLAKE2b-256 cd2b3511ba83abab5eaed5adb882c2e3cb57b834f34b8a84d4dda71e9e8a94cc

See more details on using hashes here.

File details

Details for the file goosefs_metastore_client-0.1.3-py3-none-any.whl.

File metadata

File hashes

Hashes for goosefs_metastore_client-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 4e92740060bfddb7bce6f736c3cbf94a9b95bfd092e45bd7a9e12d7a2a629363
MD5 ce1ceb5b774a594dc690c4bb0b3257bc
BLAKE2b-256 33f0e6cff02db2c592f8c9806cefd853a412c9f91ff3be591fd1d8bf9213643d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page