Skip to main content

Python lib/client for Gravitino

Project description

Introduction

Gravitino is a high-performance, geo-distributed, and federated metadata lake. It manages the metadata directly in different sources, types, and regions, also provides users the unified metadata access for data and AI assets.

Gravitino Python client helps data scientists easily manage metadata using Python language.

gravitino-python-client-introduction

Use Guidance

You can use Gravitino Python client library with Spark, PyTorch, Tensorflow, Ray and Python environment.

First of all, You must have a Gravitino server set up and run, You can refer document of How to install Gravitino to build Gravitino server from source code and install it in your local.

Gravitino Python client API

pip install gravitino
  1. Manage metalake using Gravitino Python API
  2. Manage fileset metadata using Gravitino Python API

Gravitino Fileset Example

We offer a playground environment to help you quickly understand how to use Gravitino Python client to manage non-tabular data on HDFS via Fileset in Gravitino. You can refer to the document How to use the playground#Launch AI components of playground to launch a Gravitino server, HDFS and Jupyter notebook environment in you local Docker environment.

Waiting for the playground Docker environment to start, you can directly open http://localhost:8888/lab/tree/gravitino-fileset-example.ipynb in the browser and run the example.

The gravitino-fileset-example contains the following code snippets:

  1. Install HDFS Python client.
  2. Create a HDFS client to connect HDFS and to do some test operations.
  3. Install Gravitino Python client.
  4. Initialize Gravitino admin client and create a Gravitino metalake.
  5. Initialize Gravitino client and list metalakes.
  6. Create a Gravitino Catalog and special type is Catalog.Type.FILESET and provider is hadoop
  7. Create a Gravitino Schema with the location pointed to a HDFS path, and use hdfs client to check if the schema location is successfully created in HDFS.
  8. Create a Fileset with type is Fileset.Type.MANAGED, use hdfs client to check if the fileset location was successfully created in HDFS.
  9. Drop this Fileset.Type.MANAGED type fileset and check if the fileset location was successfully deleted in HDFS.
  10. Create a Fileset with type is Fileset.Type.EXTERNAL and location pointed to exist HDFS path
  11. Drop this Fileset.Type.EXTERNAL type fileset and check if the fileset location was not deleted in HDFS.

How to development Gravitino Python Client

You can ues any IDE to develop Gravitino Python Client. Directly open the client-python module project in the IDE.

Prerequisites

Build and testing

  1. Clone the Gravitino project.

    git clone git@github.com:datastrato/gravitino.git
    
  2. Build the Gravitino Python client module

    ./gradlew :clients:client-python:build
    
  3. Run unit tests

    ./gradlew :clients:client-python:test -PskipITs
    
  4. Run integration tests

    Because Python client connects to Gravitino Server to run integration tests, So it runs ./gradlew compileDistribution -x test command automatically to compile the Gravitino project in the distribution directory. When you run integration tests via Gradle command or IDE, Gravitino integration test framework (integration_test_env.py) will start and stop Gravitino server automatically.

    ./gradlew :clients:client-python:test
    
  5. Distribute the Gravitino Python client module

    ./gradlew :clients:client-python:distribution
    
  6. Deploy the Gravitino Python client to https://pypi.org/project/gravitino/

    ./gradlew :clients:client-python:deploy
    

Resources

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gravitino-0.5.0.dev24.tar.gz (38.6 kB view details)

Uploaded Source

File details

Details for the file gravitino-0.5.0.dev24.tar.gz.

File metadata

  • Download URL: gravitino-0.5.0.dev24.tar.gz
  • Upload date:
  • Size: 38.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.8.19

File hashes

Hashes for gravitino-0.5.0.dev24.tar.gz
Algorithm Hash digest
SHA256 f4343a5fac86a7c7b51971fba05acaed56275ba2401c68a605a26a264d4620e1
MD5 e5f8fdcd25852ff978e581ef6f9fc295
BLAKE2b-256 a586a491dd4a50be4252214029005b7c4d11b16c3d8a6274c7bc8533c8a47c49

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page