Skip to main content

A Python API wrapping services of the Superb Data Kraken (SDK)

Project description

PyPI - License PyPI - Python Version PyPI PyPI - Downloads

superb-data-klient

superb-data-klient offers a streamlined interface to access various services of the Superb Data Kraken platform (SDK). With the library, you can effortlessly fetch and index data, manage indices, spaces and organizations on the SDK.

Designed primarily for a Jupyter Hub environment within the platform, it's versatile enough to be set up in other environments too.

Installation and Supported Versions

$ python -m pip install superb-data-klient

Usage

Authentication

To begin, authenticate against the SDK's OIDC provider. This is achieved when instantiating the client object:

  1. System Environment Variables (recommended for Jupyter environments):

    import superbdataklient as sdk
    client = sdk.SDKClient()
    

    This approach leverages environment variables SDK_ACCESS_TOKEN and SDK_REFRESH_TOKEN.

  2. Login Credentials:

    import superbdataklient as sdk
    sdk.SDKClient(username='hasslethehoff', password='lookingforfreedom')
    
  3. Authentication Code Flow:

    If none of the above mentioned authentication methods fit, authentication is fulfilled via code-flow.

    CAUTION Beware that this method only works in a browser-environment.

NOTE: If your user account was linked from an external identity provider, your account in the SDK identity provider (Keycloak) does not have a password by default. To enable login via basic authentication, you need to set a password through self-service first.

Follow these steps to set your password:

  1. Go to the self-service portal for your environment:
  2. Set a password for your account.
  3. Once the password is set, you can log in using basic authentication (option 2).

Configuration

While the default settings cater to the standard SDK instance, configurations for various other instances are also available.

Setting Environment

import superbdataklient as sdk
client = sdk.SDKClient(env='sdk-dev')
client = sdk.SDKClient(env='sdk')

Overwriting Settings

client = sdk.SDKClient(domain='mydomain.ai', realm='my-realm', client_id='my-client-id', api_version='v13.37')

Proxy

To use the SDK Client behind a company proxy a user might add the following config parameters to the constructor.
NOTE: The environment Variables "http_proxy" and "https_proxy" will overwrite the settings in the SDKClient. So remove them before configuring the SDKClient.

client = SDKClient(username='hasslethehoff', 
                   password='lookingforfreedom', 
                   proxy_http="http://proxy.example.com:8080", 
                   proxy_https="https://proxy.example.com:8080", 
                   proxy_user="proxyusername", 
                   proxy_pass="proxyuserpassword")

Logging

Our flexible logging-functionality allows you to pass a user-defined logger. This makes it easier to integrate the log output of the class into an existing logging framework. The logger can be passed as an argument during the initialization of the SDKClient instance. If this is the case, log messages are automatically forwarded to this logger in the various methods - otherwise logging will be printed to stdout / stderr.

import logging
from superbdataklient import SDKClient

# Logger konfigurieren
my_logger = logging.getLogger('sdk_logger')
my_logger.setLevel(logging.DEBUG)
console_handler = logging.StreamHandler()
console_handler.setLevel(logging.DEBUG)
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
console_handler.setFormatter(formatter)
my_logger.addHandler(console_handler)

# Logger an SDKClient übergeben
client = SDKClient(logger=my_logger)

Examples

Organizations

Get details of all organizations, or retrieve by ID or name:

client.organization_get_all()
client.organization_get_by_id(1337)
client.organization_get_by_name('my-organization')

Spaces

To retrieve spaces related to an organization:

organization_id = 1234
client.space_get_all(organization_id)
client.space_get_by_id(organization_id, space_id)
client.space_get_by_name(organization_id, space)

Index

Retrieve a specific document:

document = client.index_get_document(index_name, doc_id)

Fetch all documents within an index:

documents = client.index_get_all_documents("index_name")

Iterate through documents using a generator:

documents = client.index_get_documents("index-name")
for document in documents:
   print(document)

Index multiple documents:

documents = [
   {"_id": 123, "name": "document01", "value": "value"},
   {"_id": 1337, "name": "document02", "value": "value"}
]
index_name = "index"
client.index_documents(documents, index_name)

Note: The optional _id field is used as the document ID for indexing in OpenSearch.

Filter indices by organization, space, and type:

client.index_filter_by_space("my-organization", "my-space", "index-type")

For all spaces in an organization, use * instead of a space name. Available index_type values are ANALYSIS or MEASUREMENTS.

Create an application index:

mapping = {
   ...
}
client.application_index_create("my-application-index", "my-organization", "my-space", mapping)

Remove an application index by its name:

client.application_index_delete("my-organization_my-space_analysis_my-application-index")

Storage

List files in Storage:

files = client.storage_list_blobs("my-organization", "space")

Download specific files from Storage:

files = ['file01.txt', 'directory/file02.json']
client.storage_download_files(organization='my-organization', space='my-space', files=files, local_dir='tmp')

Use regex patterns for file downloads:

files = ['file01.txt', 'directory/file02.json']
client.storage_download_files_with_regex(organization='my-organization', space='my-space', files=files, local_dir='tmp', regex=r'.*json$')

Upload files from a local directory. Ensure the presence of a valid meta.json if the metadataGenerate property on the space is not set to true:

files = ['meta.json', 'file01.txt', 'file02.txt']
client.storage_upload_files(organization='my-organization', space='my-space', files=files, local_dir='tmp')

If you want to monitor the status of the upload, you can pass a progress_callback function with the following function-signature:

def progress_callback(uploaded: int, total: int) -> None:

where:

  • uploaded: The number of bytes that have been uploaded so far.
  • total: The total size of the file in bytes.
def progress_callback(uploaded, total):
    # do something to update the progress-bar

files = ['meta.json', 'file01.txt', 'file02.txt']
client.storage_upload_files(organization='my-organization', space='my-space', files=files, local_dir='tmp', progress_callback=progress_callback)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

superb_data_klient-1.5.0.tar.gz (30.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

superb_data_klient-1.5.0-py3-none-any.whl (25.0 kB view details)

Uploaded Python 3

File details

Details for the file superb_data_klient-1.5.0.tar.gz.

File metadata

  • Download URL: superb_data_klient-1.5.0.tar.gz
  • Upload date:
  • Size: 30.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.11

File hashes

Hashes for superb_data_klient-1.5.0.tar.gz
Algorithm Hash digest
SHA256 b8ae343a7fdde94e4577c48fc95e8e88170adc18352868bed80d01dad59ec8a1
MD5 408a7b3ac38af6b64fd5c511ea514f73
BLAKE2b-256 c5282dda5441cb5ef626d62e443a5d38dc0b91f6043541722678f8b074b1ecb6

See more details on using hashes here.

File details

Details for the file superb_data_klient-1.5.0-py3-none-any.whl.

File metadata

File hashes

Hashes for superb_data_klient-1.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1b650aa264a45253fc8e7558a9bd34c46fbc8a306dae94f8b81c6d99d00949a4
MD5 cf6aad788e659242f549a0418a3e39b0
BLAKE2b-256 0e11fed33f0b84fe443e8e7b4bf066511e5fbbd563fce8f7da04e6ed12c043bf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page