Skip to main content

API client for Datarefiner

Project description

Readme

DataRefiner Client Library is a Python API toolkit designed to seamlessly connect your Python code with the DataRefiner platform, enabling convenient access and interaction.

Website: https://datarefiner.com

What functions this library support?

  • Login using API key
  • Upload dataset to the platform
  • Configure project settings before training
  • Start training (you can track rendering progress)
  • Embed TDA map from DataRefiner right in your Jupyter Notebook for analysis
  • Export result data from the TDA: cluster labels for source data, parameter scores for segmentation, list of the most importan features for all clusters, download TDA coordinates
  • Perform Supervised labelling (predict cluster labels and groups from trained toplogical project)

Usage example:

import pandas as pd

from datarefiner_client import DataRefinerClient
from datarefiner_client.services.project_settings import ProjectSettingsFactory, ProjectType
from datarefiner_client.exceptions import DatarefinerExploreDownloadsError

from dataclasses import asdict
from pprint import pprint as pp

API_TOKEN = "<api_token>" # get API token from your user profile page
API_BASE_URL = "https://app.datarefiner.com"

# Login using API key
datarefiner_api = DataRefinerClient(
    token=API_TOKEN,
    base_url=API_BASE_URL,
)
datarefiner_api.me()

# Loading new data from CSV file
df = pd.read_csv("./data.csv")

# Upload dataset to the platform
upload, project_settings = datarefiner_api.upload(df=df, title="Data", load_filedetails=True)

# Check the project settings generated automatically
pp(asdict(project_settings))

# Change the field mapping settings: overlay/learn/disabled.
project_settings.fields_config['1'].config = "overlay"
project_settings.fields_config['2'].config = "learn"
project_settings.fields_config['3'].config = "disabled"

# You can change the rest of the project settings, here some examples:
project_settings.json_params.allow_noise_points = False
project_settings.json_params.beta = [45, 100, 200]
project_settings.json_params.clusterisation_type = 'kMeans'
project_settings.json_params.metric = ['euclidean', 'cosine']

# Perform rendering of the project
project_settings.name = "Create test project from API client"
project = datarefiner_api.create_project(project_settings=project_settings)

# Embed TDA map right in your Jupyter notebook
datarefiner_api.explore(project_id=project.id)

# Get assigned clusters for your source data
cluster_labels_df = datarefiner_api.get_cluster_labels(project_id=project.id)

# Get user-defined labels for your source data (and catch the excpection if there are no groups defined for the project)
try:
    group_labels_df = datarefiner_api.get_group_labels(project_id=project.id)
    print(group_labels_df.groupby('GroupID').count())
except DatarefinerExploreDownloadsError as e: 
    print(e)

# Get top parameters impacting the segmentation
parameter_scores_df = datarefiner_api.get_parameter_scores_for_segmentation(project_id=project.id)

# Get he list of the most important features for all clusters in one request
most_important_features_df = datarefiner_api.get_most_important_features_for_all_clusters(project_id=project.id)

# Get 2D and 3D TDA coordinates for your source data points (can be used in downstream tasks) 
tda_coordinates_df = datarefiner_api.get_tda_coordinates(project_id=project.id)

# Performig prediction for new data (we use the same data as for training, but in reality you'll use new data in the same format)
clusters_df, groups_df = datarefiner_api.supervised_labeling(project_id=project.id, df=df)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datarefiner_client-0.0.4.tar.gz (12.6 kB view hashes)

Uploaded Source

Built Distribution

datarefiner_client-0.0.4-py3-none-any.whl (17.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page