Learning Orchestra client for Python

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

learningOrchestra client package

Installation

Ensure which you have the python 3 installed in your machine and run:

pip install learning_orchestra_cliet

Documentation

After downloading the package, import all classes:

from learning_orchestra_client import *

create a Context object passing a ip from your cluster in constructor parameter:

cluster_ip = "34.95.222.197"
Context(cluster_ip)

After create a Context object, you will able to usage learningOrchestra, each learningOrchestra functionality is contained in your own class, therefore, to use a specific functionality, after you instantiate and configure Context class, you need instantiate and call the method class of interest, in below, there are all class and each class methods, also have an example of workflow using this package in a python code.

DatabaseApi

read_resume_files

read_resume_files(pretty_response=True)

Read all metadata files in learningOrchestra

pretty_response: return indented string to visualization (default True, if False, return dict)

read_file

read_file(filename_key, skip=0, limit=10, query={}, pretty_response=True)

filename_ley : filename of file
skip: number of rows amount to skip in pagination (default 0)
limit: number of rows to return in pagination (default 10) (max setted in 20 rows per request)
query: query to make in mongo (default empty query)
pretty_response: return indented string to visualization (default True, if False, return dict)

create_file

create_file(filename, url, pretty_response=True)

filename: filename of file to be created
url: url to csv file
pretty_response: return indented string to visualization (default True, if False, return dict)

delete_file

delete_file(filename, pretty_response=True)

filename: file filename to be deleted
pretty_response: return indented string to visualization (default True, if False, return dict)

Projection

create_projection

create_projection(filename, projection_filename, fields, pretty_response=True)

filename: filename of file to make projection
projection_filename: filename used to create projection
fields: list with fields to make projection
pretty_response: return indented string to visualization (default True, if False, return dict)

DataTypeHandler

change_file_type

change_file_type(filename, fields_dict, pretty_response=True)

filename: filename of file
fields_dict: dictionary with "field": "number" or field: "string" keys
pretty_response: return indented string to visualization (default True, if False, return dict)

Histogram

create_histogram

create_histogram(filename, histogram_filename, fields, 
                 pretty_response=True)

filename: filename of file to make histogram
histogram_filename: filename used to create histogram
fields: list with fields to make histogram
pretty_response: return indented string to visualization (default True, if False, return dict)

Tsne

create_image_plot

create_image_plot(tsne_filename, parent_filename,
                  label_name=None, pretty_response=True)

parent_filename: filename of file to make histogram
tsne_filename: filename used to create image plot
label_name: label name to dataset with labeled tuples (default None, to datasets without labeled tuples)
pretty_response: return indented string to visualization (default True, if False, return dict)

read_image_plot_filenames

read_image_plot_filenames(pretty_response=True)

pretty_response: return indented string to visualization (default True, if False, return dict)

read_image_plot

read_image_plot(tsne_filename, pretty_response=True)

tsne_filename: filename of a created image plot
pretty_response: return indented string to visualization (default True, if False, return dict)

delete_image_plot

delete_image_plot(tsne_filename, pretty_response=True)

tsne_filename: filename of a created image plot
pretty_response: return indented string to visualization (default True, if False, return dict)

Pca

create_image_plot

create_image_plot(tsne_filename, parent_filename,
                  label_name=None, pretty_response=True)

parent_filename: filename of file to make histogram
pca_filename: filename used to create image plot
label_name: label name to dataset with labeled tuples (default None, to datasets without labeled tuples)
pretty_response: return indented string to visualization (default True, if False, return dict)

read_image_plot_filenames

read_image_plot_filenames(pretty_response=True)

pretty_response: return indented string to visualization (default True, if False, return dict)

read_image_plot

read_image_plot(pca_filename, pretty_response=True)

pca_filename: filename of a created image plot
pretty_response: return indented string to visualization (default True, if False, return dict)

delete_image_plot

delete_image_plot(pca_filename, pretty_response=True)

pca_filename: filename of a created image plot
pretty_response: return indented string to visualization (default True, if False, return dict)

ModelBuilder

create_model

create_model(training_filename, test_filename, preprocessor_code, 
             model_classificator, pretty_response=True)

training_filename: filename to be used in training
test_filename: filename to be used in test
preprocessor_code: python3 code for pyspark preprocessing model
model_classificator: list of initial from classificators to be used in model
pretty_response: return indented string to visualization (default True, if False, return dict)

model_classificator

"lr": LogisticRegression
"dt": DecisionTreeClassifier
"rf": RandomForestClassifier
"gb": Gradient-boosted tree classifier
"nb": NaiveBayes

to send a request with LogisticRegression and NaiveBayes classificators:

create_model(training_filename, test_filename, preprocessor_code, ["lr", "nb"])

preprocessor_code environment

The python 3 preprocessing code must use the environment instances in bellow:

training_df (Instantiated): Spark Dataframe instance for training filename
testing_df (Instantiated): Spark Dataframe instance for testing filename

The preprocessing code must instantiate the variables in bellow, all instances must be transformed by pyspark VectorAssembler:

features_training (Not Instantiated): Spark Dataframe instance for train the model
features_evaluation (Not Instantiated): Spark Dataframe instance for evaluate trained model accuracy
features_testing (Not Instantiated): Spark Dataframe instance for test the model

Case you don't want evaluate the model prediction, define features_evaluation as None.

Handy methods

self.fields_from_dataframe(dataframe, is_string)

dataframe: dataframe instance
is_string: Boolean parameter, if True, the method return the string dataframe fields, otherwise, return the numbers dataframe fields.

learning_orchestra_client usage example

In below there is a python script using the package with titanic challengue datasets:

from learning_orchestra_client import *

cluster_ip = "34.95.187.26"

Context(cluster_ip)

database_api = DatabaseApi()

print(database_api.create_file(
    "titanic_training",
    "https://filebin.net/rpfdy8clm5984a4c/titanic_training.csv?t=gcnjz1yo"))
print(database_api.create_file(
    "titanic_testing",
    "https://filebin.net/mguee52ke97k0x9h/titanic_testing.csv?t=ub4nc1rc"))

print(database_api.read_resume_files())


projection = Projection()
non_required_columns = ["Name", "Ticket", "Cabin",
                        "Embarked", "Sex", "Initial"]
print(projection.create("titanic_training",
                        "titanic_training_projection",
                        non_required_columns))
print(projection.create("titanic_testing",
                        "titanic_testing_projection",
                        non_required_columns))


data_type_handler = DataTypeHandler()
type_fields = {
    "Age": "number",
    "Fare": "number",
    "Parch": "number",
    "PassengerId": "number",
    "Pclass": "number",
    "SibSp": "number"
}

print(data_type_handler.change_file_type(
    "titanic_testing_projection",
    type_fields))

type_fields["Survived"] = "number"

print(data_type_handler.change_file_type(
    "titanic_training_projection",
    type_fields))


preprocessing_code = '''
from pyspark.ml import Pipeline
from pyspark.sql.functions import (
    mean, col, split,
    regexp_extract, when, lit)

from pyspark.ml.feature import (
    VectorAssembler,
    StringIndexer
)

TRAINING_DF_INDEX = 0
TESTING_DF_INDEX = 1

training_df = training_df.withColumnRenamed('Survived', 'label')
testing_df = testing_df.withColumn('label', lit(0))
datasets_list = [training_df, testing_df]

for index, dataset in enumerate(datasets_list):
    dataset = dataset.withColumn(
        "Initial",
        regexp_extract(col("Name"), "([A-Za-z]+)\.", 1))
    datasets_list[index] = dataset


misspelled_initials = ['Mlle', 'Mme', 'Ms', 'Dr', 'Major', 'Lady', 'Countess',
                       'Jonkheer', 'Col', 'Rev', 'Capt', 'Sir', 'Don']
correct_initials = ['Miss', 'Miss', 'Miss', 'Mr', 'Mr', 'Mrs', 'Mrs',
                    'Other', 'Other', 'Other', 'Mr', 'Mr', 'Mr']
for index, dataset in enumerate(datasets_list):
    dataset = dataset.replace(misspelled_initials, correct_initials)
    datasets_list[index] = dataset


initials_age = {"Miss": 22,
                "Other": 46,
                "Master": 5,
                "Mr": 33,
                "Mrs": 36}
for index, dataset in enumerate(datasets_list):
    for initial, initial_age in initials_age.items():
        dataset = dataset.withColumn(
            "Age",
            when((dataset["Initial"] == initial) &
                 (dataset["Age"].isNull()), initial_age).otherwise(
                    dataset["Age"]))
        datasets_list[index] = dataset


for index, dataset in enumerate(datasets_list):
    dataset = dataset.na.fill({"Embarked": 'S'})
    datasets_list[index] = dataset


for index, dataset in enumerate(datasets_list):
    dataset = dataset.withColumn("Family_Size", col('SibSp')+col('Parch'))
    dataset = dataset.withColumn('Alone', lit(0))
    dataset = dataset.withColumn(
        "Alone",
        when(dataset["Family_Size"] == 0, 1).otherwise(dataset["Alone"]))
    datasets_list[index] = dataset


text_fields = ["Sex", "Embarked", "Initial"]
for column in text_fields:
    for index, dataset in enumerate(datasets_list):
        dataset = StringIndexer(
            inputCol=column, outputCol=column+"_index").\
                fit(dataset).\
                transform(dataset)
        datasets_list[index] = dataset


training_df = datasets_list[TRAINING_DF_INDEX]
testing_df = datasets_list[TESTING_DF_INDEX]

assembler = VectorAssembler(
    inputCols=training_df.columns[1:],
    outputCol="features")
assembler.setHandleInvalid('skip')

features_training = assembler.transform(training_df)
(features_training, features_evaluation) =\
    features_training.randomSplit([0.1, 0.9], seed=11)
features_testing = assembler.transform(testing_df)
'''

model_builder = Model()

print(model_builder.create_model(
    "titanic_training_projection",
    "titanic_testing_projection",
    preprocessing_code,
    ["lr", "dt", "gb", "rf", "nb"]))

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

2.2.3

May 14, 2022

2.2.2

May 15, 2021

2.2.1

May 15, 2021

2.1.0

Apr 11, 2021

2.0.0

Dec 21, 2020

1.0.3

Oct 6, 2020

1.0.2

Oct 3, 2020

1.0.1

Oct 2, 2020

1.0.0

Sep 29, 2020

This version

0.6.0

Aug 23, 2020

0.5.0

Aug 23, 2020

0.4.1

Aug 22, 2020

0.4.0

Aug 22, 2020

0.3.1

Aug 18, 2020

0.3

Aug 18, 2020

0.2.2

Aug 13, 2020

0.2.1

Aug 9, 2020

0.2.0

Aug 7, 2020

0.1.1

Jul 28, 2020

0.1.0

Jul 28, 2020

0.0.1

Jul 23, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

learning_orchestra_client-0.6.0.tar.gz (8.5 kB view details)

Uploaded Aug 23, 2020 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

learning_orchestra_client-0.6.0-py3-none-any.whl (18.3 kB view details)

Uploaded Aug 23, 2020 Python 3

File details

Details for the file learning_orchestra_client-0.6.0.tar.gz.

File metadata

Download URL: learning_orchestra_client-0.6.0.tar.gz
Upload date: Aug 23, 2020
Size: 8.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.20.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.8.3

File hashes

Hashes for learning_orchestra_client-0.6.0.tar.gz
Algorithm	Hash digest
SHA256	`6686b5b7206f6db9f01fe5158b5a75c94201bc8855c851840049399c926a229f`
MD5	`96b22c88d7bc4a741c5b23a13408dfde`
BLAKE2b-256	`8a550d401bec6021ce5f3e8b142d9b8e2aaa506462edffa25c91f11fdbf5abef`

See more details on using hashes here.

File details

Details for the file learning_orchestra_client-0.6.0-py3-none-any.whl.

File metadata

Download URL: learning_orchestra_client-0.6.0-py3-none-any.whl
Upload date: Aug 23, 2020
Size: 18.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.20.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.8.3

File hashes

Hashes for learning_orchestra_client-0.6.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3f1ba7bb774be656c464be9858816f167a1b43562e50311568fb166ef57f247d`
MD5	`d257e0db9827039cfbe3bbe9343e4638`
BLAKE2b-256	`4e11f622ac4251116334119a8fa49db740c931d367781ff065d44608059f9625`

See more details on using hashes here.

learning-orchestra-client 0.6.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

learningOrchestra client package

Installation

Documentation

DatabaseApi

read_resume_files

read_file

create_file

delete_file

Projection

create_projection

DataTypeHandler

change_file_type

Histogram

create_histogram

Tsne

create_image_plot

read_image_plot_filenames

read_image_plot

delete_image_plot

Pca

create_image_plot

read_image_plot_filenames

read_image_plot

delete_image_plot

ModelBuilder

create_model

model_classificator

preprocessor_code environment

Handy methods

learning_orchestra_client usage example

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes