Skip to main content

A series of Widgets for Orange3 to work with Spark ML

Project description

A set of widgets for Orange data mining suite to work with Apache Spark ML API.

Requirements

  • Python >= 3.4
  • Pandas
  • Orange 3

Please follow the instruction to install Orange 3 first.

The main Orange project is hosted at: https://github.com/biolab/orange3 Download from: http://orange.biolab.si

Features

  • A Spark Context.
  • A Hive Table.
  • A Dataframe from an SQL Query.
  • A Dataset Builder, basically a call to VectorAssembler, this is usefull before sending data to Estimators.
  • Transformers from the feature module.
  • Estimators from classification module.
  • Estimators from regression module.
  • Estimators from clustering module.
  • Evaluation from evaluator module.
  • A PySpark script executor + PySpark console.
  • DataFrame transformes for Pandas and Orangle Tables

… more coming soon!

Installing

First, you need to have Apache Spark installed. Follow the instructions here: http://spark.apache.org/docs/latest/

Then you can do:

pip install Orange3-spark

or install the add-on from the Orange’s Options | Add-ons menu. Note, if installing from Add-ons menu, the installation may fail if not all requirements are satisfiable.

If you require ODBC connectivity, you need to install pyodbc (which requires sql.h available if built with pip – that’s unixodbc-dev package on Linux).

If install is ok, you should see a new section in Orange containing a series of widgets from Spark ML API.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for Orange3-spark, version 0.2.7
Filename, size File type Python version Upload date Hashes
Filename, size Orange3_spark-0.2.7-py3-none-any.whl (195.3 kB) File type Wheel Python version 3.5 Upload date Hashes View
Filename, size Orange3-spark-0.2.7.tar.gz (88.5 kB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page