Skip to main content

High-Demensional LASSO_spark

Project description

Hi-LASSO_spark

Hi-LASSO_Spark(High-Demensinal LASSO Spark) is to improve the LASSO solutions for extremely high-dimensional data using pyspark. PySpark is the Python API written in python to support Apache Spark. Apache Spark is a distributed framework that can handle Big Data analysis. Spark is basically a computational engine, that works with huge sets of data by processing them in parallel and batch systems.

Installation

Hi-LASSO_Spark support Python 3.6+, Additionally, you will need numpy, scipy, and glmnet.

Hi-LASSO_spark is available through PyPI and can easily be installed with a pip install::

pip install hi_lasso_spark

Documentation

Read the documentation on readthedocs

Quick Start

# Data load
import pandas as pd
X = pd.read_csv('simulation_data_x.csv')
y = pd.read_csv('simulation_data_y.csv')

# General Usage
from hi_lasso_spark.Hi_LASSO_spark import HiLASSO_Spark

# Create a HiLasso model
model = HiLASSO_Spark(X, y, alpha=0.05, q1='auto', q2='auto', L=30, cv=5, node='auto', logistic=False)

# Fit the model
model.fit()

# Show the coefficients
model.coef_

# Show the p-values
model.p_values_

# Show the selected variable
model.selected_var_

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hi_lasso_spark-1.0.0.tar.gz (6.0 kB view hashes)

Uploaded Source

Built Distribution

hi_lasso_spark-1.0.0-py3-none-any.whl (6.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page