High-Demensional LASSO_spark
Project description
Hi-LASSO_spark
Hi-LASSO_Spark(High-Demensinal LASSO Spark) is to improve the LASSO solutions for extremely high-dimensional data using pyspark. PySpark is the Python API written in python to support Apache Spark. Apache Spark is a distributed framework that can handle Big Data analysis. Spark is basically a computational engine, that works with huge sets of data by processing them in parallel and batch systems.
Installation
Hi-LASSO_Spark support Python 3.6+, Additionally, you will need numpy
, scipy
, and glmnet
.
Hi-LASSO_spark
is available through PyPI and can easily be installed with a
pip install::
pip install hi_lasso_pyspark
Documentation
Read the documentation on readthedocs
Quick Start
# Data load
import pandas as pd
X = pd.read_csv('simulation_data_x.csv')
y = pd.read_csv('simulation_data_y.csv')
# General Usage
from hi_lasso_pyspark.Hi_LASSO_spark import HiLASSO_Spark
# Create a HiLasso model
model = HiLASSO_Spark(X, y, q1 = 'auto', q2 = 'auto', B = 'auto', d = 0.05, alpha = 0.95)
# Fit the model
model.fit(significance_level = 0.05)
# Show the coefficients
model.coef_
# Show the p-values
model.p_values_
# Show the selected variable
model.selected_var_
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for Hi_LASSO_spark-0.1.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 84e4bb1a39775282b3675965ef9424fdd3596d5c5242e7d47cbdc6fcb525864e |
|
MD5 | a403950babff24cc190298b70088b207 |
|
BLAKE2b-256 | fe3505c618d0f0a388a234341800de20b7176c64ab2c876bc73c474e3abbb7c0 |