High-Demensional LASSO_spark
Project description
Hi-LASSO_spark
Hi-LASSO_Spark(High-Demensinal LASSO Spark) is to improve the LASSO solutions for extremely high-dimensional data using pyspark. PySpark is the Python API written in python to support Apache Spark. Apache Spark is a distributed framework that can handle Big Data analysis. Spark is basically a computational engine, that works with huge sets of data by processing them in parallel and batch systems.
Installation
Hi-LASSO_Spark support Python 3.6+, Additionally, you will need numpy
, scipy
, and glmnet
.
Hi-LASSO_spark
is available through PyPI and can easily be installed with a
pip install::
pip install hi_lasso_spark
Documentation
Read the documentation on readthedocs
Quick Start
# Data load
import pandas as pd
X = pd.read_csv('simulation_data_x.csv')
y = pd.read_csv('simulation_data_y.csv')
# General Usage
from hi_lasso_spark.Hi_LASSO_spark import HiLASSO_Spark
# Create a HiLasso model
model = HiLASSO_Spark(X, y, alpha=0.05, q1='auto', q2='auto', L=30, cv=5, node='auto', logistic=False)
# Fit the model
model.fit()
# Show the coefficients
model.coef_
# Show the p-values
model.p_values_
# Show the selected variable
model.selected_var_
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for hi_lasso_spark-1.0.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6f2f1072ddf8d1abf323d726563ca774e326750d81dfa3b8d7800c6c578005ca |
|
MD5 | 6ca41650abf5f3ad722940b9959a35b6 |
|
BLAKE2b-256 | 24636c7ea8c028d00e99c2842af6b6027a769b13ec5c5f7525b0ff6c93e488ec |