pyXgboost,github:https://github.com/303844828/PyXGBoost.git
Project description
Introduction
xgboost for pyspark
python3.5 spark2.4.x xgboost0.9
python install
pip install PyXGBoost
code examples:
from pyspark.sql import SparkSession
from PyXGBoost import PyXGBoostClassifier, PyXGBoostClassificationModel
spark = SparkSession \
.builder \
.appName("pyspark xgboost") \
.getOrCreate()
df=spark.read.csv("src/main/resources/iris.csv",schema="sepal_length double, sepal_width double, petal_length double,petal_width double,label int")
df=df.fillna(0)
#same as xgboost param map
params0 = {
"objective" :"binary:logistic"
, "eta" : 0.01
, "max_depth" : 6
, "min_child_weight" : 50
, "colsample_bytree" : 0.5
, "silent" : 0
, "seed" : 12345
}
xgb=PyXGBoostClassifier(params0)
xgb.set_num_round(11) \
.set_num_workers(11)
feature_names=["sepal_length","sepal_width","petal_length","petal_width"]
xgbModel=xgb.train(df,feature_names, "label")
xgbModel.saveOverwrite("hdfs://xxxx")
#xgbModel.write().overwrite().save("hdfs://xxxx")
xgbModel=PyXGBoostClassificationModel.load("hdfs://xxxx")
result_df=xgbModel.transform(df,feature_names)
submit
spark-submit --master yarn-cluster --num-executors 100 \
--jars pyspark-xgboost-1.0-SNAPSHOT.jar \
--py-files pyspark-xgboost-1.0-SNAPSHOT.jar \
--files test.py
简介
pyspark版本的xgboost
首先执行:
pip install PyXGBoost
代码示例:
from pyspark.sql import SparkSession
from PyXGBoost import PyXGBoostClassifier, PyXGBoostClassificationModel
spark = SparkSession \
.builder \
.appName("pyspark xgboost") \
.getOrCreate()
df=spark.read.csv("src/main/resources/iris.csv",schema="sepal_length double, sepal_width double, petal_length double,petal_width double,label int")
df=df.fillna(0)
#same as xgboost param map
params0 = {
"objective" :"binary:logistic"
, "eta" : 0.01
, "max_depth" : 6
, "min_child_weight" : 50
, "colsample_bytree" : 0.5
, "silent" : 0
, "seed" : 12345
}
xgb=PyXGBoostClassifier(params0)
xgb.set_num_round(11) \
.set_num_workers(11)
feature_names=["sepal_length","sepal_width","petal_length","petal_width"]
xgbModel=xgb.train(df,feature_names, "label")
xgbModel.saveOverwrite("hdfs://xxxx")
#xgbModel.write().overwrite().save("hdfs://xxxx")
xgbModel=PyXGBoostClassificationModel.load("hdfs://xxxx")
result_df=xgbModel.transform(df,feature_names)
提交
命令需要在两个地方带上jar包,示例:
spark-submit --master yarn-cluster --num-executors 100 \
--jars pyspark-xgboost-1.0-SNAPSHOT.jar \
--py-files pyspark-xgboost-1.0-SNAPSHOT.jar \
--files test.py
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
PyXGBoost-1.0.6.tar.gz
(2.8 kB
view details)
File details
Details for the file PyXGBoost-1.0.6.tar.gz.
File metadata
- Download URL: PyXGBoost-1.0.6.tar.gz
- Upload date:
- Size: 2.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/2.0.0 pkginfo/1.4.2 requests/2.22.0 setuptools/39.1.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.6.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c22d5aee4112ea9d2dffd76058284baae7d1c3c33d95142361062d005fa5a892
|
|
| MD5 |
a7593c699bdfa22cbd688baf0774072c
|
|
| BLAKE2b-256 |
66645687e98766d3159f35cd522bfdd3cfbc87e8a8075d6ce6230062886891fb
|