模型训练工具集 model training toolsets
Project description
spinesUtils -- A Machine-Learning Toolsets
模型训练工具集 model training toolsets.
使用pip安装 use pip install spinesUtils
pip install spinesUtils
better csv dataloader
from spinesUtils import dataloader
your_df = dataloader(
fp='/path/to/your/file.csv',
sep=',', # equal to pandas read_csv.sep
turbo_method='pyarrow', # use turbo_method to speed up load time
chunk_size=None, # it can be integer if you want to use pandas backend
save_as_pkl=False, # if you want to save the file as pickle, it can speed up next load time
transform2low_mem=True, # it can compresses file to save more memory
verbose=False
)
better pandas dataframe insight tools
from spinesUtils import df_preview, classify_samples_dist
df_insight = df_preview(your_df)
df_target_distribution = classify_samples_dist(your_df, target_col=your_df[y_col])
print(df_insight)
print(df_target_distribution)
better dataframe compresses/uncompress tools
# single dataframe
from spinesUtils import transform_dtypes_low_mem, inverse_transform_dtypes
# compresses file to save memory
transform_dtypes_low_mem(your_df, verbose=True)
# uncompress file to python type
inverse_transform_dtypes(your_df, verbose=True, int_dtypes=int, float_dtypes=float)
# dataframes
import numpy as np
from spinesUtils import transform_batch_dtypes_low_mem, inverse_transform_batch_dtypes
your_dfs = [your_df1, your_df2, your_df3] # it can be unlimited
# compresses files to save memory
transform_batch_dtypes_low_mem(your_dfs, verbose=True)
# uncompress file to numpy type
inverse_transform_batch_dtypes(your_dfs, verbose=True, int_dtypes=np.int32, float_dtypes=np.float32)
better features selector
from spinesUtils import TreeSequentialFeatureSelector
from lightgbm import LGBMClassifier
estimator = LGBMClassifier(random_state=0)
fe = TreeSequentialFeatureSelector(estimator, metrics_name='f1',
forward=True,
floating=True,
log_file_path='feature_selection.log',
best_features_save_path='best_feature.txt', verbose=True)
fe.fit(your_df[x_cols], your_df[y_col])
print(fe.best_cols_, fe.best_score_)
better train_test_split function
# return numpy.ndarray
from spinesUtils import train_test_split_bigdata
X_train, X_valid, X_test, y_train, y_valid, y_test = train_test_split_bigdata(
df=your_df,
x_cols=x_cols,
y_col=y_col,
shuffle=True,
return_valid=True,
train_size=0.8,
valid_size=0.5
)
# return pandas.dataframe
from spinesUtils import train_test_split_bigdata_df
train, valid, test = train_test_split_bigdata_df(
df=your_df,
x_cols=x_cols,
y_col=y_col,
shuffle=True,
return_valid=True,
train_size=0.8,
valid_size=0.5,
reset_index=True
)
better imbalanced-data model
from spinesUtils import BinaryBalanceClassifier
from lightgbm import LGBMClassifier
from sklearn.metrics import f1_score, recall_score, precision_score
classifier = BinaryBalanceClassifier(meta_estimators=[LGBMClassifier(), LGBMClassifier()])
classifier.fit(your_df[x_cols], your_df[y_col], threshold_search_set=(your_df[x_cols], your_df[y_col]))
print('threshold: ', classifier.auto_threshold)
print(
'f1:', f1_score(your_df[y_col], classifier.predict(your_df[x_cols])),
'recall:', recall_score(your_df[y_col], classifier.predict(your_df[x_cols])),
'precision:', precision_score(your_df[y_col], classifier.predict(your_df[x_cols]))
)
log for human
from spinesUtils import Printer
your_logger = Printer(name='your_logger', verbose=True,
truncate_file=True, with_time=True)
your_logger.insert2file("test")
your_logger.print('test')
# Or you can do it both
your_logger.insert_and_throwout('test')
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
spinesUtils-0.2.5.tar.gz
(29.2 kB
view details)
Built Distribution
File details
Details for the file spinesUtils-0.2.5.tar.gz
.
File metadata
- Download URL: spinesUtils-0.2.5.tar.gz
- Upload date:
- Size: 29.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5ac778e60e7f54639a1b9d91198f98644d33bf4a328d981315c40d026c507fc7 |
|
MD5 | 49bf4f2e6a79ef620be97b19cbd7c782 |
|
BLAKE2b-256 | 226f53298b8b168414f3626e2d6ee16fab89a96e7956965a9d1ea49a218274a5 |
File details
Details for the file spinesUtils-0.2.5-py3-none-any.whl
.
File metadata
- Download URL: spinesUtils-0.2.5-py3-none-any.whl
- Upload date:
- Size: 34.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5259d51b16f8a4573b3ad31515b7760fb66957a46a830399e40a2323e71d5add |
|
MD5 | 92deb7b30b2df8267848968489318ab2 |
|
BLAKE2b-256 | 88e175c67401fa67434dfa4607e5f3dba13070144d64b4ba324bd338ac65291c |