Skip to main content

模型训练工具集 model training toolsets

Project description

spinesUtils -- A Machine-Learning Toolsets

模型训练工具集 model training toolsets.

使用pip安装 use pip install spinesUtils

pip install spinesUtils

better csv dataloader

from spinesUtils import dataloader

your_df = dataloader(
    fp='/path/to/your/file.csv',
    sep=',',  # equal to pandas read_csv.sep
    turbo_method='pyarrow', # use turbo_method to speed up load time
    chunk_size=None, # it can be integer if you want to use pandas backend
    save_as_pkl=False, # if you want to save the file as pickle, it can speed up next load time
    transform2low_mem=True, # it can compresses file to save more memory
    verbose=False
)

better pandas dataframe insight tools

from spinesUtils import df_preview, classify_samples_dist

df_insight = df_preview(your_df)

df_target_distribution = classify_samples_dist(your_df, target_col=your_df[y_col])

print(df_insight)
print(df_target_distribution)

better dataframe compresses/uncompress tools

# single dataframe
from spinesUtils import transform_dtypes_low_mem, inverse_transform_dtypes

# compresses file to save memory
transform_dtypes_low_mem(your_df, verbose=True)

# uncompress file to python type
inverse_transform_dtypes(your_df, verbose=True, int_dtypes=int, float_dtypes=float)
# dataframes
import numpy as np
from spinesUtils import transform_batch_dtypes_low_mem, inverse_transform_batch_dtypes

your_dfs = [your_df1, your_df2, your_df3] # it can be unlimited

# compresses files to save memory
transform_batch_dtypes_low_mem(your_dfs, verbose=True)

# uncompress file to numpy type
inverse_transform_batch_dtypes(your_dfs, verbose=True, int_dtypes=np.int32, float_dtypes=np.float32)

better features selector

from spinesUtils import TreeSequentialFeatureSelector
from lightgbm import LGBMClassifier

estimator = LGBMClassifier(random_state=0)
fe = TreeSequentialFeatureSelector(estimator, metrics_name='f1',
    forward=True,
    floating=True,
    log_file_path='feature_selection.log',
    best_features_save_path='best_feature.txt', verbose=True)

fe.fit(your_df[x_cols], your_df[y_col])
print(fe.best_cols_, fe.best_score_)

better train_test_split function

# return numpy.ndarray
from spinesUtils import train_test_split_bigdata

X_train, X_valid, X_test, y_train, y_valid, y_test = train_test_split_bigdata(
    df=your_df, 
    x_cols=x_cols,
    y_col=y_col, 
    shuffle=True,
    return_valid=True,
    train_size=0.8,
    valid_size=0.5
)
# return pandas.dataframe
from spinesUtils import train_test_split_bigdata_df

train, valid, test = train_test_split_bigdata_df(
    df=your_df, 
    x_cols=x_cols,
    y_col=y_col, 
    shuffle=True,
    return_valid=True,
    train_size=0.8,
    valid_size=0.5,
    reset_index=True
)

better imbalanced-data model

from spinesUtils import BinaryBalanceClassifier
from lightgbm import LGBMClassifier
from sklearn.metrics import f1_score, recall_score, precision_score

classifier = BinaryBalanceClassifier(meta_estimators=[LGBMClassifier(), LGBMClassifier()])

classifier.fit(your_df[x_cols], your_df[y_col], threshold_search_set=(your_df[x_cols], your_df[y_col]))

print('threshold: ', classifier.auto_threshold)

print(
    'f1:', f1_score(your_df[y_col], classifier.predict(your_df[x_cols])), 
    'recall:', recall_score(your_df[y_col], classifier.predict(your_df[x_cols])), 
    'precision:', precision_score(your_df[y_col], classifier.predict(your_df[x_cols]))
)

log for human

from spinesUtils import Printer

your_logger = Printer(name='your_logger', verbose=True, 
        truncate_file=True, with_time=True)

your_logger.insert2file("test") 
your_logger.print('test')

# Or you can do it both
your_logger.insert_and_throwout('test')

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spinesUtils-0.2.5.tar.gz (29.2 kB view details)

Uploaded Source

Built Distribution

spinesUtils-0.2.5-py3-none-any.whl (34.2 kB view details)

Uploaded Python 3

File details

Details for the file spinesUtils-0.2.5.tar.gz.

File metadata

  • Download URL: spinesUtils-0.2.5.tar.gz
  • Upload date:
  • Size: 29.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for spinesUtils-0.2.5.tar.gz
Algorithm Hash digest
SHA256 5ac778e60e7f54639a1b9d91198f98644d33bf4a328d981315c40d026c507fc7
MD5 49bf4f2e6a79ef620be97b19cbd7c782
BLAKE2b-256 226f53298b8b168414f3626e2d6ee16fab89a96e7956965a9d1ea49a218274a5

See more details on using hashes here.

File details

Details for the file spinesUtils-0.2.5-py3-none-any.whl.

File metadata

  • Download URL: spinesUtils-0.2.5-py3-none-any.whl
  • Upload date:
  • Size: 34.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for spinesUtils-0.2.5-py3-none-any.whl
Algorithm Hash digest
SHA256 5259d51b16f8a4573b3ad31515b7760fb66957a46a830399e40a2323e71d5add
MD5 92deb7b30b2df8267848968489318ab2
BLAKE2b-256 88e175c67401fa67434dfa4607e5f3dba13070144d64b4ba324bd338ac65291c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page