Skip to main content

Machine-Learning Toolsets

Project description

spinesUtils -- A Machine-Learning Toolsets

Let you get more done in less time.


This is where all stories begin:

pip install spinesUtils

better CSV dataloader

from spinesUtils import dataloader

your_df = dataloader(
    fp='/path/to/your/file.csv',
    sep=',',  # equal to pandas read_csv.sep
    turbo_method='pyarrow', # use turbo_method to speed up load time
    chunk_size=None, # it can be integer if you want to use pandas backend
    save_as_pkl=False, # if you want to save the file as pickle, it can speed up next load time
    transform2low_mem=True, # it can compresses file to save more memory
    verbose=False
)

better pandas DataFrame insight tools

from spinesUtils import df_preview, classify_samples_dist

df_insight = df_preview(your_df)

df_target_distribution = classify_samples_dist(your_df, target_col=your_df[y_col])

print(df_insight)
print(df_target_distribution)

better dataframe compresses/uncompress tools

# single dataframe
from spinesUtils import transform_dtypes_low_mem, inverse_transform_dtypes

# compresses file to save memory
transform_dtypes_low_mem(your_df, verbose=True)

# uncompress file to python type
inverse_transform_dtypes(your_df, verbose=True, int_dtype=int, float_dtype=float)
# dataframes
import numpy as np
from spinesUtils import transform_batch_dtypes_low_mem, inverse_transform_batch_dtypes

your_dfs = [your_df1, your_df2, your_df3]  # it can be unlimited

# compresses files to save memory
transform_batch_dtypes_low_mem(your_dfs, verbose=True)

# uncompress file to numpy type
inverse_transform_batch_dtypes(your_dfs, verbose=True, int_dtype=np.int32, float_dtype=np.float32)

better features selector

from spinesUtils import TreeSequentialFeatureSelector
from lightgbm import LGBMClassifier

estimator = LGBMClassifier(random_state=0)
fe = TreeSequentialFeatureSelector(estimator, metrics_name='f1',
    forward=True,
    floating=True,
    log_file_path='feature_selection.log',
    verbose=True)

fe.fit(your_df[x_cols], your_df[y_col])
print(fe.best_cols_, fe.best_score_)

better train_test_split function

# return numpy.ndarray
from spinesUtils import train_test_split_bigdata

X_train, X_valid, X_test, y_train, y_valid, y_test = train_test_split_bigdata(
    df=your_df, 
    x_cols=x_cols,
    y_col=y_col, 
    shuffle=True,
    return_valid=True,
    train_size=0.8,
    valid_size=0.5
)
# return pandas.dataframe
from spinesUtils import train_test_split_bigdata_df

train, valid, test = train_test_split_bigdata_df(
    df=your_df, 
    x_cols=x_cols,
    y_col=y_col, 
    shuffle=True,
    return_valid=True,
    train_size=0.8,
    valid_size=0.5,
    reset_index=True
)

better imbalanced-data model

from spinesUtils import BinaryBalanceClassifier
from lightgbm import LGBMClassifier
from sklearn.metrics import f1_score, recall_score, precision_score

classifier = BinaryBalanceClassifier(meta_estimators=[LGBMClassifier(), LGBMClassifier()])

classifier.fit(your_df[x_cols], your_df[y_col], threshold_search_set=(your_df[x_cols], your_df[y_col]))

print('threshold: ', classifier.auto_threshold)

print(
    'f1:', f1_score(your_df[y_col], classifier.predict(your_df[x_cols])), 
    'recall:', recall_score(your_df[y_col], classifier.predict(your_df[x_cols])), 
    'precision:', precision_score(your_df[y_col], classifier.predict(your_df[x_cols]))
)

log for human

from spinesUtils import Printer

your_logger = Printer(name='your_logger', 
                      fp='/path/to/your.log', 
                      verbose=True, 
                      truncate_file=True, 
                      with_time=True)

your_logger.insert2file("test")  # only insert to log file
your_logger.print('test') # only print to console

# Or you can do it both
your_logger.insert_and_throwout('test')

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spinesUtils-0.2.8.tar.gz (29.8 kB view details)

Uploaded Source

Built Distribution

spinesUtils-0.2.8-py3-none-any.whl (35.9 kB view details)

Uploaded Python 3

File details

Details for the file spinesUtils-0.2.8.tar.gz.

File metadata

  • Download URL: spinesUtils-0.2.8.tar.gz
  • Upload date:
  • Size: 29.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for spinesUtils-0.2.8.tar.gz
Algorithm Hash digest
SHA256 70909b67b8ed925403d51d14f16463081621b6934cde91d4462fbe6607f9477e
MD5 05b6d11cfbe7249404fb98a9436b6bc9
BLAKE2b-256 6a14da4f55830eb2b18d6c86b0fb3291bdc23b2a24133a3a30a2f001ff7ae2cd

See more details on using hashes here.

File details

Details for the file spinesUtils-0.2.8-py3-none-any.whl.

File metadata

  • Download URL: spinesUtils-0.2.8-py3-none-any.whl
  • Upload date:
  • Size: 35.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for spinesUtils-0.2.8-py3-none-any.whl
Algorithm Hash digest
SHA256 4b346b8673b08102c945ed30dfce4cd3cea2fe8942d9457caf3816872aff6c54
MD5 9eb89b336093b3ca555cbb5b8dc9bf03
BLAKE2b-256 4463dd101a3d77660fd57cc193fcd35e1516c5c097f3fed2aaa6aa0495a1f845

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page