Skip to main content

Augment pandas DataFrame with methods for machine learning

Project description

Pandas ML Utils

Pandas Machine Learning Utilities is part of a bigger set of libraries for a convenient experience. Usually exploring statistical models start with a pandas DataFrame.

But soon enough you will find yourself converting your data frames to numpy, splitting arrays, applying min max scalers, lagging and concatenating columns etc. As a result your notebook looks messy and became and unreadable beast. Yet the mess becomes only worse once you start to deploy your research into a productive application. The untested hard coded data pipelines need be be maintained at two places.

The aim of this library is to conveniently operate with data frames without and abstract away the ugly unreproducible data pipelines. The only thing you need is the original unprocessed data frame where you started. The data pipeline becomes a part of your model and gets saved that way. Going into production is as easy as this:

import pandas as pd
import pandas_ml_utils  # monkey patch the `DataFrame`
from pandas_ml_utils import Model
# alternatively as a one liner `from pandas_ml_utils import pd, Model` 

model = Model.load('your_saved.model')
df = pd.read_csv('your_raw_data.csv')
df_prediction = df.model.predict(model)

# do something with your prediction
df_prediction.plot()

is intended to help you through your journey of statistical or machine learning models, while you never need to leave the world of pandas.

Installation

The basic implementation supports scikit learn classifiers and regressors.

pip install pandas-ml-utils

Additional machine learning libraries are available as an add on:

pip install pandas-ml-utils-torch  # pytorch implementation
pip install pandas-ml-utils-keras  # keras + tensorflow 1.x implementation

Note that the keras/tensorflow version is currently stalled as I focus on pytorch recently. This might change with PyMC4 and tensorflow probability

Example

You will find some demo projects in the examples directory. But It might also be worth it to check the unit tests and the integration tests. Here is how classification challenge might look like:

Classification Example

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pandas-ml-utils-0.2.7.tar.gz (319.3 kB view details)

Uploaded Source

File details

Details for the file pandas-ml-utils-0.2.7.tar.gz.

File metadata

  • Download URL: pandas-ml-utils-0.2.7.tar.gz
  • Upload date:
  • Size: 319.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.4.2 requests/2.22.0 setuptools/45.2.0 requests-toolbelt/0.8.0 tqdm/4.30.0 CPython/3.8.10

File hashes

Hashes for pandas-ml-utils-0.2.7.tar.gz
Algorithm Hash digest
SHA256 cae770c21c7aa334eb0a6b15676c2dc138966d2b5db8d26f152023167fd6608f
MD5 daed11f7a017546d3e301e99ebcb71fb
BLAKE2b-256 117fe33ca070d760fa629dce191afb75f1acb39d4b825bc88497832464e336d1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page