Skip to main content

Framework for machine and deep learning, with regression, classification and time series analysis

Project description

crapaud

Welcome to LeCrapaud

An all-in-one machine learning framework

GitHub stars PyPI version Python versions License codecov

🚀 Introduction

LeCrapaud is a high-level Python library for end-to-end machine learning workflows on tabular data, with a focus on financial and stock datasets. It provides a simple API to handle feature engineering, model selection, training, and prediction, all in a reproducible and modular way.

✨ Key Features

  • 🧩 Modular pipeline: Feature engineering, preprocessing, selection, and modeling as independent steps
  • 🤖 Automated model selection and hyperparameter optimization
  • 📊 Easy integration with pandas DataFrames
  • 🔬 Supports both regression and classification tasks
  • 🛠️ Simple API for both full pipeline and step-by-step usage
  • 📦 Ready for production and research workflows

⚡ Quick Start

Install the package

pip install lecrapaud

How it works

This package provides a high-level API to manage experiments for feature engineering, model selection, and prediction on tabular data (e.g. stock data).

Typical workflow

from lecrapaud import LeCrapaud

# 1. Create the main app
app = LeCrapaud(uri=uri)

# 2. Define your experiment context (see your notebook or api.py for all options)
context = {
    "data": your_dataframe,
    "columns_drop": [...],
    "columns_date": [...],
    # ... other config options
}

# 3. Create an experiment
experiment = app.create_experiment(**context)

# 4. Run the full training pipeline
experiment.train(your_dataframe)

# 5. Make predictions on new data
predictions = experiment.predict(new_data)

Database Configuration (Required)

LeCrapaud requires access to a MySQL database to store experiments and results. You must either:

  • Pass a valid MySQL URI to the LeCrapaud constructor:
    app = LeCrapaud(uri="mysql+pymysql://user:password@host:port/dbname")
    
  • OR set the following environment variables before using the package:
    • DB_USER, DB_PASSWORD, DB_HOST, DB_PORT, DB_NAME
    • Or set DB_URI directly with your full connection string.

If neither is provided, database operations will not work.

Using OpenAI Embeddings (Optional)

If you want to use the columns_pca embedding feature (for advanced feature engineering), you must set the OPENAI_API_KEY environment variable with your OpenAI API key:

export OPENAI_API_KEY=sk-...

If this variable is not set, features relying on OpenAI embeddings will not be available.

Experiment Context Arguments

Below are the main arguments you can pass to create_experiment (or the Experiment class):

Argument Type Description Example/Default
columns_binary list Columns to treat as binary ['flag']
columns_boolean list Columns to treat as boolean ['is_active']
columns_date list Columns to treat as dates ['date']
columns_drop list Columns to drop during feature engineering ['col1', 'col2']
columns_frequency list Columns to frequency encode ['category']
columns_onehot list Columns to one-hot encode ['sector']
columns_ordinal list Columns to ordinal encode ['grade']
columns_pca list Columns to use for PCA/embeddings (requires OPENAI_API_KEY if using OpenAI embeddings) ['text_col']
columns_te_groupby list Columns for target encoding groupby ['sector']
columns_te_target list Columns for target encoding target ['target']
data DataFrame Your main dataset (required for new experiment) your_dataframe
date_column str Name of the date column 'date'
experiment_name str Name for the training session 'my_session'
group_column str Name of the group column 'stock_id'
max_timesteps int Max timesteps for time series models 30
models_idx list Indices of models to use for model selection [0, 1, 2]
number_of_trials int Number of trials for hyperparameter optimization 20
perform_crossval bool Whether to perform cross-validation True/False
perform_hyperopt bool Whether to perform hyperparameter optimization True/False
plot bool Whether to plot results True/False
preserve_model bool Whether to preserve the best model True/False
target_clf list List of classification target column indices/names [1, 2, 3]
target_mclf list Multi-class classification targets (not yet implemented) [11]
target_numbers list List of regression target column indices/names [1, 2, 3]
test_size int/float Test set size (count or fraction) 0.2
time_series bool Whether the data is time series True/False
val_size int/float Validation set size (count or fraction) 0.2

Note:

  • Not all arguments are required; defaults may exist for some.
  • For columns_pca with OpenAI embeddings, you must set the OPENAI_API_KEY environment variable.

Modular usage

You can also use each step independently:

data_eng = experiment.feature_engineering(data)
train, val, test = experiment.preprocess_feature(data_eng)
features = experiment.feature_selection(train)
std_data, reshaped_data = experiment.preprocess_model(train, val, test)
experiment.model_selection(std_data, reshaped_data)

⚠️ Using Alembic in Your Project (Important for Integrators)

If you use Alembic for migrations in your own project and you share the same database with LeCrapaud, you must ensure that Alembic does not attempt to drop or modify LeCrapaud tables (those prefixed with {LECRAPAUD_TABLE_PREFIX}_).

By default, Alembic's autogenerate feature will propose to drop any table that exists in the database but is not present in your project's models. To prevent this, add the following filter to your env.py:

def include_object(object, name, type_, reflected, compare_to):
    if type_ == "table" and name.startswith(f"{LECRAPAUD_TABLE_PREFIX}_"):
        return False  # Ignore LeCrapaud tables
    return True

context.configure(
    # ... other options ...
    include_object=include_object,
)

This will ensure that Alembic ignores all tables created by LeCrapaud when generating migrations for your own project.


🤝 Contributing

Reminders for Github usage

  1. Creating Github repository
$ brew install gh
$ gh auth login
$ gh repo create
  1. Initializing git and first commit to distant repository
$ git init
$ git add .
$ git commit -m 'first commit'
$ git remote add origin <YOUR_REPO_URL>
$ git push -u origin master
  1. Use conventional commits
    https://www.conventionalcommits.org/en/v1.0.0/#summary

  2. Create environment

$ pip install virtualenv
$ python -m venv .venv
$ source .venv/bin/activate
  1. Install dependencies
$ make install
  1. Deactivate virtualenv (if needed)
$ deactivate

Pierre Gallet © 2025

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lecrapaud-0.18.4.tar.gz (81.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lecrapaud-0.18.4-py3-none-any.whl (98.2 kB view details)

Uploaded Python 3

File details

Details for the file lecrapaud-0.18.4.tar.gz.

File metadata

  • Download URL: lecrapaud-0.18.4.tar.gz
  • Upload date:
  • Size: 81.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.11

File hashes

Hashes for lecrapaud-0.18.4.tar.gz
Algorithm Hash digest
SHA256 b3b30d31218853e13fbcfc62330e6d86b1069d0acbc7d9f83161412761611805
MD5 7a9b9ab1659f4bac8337ec3310d6c803
BLAKE2b-256 89cc509f79aaa16bb9d9ff4082e286415a6aa9a423af6eda19da61fbc490b5a3

See more details on using hashes here.

File details

Details for the file lecrapaud-0.18.4-py3-none-any.whl.

File metadata

  • Download URL: lecrapaud-0.18.4-py3-none-any.whl
  • Upload date:
  • Size: 98.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.11

File hashes

Hashes for lecrapaud-0.18.4-py3-none-any.whl
Algorithm Hash digest
SHA256 feaacb252cef8674eb537ed2205fdff57264b3fc7f2cf3f0b9d13de5d7bb6c34
MD5 a7ebea2aa96c3fea686893a89c2b0df9
BLAKE2b-256 9e9b19b96336d71105115d740b6af3c052d7742c5134d3e06c0c37d49e304d85

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page