Skip to main content

Framework for machine and deep learning, with regression, classification and time series analysis

Project description

crapaud

Welcome to LeCrapaud

An all-in-one machine learning framework

GitHub stars PyPI version Python versions License codecov

🚀 Introduction

LeCrapaud is a high-level Python library for end-to-end machine learning workflows on tabular data, with a focus on financial and stock datasets. It provides a simple API to handle feature engineering, model selection, training, and prediction, all in a reproducible and modular way.

✨ Key Features

  • 🧩 Modular pipeline: Feature engineering, preprocessing, selection, and modeling as independent steps
  • 🤖 Automated model selection and hyperparameter optimization
  • 📊 Easy integration with pandas DataFrames
  • 🔬 Supports both regression and classification tasks
  • 🛠️ Simple API for both full pipeline and step-by-step usage
  • 📦 Ready for production and research workflows

⚡ Quick Start

Install the package

pip install lecrapaud

How it works

This package provides a high-level API to manage experiments for feature engineering, model selection, and prediction on tabular data (e.g. stock data).

Typical workflow

from lecrapaud import LeCrapaud

# 1. Create the main app
app = LeCrapaud(uri=uri)

# 2. Define your experiment context (see your notebook or api.py for all options)
context = {
    "data": your_dataframe,
    "columns_drop": [...],
    "columns_date": [...],
    # ... other config options
}

# 3. Create an experiment
experiment = app.create_experiment(**context)

# 4. Run the full training pipeline
experiment.train(your_dataframe)

# 5. Make predictions on new data
predictions = experiment.predict(new_data)

Database Configuration (Required)

LeCrapaud requires access to a MySQL database to store experiments and results. You must either:

  • Pass a valid MySQL URI to the LeCrapaud constructor:
    app = LeCrapaud(uri="mysql+pymysql://user:password@host:port/dbname")
    
  • OR set the following environment variables before using the package:
    • DB_USER, DB_PASSWORD, DB_HOST, DB_PORT, DB_NAME
    • Or set DB_URI directly with your full connection string.

If neither is provided, database operations will not work.

Using OpenAI Embeddings (Optional)

If you want to use the columns_pca embedding feature (for advanced feature engineering), you must set the OPENAI_API_KEY environment variable with your OpenAI API key:

export OPENAI_API_KEY=sk-...

If this variable is not set, features relying on OpenAI embeddings will not be available.

Experiment Context Arguments

Below are the main arguments you can pass to create_experiment (or the Experiment class):

Argument Type Description Example/Default
columns_binary list Columns to treat as binary ['flag']
columns_boolean list Columns to treat as boolean ['is_active']
columns_date list Columns to treat as dates ['date']
columns_drop list Columns to drop during feature engineering ['col1', 'col2']
columns_frequency list Columns to frequency encode ['category']
columns_onehot list Columns to one-hot encode ['sector']
columns_ordinal list Columns to ordinal encode ['grade']
columns_pca list Columns to use for PCA/embeddings (requires OPENAI_API_KEY if using OpenAI embeddings) ['text_col']
columns_te_groupby list Columns for target encoding groupby ['sector']
columns_te_target list Columns for target encoding target ['target']
data DataFrame Your main dataset (required for new experiment) your_dataframe
date_column str Name of the date column 'date'
experiment_name str Name for the training session 'my_session'
group_column str Name of the group column 'stock_id'
max_timesteps int Max timesteps for time series models 30
models_idx list Indices of models to use for model selection [0, 1, 2]
number_of_trials int Number of trials for hyperparameter optimization 20
perform_crossval bool Whether to perform cross-validation True/False
perform_hyperopt bool Whether to perform hyperparameter optimization True/False
plot bool Whether to plot results True/False
preserve_model bool Whether to preserve the best model True/False
target_clf list List of classification target column indices/names [1, 2, 3]
target_mclf list Multi-class classification targets (not yet implemented) [11]
target_numbers list List of regression target column indices/names [1, 2, 3]
test_size int/float Test set size (count or fraction) 0.2
time_series bool Whether the data is time series True/False
val_size int/float Validation set size (count or fraction) 0.2

Note:

  • Not all arguments are required; defaults may exist for some.
  • For columns_pca with OpenAI embeddings, you must set the OPENAI_API_KEY environment variable.

Modular usage

You can also use each step independently:

data_eng = experiment.feature_engineering(data)
train, val, test = experiment.preprocess_feature(data_eng)
features = experiment.feature_selection(train)
std_data, reshaped_data = experiment.preprocess_model(train, val, test)
experiment.model_selection(std_data, reshaped_data)

⚠️ Using Alembic in Your Project (Important for Integrators)

If you use Alembic for migrations in your own project and you share the same database with LeCrapaud, you must ensure that Alembic does not attempt to drop or modify LeCrapaud tables (those prefixed with lecrapaud_).

By default, Alembic's autogenerate feature will propose to drop any table that exists in the database but is not present in your project's models. To prevent this, add the following filter to your env.py:

def include_object(object, name, type_, reflected, compare_to):
    if type_ == "table" and name.startswith("lecrapaud_"):
        return False  # Ignore LeCrapaud tables
    return True

context.configure(
    # ... other options ...
    include_object=include_object,
)

This will ensure that Alembic ignores all tables created by LeCrapaud when generating migrations for your own project.


🤝 Contributing

Reminders for Github usage

  1. Creating Github repository
$ brew install gh
$ gh auth login
$ gh repo create
  1. Initializing git and first commit to distant repository
$ git init
$ git add .
$ git commit -m 'first commit'
$ git remote add origin <YOUR_REPO_URL>
$ git push -u origin master
  1. Use conventional commits
    https://www.conventionalcommits.org/en/v1.0.0/#summary

  2. Create environment

$ pip install virtualenv
$ python -m venv .venv
$ source .venv/bin/activate
  1. Install dependencies
$ make install
  1. Deactivate virtualenv (if needed)
$ deactivate

Pierre Gallet © 2025

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lecrapaud-0.12.0.tar.gz (74.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lecrapaud-0.12.0-py3-none-any.whl (89.1 kB view details)

Uploaded Python 3

File details

Details for the file lecrapaud-0.12.0.tar.gz.

File metadata

  • Download URL: lecrapaud-0.12.0.tar.gz
  • Upload date:
  • Size: 74.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.10

File hashes

Hashes for lecrapaud-0.12.0.tar.gz
Algorithm Hash digest
SHA256 607b25e2170bfd24d7a70e7bd4b0ac333d3b79ed822733ecf20cd809ba2d1482
MD5 2e0039c2750cd73bfa29b6a829993b47
BLAKE2b-256 aeb0e0cd24709bf5741dd8cf1aa8c9259b8b4c68603c22d5693164bc678c4a3c

See more details on using hashes here.

File details

Details for the file lecrapaud-0.12.0-py3-none-any.whl.

File metadata

  • Download URL: lecrapaud-0.12.0-py3-none-any.whl
  • Upload date:
  • Size: 89.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.10

File hashes

Hashes for lecrapaud-0.12.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3edb1f6d1cf456dc83d3b6b3b199421614d14dc0c2b0f03c42b4f3f5cd66e75a
MD5 e63f6a99ea80bdeabe5ba9ac04edf21a
BLAKE2b-256 bda01ee4e974ebf4cd419af082db69d65e1bb1c998f1446be90083f559140a76

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page