Skip to main content

Framework for machine and deep learning, with regression, classification and time series analysis

Project description

crapaud

Welcome to LeCrapaud

An all-in-one machine learning framework

GitHub stars PyPI version Python versions License codecov

🚀 Introduction

LeCrapaud is a high-level Python library for end-to-end machine learning workflows on tabular data, with a focus on financial and stock datasets. It provides a simple API to handle feature engineering, model selection, training, and prediction, all in a reproducible and modular way.

✨ Key Features

  • 🧩 Modular pipeline: Feature engineering, preprocessing, selection, and modeling as independent steps
  • 🤖 Automated model selection and hyperparameter optimization
  • 📊 Easy integration with pandas DataFrames
  • 🔬 Supports both regression and classification tasks
  • 🛠️ Simple API for both full pipeline and step-by-step usage
  • 📦 Ready for production and research workflows

⚡ Quick Start

Install the package

pip install lecrapaud

How it works

This package provides a high-level API to manage experiments for feature engineering, model selection, and prediction on tabular data (e.g. stock data).

Typical workflow

from lecrapaud import LeCrapaud

# 1. Create the main app
app = LeCrapaud(uri=uri)

# 2. Define your experiment context (see your notebook or api.py for all options)
context = {
    "data": your_dataframe,
    "columns_drop": [...],
    "columns_date": [...],
    # ... other config options
}

# 3. Create an experiment
experiment = app.create_experiment(**context)

# 4. Run the full training pipeline
experiment.train(your_dataframe)

# 5. Make predictions on new data
predictions = experiment.predict(new_data)

Database Configuration (Required)

LeCrapaud requires access to a MySQL database to store experiments and results. You must either:

  • Pass a valid MySQL URI to the LeCrapaud constructor:
    app = LeCrapaud(uri="mysql+pymysql://user:password@host:port/dbname")
    
  • OR set the following environment variables before using the package:
    • DB_USER, DB_PASSWORD, DB_HOST, DB_PORT, DB_NAME
    • Or set DB_URI directly with your full connection string.

If neither is provided, database operations will not work.

Using OpenAI Embeddings (Optional)

If you want to use the columns_pca embedding feature (for advanced feature engineering), you must set the OPENAI_API_KEY environment variable with your OpenAI API key:

export OPENAI_API_KEY=sk-...

If this variable is not set, features relying on OpenAI embeddings will not be available.

Experiment Context Arguments

Below are the main arguments you can pass to create_experiment (or the Experiment class):

Argument Type Description Example/Default
columns_binary list Columns to treat as binary ['flag']
columns_boolean list Columns to treat as boolean ['is_active']
columns_date list Columns to treat as dates ['date']
columns_drop list Columns to drop during feature engineering ['col1', 'col2']
columns_frequency list Columns to frequency encode ['category']
columns_onehot list Columns to one-hot encode ['sector']
columns_ordinal list Columns to ordinal encode ['grade']
columns_pca list Columns to use for PCA/embeddings (requires OPENAI_API_KEY if using OpenAI embeddings) ['text_col']
columns_te_groupby list Columns for target encoding groupby ['sector']
columns_te_target list Columns for target encoding target ['target']
data DataFrame Your main dataset (required for new experiment) your_dataframe
date_column str Name of the date column 'date'
experiment_name str Name for the training session 'my_session'
group_column str Name of the group column 'stock_id'
max_timesteps int Max timesteps for time series models 30
models_idx list Indices of models to use for model selection [0, 1, 2]
number_of_trials int Number of trials for hyperparameter optimization 20
perform_crossval bool Whether to perform cross-validation True/False
perform_hyperopt bool Whether to perform hyperparameter optimization True/False
plot bool Whether to plot results True/False
preserve_model bool Whether to preserve the best model True/False
target_clf list List of classification target column indices/names [1, 2, 3]
target_mclf list Multi-class classification targets (not yet implemented) [11]
target_numbers list List of regression target column indices/names [1, 2, 3]
test_size int/float Test set size (count or fraction) 0.2
time_series bool Whether the data is time series True/False
val_size int/float Validation set size (count or fraction) 0.2

Note:

  • Not all arguments are required; defaults may exist for some.
  • For columns_pca with OpenAI embeddings, you must set the OPENAI_API_KEY environment variable.

Modular usage

You can also use each step independently:

data_eng = experiment.feature_engineering(data)
train, val, test = experiment.preprocess_feature(data_eng)
features = experiment.feature_selection(train)
std_data, reshaped_data = experiment.preprocess_model(train, val, test)
experiment.model_selection(std_data, reshaped_data)

⚠️ Using Alembic in Your Project (Important for Integrators)

If you use Alembic for migrations in your own project and you share the same database with LeCrapaud, you must ensure that Alembic does not attempt to drop or modify LeCrapaud tables (those prefixed with lecrapaud_).

By default, Alembic's autogenerate feature will propose to drop any table that exists in the database but is not present in your project's models. To prevent this, add the following filter to your env.py:

def include_object(object, name, type_, reflected, compare_to):
    if type_ == "table" and name.startswith("lecrapaud_"):
        return False  # Ignore LeCrapaud tables
    return True

context.configure(
    # ... other options ...
    include_object=include_object,
)

This will ensure that Alembic ignores all tables created by LeCrapaud when generating migrations for your own project.


🤝 Contributing

Reminders for Github usage

  1. Creating Github repository
$ brew install gh
$ gh auth login
$ gh repo create
  1. Initializing git and first commit to distant repository
$ git init
$ git add .
$ git commit -m 'first commit'
$ git remote add origin <YOUR_REPO_URL>
$ git push -u origin master
  1. Use conventional commits
    https://www.conventionalcommits.org/en/v1.0.0/#summary

  2. Create environment

$ pip install virtualenv
$ python -m venv .venv
$ source .venv/bin/activate
  1. Install dependencies
$ make install
  1. Deactivate virtualenv (if needed)
$ deactivate

Pierre Gallet © 2025

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lecrapaud-0.11.1.tar.gz (73.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lecrapaud-0.11.1-py3-none-any.whl (88.4 kB view details)

Uploaded Python 3

File details

Details for the file lecrapaud-0.11.1.tar.gz.

File metadata

  • Download URL: lecrapaud-0.11.1.tar.gz
  • Upload date:
  • Size: 73.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.10

File hashes

Hashes for lecrapaud-0.11.1.tar.gz
Algorithm Hash digest
SHA256 6e24cfa6a55be376d12255ca7ef2b9a54dc13438230b06ff97f6c7447b48924d
MD5 f7cb1081a189cc071e16368d0a511b38
BLAKE2b-256 5c223657ab2a79a43ed067e519ba82b767875cc6bffaf103a997cb2920b71aa4

See more details on using hashes here.

File details

Details for the file lecrapaud-0.11.1-py3-none-any.whl.

File metadata

  • Download URL: lecrapaud-0.11.1-py3-none-any.whl
  • Upload date:
  • Size: 88.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.10

File hashes

Hashes for lecrapaud-0.11.1-py3-none-any.whl
Algorithm Hash digest
SHA256 eac98d3b99ede221f6c33970bea94d29bea9b59a057df173b826afdd69c660c4
MD5 6f8b17d4daa9e5737b398e821fdcd6d8
BLAKE2b-256 0588ffde54123f3a46d4320c758170e1f66fd18ef72d299d132bc4a9c0bd78b4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page