Skip to main content

SmartEco ecosystem: CPU-first machine learning benchmarking and tooling

Project description

SmartEco Website

SmartML

SmartML is a CPU-first machine learning benchmarking library focused on fair, leakage-free, and reproducible evaluation of tabular machine learning models.

SmartML is part of the SmartEco ecosystem.


Core Principles

  • CPU-only by default
  • Deterministic and reproducible benchmarks
  • Zero data leakage
  • Minimal but safe preprocessing
  • Honest model availability detection
  • Fair comparison across models

SmartML only exposes models that actually run on the current system.

No fake availability. No “works on my GPU” nonsense.


Intended Use & Scope

SmartML is not a commercial AutoML system.

It is an internal benchmarking and evaluation tool, designed to:

  • Compare models fairly under identical conditions
  • Measure real-world CPU performance
  • Ensure leakage-free and reproducible results

All models are evaluated using:

  • Fixed default hyperparameters
  • Identical preprocessing
  • Identical train/test splits

No model is tuned, favored, or given any special advantage.

SmartML prioritizes fairness, transparency, and repeatability over leaderboard-style optimization.

For full details on:

  • Default hyperparameters
  • Preprocessing rules
  • Benchmark methodology

Please refer to the official SmartEco documentation and website.


Installation

SmartML is used as part of the SmartEco package.

Install SmartEco in editable mode from the directory that contains the SmartEco folder.

Required Dependencies

  • numpy
  • pandas
  • scikit-learn

Optional Dependencies

Installing these unlocks additional models:

  • lightgbm
  • xgboost
  • catboost
  • interpret (for NAM / Explainable Boosting)
  • pytorch-tabnet
  • torch (CPU)
  • smart-knn (SmartEco-native)

Some research libraries are platform-dependent and may not be available on Windows CPU.
SmartML automatically hides unavailable models.


Data Encoding & Preprocessing

SmartML uses minimal, transparent, and safe preprocessing.

Feature Encoding

  • Numerical features
    Passed directly without modification.

  • Categorical features

    • Low-cardinality → One-Hot Encoding (OHE)
    • High-cardinality → Target Encoding

Target Encoding Safety

  • Computed only on the training split
  • Validation and test data never influence encoding
  • Guarantees zero target leakage

Additional guarantees:

  • Classification targets → label-encoded
  • Regression targets → remain continuous
  • Encoding logic is task-aware
  • Test targets are never used during preprocessing
  • Linear models are feature-scaled to ensure fair and stable benchmarking

Train / Test Split

  • Fixed random seed is always used
  • Default split is deterministic
  • Stratification is applied automatically for classification
  • Regression splits are random but reproducible

This ensures:

  • Identical splits across runs
  • Fair comparison between models
  • Benchmark repeatability

Available Models

SmartML dynamically detects and exposes models that are usable on the current machine.

Classification Models

Depending on installed dependencies:

  • Logistic Regression
  • Support Vector Classifier (SVC)
  • K-Nearest Neighbors (KNN)
  • Naive Bayes
  • Random Forest
  • Extra Trees
  • LightGBM
  • XGBoost
  • CatBoost
  • NAM (Explainable Boosting)
  • TabNet
  • SmartKNN
  • Optional research models (platform-dependent)

Regression Models

Depending on installed dependencies:

  • Linear Regression
  • Ridge Regression
  • Lasso
  • ElasticNet
  • Support Vector Regressor (SVR)
  • KNN Regressor
  • Random Forest
  • Extra Trees
  • LightGBM
  • XGBoost
  • CatBoost
  • NAM (Explainable Boosting)
  • TabNet
  • SmartKNN
  • Optional research models (platform-dependent)

Model Availability Policy

If a model cannot run on the current system, it does not appear.

SmartML:

  • Does not fake availability
  • Does not crash on missing dependencies
  • Does not assume Linux or GPU environments

Model availability is determined at runtime.


Inspection Utility

SmartML provides a runtime inspection utility called SmartML_Inspect.

It generates a JSON file reporting:

  • Available classification models
  • Available regression models
  • Metrics used by SmartML

The output reflects actual runtime capability, not theoretical support.
No terminal output is produced.


Evaluation Metrics

Classification Metrics

  • Accuracy
  • Macro F1 Score

Regression Metrics

  • R² Score
  • Mean Squared Error (MSE)

Inference & Performance Metrics

  • Training time
  • Batch inference time
  • Batch throughput
  • Single-sample mean latency
  • Single-sample P95 latency

These metrics evaluate both model quality and real-world performance.


Benchmarking Behavior

For each model, SmartML:

  • Uses the same train/test split
  • Applies identical preprocessing
  • Trains the model
  • Measures training time
  • Measures batch inference time
  • Measures single-sample latency distribution
  • Records results in a unified format

Result: fair, comparable, honest benchmarks.


Platform Notes

  • Windows + CPU → fully supported
  • Linux / WSL → additional research models may become available
  • GPU → not required, not assumed, not enforced

SmartML remains CPU-safe by default.


Experimental & Research Models

Some models exist in the codebase but may be hidden at runtime due to missing dependencies:

  • Torch-Tabular models (MLP, FTTransformer, SAINT, TabTransformer)
  • DeepGBM
  • GrowNet
  • ModernNCA

These models are:

  • Guarded by optional imports
  • Exposed only when installable
  • Never removed from source code

Architecture Overview

SmartML is organized into modular components:

  • Dataset loading and encoding
  • Model registry and availability detection
  • Training and benchmarking engine
  • Evaluation and inference measurement
  • Inspection and reporting

Each component is deterministic and reproducible.


Part of SmartEco

SmartML is one component of the SmartEco ecosystem

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

smarteco-0.1.0.tar.gz (18.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

smarteco-0.1.0-py3-none-any.whl (23.0 kB view details)

Uploaded Python 3

File details

Details for the file smarteco-0.1.0.tar.gz.

File metadata

  • Download URL: smarteco-0.1.0.tar.gz
  • Upload date:
  • Size: 18.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.0

File hashes

Hashes for smarteco-0.1.0.tar.gz
Algorithm Hash digest
SHA256 ccf782701e8ca162167f057d29eebcad4d98708b105096a75959c4f5bae0749a
MD5 ebd889a8d33d25f3527b3a5dce46ca19
BLAKE2b-256 7c8d9c6e8f4dd985810aa844e5f1b55549880e1e8493d0fdeb040d6c9cc98c1a

See more details on using hashes here.

File details

Details for the file smarteco-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: smarteco-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 23.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.0

File hashes

Hashes for smarteco-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c89642ce9b9176ec32e9b56c0131d51758b605bc189b77b46fb50e908bd0dbfd
MD5 2b52c2a69cb01031c0383ff7bf3ec8c2
BLAKE2b-256 c0e46b51e7c44040ac79fea755478729a4c35bcd359ddb3b1af2a3c013aadbd8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page