Skip to main content

SmartEco ecosystem: CPU-first machine learning benchmarking and tooling

Project description

SmartEco Website

SmartML

SmartML is a CPU-first machine learning benchmarking library focused on fair, leakage-free, and reproducible evaluation of tabular machine learning models.

SmartML is part of the SmartEco ecosystem.


Core Principles

  • CPU-only by default
  • Deterministic and reproducible benchmarks
  • Zero data leakage
  • Minimal but safe preprocessing
  • Honest model availability detection
  • Fair comparison across models

SmartML only exposes models that actually run on the current system.

No fake availability. No “works on my GPU” nonsense.


Intended Use & Scope

SmartML is not a commercial AutoML system.

It is an internal benchmarking and evaluation tool, designed to:

  • Compare models fairly under identical conditions
  • Measure real-world CPU performance
  • Ensure leakage-free and reproducible results

All models are evaluated using:

  • Fixed default hyperparameters
  • Identical preprocessing
  • Identical train/test splits

No model is tuned, favored, or given any special advantage.

SmartML prioritizes fairness, transparency, and repeatability over leaderboard-style optimization.

For full details on:

  • Default hyperparameters
  • Preprocessing rules
  • Benchmark methodology

Please refer to the official SmartEco documentation and website.


Installation

SmartML is used as part of the SmartEco package.

Install SmartEco in editable mode from the directory that contains the SmartEco folder.

Required Dependencies

  • numpy
  • pandas
  • scikit-learn

Optional Dependencies

Installing these unlocks additional models:

  • lightgbm
  • xgboost
  • catboost
  • interpret (for NAM / Explainable Boosting)
  • pytorch-tabnet
  • torch (CPU)
  • smart-knn (SmartEco-native)

Some research libraries are platform-dependent and may not be available on Windows CPU.
SmartML automatically hides unavailable models.


Data Encoding & Preprocessing

SmartML uses minimal, transparent, and safe preprocessing.

Feature Encoding

  • Numerical features
    Passed directly without modification.

  • Categorical features

    • Low-cardinality → One-Hot Encoding (OHE)
    • High-cardinality → Target Encoding

Target Encoding Safety

  • Computed only on the training split
  • Validation and test data never influence encoding
  • Guarantees zero target leakage

Additional guarantees:

  • Classification targets → label-encoded
  • Regression targets → remain continuous
  • Encoding logic is task-aware
  • Test targets are never used during preprocessing
  • Linear models are feature-scaled to ensure fair and stable benchmarking

Train / Test Split

  • Fixed random seed is always used
  • Default split is deterministic
  • Stratification is applied automatically for classification
  • Regression splits are random but reproducible

This ensures:

  • Identical splits across runs
  • Fair comparison between models
  • Benchmark repeatability

Available Models

SmartML dynamically detects and exposes models that are usable on the current machine.

Classification Models

Depending on installed dependencies:

  • Logistic Regression
  • Support Vector Classifier (SVC)
  • K-Nearest Neighbors (KNN)
  • Naive Bayes
  • Random Forest
  • Extra Trees
  • LightGBM
  • XGBoost
  • CatBoost
  • NAM (Explainable Boosting)
  • TabNet
  • SmartKNN
  • Optional research models (platform-dependent)

Regression Models

Depending on installed dependencies:

  • Linear Regression
  • Ridge Regression
  • Lasso
  • ElasticNet
  • Support Vector Regressor (SVR)
  • KNN Regressor
  • Random Forest
  • Extra Trees
  • LightGBM
  • XGBoost
  • CatBoost
  • NAM (Explainable Boosting)
  • TabNet
  • SmartKNN
  • Optional research models (platform-dependent)

Model Availability Policy

If a model cannot run on the current system, it does not appear.

SmartML:

  • Does not fake availability
  • Does not crash on missing dependencies
  • Does not assume Linux or GPU environments

Model availability is determined at runtime.


Inspection Utility

SmartML provides a runtime inspection utility called SmartML_Inspect.

It generates a JSON file reporting:

  • Available classification models
  • Available regression models
  • Metrics used by SmartML

The output reflects actual runtime capability, not theoretical support.
No terminal output is produced.


Evaluation Metrics

Classification Metrics

  • Accuracy
  • Macro F1 Score

Regression Metrics

  • R² Score
  • Mean Squared Error (MSE)

Inference & Performance Metrics

  • Training time
  • Batch inference time
  • Batch throughput
  • Single-sample mean latency
  • Single-sample P95 latency

These metrics evaluate both model quality and real-world performance.


Benchmarking Behavior

For each model, SmartML:

  • Uses the same train/test split
  • Applies identical preprocessing
  • Trains the model
  • Measures training time
  • Measures batch inference time
  • Measures single-sample latency distribution
  • Records results in a unified format

Result: fair, comparable, honest benchmarks.


Platform Notes

  • Windows + CPU → fully supported
  • Linux / WSL → additional research models may become available
  • GPU → not required, not assumed, not enforced

SmartML remains CPU-safe by default.


Experimental & Research Models

Some models exist in the codebase but may be hidden at runtime due to missing dependencies:

  • Torch-Tabular models (MLP, FTTransformer, SAINT, TabTransformer)
  • DeepGBM
  • GrowNet
  • ModernNCA

These models are:

  • Guarded by optional imports
  • Exposed only when installable
  • Never removed from source code

Architecture Overview

SmartML is organized into modular components:

  • Dataset loading and encoding
  • Model registry and availability detection
  • Training and benchmarking engine
  • Evaluation and inference measurement
  • Inspection and reporting

Each component is deterministic and reproducible.


Part of SmartEco

SmartML is one component of the SmartEco ecosystem

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

smarteco-0.1.1.tar.gz (18.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

smarteco-0.1.1-py3-none-any.whl (23.2 kB view details)

Uploaded Python 3

File details

Details for the file smarteco-0.1.1.tar.gz.

File metadata

  • Download URL: smarteco-0.1.1.tar.gz
  • Upload date:
  • Size: 18.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.0

File hashes

Hashes for smarteco-0.1.1.tar.gz
Algorithm Hash digest
SHA256 6373f60229b7e88f157b5d0884dd7db0086662db604ff200755d33f8a0e014a3
MD5 9d48413663494fbabd61be9cd41ddcb1
BLAKE2b-256 261f2f05a380bebab5e14ead81016b039d64a2f3e4b9f41ddfcb23dc36332f65

See more details on using hashes here.

File details

Details for the file smarteco-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: smarteco-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 23.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.0

File hashes

Hashes for smarteco-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 6c1cb819b7808be5ae666e68c44a831afbc181dc62944937dba23231536a42cb
MD5 bff8ea3def1cd4e837e731689d3eb2f2
BLAKE2b-256 5a2c659f5646bb3f8dcb2ee103f23a1eeed80e0224e20d8e7fb60e6f0ddee671

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page