Skip to main content

One-line Data Analyst + AutoML library for tabular datasets — clean, train, and predict with minimal code

Project description

mrpravin

One-line Data Analyst + AutoML for tabular datasets

PyPI Python License Stars

Built by Pravin MR · Chennai, India
mrpravin000@gmail.com · LinkedIn · GitHub


What is mrpravin?

mrpravin is a Python library that automates the entire ML pipeline for tabular data — from raw CSV to trained, production-ready model — in as few as 3 lines of code.

import mrpravin as mr

df    = mr.pravinDA("data.csv")                  # clean, encode, ready
model = mr.pravinDS(df, target="loan_status")    # AutoML → best model
model.summary()                                  # full model card

No manual preprocessing. No encoder fitting. No scaler setup. No model selection loop.


Install

# Core (pandas, numpy, scikit-learn)
pip install mrpravin

# Full — adds XGBoost, LightGBM, encoding detection
pip install "mrpravin[full]"

The 3 Functions

Function What it does
mr.pravinDA(source) Loads, cleans, encodes, and returns a ready DataFrame
mr.pravinDS(df, target) Full AutoML — selects and tunes the best model
mr.pravinML Production inference layer — predict, validate, explain, benchmark

Quick Start

Data Analyst Mode

import mrpravin as mr

# Works with CSV, Excel, JSON, or a DataFrame
df = mr.pravinDA("data.csv")

print(df.head())   # fully cleaned, encoded, human-readable
print(df.shape)

What happens automatically:

  • Duplicate rows removed
  • Missing values filled (median for numeric, mode for categorical)
  • Outliers winsorized
  • Boolean columns → 0 / 1
  • Categorical columns → one-hot encoded
  • High cardinality columns → frequency encoded
  • Datetime columns → year / month / day / dayofweek features
  • ID columns → dropped

Data Scientist Mode — AutoML

import mrpravin as mr

df    = mr.pravinDA("data.csv")
model = mr.pravinDS(df, target="price")

model.summary()

What happens automatically:

  • Train / test split with zero data leakage
  • Runs: LinearRegression / LogisticRegression, RandomForest, GradientBoosting (+ XGBoost, LightGBM if installed)
  • Cross-validated hyperparameter tuning
  • Picks the best model
  • Evaluates on held-out test set
  • Returns a pravinML object ready for production

Production Inference — pravinML

import pandas as pd

# Predict on new raw data
new_data = pd.DataFrame({
    "feature_1": [8, 3, 6],
    "feature_2": [95, 55, 75],
    "category":  ["Yes", "No", "Yes"],
})

predictions = model.predict(new_data)       # auto-cleans internally
probabilities = model.predict_proba(new_data)  # classification only

# Validate schema before predicting
report = model.validate(new_data)
print(report.summary())

# Feature importance
for feature, pct in model.explain().items():
    print(f"  {feature}: {pct:.1f}%")

# Benchmark inference speed
bench = model.benchmark(new_data, n_runs=100)
print(f"p50: {bench['p50_ms']} ms | throughput: {bench['throughput_rows_per_sec']} rows/s")

# Save and load
model.save("model.pkl")
model = mr.pravinML.load("model.pkl")

Real Results

Dataset Rows Problem Best Model Score
Student Performance 10,000 Regression LinearRegression R² = 0.988
Loan Default Prediction 45,000 Classification GradientBoosting Accuracy = 93.4%, ROC-AUC = 97.8%

Both achieved with identical 3-line code.


Configuration

from mrpravin import MrPravinConfig

cfg = MrPravinConfig(
    random_seed=42,
    cv_folds=5,
    n_iter_search=20,        # hyperparameter search iterations
    use_xgboost=True,
    use_lightgbm=True,
    outlier_method="iqr",    # "iqr" | "zscore" | "mad"
    verbose=True,
)

model = mr.pravinDS("data.csv", target="label", cfg=cfg)

Save and reuse config:

cfg.to_json("my_config.json")
cfg = MrPravinConfig.from_json("my_config.json")

Architecture

mrpravin/
├── mrpravin/
│   ├── __init__.py          ← public API
│   ├── config.py            ← MrPravinConfig
│   ├── pipeline.py          ← pravinDA() and pravinDS()
│   ├── ml.py                ← pravinML (inference layer)
│   ├── core/
│   │   ├── loader.py        ← CSV / Excel / JSON loading
│   │   ├── profiler.py      ← column type detection
│   │   ├── cleaner.py       ← dedup, imputation, outliers
│   │   ├── encoder.py       ← OHE, frequency, boolean encoding
│   │   ├── scaler.py        ← StandardScaler / RobustScaler
│   │   └── report.py        ← report builder + JSON/HTML export
│   └── automl/
│       ├── model_selector.py ← problem detection + candidates
│       ├── tuner.py          ← RandomizedSearchCV
│       └── evaluator.py      ← metrics + feature importance
└── tests/
    └── test_mrpravin.py     ← 37 tests

Full pipeline flow

CSV / Excel / JSON / DataFrame
        ↓
   pravinDA()
        ├── load
        ├── detect column types  (7 types)
        ├── clean                (dedup, impute, outliers, text)
        ├── encode               (OHE / frequency / boolean / datetime)
        └── returns DataFrame    ← human-readable, ML-ready
        ↓
   pravinDS()
        ├── dedup full dataset before split  (prevents X/y misalignment)
        ├── train / test split               (no leakage)
        ├── clean + encode + scale           (fit on train only)
        ├── model selection + CV tuning
        ├── evaluate on test set
        └── returns pravinML object
        ↓
   pravinML.predict()
        ├── validate schema
        ├── clean + encode + scale           (transform only, no re-fit)
        └── predict

Zero data leakage by design. Encoders and scalers are always fit on training data only, then applied via .transform() at inference time.


pravinML — Full API Reference

model.predict(X)              # predict labels / values
model.predict_proba(X)        # predict class probabilities
model.validate(X)             # schema + drift check → ValidationReport
model.evaluate(X, y)          # metrics on any labelled dataset
model.explain(top_n=20)       # feature importance as % contribution
model.summary()               # full model card printout
model.benchmark(X, n_runs=100)# inference latency p50/p95/p99
model.save("model.pkl")       # persist with integrity checksum
mr.pravinML.load("model.pkl") # load with checksum verification
model.metrics                 # raw metrics dict
model.feature_names           # training feature list
model.problem_type            # 'regression' | 'binary_classification' | ...
model.schema                  # InputSchema — raw feature ranges
model.model_name              # winning algorithm name

Supported Formats

Format Extension
CSV .csv, .tsv, .txt
Excel .xlsx, .xls, .xlsm
JSON .json (records or lines)
DataFrame Pass directly

Requirements

  • Python ≥ 3.9
  • pandas ≥ 1.5
  • numpy ≥ 1.23
  • scikit-learn ≥ 1.3
  • scipy ≥ 1.10
  • openpyxl ≥ 3.1 (Excel support)

Optional:

  • xgboost ≥ 1.7
  • lightgbm ≥ 4.0
  • chardet ≥ 5.0 (non-UTF-8 CSV encoding detection)

Run Tests

cd mrpravin
pip install -e ".[dev]"
pytest tests/ -v

37 tests covering profiler, cleaner, encoder, scaler, pravinDA, pravinDS, pravinML (predict, validate, evaluate, explain, benchmark, save/load).


Roadmap

  • Phase 1 — pravinDA · pravinDS · pravinML
  • Phase 2 — pravinAI — static pipeline compiler and anti-pattern detector
  • Phase 3 — pravinDL · pravinNLP — deep learning and NLP extensions

Author

Pravin MR — Data Engineer & ML Systems Builder, Chennai, India


License

MIT © 2026 Pravin MR — see LICENSE for full text.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mrpravin-0.1.0.tar.gz (38.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mrpravin-0.1.0-py3-none-any.whl (34.5 kB view details)

Uploaded Python 3

File details

Details for the file mrpravin-0.1.0.tar.gz.

File metadata

  • Download URL: mrpravin-0.1.0.tar.gz
  • Upload date:
  • Size: 38.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for mrpravin-0.1.0.tar.gz
Algorithm Hash digest
SHA256 97c4db5fb0c73156cf9577bc9875bc8586de7b2f3a4bd0490bcf3595bd3b6ea3
MD5 4340fe9ca7cd5261f964492aa2828bb0
BLAKE2b-256 6109d2f3af29757cd9cd27a43b4deb4361cc417808713f38c199317b7c8f28e4

See more details on using hashes here.

File details

Details for the file mrpravin-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: mrpravin-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 34.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for mrpravin-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 bc3c32011f002ad534758f5d74dcfa9b51649949330e3bf032bae3528cfb82da
MD5 b5431c94ddbb419c706ae5918ce06434
BLAKE2b-256 48be1b95f74be629ce7d5bea4f832b645ac6335d7acb7fe6097c8b3ee2997abf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page