One-line Data Analyst + AutoML library for tabular datasets — clean, train, and predict with minimal code

These details have not been verified by PyPI

Project links

Project description

mrpravin

One-line Data Analyst + AutoML for tabular datasets

Built by Pravin MR · Chennai, India
mrpravin000@gmail.com · LinkedIn · GitHub

What is mrpravin?

mrpravin is a Python library that automates the entire ML pipeline for tabular data — from raw CSV to trained, production-ready model — in as few as 3 lines of code.

import mrpravin as mr

df    = mr.pravinDA("data.csv")                  # clean, encode, ready
model = mr.pravinDS(df, target="loan_status")    # AutoML → best model
model.summary()                                  # full model card

No manual preprocessing. No encoder fitting. No scaler setup. No model selection loop.

Install

# Core (pandas, numpy, scikit-learn)
pip install mrpravin

# Full — adds XGBoost, LightGBM, encoding detection
pip install "mrpravin[full]"

The 3 Functions

Function	What it does
`mr.pravinDA(source)`	Loads, cleans, encodes, and returns a ready DataFrame
`mr.pravinDS(df, target)`	Full AutoML — selects and tunes the best model
`mr.pravinML`	Production inference layer — predict, validate, explain, benchmark

Quick Start

Data Analyst Mode

import mrpravin as mr

# Works with CSV, Excel, JSON, or a DataFrame
df = mr.pravinDA("data.csv")

print(df.head())   # fully cleaned, encoded, human-readable
print(df.shape)

What happens automatically:

Duplicate rows removed
Missing values filled (median for numeric, mode for categorical)
Outliers winsorized
Boolean columns → 0 / 1
Categorical columns → one-hot encoded
High cardinality columns → frequency encoded
Datetime columns → year / month / day / dayofweek features
ID columns → dropped

Data Scientist Mode — AutoML

import mrpravin as mr

df    = mr.pravinDA("data.csv")
model = mr.pravinDS(df, target="price")

model.summary()

What happens automatically:

Train / test split with zero data leakage
Runs: LinearRegression / LogisticRegression, RandomForest, GradientBoosting (+ XGBoost, LightGBM if installed)
Cross-validated hyperparameter tuning
Picks the best model
Evaluates on held-out test set
Returns a pravinML object ready for production

Production Inference — pravinML

import pandas as pd

# Predict on new raw data
new_data = pd.DataFrame({
    "feature_1": [8, 3, 6],
    "feature_2": [95, 55, 75],
    "category":  ["Yes", "No", "Yes"],
})

predictions = model.predict(new_data)       # auto-cleans internally
probabilities = model.predict_proba(new_data)  # classification only

# Validate schema before predicting
report = model.validate(new_data)
print(report.summary())

# Feature importance
for feature, pct in model.explain().items():
    print(f"  {feature}: {pct:.1f}%")

# Benchmark inference speed
bench = model.benchmark(new_data, n_runs=100)
print(f"p50: {bench['p50_ms']} ms | throughput: {bench['throughput_rows_per_sec']} rows/s")

# Save and load
model.save("model.pkl")
model = mr.pravinML.load("model.pkl")

Real Results

Dataset	Rows	Problem	Best Model	Score
Student Performance	10,000	Regression	LinearRegression	R² = 0.988
Loan Default Prediction	45,000	Classification	GradientBoosting	Accuracy = 93.4%, ROC-AUC = 97.8%

Both achieved with identical 3-line code.

Configuration

from mrpravin import MrPravinConfig

cfg = MrPravinConfig(
    random_seed=42,
    cv_folds=5,
    n_iter_search=20,        # hyperparameter search iterations
    use_xgboost=True,
    use_lightgbm=True,
    outlier_method="iqr",    # "iqr" | "zscore" | "mad"
    verbose=True,
)

model = mr.pravinDS("data.csv", target="label", cfg=cfg)

Save and reuse config:

cfg.to_json("my_config.json")
cfg = MrPravinConfig.from_json("my_config.json")

Architecture

mrpravin/
├── mrpravin/
│   ├── __init__.py          ← public API
│   ├── config.py            ← MrPravinConfig
│   ├── pipeline.py          ← pravinDA() and pravinDS()
│   ├── ml.py                ← pravinML (inference layer)
│   ├── core/
│   │   ├── loader.py        ← CSV / Excel / JSON loading
│   │   ├── profiler.py      ← column type detection
│   │   ├── cleaner.py       ← dedup, imputation, outliers
│   │   ├── encoder.py       ← OHE, frequency, boolean encoding
│   │   ├── scaler.py        ← StandardScaler / RobustScaler
│   │   └── report.py        ← report builder + JSON/HTML export
│   └── automl/
│       ├── model_selector.py ← problem detection + candidates
│       ├── tuner.py          ← RandomizedSearchCV
│       └── evaluator.py      ← metrics + feature importance
└── tests/
    └── test_mrpravin.py     ← 37 tests

Full pipeline flow

CSV / Excel / JSON / DataFrame
        ↓
   pravinDA()
        ├── load
        ├── detect column types  (7 types)
        ├── clean                (dedup, impute, outliers, text)
        ├── encode               (OHE / frequency / boolean / datetime)
        └── returns DataFrame    ← human-readable, ML-ready
        ↓
   pravinDS()
        ├── dedup full dataset before split  (prevents X/y misalignment)
        ├── train / test split               (no leakage)
        ├── clean + encode + scale           (fit on train only)
        ├── model selection + CV tuning
        ├── evaluate on test set
        └── returns pravinML object
        ↓
   pravinML.predict()
        ├── validate schema
        ├── clean + encode + scale           (transform only, no re-fit)
        └── predict

Zero data leakage by design. Encoders and scalers are always fit on training data only, then applied via .transform() at inference time.

pravinML — Full API Reference

model.predict(X)              # predict labels / values
model.predict_proba(X)        # predict class probabilities
model.validate(X)             # schema + drift check → ValidationReport
model.evaluate(X, y)          # metrics on any labelled dataset
model.explain(top_n=20)       # feature importance as % contribution
model.summary()               # full model card printout
model.benchmark(X, n_runs=100)# inference latency p50/p95/p99
model.save("model.pkl")       # persist with integrity checksum
mr.pravinML.load("model.pkl") # load with checksum verification
model.metrics                 # raw metrics dict
model.feature_names           # training feature list
model.problem_type            # 'regression' | 'binary_classification' | ...
model.schema                  # InputSchema — raw feature ranges
model.model_name              # winning algorithm name

Supported Formats

Format	Extension
CSV	`.csv`, `.tsv`, `.txt`
Excel	`.xlsx`, `.xls`, `.xlsm`
JSON	`.json` (records or lines)
DataFrame	Pass directly

Requirements

Python ≥ 3.9
pandas ≥ 1.5
numpy ≥ 1.23
scikit-learn ≥ 1.3
scipy ≥ 1.10
openpyxl ≥ 3.1 (Excel support)

Optional:

xgboost ≥ 1.7
lightgbm ≥ 4.0
chardet ≥ 5.0 (non-UTF-8 CSV encoding detection)

Run Tests

cd mrpravin
pip install -e ".[dev]"
pytest tests/ -v

37 tests covering profiler, cleaner, encoder, scaler, pravinDA, pravinDS, pravinML (predict, validate, evaluate, explain, benchmark, save/load).

Roadmap

Phase 1 — pravinDA · pravinDS · pravinML
Phase 2 — pravinAI — static pipeline compiler and anti-pattern detector
Phase 3 — pravinDL · pravinNLP — deep learning and NLP extensions

Author

Pravin MR — Data Engineer & ML Systems Builder, Chennai, India

License

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Feb 22, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mrpravin-0.1.0.tar.gz (38.4 kB view details)

Uploaded Feb 22, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mrpravin-0.1.0-py3-none-any.whl (34.5 kB view details)

Uploaded Feb 22, 2026 Python 3

File details

Details for the file mrpravin-0.1.0.tar.gz.

File metadata

Download URL: mrpravin-0.1.0.tar.gz
Upload date: Feb 22, 2026
Size: 38.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for mrpravin-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`97c4db5fb0c73156cf9577bc9875bc8586de7b2f3a4bd0490bcf3595bd3b6ea3`
MD5	`4340fe9ca7cd5261f964492aa2828bb0`
BLAKE2b-256	`6109d2f3af29757cd9cd27a43b4deb4361cc417808713f38c199317b7c8f28e4`

See more details on using hashes here.

File details

Details for the file mrpravin-0.1.0-py3-none-any.whl.

File metadata

Download URL: mrpravin-0.1.0-py3-none-any.whl
Upload date: Feb 22, 2026
Size: 34.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for mrpravin-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`bc3c32011f002ad534758f5d74dcfa9b51649949330e3bf032bae3528cfb82da`
MD5	`b5431c94ddbb419c706ae5918ce06434`
BLAKE2b-256	`48be1b95f74be629ce7d5bea4f832b645ac6335d7acb7fe6097c8b3ee2997abf`

See more details on using hashes here.

mrpravin 0.1.0

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

mrpravin

What is mrpravin?

Install

The 3 Functions

Quick Start

Data Analyst Mode

Data Scientist Mode — AutoML

Production Inference — pravinML

Real Results

Configuration

Architecture

Full pipeline flow

pravinML — Full API Reference

Supported Formats

Requirements

Run Tests

Roadmap

Author

License

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes