Educational descriptive analytics + statistical inference + process mining. Rich tables, diagnostic checks, hypothesis testing, and business process analysis for teaching and analysis.
Project description
BizLens v2.2.11 📊
Educational Analytics: Descriptive + Statistical Inference + Process Mining
BizLens is a Python library for business analysts, data scientists, and educators. It combines three powerful analytics domains:
- Descriptive Analytics: Rich statistical tables, summaries, and data exploration
- Statistical Inference: Hypothesis testing, confidence intervals, effect sizes
- Process Mining: Event log analysis, case metrics, bottleneck detection
Quick Start
Installation
pip install bizlens==2.2.11
3-Line Example: Descriptive Analytics
import bizlens as bl
import pandas as pd
data = pd.DataFrame({'revenue': [100, 150, 200, 250], 'region': ['A', 'B', 'A', 'B']})
bl.describe(data) # Auto-detects data type, creates summary tables
Process Mining (Auto-Detection)
event_log = bl.generate_hr_onboarding_event_log(num_cases=50)
bl.describe(event_log) # Auto-detects as event log, shows case metrics & variants
Hypothesis Testing
import numpy as np
sample = np.random.normal(100, 15, size=50)
bl.inference.confidence_interval(sample) # 95% CI for mean
bl.inference.one_sample_ttest(sample, pop_mean=100) # Test vs population
Core Modules
1. describe() — Smart Descriptive Analytics
- Analyzes DataFrames with automatic statistics
- Auto-detects event logs: When columns include case_id/activity/timestamp
- Sample vs Population: Educational comparison (ddof=1 vs ddof=0)
- Rich output: Professional console tables with rich formatting
bl.describe(dataframe)
bl.describe(event_log) # Auto-detects and shows process mining metrics
2. tables — Statistical Tables
Professional tables for data summarization:
bl.tables.frequency_table(series) # Value counts with %
bl.tables.percentile_table(series) # Quartile breakdown (0,25,50,75,100)
bl.tables.contingency_table(df, 'row', 'col') # Crosstab with chi-square
bl.tables.summary_statistics(df) # Count, mean, std, min, quartiles, max
bl.tables.group_comparison(df, 'group') # ANOVA across groups
bl.tables.distribution_fit(series) # Fit to distributions (normal, exponential, etc)
bl.tables.descriptive_comparison(df1, df2) # Sample vs population tables
3. diagnostic — Data Quality & Statistical Diagnostics
Check data quality and identify anomalies:
bl.diagnostic.detect_outliers(series, method='iqr') # IQR, Z-score, or Isolation Forest
bl.diagnostic.normality_test(series) # Shapiro-Wilk, Anderson-Darling, KS
bl.diagnostic.correlation_analysis(df) # Pearson & Spearman with heatmap
bl.diagnostic.missing_value_analysis(df) # Missing data patterns
bl.diagnostic.duplicate_analysis(df) # Find exact duplicates
bl.diagnostic.sample_vs_population(sample, pop_mean, pop_std) # Educational t-test
4. inference — Hypothesis Testing & Statistical Inference
Test hypotheses and estimate population parameters:
bl.inference.confidence_interval(sample, confidence=0.95) # 95% CI for mean
bl.inference.one_sample_ttest(sample, pop_mean=100) # Test sample vs population
bl.inference.two_sample_ttest(group1, group2) # Compare two groups (t-test + Mann-Whitney U)
bl.inference.paired_ttest(before, after) # Before/after testing
bl.inference.anova_test({'A': group_a, 'B': group_b, 'C': group_c}) # Multi-group comparison
bl.inference.correlation_test(x, y) # Correlation significance
5. process_mining — Event Log Analysis
Analyze business processes from event logs:
bl.process_mining.case_metrics(event_log) # Duration, cost, activity count per case
bl.process_mining.activity_metrics(event_log) # Frequency, duration by activity
bl.process_mining.resource_analysis(event_log) # Workload distribution
bl.process_mining.variant_discovery(event_log) # Top activity sequences (paths)
bl.process_mining.bottleneck_analysis(event_log) # Waiting time identification
bl.process_mining.rework_detection(event_log) # Repeated activities
bl.process_mining.timeline_visualization(event_log) # Interactive Gantt chart (plotly)
Auto-Detection Requirements:
case_idcolumn: Uniquely identifies a case/instanceactivitycolumn: Names the activity/steptimestampcolumn: When the activity occurred- Optional:
resource,cost, or other numeric columns
6. quality — Data Quality Assessment
Evaluate overall data quality:
bl.quality.completeness_report(df) # Missing data per column
bl.quality.consistency_check(df) # Type mixing and format violations
bl.quality.uniqueness_analysis(df) # Cardinality and duplicates
bl.quality.data_profile(df) # Overall quality score (0-100)
bl.quality.outlier_summary(df) # Quick outlier identification
Built-in Datasets & Generators
# Load classic datasets
bl.load_dataset('iris') # Fisher's iris dataset
bl.load_dataset('tips') # Restaurant tips
bl.load_dataset('diamonds') # Diamond prices
# Generate synthetic business data
bl.generate_sample_data(n_rows=1000)
# Generate event logs for process mining
bl.generate_hr_onboarding_event_log(num_cases=300)
bl.generate_healthcare_event_log(num_cases=250)
bl.generate_manufacturing_event_log(num_cases=200)
bl.generate_tech_support_event_log(num_cases=400)
Examples
Example 1: Descriptive Analytics with Tables
import bizlens as bl
import pandas as pd
import numpy as np
# Generate data
data = pd.DataFrame({
'region': np.random.choice(['North', 'South', 'East', 'West'], 500),
'revenue': np.random.gamma(2, 5000, 500),
'satisfaction': np.random.normal(7.5, 1.5, 500).clip(1, 10),
})
# Analyze
bl.describe(data)
bl.tables.frequency_table(data['region'])
bl.tables.percentile_table(data['revenue'])
bl.tables.summary_statistics(data)
bl.diagnostic.detect_outliers(data['revenue'])
bl.quality.data_profile(data)
Run this example:
python examples/01_descriptive_analytics.py
Example 2: Process Mining Event Logs
import bizlens as bl
# Generate HR onboarding event log
event_log = bl.generate_hr_onboarding_event_log(num_cases=100)
# Analyze with auto-detection
bl.describe(event_log) # Auto-detects event log
# Process mining metrics
bl.process_mining.case_metrics(event_log)
bl.process_mining.variant_discovery(event_log, top_n=5)
bl.process_mining.bottleneck_analysis(event_log)
bl.process_mining.rework_detection(event_log)
Run this example:
python examples/02_process_mining_basics.py
Example 3: Hypothesis Testing & Inference
import bizlens as bl
import numpy as np
import pandas as pd
# Generate test data
sample = pd.Series(np.random.normal(100, 15, size=50))
control = pd.Series(np.random.normal(100, 15, size=40))
treatment = pd.Series(np.random.normal(110, 18, size=40))
# Confidence intervals
bl.inference.confidence_interval(sample, confidence=0.95)
# t-tests
bl.inference.one_sample_ttest(sample, pop_mean=100)
bl.inference.two_sample_ttest(control, treatment)
# ANOVA
bl.inference.anova_test({
'Control': control,
'Treatment': treatment,
})
Run this example:
python examples/03_inference_hypothesis_testing.py
Educational Features
Sample vs Population
Learn the difference between sample and population statistics:
# Sample: Bessel's correction (ddof=1)
bl.diagnostic.sample_vs_population(
sample_data=my_sample,
pop_mean=100,
pop_std=15,
column_name='Revenue'
)
Hypothesis Testing with Effect Sizes
Understand both statistical AND practical significance:
results = bl.inference.one_sample_ttest(sample, pop_mean=100)
print(f"p-value: {results['p_value']}") # Statistical significance
print(f"Cohen's d: {results['cohens_d']}") # Effect size (practical significance)
Visualization of Distributions
table, dist_info = bl.tables.distribution_fit(data['column'])
# Returns: best-fit distribution, parameters, AIC scores
Installation & Dependencies
Minimal Install
pip install bizlens==2.2.11
Core dependencies:
- pandas ≥1.5.0 — Data manipulation
- numpy ≥1.21.0 — Numerical operations
- scipy ≥1.9.0 — Statistical tests
- statsmodels ≥0.13.0 — Advanced statistics
- scikit-learn ≥1.0.0 — Outlier detection
- matplotlib ≥3.6.0 — Static plots
- seaborn ≥0.12.0 — Statistical plots
- rich ≥13.0.0 — Beautiful console output
- plotly ≥5.0.0 — Interactive visualizations (optional)
Optional Extras
# For Jupyter/Colab
pip install bizlens[jupyter]
# For advanced correlations
pip install bizlens[stats-advanced]
# Everything
pip install bizlens[full]
Running in Different Environments
Google Colab
- Paste example code into a cell
- Run! (Auto-install handles dependencies)
Jupyter Notebook
%run examples/01_descriptive_analytics.py
VSCode / Terminal
python examples/01_descriptive_analytics.py
Anaconda
conda install -c conda-forge bizlens==2.2.11
Version History
v2.2.11 (Current)
- ✅ NEW: Statistical tables module (frequency, percentile, contingency, summary)
- ✅ NEW: Diagnostic module (outliers, normality, correlations, quality checks)
- ✅ NEW: Inference module (hypothesis testing, confidence intervals, effect sizes)
- ✅ ENHANCED: Process mining with Gantt charts, bottleneck detection, rework analysis
- ✅ NEW: Quality module (data profiling, completeness, consistency assessment)
- ✅ Fixed: narwhals.selectors fallback for compatibility
- ✅ Enhanced: Rich educational docstrings with learning notes
v2.2.10
- Initial v2.2 release with descriptive analytics and process mining
v2.2.1
- Early descriptive analytics foundation
Future Roadmap
| Version | Features | Status |
|---|---|---|
| v2.2.11 | Tables, Diagnostic, Inference, Process Mining, Quality | ✅ Current |
| v2.3.0 | Predictive Analytics (regression, classification, decision trees) | 🔄 Planned |
| v2.4.0 | Time-Series Forecasting (ARIMA, exponential smoothing) | 🔄 Planned |
| v2.5.0 | Quality & Six Sigma (Cpk, control charts, hypothesis testing) | 🔄 Planned |
| v3.0.0 | Advanced: Deep Learning pipelines, automated ML | 🔄 Future |
Performance Notes
- Handles: DataFrames up to 1M rows (pandas) or larger (polars)
- Auto-detects: Event logs automatically (looks for case_id + activity + timestamp)
- Educational: Rich output with learning notes (not optimized for speed)
- Polars Support: Via narwhals abstraction layer for fast operations
FAQ
Q: Can I use BizLens with Polars DataFrames? A: Yes! Via narwhals compatibility layer. Same API works with both pandas and polars.
Q: Does BizLens replace scipy/statsmodels? A: No! It's a wrapper providing educational, easy-to-use interfaces. For advanced use, use scipy/statsmodels directly.
Q: Is it for teaching or production? A: Both! Clean API and rich output make it great for education. Lightweight and fast enough for analysis pipelines.
Q: Can I use the event log generators for my data? A: Yes! The templates include realistic HR, healthcare, manufacturing, and support data. Modify them for your use case.
Q: Does it support missing data?
A: Yes! Functions automatically drop NaN values. Use pandas .fillna() or .interpolate() for imputation first.
Contributing
Issues, feature requests, and PRs welcome at: https://github.com/solutiongate-learn/bizlens
License
MIT License — Free for personal, educational, and commercial use.
Citation
If you use BizLens in research or publications:
@software{bizlens2024,
title = {BizLens: Educational Analytics for Python},
author = {Singh, Sudhanshu},
year = {2024},
url = {https://github.com/solutiongate-learn/bizlens}
}
Made with ❤️ for data analysts, teachers, and students
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file bizlens-2.2.11.tar.gz.
File metadata
- Download URL: bizlens-2.2.11.tar.gz
- Upload date:
- Size: 64.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
da63d2d88c16a888d9373aaafd289f16d0b0ebb1b9fe2721c1d93e7c20a7d0ee
|
|
| MD5 |
cc4688b2eb4f373ef3cba377570b0969
|
|
| BLAKE2b-256 |
bd118d069b8cd03416f760e8114fd844cb9607ea7633e7ec157533939d1c52ef
|
File details
Details for the file bizlens-2.2.11-py3-none-any.whl.
File metadata
- Download URL: bizlens-2.2.11-py3-none-any.whl
- Upload date:
- Size: 61.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
897fdd52ff961eeea02e80086eaa90b1e79c69cccb7abea62c3e55bfdb5d727b
|
|
| MD5 |
c7336c69ad9222fc6ded73cf7cc36cad
|
|
| BLAKE2b-256 |
7a009c2b33ff43c7169c3592d246a0407f3b6d0966e7cc10ef4277d784e7e4f3
|