Skip to main content

Educational descriptive analytics + statistical inference + process mining. Rich tables, diagnostic checks, hypothesis testing, and business process analysis for teaching and analysis.

Project description

BizLens v2.2.11 📊

Educational Analytics: Descriptive + Statistical Inference + Process Mining

BizLens is a Python library for business analysts, data scientists, and educators. It combines three powerful analytics domains:

  • Descriptive Analytics: Rich statistical tables, summaries, and data exploration
  • Statistical Inference: Hypothesis testing, confidence intervals, effect sizes
  • Process Mining: Event log analysis, case metrics, bottleneck detection

Quick Start

Installation

pip install bizlens==2.2.11

3-Line Example: Descriptive Analytics

import bizlens as bl
import pandas as pd

data = pd.DataFrame({'revenue': [100, 150, 200, 250], 'region': ['A', 'B', 'A', 'B']})
bl.describe(data)  # Auto-detects data type, creates summary tables

Process Mining (Auto-Detection)

event_log = bl.generate_hr_onboarding_event_log(num_cases=50)
bl.describe(event_log)  # Auto-detects as event log, shows case metrics & variants

Hypothesis Testing

import numpy as np

sample = np.random.normal(100, 15, size=50)
bl.inference.confidence_interval(sample)  # 95% CI for mean
bl.inference.one_sample_ttest(sample, pop_mean=100)  # Test vs population

Core Modules

1. describe() — Smart Descriptive Analytics

  • Analyzes DataFrames with automatic statistics
  • Auto-detects event logs: When columns include case_id/activity/timestamp
  • Sample vs Population: Educational comparison (ddof=1 vs ddof=0)
  • Rich output: Professional console tables with rich formatting
bl.describe(dataframe)
bl.describe(event_log)  # Auto-detects and shows process mining metrics

2. tables — Statistical Tables

Professional tables for data summarization:

bl.tables.frequency_table(series)              # Value counts with %
bl.tables.percentile_table(series)             # Quartile breakdown (0,25,50,75,100)
bl.tables.contingency_table(df, 'row', 'col') # Crosstab with chi-square
bl.tables.summary_statistics(df)               # Count, mean, std, min, quartiles, max
bl.tables.group_comparison(df, 'group')       # ANOVA across groups
bl.tables.distribution_fit(series)             # Fit to distributions (normal, exponential, etc)
bl.tables.descriptive_comparison(df1, df2)    # Sample vs population tables

3. diagnostic — Data Quality & Statistical Diagnostics

Check data quality and identify anomalies:

bl.diagnostic.detect_outliers(series, method='iqr')      # IQR, Z-score, or Isolation Forest
bl.diagnostic.normality_test(series)                      # Shapiro-Wilk, Anderson-Darling, KS
bl.diagnostic.correlation_analysis(df)                    # Pearson & Spearman with heatmap
bl.diagnostic.missing_value_analysis(df)                  # Missing data patterns
bl.diagnostic.duplicate_analysis(df)                      # Find exact duplicates
bl.diagnostic.sample_vs_population(sample, pop_mean, pop_std)  # Educational t-test

4. inference — Hypothesis Testing & Statistical Inference

Test hypotheses and estimate population parameters:

bl.inference.confidence_interval(sample, confidence=0.95)            # 95% CI for mean
bl.inference.one_sample_ttest(sample, pop_mean=100)                 # Test sample vs population
bl.inference.two_sample_ttest(group1, group2)                       # Compare two groups (t-test + Mann-Whitney U)
bl.inference.paired_ttest(before, after)                            # Before/after testing
bl.inference.anova_test({'A': group_a, 'B': group_b, 'C': group_c}) # Multi-group comparison
bl.inference.correlation_test(x, y)                                  # Correlation significance

5. process_mining — Event Log Analysis

Analyze business processes from event logs:

bl.process_mining.case_metrics(event_log)           # Duration, cost, activity count per case
bl.process_mining.activity_metrics(event_log)       # Frequency, duration by activity
bl.process_mining.resource_analysis(event_log)      # Workload distribution
bl.process_mining.variant_discovery(event_log)      # Top activity sequences (paths)
bl.process_mining.bottleneck_analysis(event_log)    # Waiting time identification
bl.process_mining.rework_detection(event_log)       # Repeated activities
bl.process_mining.timeline_visualization(event_log) # Interactive Gantt chart (plotly)

Auto-Detection Requirements:

  • case_id column: Uniquely identifies a case/instance
  • activity column: Names the activity/step
  • timestamp column: When the activity occurred
  • Optional: resource, cost, or other numeric columns

6. quality — Data Quality Assessment

Evaluate overall data quality:

bl.quality.completeness_report(df)      # Missing data per column
bl.quality.consistency_check(df)        # Type mixing and format violations
bl.quality.uniqueness_analysis(df)      # Cardinality and duplicates
bl.quality.data_profile(df)             # Overall quality score (0-100)
bl.quality.outlier_summary(df)          # Quick outlier identification

Built-in Datasets & Generators

# Load classic datasets
bl.load_dataset('iris')         # Fisher's iris dataset
bl.load_dataset('tips')         # Restaurant tips
bl.load_dataset('diamonds')     # Diamond prices

# Generate synthetic business data
bl.generate_sample_data(n_rows=1000)

# Generate event logs for process mining
bl.generate_hr_onboarding_event_log(num_cases=300)
bl.generate_healthcare_event_log(num_cases=250)
bl.generate_manufacturing_event_log(num_cases=200)
bl.generate_tech_support_event_log(num_cases=400)

Examples

Example 1: Descriptive Analytics with Tables

import bizlens as bl
import pandas as pd
import numpy as np

# Generate data
data = pd.DataFrame({
    'region': np.random.choice(['North', 'South', 'East', 'West'], 500),
    'revenue': np.random.gamma(2, 5000, 500),
    'satisfaction': np.random.normal(7.5, 1.5, 500).clip(1, 10),
})

# Analyze
bl.describe(data)
bl.tables.frequency_table(data['region'])
bl.tables.percentile_table(data['revenue'])
bl.tables.summary_statistics(data)
bl.diagnostic.detect_outliers(data['revenue'])
bl.quality.data_profile(data)

Run this example:

python examples/01_descriptive_analytics.py

Example 2: Process Mining Event Logs

import bizlens as bl

# Generate HR onboarding event log
event_log = bl.generate_hr_onboarding_event_log(num_cases=100)

# Analyze with auto-detection
bl.describe(event_log)  # Auto-detects event log

# Process mining metrics
bl.process_mining.case_metrics(event_log)
bl.process_mining.variant_discovery(event_log, top_n=5)
bl.process_mining.bottleneck_analysis(event_log)
bl.process_mining.rework_detection(event_log)

Run this example:

python examples/02_process_mining_basics.py

Example 3: Hypothesis Testing & Inference

import bizlens as bl
import numpy as np
import pandas as pd

# Generate test data
sample = pd.Series(np.random.normal(100, 15, size=50))
control = pd.Series(np.random.normal(100, 15, size=40))
treatment = pd.Series(np.random.normal(110, 18, size=40))

# Confidence intervals
bl.inference.confidence_interval(sample, confidence=0.95)

# t-tests
bl.inference.one_sample_ttest(sample, pop_mean=100)
bl.inference.two_sample_ttest(control, treatment)

# ANOVA
bl.inference.anova_test({
    'Control': control,
    'Treatment': treatment,
})

Run this example:

python examples/03_inference_hypothesis_testing.py

Educational Features

Sample vs Population

Learn the difference between sample and population statistics:

# Sample: Bessel's correction (ddof=1)
bl.diagnostic.sample_vs_population(
    sample_data=my_sample,
    pop_mean=100,
    pop_std=15,
    column_name='Revenue'
)

Hypothesis Testing with Effect Sizes

Understand both statistical AND practical significance:

results = bl.inference.one_sample_ttest(sample, pop_mean=100)
print(f"p-value: {results['p_value']}")      # Statistical significance
print(f"Cohen's d: {results['cohens_d']}")   # Effect size (practical significance)

Visualization of Distributions

table, dist_info = bl.tables.distribution_fit(data['column'])
# Returns: best-fit distribution, parameters, AIC scores

Installation & Dependencies

Minimal Install

pip install bizlens==2.2.11

Core dependencies:

  • pandas ≥1.5.0 — Data manipulation
  • numpy ≥1.21.0 — Numerical operations
  • scipy ≥1.9.0 — Statistical tests
  • statsmodels ≥0.13.0 — Advanced statistics
  • scikit-learn ≥1.0.0 — Outlier detection
  • matplotlib ≥3.6.0 — Static plots
  • seaborn ≥0.12.0 — Statistical plots
  • rich ≥13.0.0 — Beautiful console output
  • plotly ≥5.0.0 — Interactive visualizations (optional)

Optional Extras

# For Jupyter/Colab
pip install bizlens[jupyter]

# For advanced correlations
pip install bizlens[stats-advanced]

# Everything
pip install bizlens[full]

Running in Different Environments

Google Colab

  1. Paste example code into a cell
  2. Run! (Auto-install handles dependencies)

Jupyter Notebook

%run examples/01_descriptive_analytics.py

VSCode / Terminal

python examples/01_descriptive_analytics.py

Anaconda

conda install -c conda-forge bizlens==2.2.11

Version History

v2.2.11 (Current)

  • ✅ NEW: Statistical tables module (frequency, percentile, contingency, summary)
  • ✅ NEW: Diagnostic module (outliers, normality, correlations, quality checks)
  • ✅ NEW: Inference module (hypothesis testing, confidence intervals, effect sizes)
  • ✅ ENHANCED: Process mining with Gantt charts, bottleneck detection, rework analysis
  • ✅ NEW: Quality module (data profiling, completeness, consistency assessment)
  • ✅ Fixed: narwhals.selectors fallback for compatibility
  • ✅ Enhanced: Rich educational docstrings with learning notes

v2.2.10

  • Initial v2.2 release with descriptive analytics and process mining

v2.2.1

  • Early descriptive analytics foundation

Future Roadmap

Version Features Status
v2.2.11 Tables, Diagnostic, Inference, Process Mining, Quality ✅ Current
v2.3.0 Predictive Analytics (regression, classification, decision trees) 🔄 Planned
v2.4.0 Time-Series Forecasting (ARIMA, exponential smoothing) 🔄 Planned
v2.5.0 Quality & Six Sigma (Cpk, control charts, hypothesis testing) 🔄 Planned
v3.0.0 Advanced: Deep Learning pipelines, automated ML 🔄 Future

Performance Notes

  • Handles: DataFrames up to 1M rows (pandas) or larger (polars)
  • Auto-detects: Event logs automatically (looks for case_id + activity + timestamp)
  • Educational: Rich output with learning notes (not optimized for speed)
  • Polars Support: Via narwhals abstraction layer for fast operations

FAQ

Q: Can I use BizLens with Polars DataFrames? A: Yes! Via narwhals compatibility layer. Same API works with both pandas and polars.

Q: Does BizLens replace scipy/statsmodels? A: No! It's a wrapper providing educational, easy-to-use interfaces. For advanced use, use scipy/statsmodels directly.

Q: Is it for teaching or production? A: Both! Clean API and rich output make it great for education. Lightweight and fast enough for analysis pipelines.

Q: Can I use the event log generators for my data? A: Yes! The templates include realistic HR, healthcare, manufacturing, and support data. Modify them for your use case.

Q: Does it support missing data? A: Yes! Functions automatically drop NaN values. Use pandas .fillna() or .interpolate() for imputation first.


Contributing

Issues, feature requests, and PRs welcome at: https://github.com/solutiongate-learn/bizlens


License

MIT License — Free for personal, educational, and commercial use.


Citation

If you use BizLens in research or publications:

@software{bizlens2024,
  title = {BizLens: Educational Analytics for Python},
  author = {Singh, Sudhanshu},
  year = {2024},
  url = {https://github.com/solutiongate-learn/bizlens}
}

Made with ❤️ for data analysts, teachers, and students

GitHubPyPIIssues

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bizlens-2.2.11.tar.gz (64.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bizlens-2.2.11-py3-none-any.whl (61.9 kB view details)

Uploaded Python 3

File details

Details for the file bizlens-2.2.11.tar.gz.

File metadata

  • Download URL: bizlens-2.2.11.tar.gz
  • Upload date:
  • Size: 64.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for bizlens-2.2.11.tar.gz
Algorithm Hash digest
SHA256 da63d2d88c16a888d9373aaafd289f16d0b0ebb1b9fe2721c1d93e7c20a7d0ee
MD5 cc4688b2eb4f373ef3cba377570b0969
BLAKE2b-256 bd118d069b8cd03416f760e8114fd844cb9607ea7633e7ec157533939d1c52ef

See more details on using hashes here.

File details

Details for the file bizlens-2.2.11-py3-none-any.whl.

File metadata

  • Download URL: bizlens-2.2.11-py3-none-any.whl
  • Upload date:
  • Size: 61.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for bizlens-2.2.11-py3-none-any.whl
Algorithm Hash digest
SHA256 897fdd52ff961eeea02e80086eaa90b1e79c69cccb7abea62c3e55bfdb5d727b
MD5 c7336c69ad9222fc6ded73cf7cc36cad
BLAKE2b-256 7a009c2b33ff43c7169c3592d246a0407f3b6d0966e7cc10ef4277d784e7e4f3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page