Skip to main content

Lightweight Python statistics library for descriptive stats and IQR-based outlier detection.

Project description

📊 StatTools

A lightweight, zero-dependency Python statistics library for learning and analysis

📢 A Note from the Author

Hi! I'm a student, and this is my first Python package. 🎓

I didn't create StatTools to compete with established libraries or to impress anyone—I built it as a learning exercise to understand how Python packages work, how to structure code properly, and how to publish to PyPI. This project helped me practice fundamental concepts like package structure, testing, documentation, and distribution.

StatTools is not a production-ready, feature-complete statistics library. It's a student project that implements basic statistical functions as a learning journey. I'm sharing it openly because someone else learning Python might find it useful, or at least see how a beginner approaches building their first package.

I plan to improve and expand it over time as I learn more. If you're a student like me, feel free to explore the code, suggest improvements, or even fork it for your own learning!

— Anannya Vyas

Acknowledgments

Special thanks to my teacher Lovnish Verma for inspiring me to take on this project. Their own package, snapmyenv, served as motivation and a reference for how to structure and publish a Python library. This wouldn't exist without their guidance and encouragement!


StatTools is a lightweight, zero-dependency statistics library designed to solve the "I need quick stats without NumPy" problem for students, educators, and developers. It provides essential descriptive statistics and outlier detection using only Python's standard library—making it perfect for learning environments, academic projects, and situations where you need reliable statistical analysis without heavy frameworks.

Share your code with confidence, knowing StatTools works everywhere Python runs—no compilation, no platform conflicts, no dependency hell.

🚀 Key Features

  • 📈 Descriptive Statistics: Calculate mean, median, and percentiles with straightforward, textbook-accurate implementations
  • 📊 Dispersion Measures: Compute Interquartile Range (IQR) for understanding data spread
  • 🔍 Outlier Detection: Identify anomalies using the industry-standard IQR method
  • 🛡️ Zero Dependencies: Built using only Python's standard library—install it anywhere without conflicts
  • Fully Tested: Comprehensive pytest coverage ensures reliability
  • 🪶 Lightweight: Minimal footprint, maximum clarity
  • 📚 Educational: Clean, readable code that mirrors statistical textbook definitions

📦 Installation

pip install stattools-anannya==0.1.6

⚡ Quick Start

The "Instant Analysis" Workflow

Step 1: Import and Analyze

import stattools

# Your dataset
grades = [78, 82, 85, 88, 90, 92, 95, 45, 98, 100]

# Get insights instantly
print(f"Class Average: {stattools.mean(grades):.1f}")
print(f"Median Score: {stattools.median(grades):.1f}")
print(f"Top 25% Threshold: {stattools.percentile(grades, 75):.1f}")
print(f"Score Spread (IQR): {stattools.iqr(grades):.1f}")
print(f"Outliers: {stattools.detect_outliers_iqr(grades)}")

Output:

Class Average: 85.3
Median Score: 91.0
Top 25% Threshold: 96.2
Score Spread (IQR): 13.0
Outliers: [45]

Common Use Cases

Quality Control:

from stattools import mean, iqr, detect_outliers_iqr

# Product weights in grams
weights = [500, 502, 498, 501, 503, 499, 520, 497, 500, 502]

avg_weight = mean(weights)
variability = iqr(weights)
defects = detect_outliers_iqr(weights)

print(f"Average: {avg_weight:.2f}g (±{variability:.2f}g IQR)")
print(f"Defective items: {defects}")

Financial Screening:

from stattools import percentile, detect_outliers_iqr

# Daily returns (%)
returns = [0.5, -0.3, 0.8, -0.2, 0.4, 12.5, -0.1, 0.6]

normal_range = percentile(returns, 95)
anomalies = detect_outliers_iqr(returns)

print(f"95% of returns below: {normal_range:.2f}%")
print(f"Abnormal trading days: {anomalies}")

📖 API Reference

mean(data) → float

Calculates the arithmetic mean (average) of a dataset.

Parameters:

  • data (list/tuple): Numeric values

Returns: Float representing the mean

Example:

stattools.mean([10, 20, 30, 40, 50])  # Returns: 30.0

median(data) → float

Finds the middle value in a sorted dataset. For even-length datasets, returns the average of the two middle values.

Parameters:

  • data (list/tuple): Numeric values

Returns: Float representing the median

Example:

stattools.median([1, 2, 3, 4, 5])  # Returns: 3.0
stattools.median([1, 2, 3, 4])     # Returns: 2.5

percentile(data, p) → float

Calculates the p-th percentile using linear interpolation between closest ranks.

Parameters:

  • data (list/tuple): Numeric values
  • p (int/float): Percentile to calculate (0-100)

Returns: Float representing the percentile value

Example:

stattools.percentile([10, 20, 30, 40, 50], 75)  # Returns: 40.0
stattools.percentile([1, 2, 3, 4, 5], 50)       # Returns: 3.0 (same as median)

iqr(data) → float

Computes the Interquartile Range (Q3 - Q1), a measure of statistical dispersion.

Parameters:

  • data (list/tuple): Numeric values

Returns: Float representing the IQR

Example:

stattools.iqr([1, 2, 3, 4, 5, 6, 7, 8, 9])  # Returns: 4.0

detect_outliers_iqr(data, multiplier=1.5) → list

Identifies outliers using the IQR method. Values are considered outliers if they fall outside:

  • Lower bound: Q1 - (multiplier × IQR)
  • Upper bound: Q3 + (multiplier × IQR)

Parameters:

  • data (list/tuple): Numeric values
  • multiplier (float): Sensitivity factor (default: 1.5, standard statistical practice)

Returns: List of outlier values

Example:

data = [5, 7, 8, 10, 12, 100]
stattools.detect_outliers_iqr(data)              # Returns: [100]
stattools.detect_outliers_iqr(data, multiplier=3.0)  # Less sensitive, Returns: [100]

Interpretation:

  • multiplier=1.5 (default): Standard outlier detection
  • multiplier=3.0: Extreme outliers only
  • Lower multipliers → more sensitive (flags more values)

🔍 What Makes StatTools Different?

Unlike heavyweight scientific computing libraries, StatTools focuses on:

Feature StatTools NumPy/SciPy/Pandas
Dependencies None (pure Python) Compiled C/Fortran binaries
Install Size ~10 KB 50-100+ MB
Learning Curve Minimal Steep
Platform Issues None Common on ARM/M1/Windows
Code Clarity Readable textbook implementations Optimized C wrappers
Best For Learning, teaching, simple scripts Production data science

💡 Real-World Examples

Example 1: Grade Analysis System

from stattools import mean, median, percentile, detect_outliers_iqr

class GradeAnalyzer:
    def __init__(self, scores):
        self.scores = scores
    
    def summary(self):
        return {
            'average': mean(self.scores),
            'median': median(self.scores),
            'top_10_percent': percentile(self.scores, 90),
            'struggling_students': [s for s in self.scores if s < percentile(self.scores, 25)],
            'anomalies': detect_outliers_iqr(self.scores)
        }

# Usage
analyzer = GradeAnalyzer([78, 82, 85, 88, 90, 92, 95, 45, 98, 100])
report = analyzer.summary()
print(report)

Example 2: Manufacturing Quality Dashboard

from stattools import mean, iqr, detect_outliers_iqr

def quality_check(measurements, tolerance_iqr=5.0):
    """
    Check if manufacturing process is within acceptable variability.
    """
    avg = mean(measurements)
    spread = iqr(measurements)
    defects = detect_outliers_iqr(measurements)
    
    status = "PASS" if spread <= tolerance_iqr and len(defects) == 0 else "FAIL"
    
    return {
        'status': status,
        'average': avg,
        'variability': spread,
        'defect_count': len(defects),
        'defective_items': defects
    }

# Daily production run
batch = [500.1, 499.8, 500.3, 500.0, 499.9, 500.2, 515.0]
print(quality_check(batch))
# {'status': 'FAIL', 'average': 502.19, 'variability': 0.4, 
#  'defect_count': 1, 'defective_items': [515.0]}

Example 3: Sports Performance Tracking

from stattools import median, percentile

# Player sprint times (seconds)
sprint_times = [10.2, 10.5, 10.3, 10.4, 10.6, 10.1, 10.5, 10.3]

typical_time = median(sprint_times)
personal_best = min(sprint_times)
consistency_target = percentile(sprint_times, 25)  # Top 25% performance

print(f"Typical Performance: {typical_time}s")
print(f"Personal Best: {personal_best}s")
print(f"Consistency Target (75th percentile): {consistency_target}s")

🧪 Running Tests

StatTools uses pytest for comprehensive testing.

Install pytest:

pip install pytest

Run all tests:

python -m pytest

Run with verbose output:

python -m pytest -v

Generate coverage report:

pip install pytest-cov
python -m pytest --cov=stattools --cov-report=html

All tests should pass ✅

📁 Project Structure

stattools/
├── stattools/
│   ├── __init__.py          # Package initialization & public API
│   ├── descriptive.py       # Mean, median, percentile functions
│   └── outliers.py          # IQR calculation & outlier detection
├── tests/
│   └── test_stattools.py    # Comprehensive test suite
├── README.md                # This documentation
├── LICENSE                  # MIT License
├── setup.py                 # Package configuration
├── .gitignore               # Git exclusions
└── requirements-dev.txt     # Development dependencies

⚠️ Limitations

  • Performance: Optimized for clarity over speed. For datasets with millions of rows, consider NumPy/Pandas.
  • Scope: Focuses on descriptive statistics. Does not include inferential statistics (t-tests, ANOVA, regression, etc.).
  • Data Types: Expects numeric data (int/float). Does not handle categorical data or timestamps.
  • Missing Data: Does not have built-in handling for NaN/None values. Clean your data first.

🗺️ Roadmap

Future enhancements under consideration:

  • Standard deviation and variance
  • Mode calculation (handling multimodal distributions)
  • Z-score outlier detection
  • Covariance and correlation
  • Summary statistics report generator
  • Support for weighted statistics
  • Basic data validation utilities

Want to see a feature? Open an issue or submit a PR!

💻 Development

Setup Development Environment

# Clone the repository
git clone https://github.com/Anannya-Vyas/my-python-library.git
cd my-python-library

# Install in editable mode with dev dependencies
pip install -e ".[dev]"

Running Checks

# Run tests
pytest

# Check code formatting (if using Black)
black --check stattools/

# Type checking (if using mypy)
mypy stattools/

🤝 Contributing

Contributions are welcome! Whether it's bug fixes, new features, documentation improvements, or examples—your help makes StatTools better for everyone.

How to contribute:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Write tests for your changes
  4. Ensure all tests pass (pytest)
  5. Commit your changes (git commit -m 'Add amazing feature')
  6. Push to your fork (git push origin feature/amazing-feature)
  7. Open a Pull Request

Contribution Guidelines:

  • All new functions must include docstrings and examples
  • Maintain zero-dependency philosophy (standard library only)
  • Add tests for all new functionality
  • Keep code readable and educational

🐛 Found a Bug?

Open an issue on GitHub Issues with:

  • Clear description of the problem
  • Steps to reproduce the issue
  • Expected behavior vs. actual behavior
  • Python version and operating system
  • Sample data (if applicable)

📄 Changelog

v1.0.0

  • Initial release
  • Core descriptive statistics (mean, median, percentile)
  • IQR calculation
  • IQR-based outlier detection
  • Comprehensive test coverage
  • Published on PyPI

📄 License

This project is licensed under the MIT License — see the LICENSE file for details.

You are free to use, modify, and distribute this software with proper attribution.

👩‍💻 Author

Anannya Vyas

⭐ Show Your Support

If StatTools helped you with your project, consider:

  • Starring the repository on GitHub
  • 📢 Sharing it with classmates, colleagues, and on social media
  • 🐛 Reporting bugs to help improve the library
  • 💡 Contributing new features or documentation improvements

Made by a student learning Python package development

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

stattools_anannya-0.1.6.tar.gz (9.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

stattools_anannya-0.1.6-py3-none-any.whl (9.5 kB view details)

Uploaded Python 3

File details

Details for the file stattools_anannya-0.1.6.tar.gz.

File metadata

  • Download URL: stattools_anannya-0.1.6.tar.gz
  • Upload date:
  • Size: 9.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for stattools_anannya-0.1.6.tar.gz
Algorithm Hash digest
SHA256 5f892506f2f94b88d3015c61463b646ca52c01d34fa67f8f0057939d45f413d5
MD5 038a1e45ec0dac766c7d5b060481a20a
BLAKE2b-256 65e4f00f027ec96dfa08db454da24f2510ae1147b1dccf8b127d26fb9e8900ab

See more details on using hashes here.

File details

Details for the file stattools_anannya-0.1.6-py3-none-any.whl.

File metadata

File hashes

Hashes for stattools_anannya-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 af0aeca6ad1397f0b01b5e8812e90ae53b79e48d772342d79e7bb372af5e3164
MD5 2987f915205c2ec627f08e230f6e921b
BLAKE2b-256 70c44b1eecfead782fdd3038ace4934df16b66c6cd3261b8ba94e8dc69829823

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page