Lightweight Python statistics library for descriptive stats and IQR-based outlier detection.

Project description

📊 StatTools

A lightweight, zero-dependency Python statistics library for learning and analysis

📢 A Note from the Author

Hi! I'm a student, and this is my first Python package. 🎓

I didn't create StatTools to compete with established libraries or to impress anyone—I built it as a learning exercise to understand how Python packages work, how to structure code properly, and how to publish to PyPI. This project helped me practice fundamental concepts like package structure, testing, documentation, and distribution.

StatTools is not a production-ready, feature-complete statistics library. It's a student project that implements basic statistical functions as a learning journey. I'm sharing it openly because someone else learning Python might find it useful, or at least see how a beginner approaches building their first package.

I plan to improve and expand it over time as I learn more. If you're a student like me, feel free to explore the code, suggest improvements, or even fork it for your own learning!

— Anannya Vyas

Acknowledgments

Special thanks to my teacher Lovnish Verma for inspiring me to take on this project. Their own package, snapmyenv, served as motivation and a reference for how to structure and publish a Python library. This wouldn't exist without their guidance and encouragement!

StatTools is a lightweight, zero-dependency statistics library designed to solve the "I need quick stats without NumPy" problem for students, educators, and developers. It provides essential descriptive statistics and outlier detection using only Python's standard library—making it perfect for learning environments, academic projects, and situations where you need reliable statistical analysis without heavy frameworks.

Share your code with confidence, knowing StatTools works everywhere Python runs—no compilation, no platform conflicts, no dependency hell.

🚀 Key Features

📈 Descriptive Statistics: Calculate mean, median, and percentiles with straightforward, textbook-accurate implementations
📊 Dispersion Measures: Compute Interquartile Range (IQR) for understanding data spread
🔍 Outlier Detection: Identify anomalies using the industry-standard IQR method
🛡️ Zero Dependencies: Built using only Python's standard library—install it anywhere without conflicts
✅ Fully Tested: Comprehensive pytest coverage ensures reliability
🪶 Lightweight: Minimal footprint, maximum clarity
📚 Educational: Clean, readable code that mirrors statistical textbook definitions

📦 Installation

pip install stattools-anannya==0.1.6

⚡ Quick Start

The "Instant Analysis" Workflow

Step 1: Import and Analyze

import stattools

# Your dataset
grades = [78, 82, 85, 88, 90, 92, 95, 45, 98, 100]

# Get insights instantly
print(f"Class Average: {stattools.mean(grades):.1f}")
print(f"Median Score: {stattools.median(grades):.1f}")
print(f"Top 25% Threshold: {stattools.percentile(grades, 75):.1f}")
print(f"Score Spread (IQR): {stattools.iqr(grades):.1f}")
print(f"Outliers: {stattools.detect_outliers_iqr(grades)}")

Output:

Class Average: 85.3
Median Score: 91.0
Top 25% Threshold: 96.2
Score Spread (IQR): 13.0
Outliers: [45]

Common Use Cases

Quality Control:

from stattools import mean, iqr, detect_outliers_iqr

# Product weights in grams
weights = [500, 502, 498, 501, 503, 499, 520, 497, 500, 502]

avg_weight = mean(weights)
variability = iqr(weights)
defects = detect_outliers_iqr(weights)

print(f"Average: {avg_weight:.2f}g (±{variability:.2f}g IQR)")
print(f"Defective items: {defects}")

Financial Screening:

from stattools import percentile, detect_outliers_iqr

# Daily returns (%)
returns = [0.5, -0.3, 0.8, -0.2, 0.4, 12.5, -0.1, 0.6]

normal_range = percentile(returns, 95)
anomalies = detect_outliers_iqr(returns)

print(f"95% of returns below: {normal_range:.2f}%")
print(f"Abnormal trading days: {anomalies}")

📖 API Reference

`mean(data)` → float

Calculates the arithmetic mean (average) of a dataset.

Parameters:

data (list/tuple): Numeric values

Returns: Float representing the mean

Example:

stattools.mean([10, 20, 30, 40, 50])  # Returns: 30.0

`median(data)` → float

Finds the middle value in a sorted dataset. For even-length datasets, returns the average of the two middle values.

Parameters:

data (list/tuple): Numeric values

Returns: Float representing the median

Example:

stattools.median([1, 2, 3, 4, 5])  # Returns: 3.0
stattools.median([1, 2, 3, 4])     # Returns: 2.5

`percentile(data, p)` → float

Calculates the p-th percentile using linear interpolation between closest ranks.

Parameters:

data (list/tuple): Numeric values
p (int/float): Percentile to calculate (0-100)

Returns: Float representing the percentile value

Example:

stattools.percentile([10, 20, 30, 40, 50], 75)  # Returns: 40.0
stattools.percentile([1, 2, 3, 4, 5], 50)       # Returns: 3.0 (same as median)

`iqr(data)` → float

Computes the Interquartile Range (Q3 - Q1), a measure of statistical dispersion.

Parameters:

data (list/tuple): Numeric values

Returns: Float representing the IQR

Example:

stattools.iqr([1, 2, 3, 4, 5, 6, 7, 8, 9])  # Returns: 4.0

`detect_outliers_iqr(data, multiplier=1.5)` → list

Identifies outliers using the IQR method. Values are considered outliers if they fall outside:

Lower bound: Q1 - (multiplier × IQR)
Upper bound: Q3 + (multiplier × IQR)

Parameters:

data (list/tuple): Numeric values
multiplier (float): Sensitivity factor (default: 1.5, standard statistical practice)

Returns: List of outlier values

Example:

data = [5, 7, 8, 10, 12, 100]
stattools.detect_outliers_iqr(data)              # Returns: [100]
stattools.detect_outliers_iqr(data, multiplier=3.0)  # Less sensitive, Returns: [100]

Interpretation:

multiplier=1.5 (default): Standard outlier detection
multiplier=3.0: Extreme outliers only
Lower multipliers → more sensitive (flags more values)

🔍 What Makes StatTools Different?

Unlike heavyweight scientific computing libraries, StatTools focuses on:

Feature	StatTools	NumPy/SciPy/Pandas
Dependencies	None (pure Python)	Compiled C/Fortran binaries
Install Size	~10 KB	50-100+ MB
Learning Curve	Minimal	Steep
Platform Issues	None	Common on ARM/M1/Windows
Code Clarity	Readable textbook implementations	Optimized C wrappers
Best For	Learning, teaching, simple scripts	Production data science

💡 Real-World Examples

Example 1: Grade Analysis System

from stattools import mean, median, percentile, detect_outliers_iqr

class GradeAnalyzer:
    def __init__(self, scores):
        self.scores = scores
    
    def summary(self):
        return {
            'average': mean(self.scores),
            'median': median(self.scores),
            'top_10_percent': percentile(self.scores, 90),
            'struggling_students': [s for s in self.scores if s < percentile(self.scores, 25)],
            'anomalies': detect_outliers_iqr(self.scores)
        }

# Usage
analyzer = GradeAnalyzer([78, 82, 85, 88, 90, 92, 95, 45, 98, 100])
report = analyzer.summary()
print(report)

Example 2: Manufacturing Quality Dashboard

from stattools import mean, iqr, detect_outliers_iqr

def quality_check(measurements, tolerance_iqr=5.0):
    """
    Check if manufacturing process is within acceptable variability.
    """
    avg = mean(measurements)
    spread = iqr(measurements)
    defects = detect_outliers_iqr(measurements)
    
    status = "PASS" if spread <= tolerance_iqr and len(defects) == 0 else "FAIL"
    
    return {
        'status': status,
        'average': avg,
        'variability': spread,
        'defect_count': len(defects),
        'defective_items': defects
    }

# Daily production run
batch = [500.1, 499.8, 500.3, 500.0, 499.9, 500.2, 515.0]
print(quality_check(batch))
# {'status': 'FAIL', 'average': 502.19, 'variability': 0.4, 
#  'defect_count': 1, 'defective_items': [515.0]}

Example 3: Sports Performance Tracking

from stattools import median, percentile

# Player sprint times (seconds)
sprint_times = [10.2, 10.5, 10.3, 10.4, 10.6, 10.1, 10.5, 10.3]

typical_time = median(sprint_times)
personal_best = min(sprint_times)
consistency_target = percentile(sprint_times, 25)  # Top 25% performance

print(f"Typical Performance: {typical_time}s")
print(f"Personal Best: {personal_best}s")
print(f"Consistency Target (75th percentile): {consistency_target}s")

🧪 Running Tests

StatTools uses pytest for comprehensive testing.

Install pytest:

pip install pytest

Run all tests:

python -m pytest

Run with verbose output:

python -m pytest -v

Generate coverage report:

pip install pytest-cov
python -m pytest --cov=stattools --cov-report=html

All tests should pass ✅

📁 Project Structure

stattools/
├── stattools/
│   ├── __init__.py          # Package initialization & public API
│   ├── descriptive.py       # Mean, median, percentile functions
│   └── outliers.py          # IQR calculation & outlier detection
├── tests/
│   └── test_stattools.py    # Comprehensive test suite
├── README.md                # This documentation
├── LICENSE                  # MIT License
├── setup.py                 # Package configuration
├── .gitignore               # Git exclusions
└── requirements-dev.txt     # Development dependencies

⚠️ Limitations

Performance: Optimized for clarity over speed. For datasets with millions of rows, consider NumPy/Pandas.
Scope: Focuses on descriptive statistics. Does not include inferential statistics (t-tests, ANOVA, regression, etc.).
Data Types: Expects numeric data (int/float). Does not handle categorical data or timestamps.
Missing Data: Does not have built-in handling for NaN/None values. Clean your data first.

🗺️ Roadmap

Future enhancements under consideration:

Standard deviation and variance
Mode calculation (handling multimodal distributions)
Z-score outlier detection
Covariance and correlation
Summary statistics report generator
Support for weighted statistics
Basic data validation utilities

Want to see a feature? Open an issue or submit a PR!

💻 Development

Setup Development Environment

# Clone the repository
git clone https://github.com/Anannya-Vyas/my-python-library.git
cd my-python-library

# Install in editable mode with dev dependencies
pip install -e ".[dev]"

Running Checks

# Run tests
pytest

# Check code formatting (if using Black)
black --check stattools/

# Type checking (if using mypy)
mypy stattools/

🤝 Contributing

Contributions are welcome! Whether it's bug fixes, new features, documentation improvements, or examples—your help makes StatTools better for everyone.

How to contribute:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Write tests for your changes
Ensure all tests pass (pytest)
Commit your changes (git commit -m 'Add amazing feature')
Push to your fork (git push origin feature/amazing-feature)
Open a Pull Request

Contribution Guidelines:

All new functions must include docstrings and examples
Maintain zero-dependency philosophy (standard library only)
Add tests for all new functionality
Keep code readable and educational

🐛 Found a Bug?

Open an issue on GitHub Issues with:

Clear description of the problem
Steps to reproduce the issue
Expected behavior vs. actual behavior
Python version and operating system
Sample data (if applicable)

📄 Changelog

v1.0.0

Initial release
Core descriptive statistics (mean, median, percentile)
IQR calculation
IQR-based outlier detection
Comprehensive test coverage
Published on PyPI

📄 License

This project is licensed under the MIT License — see the LICENSE file for details.

You are free to use, modify, and distribute this software with proper attribution.

👩‍💻 Author

Anannya Vyas

⭐ Show Your Support

If StatTools helped you with your project, consider:

⭐ Starring the repository on GitHub
📢 Sharing it with classmates, colleagues, and on social media
🐛 Reporting bugs to help improve the library
💡 Contributing new features or documentation improvements

Made by a student learning Python package development

Project details

Release history Release notifications | RSS feed

This version

0.1.6

Feb 11, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

stattools_anannya-0.1.6.tar.gz (9.3 kB view details)

Uploaded Feb 11, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

stattools_anannya-0.1.6-py3-none-any.whl (9.5 kB view details)

Uploaded Feb 11, 2026 Python 3

File details

Details for the file stattools_anannya-0.1.6.tar.gz.

File metadata

Download URL: stattools_anannya-0.1.6.tar.gz
Upload date: Feb 11, 2026
Size: 9.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for stattools_anannya-0.1.6.tar.gz
Algorithm	Hash digest
SHA256	`5f892506f2f94b88d3015c61463b646ca52c01d34fa67f8f0057939d45f413d5`
MD5	`038a1e45ec0dac766c7d5b060481a20a`
BLAKE2b-256	`65e4f00f027ec96dfa08db454da24f2510ae1147b1dccf8b127d26fb9e8900ab`

See more details on using hashes here.

File details

Details for the file stattools_anannya-0.1.6-py3-none-any.whl.

File metadata

Download URL: stattools_anannya-0.1.6-py3-none-any.whl
Upload date: Feb 11, 2026
Size: 9.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for stattools_anannya-0.1.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`af0aeca6ad1397f0b01b5e8812e90ae53b79e48d772342d79e7bb372af5e3164`
MD5	`2987f915205c2ec627f08e230f6e921b`
BLAKE2b-256	`70c44b1eecfead782fdd3038ace4934df16b66c6cd3261b8ba94e8dc69829823`

See more details on using hashes here.

stattools-anannya 0.1.6

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

📊 StatTools

📢 A Note from the Author

Acknowledgments

🚀 Key Features

📦 Installation

⚡ Quick Start

The "Instant Analysis" Workflow

Common Use Cases

📖 API Reference

mean(data) → float

median(data) → float

percentile(data, p) → float

iqr(data) → float

detect_outliers_iqr(data, multiplier=1.5) → list

🔍 What Makes StatTools Different?

💡 Real-World Examples

Example 1: Grade Analysis System

Example 2: Manufacturing Quality Dashboard

Example 3: Sports Performance Tracking

🧪 Running Tests

📁 Project Structure

⚠️ Limitations

🗺️ Roadmap

💻 Development

Setup Development Environment

Running Checks

🤝 Contributing

🐛 Found a Bug?

📄 Changelog

v1.0.0

📄 License

👩‍💻 Author

⭐ Show Your Support

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`mean(data)` → float

`median(data)` → float

`percentile(data, p)` → float

`iqr(data)` → float

`detect_outliers_iqr(data, multiplier=1.5)` → list