Lightweight Python statistics library for descriptive stats and IQR-based outlier detection.
Project description
📊 StatTools
A lightweight, zero-dependency Python statistics library for learning and analysis
📢 A Note from the Author
Hi! I'm a student, and this is my first Python package. 🎓
I didn't create StatTools to compete with established libraries or to impress anyone—I built it as a learning exercise to understand how Python packages work, how to structure code properly, and how to publish to PyPI. This project helped me practice fundamental concepts like package structure, testing, documentation, and distribution.
StatTools is not a production-ready, feature-complete statistics library. It's a student project that implements basic statistical functions as a learning journey. I'm sharing it openly because someone else learning Python might find it useful, or at least see how a beginner approaches building their first package.
I plan to improve and expand it over time as I learn more. If you're a student like me, feel free to explore the code, suggest improvements, or even fork it for your own learning!
— Anannya Vyas
Acknowledgments
Special thanks to my teacher Lovnish Verma for inspiring me to take on this project. Their own package, snapmyenv, served as motivation and a reference for how to structure and publish a Python library. This wouldn't exist without their guidance and encouragement!
StatTools is a lightweight, zero-dependency statistics library designed to solve the "I need quick stats without NumPy" problem for students, educators, and developers. It provides essential descriptive statistics and outlier detection using only Python's standard library—making it perfect for learning environments, academic projects, and situations where you need reliable statistical analysis without heavy frameworks.
Share your code with confidence, knowing StatTools works everywhere Python runs—no compilation, no platform conflicts, no dependency hell.
🚀 Key Features
- 📈 Descriptive Statistics: Calculate mean, median, and percentiles with straightforward, textbook-accurate implementations
- 📊 Dispersion Measures: Compute Interquartile Range (IQR) for understanding data spread
- 🔍 Outlier Detection: Identify anomalies using the industry-standard IQR method
- 🛡️ Zero Dependencies: Built using only Python's standard library—install it anywhere without conflicts
- ✅ Fully Tested: Comprehensive pytest coverage ensures reliability
- 🪶 Lightweight: Minimal footprint, maximum clarity
- 📚 Educational: Clean, readable code that mirrors statistical textbook definitions
📦 Installation
pip install stattools-anannya==0.1.6
⚡ Quick Start
The "Instant Analysis" Workflow
Step 1: Import and Analyze
import stattools
# Your dataset
grades = [78, 82, 85, 88, 90, 92, 95, 45, 98, 100]
# Get insights instantly
print(f"Class Average: {stattools.mean(grades):.1f}")
print(f"Median Score: {stattools.median(grades):.1f}")
print(f"Top 25% Threshold: {stattools.percentile(grades, 75):.1f}")
print(f"Score Spread (IQR): {stattools.iqr(grades):.1f}")
print(f"Outliers: {stattools.detect_outliers_iqr(grades)}")
Output:
Class Average: 85.3
Median Score: 91.0
Top 25% Threshold: 96.2
Score Spread (IQR): 13.0
Outliers: [45]
Common Use Cases
Quality Control:
from stattools import mean, iqr, detect_outliers_iqr
# Product weights in grams
weights = [500, 502, 498, 501, 503, 499, 520, 497, 500, 502]
avg_weight = mean(weights)
variability = iqr(weights)
defects = detect_outliers_iqr(weights)
print(f"Average: {avg_weight:.2f}g (±{variability:.2f}g IQR)")
print(f"Defective items: {defects}")
Financial Screening:
from stattools import percentile, detect_outliers_iqr
# Daily returns (%)
returns = [0.5, -0.3, 0.8, -0.2, 0.4, 12.5, -0.1, 0.6]
normal_range = percentile(returns, 95)
anomalies = detect_outliers_iqr(returns)
print(f"95% of returns below: {normal_range:.2f}%")
print(f"Abnormal trading days: {anomalies}")
📖 API Reference
mean(data) → float
Calculates the arithmetic mean (average) of a dataset.
Parameters:
data(list/tuple): Numeric values
Returns: Float representing the mean
Example:
stattools.mean([10, 20, 30, 40, 50]) # Returns: 30.0
median(data) → float
Finds the middle value in a sorted dataset. For even-length datasets, returns the average of the two middle values.
Parameters:
data(list/tuple): Numeric values
Returns: Float representing the median
Example:
stattools.median([1, 2, 3, 4, 5]) # Returns: 3.0
stattools.median([1, 2, 3, 4]) # Returns: 2.5
percentile(data, p) → float
Calculates the p-th percentile using linear interpolation between closest ranks.
Parameters:
data(list/tuple): Numeric valuesp(int/float): Percentile to calculate (0-100)
Returns: Float representing the percentile value
Example:
stattools.percentile([10, 20, 30, 40, 50], 75) # Returns: 40.0
stattools.percentile([1, 2, 3, 4, 5], 50) # Returns: 3.0 (same as median)
iqr(data) → float
Computes the Interquartile Range (Q3 - Q1), a measure of statistical dispersion.
Parameters:
data(list/tuple): Numeric values
Returns: Float representing the IQR
Example:
stattools.iqr([1, 2, 3, 4, 5, 6, 7, 8, 9]) # Returns: 4.0
detect_outliers_iqr(data, multiplier=1.5) → list
Identifies outliers using the IQR method. Values are considered outliers if they fall outside:
- Lower bound: Q1 - (multiplier × IQR)
- Upper bound: Q3 + (multiplier × IQR)
Parameters:
data(list/tuple): Numeric valuesmultiplier(float): Sensitivity factor (default: 1.5, standard statistical practice)
Returns: List of outlier values
Example:
data = [5, 7, 8, 10, 12, 100]
stattools.detect_outliers_iqr(data) # Returns: [100]
stattools.detect_outliers_iqr(data, multiplier=3.0) # Less sensitive, Returns: [100]
Interpretation:
multiplier=1.5(default): Standard outlier detectionmultiplier=3.0: Extreme outliers only- Lower multipliers → more sensitive (flags more values)
🔍 What Makes StatTools Different?
Unlike heavyweight scientific computing libraries, StatTools focuses on:
| Feature | StatTools | NumPy/SciPy/Pandas |
|---|---|---|
| Dependencies | None (pure Python) | Compiled C/Fortran binaries |
| Install Size | ~10 KB | 50-100+ MB |
| Learning Curve | Minimal | Steep |
| Platform Issues | None | Common on ARM/M1/Windows |
| Code Clarity | Readable textbook implementations | Optimized C wrappers |
| Best For | Learning, teaching, simple scripts | Production data science |
💡 Real-World Examples
Example 1: Grade Analysis System
from stattools import mean, median, percentile, detect_outliers_iqr
class GradeAnalyzer:
def __init__(self, scores):
self.scores = scores
def summary(self):
return {
'average': mean(self.scores),
'median': median(self.scores),
'top_10_percent': percentile(self.scores, 90),
'struggling_students': [s for s in self.scores if s < percentile(self.scores, 25)],
'anomalies': detect_outliers_iqr(self.scores)
}
# Usage
analyzer = GradeAnalyzer([78, 82, 85, 88, 90, 92, 95, 45, 98, 100])
report = analyzer.summary()
print(report)
Example 2: Manufacturing Quality Dashboard
from stattools import mean, iqr, detect_outliers_iqr
def quality_check(measurements, tolerance_iqr=5.0):
"""
Check if manufacturing process is within acceptable variability.
"""
avg = mean(measurements)
spread = iqr(measurements)
defects = detect_outliers_iqr(measurements)
status = "PASS" if spread <= tolerance_iqr and len(defects) == 0 else "FAIL"
return {
'status': status,
'average': avg,
'variability': spread,
'defect_count': len(defects),
'defective_items': defects
}
# Daily production run
batch = [500.1, 499.8, 500.3, 500.0, 499.9, 500.2, 515.0]
print(quality_check(batch))
# {'status': 'FAIL', 'average': 502.19, 'variability': 0.4,
# 'defect_count': 1, 'defective_items': [515.0]}
Example 3: Sports Performance Tracking
from stattools import median, percentile
# Player sprint times (seconds)
sprint_times = [10.2, 10.5, 10.3, 10.4, 10.6, 10.1, 10.5, 10.3]
typical_time = median(sprint_times)
personal_best = min(sprint_times)
consistency_target = percentile(sprint_times, 25) # Top 25% performance
print(f"Typical Performance: {typical_time}s")
print(f"Personal Best: {personal_best}s")
print(f"Consistency Target (75th percentile): {consistency_target}s")
🧪 Running Tests
StatTools uses pytest for comprehensive testing.
Install pytest:
pip install pytest
Run all tests:
python -m pytest
Run with verbose output:
python -m pytest -v
Generate coverage report:
pip install pytest-cov
python -m pytest --cov=stattools --cov-report=html
All tests should pass ✅
📁 Project Structure
stattools/
├── stattools/
│ ├── __init__.py # Package initialization & public API
│ ├── descriptive.py # Mean, median, percentile functions
│ └── outliers.py # IQR calculation & outlier detection
├── tests/
│ └── test_stattools.py # Comprehensive test suite
├── README.md # This documentation
├── LICENSE # MIT License
├── setup.py # Package configuration
├── .gitignore # Git exclusions
└── requirements-dev.txt # Development dependencies
⚠️ Limitations
- Performance: Optimized for clarity over speed. For datasets with millions of rows, consider NumPy/Pandas.
- Scope: Focuses on descriptive statistics. Does not include inferential statistics (t-tests, ANOVA, regression, etc.).
- Data Types: Expects numeric data (int/float). Does not handle categorical data or timestamps.
- Missing Data: Does not have built-in handling for NaN/None values. Clean your data first.
🗺️ Roadmap
Future enhancements under consideration:
- Standard deviation and variance
- Mode calculation (handling multimodal distributions)
- Z-score outlier detection
- Covariance and correlation
- Summary statistics report generator
- Support for weighted statistics
- Basic data validation utilities
Want to see a feature? Open an issue or submit a PR!
💻 Development
Setup Development Environment
# Clone the repository
git clone https://github.com/Anannya-Vyas/my-python-library.git
cd my-python-library
# Install in editable mode with dev dependencies
pip install -e ".[dev]"
Running Checks
# Run tests
pytest
# Check code formatting (if using Black)
black --check stattools/
# Type checking (if using mypy)
mypy stattools/
🤝 Contributing
Contributions are welcome! Whether it's bug fixes, new features, documentation improvements, or examples—your help makes StatTools better for everyone.
How to contribute:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Write tests for your changes
- Ensure all tests pass (
pytest) - Commit your changes (
git commit -m 'Add amazing feature') - Push to your fork (
git push origin feature/amazing-feature) - Open a Pull Request
Contribution Guidelines:
- All new functions must include docstrings and examples
- Maintain zero-dependency philosophy (standard library only)
- Add tests for all new functionality
- Keep code readable and educational
🐛 Found a Bug?
Open an issue on GitHub Issues with:
- Clear description of the problem
- Steps to reproduce the issue
- Expected behavior vs. actual behavior
- Python version and operating system
- Sample data (if applicable)
📄 Changelog
v1.0.0
- Initial release
- Core descriptive statistics (mean, median, percentile)
- IQR calculation
- IQR-based outlier detection
- Comprehensive test coverage
- Published on PyPI
📄 License
This project is licensed under the MIT License — see the LICENSE file for details.
You are free to use, modify, and distribute this software with proper attribution.
👩💻 Author
Anannya Vyas
- GitHub: @Anannya-Vyas
- PyPI: stattools-anannya
- Project: my-python-library
⭐ Show Your Support
If StatTools helped you with your project, consider:
- ⭐ Starring the repository on GitHub
- 📢 Sharing it with classmates, colleagues, and on social media
- 🐛 Reporting bugs to help improve the library
- 💡 Contributing new features or documentation improvements
Made by a student learning Python package development
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file stattools_anannya-0.1.6.tar.gz.
File metadata
- Download URL: stattools_anannya-0.1.6.tar.gz
- Upload date:
- Size: 9.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5f892506f2f94b88d3015c61463b646ca52c01d34fa67f8f0057939d45f413d5
|
|
| MD5 |
038a1e45ec0dac766c7d5b060481a20a
|
|
| BLAKE2b-256 |
65e4f00f027ec96dfa08db454da24f2510ae1147b1dccf8b127d26fb9e8900ab
|
File details
Details for the file stattools_anannya-0.1.6-py3-none-any.whl.
File metadata
- Download URL: stattools_anannya-0.1.6-py3-none-any.whl
- Upload date:
- Size: 9.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
af0aeca6ad1397f0b01b5e8812e90ae53b79e48d772342d79e7bb372af5e3164
|
|
| MD5 |
2987f915205c2ec627f08e230f6e921b
|
|
| BLAKE2b-256 |
70c44b1eecfead782fdd3038ace4934df16b66c6cd3261b8ba94e8dc69829823
|