A pure-Python module providing comprehensive statistics functions
Project description
py-stats
A pure-Python module providing comprehensive statistics functions similar to those found on scientific calculators. This package offers over 60 statistics functions for both univariate and multivariate analysis.
Educational Focus: Perfect for learning statistics, programming, and data science with clear mathematical implementations and comprehensive examples.
๐ Project Status
โ Complete and Ready for Use
- Version: 2.0.0
- Functions: 60+ statistical functions implemented
- Testing: 100% test coverage with comprehensive unit tests
- Documentation: Complete with examples and mathematical formulas
- Educational Value: High-quality learning resource
- License: MIT License (permissive and open)
- Repository: GitHub
๐ Quick Links
- ๐ฆ PyPI Package: python-stats on PyPI
- ๐ GitHub Repository: RanaEhtashamAli/py-stats
- ๐ Documentation: README
- ๐ Issues: GitHub Issues
- ๐ฌ Discussions: GitHub Discussions
๐ Table of Contents
- Project Status
- Quick Links
- Features
- Installation
- Quick Start
- Documentation
- Package Structure
- Requirements
- Testing
- Performance Notes
- Educational Focus
- License
- Contributing
- Acknowledgments
- Support
Features
Univariate Statistics
- Means: arithmetic, harmonic, geometric, and quadratic means
- Central Tendency: median, mode, midrange, trimean
- Angular Statistics: mean of angular quantities
- Averages: running and weighted averages
- Quantiles: quartiles, hinges, and quantiles
- Dispersion: variance and standard deviation (sample and population)
- Deviation Measures: average deviation and median average deviation (MAD)
- Shape: skewness and kurtosis
- Error: standard error of the mean
- Robust Statistics: winsorized mean, trimmed mean, interquartile range, range, coefficient of variation
- Order Statistics: percentile rank, deciles, percentiles
- Shape and Distribution: coefficient of skewness, coefficient of kurtosis, normality test
- Central Tendency Alternatives: winsorized median, midhinge
- Probability and Distribution: z-score, t-score, percentile from z-score, confidence intervals
- Time Series: moving average, exponential smoothing, seasonal decomposition
Multivariate Statistics
- Correlation: Pearson's, Spearman's, Kendall's tau, Q-correlation, point-biserial correlation
- Covariance: sample and population covariance
- Regression: simple linear, multiple linear, polynomial regression, residual analysis
- Sums: Sxx, Syy, and Sxy calculations
- Association Measures: chi-square test, Cramer's V, contingency coefficient
Installation
From PyPI (Recommended)
pip install python-stats
From GitHub
pip install git+https://github.com/RanaEhtashamAli/py-stats.git
Development Installation
git clone https://github.com/RanaEhtashamAli/py-stats.git
cd py-stats
pip install -e .
Quick Start
import py_stats as ps
# Basic statistics
data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
print(f"Mean: {ps.arithmetic_mean(data)}")
print(f"Median: {ps.median(data)}")
print(f"Standard Deviation: {ps.standard_deviation(data)}")
# Robust statistics
print(f"IQR: {ps.interquartile_range(data)}")
print(f"Coefficient of Variation: {ps.coefficient_of_variation(data)}")
# Multivariate analysis
x = [1, 2, 3, 4, 5]
y = [2, 4, 5, 4, 5]
print(f"Pearson Correlation: {ps.pearson_correlation(x, y)}")
print(f"Spearman Correlation: {ps.spearman_correlation(x, y)}")
# Regression
slope, intercept, r_squared = ps.linear_regression(x, y)
print(f"Regression: y = {slope:.2f}x + {intercept:.2f}, Rยฒ = {r_squared:.3f}")
Documentation
Univariate Functions
Means
arithmetic_mean(data): Arithmetic meanharmonic_mean(data): Harmonic meangeometric_mean(data): Geometric meanquadratic_mean(data): Quadratic mean (RMS)
Central Tendency
median(data): Medianmode(data): Modemidrange(data): Midrangetrimean(data): Trimean
Quantiles
quartiles(data): First, second, and third quartileshinges(data): Lower and upper hingesquantile(data, q): Quantile at specified probability
Dispersion
variance(data, population=False): Variance (sample or population)standard_deviation(data, population=False): Standard deviationaverage_deviation(data): Average absolute deviationmedian_absolute_deviation(data): Median absolute deviation (MAD)
Shape
skewness(data): Skewness coefficientkurtosis(data): Kurtosis coefficient
Robust Statistics
winsorized_mean(data, percent=10.0): Winsorized meantrimmed_mean(data, percent=10.0): Trimmed meaninterquartile_range(data): Interquartile range (IQR)range_value(data): Range (max - min)coefficient_of_variation(data): Coefficient of variation
Order Statistics
percentile_rank(data, value): Percentile rank of a valuedeciles(data): All deciles (10th, 20th, ..., 90th percentiles)percentile(data, p): pth percentile (0-100)
Shape and Distribution
coefficient_of_skewness(data): Standardized skewnesscoefficient_of_kurtosis(data): Standardized kurtosissimple_normality_test(data): Basic normality test
Central Tendency Alternatives
winsorized_median(data, percent=10.0): Winsorized medianmidhinge(data): Midhinge
Probability and Distribution
z_score(data, value): Z-score of a valuet_score(data, value): T-score of a valuepercentile_from_z_score(z): Percentile from z-scoreconfidence_interval_mean(data, confidence=0.95): Confidence interval for meanconfidence_interval_proportion(successes, total, confidence=0.95): Confidence interval for proportion
Time Series
moving_average(data, window=3): Simple moving averageexponential_smoothing(data, alpha=0.3): Exponential smoothingseasonal_decomposition_simple(data, period=4): Simple seasonal decomposition
Specialized
angular_mean(data, degrees=True): Mean of angular quantitiesrunning_average(data, window=3): Running averageweighted_average(data, weights): Weighted averagestandard_error_mean(data): Standard error of the mean
Multivariate Functions
Correlation
pearson_correlation(x, y): Pearson's correlation coefficientspearman_correlation(x, y): Spearman's rank correlationkendall_tau(x, y): Kendall's tau correlationq_correlation(x, y): Q-correlation coefficientpoint_biserial_correlation(x, y): Point-biserial correlation
Covariance
covariance(x, y, population=False): Covariance (sample or population)
Regression
linear_regression(x, y): Simple linear regressionmultiple_linear_regression(x_vars, y): Multiple linear regressionpolynomial_regression(x, y, degree=2): Polynomial regressionresidual_analysis(x, y): Residual analysis
Sums
sum_xx(x): Sum of squared deviations (Sxx)sum_yy(y): Sum of squared deviations (Syy)sum_xy(x, y): Sum of cross-products (Sxy)
Association Measures
chi_square_test(observed): Chi-square test of independencecramers_v(observed): Cramer's V association measurecontingency_coefficient(observed): Contingency coefficient
๐ Package Structure
py-stats/
โโโ py_stats/
โ โโโ __init__.py # Package initialization (v2.0.0)
โ โโโ univariate.py # 40+ univariate functions
โ โโโ multivariate.py # 20+ multivariate functions
โโโ tests/
โ โโโ test_univariate.py # Comprehensive univariate tests
โ โโโ test_multivariate.py # Comprehensive multivariate tests
โโโ examples/
โ โโโ basic_usage.py # Basic usage examples
โ โโโ advanced_usage.py # Advanced usage examples
โโโ setup.py # Package configuration
โโโ pyproject.toml # Modern packaging config
โโโ README.md # This documentation
โโโ LICENSE # MIT License
โโโ .gitignore # Git ignore patterns
Requirements
- Python 3.7 or higher
- NumPy 1.19.0 or higher
License
This project is licensed under the MIT License - see the LICENSE file for details.
License Summary
The MIT License is a permissive license that allows you to:
- โ Use the software for any purpose
- โ Modify the software
- โ Distribute the software
- โ Distribute modified versions
- โ Use it commercially
The only requirement is that the original license and copyright notice must be included in any substantial portions of the software.
Contributing
We welcome contributions to make python-stats even better! Here's how you can help:
๐ค How to Contribute
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Make your changes and add tests for new functionality
- Run the tests:
python -m pytest tests/ -v - Commit your changes:
git commit -m 'Add amazing feature' - Push to the branch:
git push origin feature/amazing-feature - Open a Pull Request
๐ Contribution Guidelines
- Code Style: Follow PEP 8 and use Black for formatting
- Documentation: Add docstrings for new functions
- Tests: Include tests for new functionality
- Educational Focus: Keep the educational value in mind
- Mathematical Accuracy: Ensure statistical formulas are correct
๐ Reporting Issues
If you find a bug or have a feature request, please open an issue on GitHub with:
- Clear description of the problem
- Steps to reproduce
- Expected vs actual behavior
- Python version and environment details
Testing
Run the test suite:
python -m pytest tests/ -v
Performance Notes
This package is designed for educational purposes and small to medium-sized datasets. For large-scale data analysis, consider using NumPy, SciPy, or Pandas for better performance.
๐ Educational Focus
This package is specifically designed for educational purposes and serves as an excellent resource for:
๐ Learning Applications
- Statistics Courses: Covers undergraduate statistics curriculum
- Programming Education: Demonstrates Python package development
- Research Methods: Practical statistical analysis tools
- Data Science: Foundation for more advanced analysis
๐ฏ Key Educational Features
- Mathematical Transparency: Clear implementation of statistical formulas
- Comprehensive Examples: Step-by-step usage demonstrations
- Pure Python: Easy to understand and modify
- Well-Documented: Detailed docstrings with mathematical explanations
- Test-Driven: All functions thoroughly tested for accuracy
๐ก Use Cases
- Classroom Teaching: Interactive statistics demonstrations
- Self-Learning: Understanding statistical concepts through code
- Research Projects: Small-scale statistical analysis
- Code Review: Learning Python best practices
- Portfolio Projects: Showcasing statistical programming skills
The code is well-documented with clear mathematical formulas and examples, making it ideal for educational use.
๐ Acknowledgments
- NumPy: For efficient numerical computations
- Scientific Community: For statistical formulas and methodologies
- Open Source Community: For inspiration and best practices
- Educational Institutions: For feedback and testing
๐ Support
If you have questions, suggestions, or need help:
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Documentation: This README and function docstrings
- Examples: Check the
examples/directory for usage demonstrations
Made with โค๏ธ for the educational community
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file python_stats-1.0.1.tar.gz.
File metadata
- Download URL: python_stats-1.0.1.tar.gz
- Upload date:
- Size: 29.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9d2d6d7296fab7ba5610ed7a105836161ca765ddaee3d69bc04974e9b6408f2e
|
|
| MD5 |
ddcf89d398335f130e805ea50d2562f5
|
|
| BLAKE2b-256 |
6aed88853019c1d09f988ea2455c82f577adb3e00ab38d5f9282d491c2dbc691
|
File details
Details for the file python_stats-1.0.1-py3-none-any.whl.
File metadata
- Download URL: python_stats-1.0.1-py3-none-any.whl
- Upload date:
- Size: 19.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5bc4d2c34163b25c1252a60f9069f4b072862b5bde53713fcceacc97b58da157
|
|
| MD5 |
a1351ed0e1cc1d6e7f80683a13b05d5b
|
|
| BLAKE2b-256 |
57199cb6cd568e54d9ab383752259755379f12bc30921dbcdcae002c44256833
|