Skip to main content

Statistical computations and models for use with SciPy

Project description

What it is
==========

Statsmodels is a Python package that provides a complement to scipy for statistical computations including descriptive statistics and estimation and inference for statistical models.

Main Features
=============

* linear regression models: Generalized least squares (including weighted least squares and
least squares with autoregressive errors), ordinary least squares.
* glm: Generalized linear models with support for all of the one-parameter
exponential family distributions.
* discrete: regression with discrete dependent variables, including Logit, Probit, MNLogit, Poisson, based on maximum likelihood estimators
* rlm: Robust linear models with support for several M-estimators.
* tsa: models for time series analysis
- univariate time series analysis: AR, ARIMA
- vector autoregressive models, VAR and structural VAR
- descriptive statistics and process models for time series analysis
* nonparametric : (Univariate) kernel density estimators
* datasets: Datasets to be distributed and used for examples and in testing.
* stats: a wide range of statistical tests
- diagnostics and specification tests
- goodness-of-fit and normality tests
- functions for multiple testing
- various additional statistical tests
* iolib
- Tools for reading Stata .dta files into numpy arrays.
- printing table output to ascii, latex, and html
* miscellaneous models
* sandbox: statsmodels contains a sandbox folder with code in various stages of
developement and testing which is not considered "production ready".
This covers among others Mixed (repeated measures) Models, GARCH models, general method
of moments (GMM) estimators, kernel regression, various extensions to scipy.stats.distributions,
panel data models, generalized additive models and information theoretic measures.


Where to get it
===============

The master branch on GitHub is the most up to date code

https://www.github.com/statsmodels/statsmodels

Source download of release tags are available on GitHub

https://github.com/statsmodels/statsmodels/tags

Binaries and source distributions are available from PyPi

http://pypi.python.org/pypi/statsmodels/


Installation from sources
=========================

See INSTALL.txt for requirements or see the documentation

http://statsmodels.sf.net/devel/install.html


License
=======

Modified BSD (3-clause)


Documentation
=============

The official documentation is hosted on SourceForge

http://statsmodels.sf.net/


Windows Help
============
The source distribution for Windows includes a htmlhelp file (statsmodels.chm).
This can be opened from the python interpreter ::

>>> import statsmodels.api as sm
>>> sm.open_help()


Discussion and Development
==========================

Discussions take place on our mailing list.

http://groups.google.com/group/pystatsmodels

We are very interested in feedback about usability and suggestions for improvements.


Bug Reports
===========

Bug reports can be submitted to the issue tracker at

https://github.com/statsmodels/statsmodels/issues


Release History
===============

0.4.1
-----

This is a backwards compatible (according to our test suite) release with
bug fixes and code cleanup.

*Bug Fixes*

* build and distribution fixes
* lowess correct distance calculation
* genmod correction CDFlink derivative
* adfuller _autolag correct calculation of optimal lag
* het_arch, het_lm : fix autolag and store options
* GLSAR: incorrect whitening for lag>1

*Other Changes*

* add lowess and other functions to api and documentation
* rename lowess module (old import path will be removed at next release)
* new robust sandwich covariance estimators, moved out of sandbox
* compatibility with pandas 0.8
* new plots in statsmodels.graphics
- ABLine plot
- interaction plot


0.4.0
-----

*Main Changes and Additions*

* Added pandas dependency.
* Cython source is built automatically if cython and compiler are present
* Support use of dates in timeseries models
* Improved plots
- Violin plots
- Bean Plots
- QQ Plots
* Added lowess function
* Support for pandas Series and DataFrame objects. Results instances return
pandas objects if the models are fit using pandas objects.
* Full Python 3 compatibility
* Fix bugs in genfromdta. Convert Stata .dta format to structured array
preserving all types. Conversion is much faster now.
* Improved documentation
* Models and results are pickleable via save/load, optionally saving the model
data.
* Kernel Density Estimation now uses Cython and is considerably faster.
* Diagnostics for outlier and influence statistics in OLS
* Added El Nino Sea Surface Temperatures dataset
* Numerous bug fixes
* Internal code refactoring
* Improved documentation including examples as part of HTML

*Changes that break backwards compatibility*

* Deprecated scikits namespace. The recommended import is now::

import statsmodels.api as sm

* model.predict methods signature is now (params, exog, ...) where before
it assumed that the model had been fit and omitted the params argument.
* For consistency with other multi-equation models, the parameters of MNLogit
are now transposed.
* tools.tools.ECDF -> distributions.ECDF
* tools.tools.monotone_fn_inverter -> distributions.monotone_fn_inverter
* tools.tools.StepFunction -> distributions.StepFunction


0.3.1
-----

* Removed academic-only WFS dataset.
* Fix easy_install issue on Windows.

0.3.0
-----

*Changes that break backwards compatibility*

Added api.py for importing. So the new convention for importing is::

import statsmodels.api as sm

Importing from modules directly now avoids unnecessary imports and increases
the import speed if a library or user only needs specific functions.

* sandbox/output.py -> iolib/table.py
* lib/io.py -> iolib/foreign.py (Now contains Stata .dta format reader)
* family -> families
* families.links.inverse -> families.links.inverse_power
* Datasets' Load class is now load function.
* regression.py -> regression/linear_model.py
* discretemod.py -> discrete/discrete_model.py
* rlm.py -> robust/robust_linear_model.py
* glm.py -> genmod/generalized_linear_model.py
* model.py -> base/model.py
* t() method -> tvalues attribute (t() still exists but raises a warning)

*Main changes and additions*

* Numerous bugfixes.
* Time Series Analysis model (tsa)

- Vector Autoregression Models VAR (tsa.VAR)
- Autogressive Models AR (tsa.AR)
- Autoregressive Moving Average Models ARMA (tsa.ARMA)
optionally uses Cython for Kalman Filtering
use setup.py install with option --with-cython
- Baxter-King band-pass filter (tsa.filters.bkfilter)
- Hodrick-Prescott filter (tsa.filters.hpfilter)
- Christiano-Fitzgerald filter (tsa.filters.cffilter)

* Improved maximum likelihood framework uses all available scipy.optimize solvers
* Refactor of the datasets sub-package.
* Added more datasets for examples.
* Removed RPy dependency for running the test suite.
* Refactored the test suite.
* Refactored codebase/directory structure.
* Support for offset and exposure in GLM.
* Removed data_weights argument to GLM.fit for Binomial models.
* New statistical tests, especially diagnostic and specification tests
* Multiple test correction
* General Method of Moment framework in sandbox
* Improved documentation
* and other additions


0.2.0
-----

*Main changes*

* renames for more consistency
RLM.fitted_values -> RLM.fittedvalues
GLMResults.resid_dev -> GLMResults.resid_deviance
* GLMResults, RegressionResults:
lazy calculations, convert attributes to properties with _cache
* fix tests to run without rpy
* expanded examples in examples directory
* add PyDTA to lib.io -- functions for reading Stata .dta binary files
and converting
them to numpy arrays
* made tools.categorical much more robust
* add_constant now takes a prepend argument
* fix GLS to work with only a one column design

*New*

* add four new datasets

- A dataset from the American National Election Studies (1996)
- Grunfeld (1950) investment data
- Spector and Mazzeo (1980) program effectiveness data
- A US macroeconomic dataset
* add four new Maximum Likelihood Estimators for models with a discrete
dependent variables with examples

- Logit
- Probit
- MNLogit (multinomial logit)
- Poisson

*Sandbox*

* add qqplot in sandbox.graphics
* add sandbox.tsa (time series analysis) and sandbox.regression (anova)
* add principal component analysis in sandbox.tools
* add Seemingly Unrelated Regression (SUR) and Two-Stage Least Squares
for systems of equations in sandbox.sysreg.Sem2SLS
* add restricted least squares (RLS)


0.1.0b1
-------
* initial release

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

statsmodels-0.4.1.zip (4.4 MB view details)

Uploaded Source

statsmodels-0.4.1.tar.gz (4.1 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

statsmodels-0.4.1.win-amd64-py3.2.exe (3.5 MB view details)

Uploaded Source

statsmodels-0.4.1.win-amd64-py2.7.exe (3.5 MB view details)

Uploaded Source

statsmodels-0.4.1.win-amd64-py2.6.exe (3.5 MB view details)

Uploaded Source

statsmodels-0.4.1.win32-py3.2.exe (3.5 MB view details)

Uploaded Source

statsmodels-0.4.1.win32-py2.7.exe (3.5 MB view details)

Uploaded Source

statsmodels-0.4.1.win32-py2.6.exe (3.5 MB view details)

Uploaded Source

File details

Details for the file statsmodels-0.4.1.zip.

File metadata

  • Download URL: statsmodels-0.4.1.zip
  • Upload date:
  • Size: 4.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for statsmodels-0.4.1.zip
Algorithm Hash digest
SHA256 4ab595780eb1fa3725a4b585febe5c13d26b3edbec40f838b53703464f584969
MD5 7f8e5849c90121a5901ec18b91555167
BLAKE2b-256 ab4800a9dd2d57ec60410e38bc4aeb7de975415089ed275a01041f07e34d4b5a

See more details on using hashes here.

File details

Details for the file statsmodels-0.4.1.tar.gz.

File metadata

  • Download URL: statsmodels-0.4.1.tar.gz
  • Upload date:
  • Size: 4.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for statsmodels-0.4.1.tar.gz
Algorithm Hash digest
SHA256 c1c959edb7314a132b65239b5ece103f0c1cb0fc1cf7d3a538040cf53dbcfd0e
MD5 7aca703462c90676faa90a1a823f4ad7
BLAKE2b-256 92dfa6be8bc880acdb2b05c1f8854d1c1885f34f17988b8014c00772cc861b9f

See more details on using hashes here.

File details

Details for the file statsmodels-0.4.1.win-amd64-py3.2.exe.

File metadata

File hashes

Hashes for statsmodels-0.4.1.win-amd64-py3.2.exe
Algorithm Hash digest
SHA256 7c3a780798e7422d460b48f041936e6dbdd65ba45a14d7349c26bc72088d49d3
MD5 4e144dff239f345a43792aea124db0bf
BLAKE2b-256 bf8880f17534eb38a56bba6e6828f3f4586f889b40a0d5607f13e6c400a42613

See more details on using hashes here.

File details

Details for the file statsmodels-0.4.1.win-amd64-py2.7.exe.

File metadata

File hashes

Hashes for statsmodels-0.4.1.win-amd64-py2.7.exe
Algorithm Hash digest
SHA256 86c043c53ec38dceae6619cb9814a5ab337932212393fff7da90841e5abc4626
MD5 7f15eaf568dda6fdaa213df16c8b5a61
BLAKE2b-256 b5416645e18c904e5c5fd8d10f20a3a2db00e5c8a32fb5ba0aa9b8d0c534f6ac

See more details on using hashes here.

File details

Details for the file statsmodels-0.4.1.win-amd64-py2.6.exe.

File metadata

File hashes

Hashes for statsmodels-0.4.1.win-amd64-py2.6.exe
Algorithm Hash digest
SHA256 fd5d7e6143cde02d9fa5e0baf3ce9b3741462f5fee48f05518aebaed799d519d
MD5 0b7921b7c79ccbadb2442028c79cfdb4
BLAKE2b-256 64cd4f0d8d7d50c4116503f8cffae15649e6a22b677e477163cefd8481cf8706

See more details on using hashes here.

File details

Details for the file statsmodels-0.4.1.win32-py3.2.exe.

File metadata

File hashes

Hashes for statsmodels-0.4.1.win32-py3.2.exe
Algorithm Hash digest
SHA256 7adb618f23f9659138b0d648edf083359c5339ea5fb0c8ed60a5d28a6e9f7199
MD5 8d3c209da20093b5ef0d70b6117f1a23
BLAKE2b-256 803202e307002de2d25123bfe8074a978a770812fec7d763bc7ea6a9bb9aa502

See more details on using hashes here.

File details

Details for the file statsmodels-0.4.1.win32-py2.7.exe.

File metadata

File hashes

Hashes for statsmodels-0.4.1.win32-py2.7.exe
Algorithm Hash digest
SHA256 62e8cbe3f6b3496cc345cfce7c8e731584a9c374baf04807c89195dfaa3f9692
MD5 6afd5a4d46b48ce25fef3139d99efb07
BLAKE2b-256 f6274980cfa781f1b73912940dff6a9090e9ade8dcd4ea390009248f63996f21

See more details on using hashes here.

File details

Details for the file statsmodels-0.4.1.win32-py2.6.exe.

File metadata

File hashes

Hashes for statsmodels-0.4.1.win32-py2.6.exe
Algorithm Hash digest
SHA256 64019468fdcf1471044d961e186e11139ac35443c9dc5442861d8a1a3258ca08
MD5 be2b68bb241c5e3b1292405ad931061b
BLAKE2b-256 c6f3c4b01588e8c1ffa70b74d1a84a050202c1da6a1a5b834db0d6a9d01f1035

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page