Skip to main content

Specification Curve is a Python package that performs specification curve analysis.

Project description

Specification Curve

https://img.shields.io/pypi/v/specification_curve.svg https://img.shields.io/travis/aeturrell/specification_curve.svg Documentation Status Downloads Support Python versions https://colab.research.google.com/assets/colab-badge.svg

Specification Curve is a Python (3.6+) package that performs specification curve analysis.

Quickstart

You can run this package right now in Google Colab

Running

from specification_curve import specification_curve as specy
from specification_curve import example as scdata
df = scdata.load_example_data1()
y_endog = 'y1'
x_exog = 'x1'
controls = ['c1', 'c2', 'group1', 'group2']
sc = specy.SpecificationCurve(df, y_endog, x_exog, controls,
                              cat_expand=['group2'])
sc.fit()
sc.plot()

produces

https://raw.githubusercontent.com/aeturrell/specification_curve/master/docs/images/example.png

Grey squares (black lines when there are many specifications) show whether a variable is included in a specification or not. Blue markers and error bars show whether the coefficient is significant (0.05).

Here’s another example:

from specification_curve import specification_curve as specy
import numpy as np
import pandas as pd
n_samples = 300
np.random.seed(1332)
x_1 = np.random.random(size=n_samples)
x_2 = np.random.random(size=n_samples)
x_3 = np.random.random(size=n_samples)
x_4 = np.random.randint(2, size=n_samples)
y = (0.8*x_1 + 0.1*x_2 + 0.5*x_3 + x_4*0.6 +
     + 2*np.random.randn(n_samples))
df = pd.DataFrame([x_1, x_2, x_3, x_4, y],
                  ['x_1', 'x_2', 'x_3', 'x_4', 'y']).T
# Set x_4 as a categorical variable
df['x_4'] = df['x_4'].astype('category')
sc = specy.SpecificationCurve(df, 'y', 'x_1', ['x_2', 'x_3', 'x_4'],
                              cat_expand=['x_4'])
sc.fit()
sc.plot()

Features

These examples use the first set of example data:

from specification_curve import specification_curve as specy
from specification_curve import example as scdata
df = scdata.load_example_data1()
  • Expand fixed effects into mutually exclusive groups using cat_expand

y_endog = 'y1'
x_exog = 'x1'
controls = ['c1', 'c2', 'group1', 'group2']
sc = specy.SpecificationCurve(df, y_endog, x_exog, controls,
                              cat_expand=['group1', 'group2'])
sc.fit()
sc.plot()
  • Mutually exclude two variables using exclu_grp

y_endog = 'y1'
x_exog = 'x1'
controls = ['c1', 'c2', 'group1', 'group2']
sc = specy.SpecificationCurve(df, y_endog, x_exog, controls,
                                  exclu_grps=[['c1', 'c2']])
sc.fit()
sc.plot()
  • Use multiple independent or dependent variables

x_exog = ['x1', 'x2']
y_endog = 'y1'
controls = ['c1', 'c2', 'group1', 'group2']
sc = specy.SpecificationCurve(df, y_endog, x_exog, controls)
sc.fit()
sc.plot()
  • Save plots to file (format is inferred from file extension)

sc = specy.SpecificationCurve(df, y_endog, x_exog, controls,
                                  cat_expand=['group1'])
sc.fit()
sc.plot(save_path='test_fig.pdf')
  • Specification results stored in output DataFrame df_r

sc = specy.SpecificationCurve(df, y_endog, x_exog, controls)
sc.fit()
print(sc.df_r)
  • Other statsmodels estimators (OLS is the default) can be used

import numpy as np
import pandas as pd
import statsmodels.api as sm
n_samples = 1000
x_2 = np.random.randint(2, size=n_samples)
x_1 = np.random.random(size=n_samples)
x_3 = np.random.randint(3, size=n_samples)
x_4 = np.random.random(size=n_samples)
x_5 = x_1 + 0.05*np.random.randn(n_samples)
x_beta = -1 + 3.5*x_1 + 0.2*x_2 + 0.3*x_3
prob = 1/(1 + np.exp(-x_beta))
y = np.random.binomial(n=1, p=prob, size=n_samples)
y2 = np.random.binomial(n=1, p=prob*0.98, size=n_samples)
df = pd.DataFrame([x_1, x_2, x_3, x_4, x_5, y, y2],
                  ['x_1', 'x_2', 'x_3', 'x_4', 'x_5', 'y', 'y2']).T
y_endog = ['y', 'y2']
x_exog = ['x_1', 'x_5']
controls = ['x_3', 'x_2', 'x_4']
sc = specy.SpecificationCurve(df, y_endog, x_exog, controls,
                              cat_expand='x_3')
sc.fit(estimator=sm.Logit)  # sm.Probit also works
sc.plot()
  • The style of specification plot flexes for very large numbers of specifications

n_samples = 400
# Number of dimensions of continuous
# random variables
n_dim = 8
c_rnd_vars = np.random.random(size=(n_dim, n_samples))
c_rnd_vars_names = [f'c_{i}' for i in range(np.shape(c_rnd_vars)[0])]
y_1 = (0.3*c_rnd_vars[0, :] +
       0.5*c_rnd_vars[1, :])
y_2 = y_1 + 0.05*np.random.randn(n_samples)
df = pd.DataFrame([y_1, y_2], ['y1', 'y2']).T
for i, col_name in enumerate(c_rnd_vars_names):
    df[col_name] = c_rnd_vars[i, :]
controls = c_rnd_vars_names[1:]
sc = specy.SpecificationCurve(df, ['y1', 'y2'], c_rnd_vars_names[0],
                              controls)
sc.fit()
sc.plot()
  • Always include regressors using the always_include keyword argument

df = scdata.load_example_data1()
x_exog = 'x1'
y_endog = 'y1'
controls = ['c2', 'group1', 'group2']
sc = specy.SpecificationCurve(df, y_endog, x_exog, controls,
                                always_include='c1')
sc.fit()
sc.plot()

Similar Packages

In RStats, there is specr (which inspired many design choices in this package) and spec_chart. Some of the example data in this package is the same as in specr.

Credits

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.

History

0.2.6 (2021-05-05)

  • Updates of some dev packages

  • Release on Zenodo to create DOI

0.2.5 (2020-11-22)

  • Added option to always include some regressors

  • Added option to label preferred specification in plots

0.2.4 (2020-09-15)

  • Further fix to pick up example data

0.2.3 (2020-09-14)

  • Fixed examples

  • Bug fix for including csv data in pypi distribution

0.2.2 (2020-09-02)

  • More badges in readme

  • Dropped support for python 3.5

0.2.1 (2020-09-02)

  • Switched to object oriented design

  • Now supports range of statsmodels estimators!

  • Example showing how to save plots to file in docs

  • Example showing where estimation results are stored in docs

  • Docs example of very large number of specifications

0.1.1 (2020-08-01)

  • Multiple independent, dependent, and control variables implemented as lists. Mutually exclusive control variables implemented. Expansions of categorical variables into mutually exclusive fixed effects implemented.

0.1.0 (2020-07-27)

  • First release on PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

specification_curve-0.2.6.tar.gz (177.9 kB view details)

Uploaded Source

Built Distribution

specification_curve-0.2.6-py2.py3-none-any.whl (39.6 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file specification_curve-0.2.6.tar.gz.

File metadata

  • Download URL: specification_curve-0.2.6.tar.gz
  • Upload date:
  • Size: 177.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.8.7

File hashes

Hashes for specification_curve-0.2.6.tar.gz
Algorithm Hash digest
SHA256 0118338e64d1b131bac09c2000838e7eecbdaeb6da06da75f1aeea67890f149f
MD5 57dfa8db772e3fe5747dfd5f5dc5ed0d
BLAKE2b-256 b859b537380e32450a42cc0cd156ffdd728513b1d53cdd1d73ad598c1838863a

See more details on using hashes here.

File details

Details for the file specification_curve-0.2.6-py2.py3-none-any.whl.

File metadata

  • Download URL: specification_curve-0.2.6-py2.py3-none-any.whl
  • Upload date:
  • Size: 39.6 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.8.7

File hashes

Hashes for specification_curve-0.2.6-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 f7ecb3c47c3e10f78ef7a9573ee495a70d173943450defb181cc228208efa945
MD5 7c13eb99557d064adff70bfac00eee5e
BLAKE2b-256 686e57cb0d596831db4b58650498f4bc6caded0e4028efa51a6c016d865fd3b9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page