Specification Curve is a Python package that performs specification curve analysis.
Project description
Specification Curve
Specification Curve is a Python (3.6+) package that performs specification curve analysis.
Free software: MIT license
Documentation: https://specification-curve.readthedocs.io.
Quickstart
You can run this package right now in Google Colab
Running
from specification_curve import specification_curve as specy
from specification_curve import example as scdata
df = scdata.load_example_data1()
y_endog = 'y1'
x_exog = 'x1'
controls = ['c1', 'c2', 'group1', 'group2']
sc = specy.SpecificationCurve(df, y_endog, x_exog, controls,
cat_expand=['group2'])
sc.fit()
sc.plot()
produces
Grey squares (black lines when there are many specifications) show whether a variable is included in a specification or not. Blue markers and error bars show whether the coefficient is significant (0.05).
Here’s another example:
from specification_curve import specification_curve as specy
import numpy as np
import pandas as pd
n_samples = 300
np.random.seed(1332)
x_1 = np.random.random(size=n_samples)
x_2 = np.random.random(size=n_samples)
x_3 = np.random.random(size=n_samples)
x_4 = np.random.randint(2, size=n_samples)
y = (0.8*x_1 + 0.1*x_2 + 0.5*x_3 + x_4*0.6 +
+ 2*np.random.randn(n_samples))
df = pd.DataFrame([x_1, x_2, x_3, x_4, y],
['x_1', 'x_2', 'x_3', 'x_4', 'y']).T
# Set x_4 as a categorical variable
df['x_4'] = df['x_4'].astype('category')
sc = specy.SpecificationCurve(df, 'y', 'x_1', ['x_2', 'x_3', 'x_4'],
cat_expand=['x_4'])
sc.fit()
sc.plot()
Features
These examples use the first set of example data:
from specification_curve import specification_curve as specy
from specification_curve import example as scdata
df = scdata.load_example_data1()
Expand fixed effects into mutually exclusive groups using cat_expand
y_endog = 'y1'
x_exog = 'x1'
controls = ['c1', 'c2', 'group1', 'group2']
sc = specy.SpecificationCurve(df, y_endog, x_exog, controls,
cat_expand=['group1', 'group2'])
sc.fit()
sc.plot()
Mutually exclude two variables using exclu_grp
y_endog = 'y1'
x_exog = 'x1'
controls = ['c1', 'c2', 'group1', 'group2']
sc = specy.SpecificationCurve(df, y_endog, x_exog, controls,
exclu_grps=[['c1', 'c2']])
sc.fit()
sc.plot()
Use multiple independent or dependent variables
x_exog = ['x1', 'x2']
y_endog = 'y1'
controls = ['c1', 'c2', 'group1', 'group2']
sc = specy.SpecificationCurve(df, y_endog, x_exog, controls)
sc.fit()
sc.plot()
Save plots to file (format is inferred from file extension)
sc = specy.SpecificationCurve(df, y_endog, x_exog, controls,
cat_expand=['group1'])
sc.fit()
sc.plot(save_path='test_fig.pdf')
Specification results stored in output DataFrame df_r
sc = specy.SpecificationCurve(df, y_endog, x_exog, controls)
sc.fit()
print(sc.df_r)
Other statsmodels estimators (OLS is the default) can be used
import numpy as np
import pandas as pd
import statsmodels.api as sm
n_samples = 1000
x_2 = np.random.randint(2, size=n_samples)
x_1 = np.random.random(size=n_samples)
x_3 = np.random.randint(3, size=n_samples)
x_4 = np.random.random(size=n_samples)
x_5 = x_1 + 0.05*np.random.randn(n_samples)
x_beta = -1 + 3.5*x_1 + 0.2*x_2 + 0.3*x_3
prob = 1/(1 + np.exp(-x_beta))
y = np.random.binomial(n=1, p=prob, size=n_samples)
y2 = np.random.binomial(n=1, p=prob*0.98, size=n_samples)
df = pd.DataFrame([x_1, x_2, x_3, x_4, x_5, y, y2],
['x_1', 'x_2', 'x_3', 'x_4', 'x_5', 'y', 'y2']).T
y_endog = ['y', 'y2']
x_exog = ['x_1', 'x_5']
controls = ['x_3', 'x_2', 'x_4']
sc = specy.SpecificationCurve(df, y_endog, x_exog, controls,
cat_expand='x_3')
sc.fit(estimator=sm.Logit) # sm.Probit also works
sc.plot()
The style of specification plot flexes for very large numbers of specifications
n_samples = 400
# Number of dimensions of continuous
# random variables
n_dim = 8
c_rnd_vars = np.random.random(size=(n_dim, n_samples))
c_rnd_vars_names = [f'c_{i}' for i in range(np.shape(c_rnd_vars)[0])]
y_1 = (0.3*c_rnd_vars[0, :] +
0.5*c_rnd_vars[1, :])
y_2 = y_1 + 0.05*np.random.randn(n_samples)
df = pd.DataFrame([y_1, y_2], ['y1', 'y2']).T
for i, col_name in enumerate(c_rnd_vars_names):
df[col_name] = c_rnd_vars[i, :]
controls = c_rnd_vars_names[1:]
sc = specy.SpecificationCurve(df, ['y1', 'y2'], c_rnd_vars_names[0],
controls)
sc.fit()
sc.plot()
Always include regressors using the always_include keyword argument
df = scdata.load_example_data1()
x_exog = 'x1'
y_endog = 'y1'
controls = ['c2', 'group1', 'group2']
sc = specy.SpecificationCurve(df, y_endog, x_exog, controls,
always_include='c1')
sc.fit()
sc.plot()
Similar Packages
In RStats, there is specr (which inspired many design choices in this package) and spec_chart. Some of the example data in this package is the same as in specr.
Credits
This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.
History
0.2.5 (2020-11-22)
Added option to always include some regressors
Added option to label preferred specification in plots
0.2.4 (2020-09-15)
Further fix to pick up example data
0.2.3 (2020-09-14)
Fixed examples
Bug fix for including csv data in pypi distribution
0.2.2 (2020-09-02)
More badges in readme
Dropped support for python 3.5
0.2.1 (2020-09-02)
Switched to object oriented design
Now supports range of statsmodels estimators!
Example showing how to save plots to file in docs
Example showing where estimation results are stored in docs
Docs example of very large number of specifications
0.1.1 (2020-08-01)
Multiple independent, dependent, and control variables implemented as lists. Mutually exclusive control variables implemented. Expansions of categorical variables into mutually exclusive fixed effects implemented.
0.1.0 (2020-07-27)
First release on PyPI.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file specification_curve-0.2.5.tar.gz
.
File metadata
- Download URL: specification_curve-0.2.5.tar.gz
- Upload date:
- Size: 177.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.53.0 CPython/3.8.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1c71450b600e6f1c685be466ceb6ae1d357d15ac617332395142bd3ed2becd1a |
|
MD5 | c3d429535e67da58408f8f6c07aa27b5 |
|
BLAKE2b-256 | 470081bb4314a314057f040d2b4bffacd11712cd44e23c61c77ee51e35f84dea |
File details
Details for the file specification_curve-0.2.5-py2.py3-none-any.whl
.
File metadata
- Download URL: specification_curve-0.2.5-py2.py3-none-any.whl
- Upload date:
- Size: 39.6 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.53.0 CPython/3.8.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 21c2b6dc7c86e35eb94040b0f30588d348723b42d58c55514a8724a506d9ad8f |
|
MD5 | 50f90af51914b763d97570d8b44bcc27 |
|
BLAKE2b-256 | 0c96fc804940d346dd9e0945cf5efdca317097db04da6f6f64cd26011ebe4a0c |