Regress Out Covariates
Project description
regressout
regressout removes the linear effect of observed covariates from a feature
matrix. It provides RegressOutCovariates, a scikit-learn-style estimator
that residualizes each feature column against a covariate matrix.
Why it exists
Some modeling workflows need features with variation explained by known covariates removed first. For example, a feature matrix may need to be adjusted for observed variables such as age, sex, ethnicity, batch, site, or other metadata before downstream analysis. This package fits those adjustments and returns the residual feature matrix.
How it works
RegressOutCovariates uses scikit-learn naming, but with domain-specific
meaning:
Xis the covariate or observation matrix: the variables to regress out.yis the feature matrix to residualize.
On fit(X=covariates, y=features), it fits one
sklearn.linear_model.LinearRegression model per feature column:
feature_j ~ covariates
On predict(X=covariates, y=features), it predicts the covariate contribution
for each feature and returns:
feature_j - predicted_feature_j
If y is a pandas DataFrame, the returned residuals are also a DataFrame
with the same index and columns. Otherwise, residuals are returned as a NumPy
array.
Installation
pip install regressout
For local development from this repository:
pip install -r requirements_dev.txt
pip install -e .
The runtime dependencies declared by the package are numpy, pandas, and
scikit-learn; Python 3.8 or newer is required.
Usage
import pandas as pd
from regressout import RegressOutCovariates
covariates = pd.DataFrame(
{
"age": [25, 49, 60, 50],
"sex_M": [1, 0, 1, 0],
},
index=["sample1", "sample2", "sample3", "sample4"],
)
features = pd.DataFrame(
{
"feat1": [1.2, 2.5, 2.9, 3.1],
"feat2": [0.4, 0.7, 1.4, 1.6],
},
index=covariates.index,
)
residualizer = RegressOutCovariates()
residualizer.fit(X=covariates, y=features)
residualized_features = residualizer.predict(X=covariates, y=features)
When covariates need preprocessing, put the preprocessing steps before
RegressOutCovariates in a scikit-learn pipeline. The tests show this pattern
with categorical encoding, column matching, scaling, and then residualization.
Important behavior and limitations
- Covariates must already be numeric when they reach
RegressOutCovariates. Encode categorical variables, impute missing values, or scale covariates in earlier pipeline steps as needed. - The estimator performs independent linear regression for each feature column; it does not model nonlinear effects unless you add nonlinear covariate features before fitting.
- When fitted with pandas DataFrames, it validates row indexes and column order on later predictions where that metadata is available.
- The number of rows in
Xandymust match. The number and order of covariate and feature columns must match what was seen duringfit. - Unlike a standard scikit-learn estimator, both
fitandpredicttake two arguments (predict(X=covariates, y=features)); a single-argumentpredict(X)call will not work, and the class is a predictor rather than atransform-style transformer.
Development
make test
make lint
make docs
The package is MIT licensed.
Changelog
0.0.1
- First release on PyPI.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file regressout-0.0.2.tar.gz.
File metadata
- Download URL: regressout-0.0.2.tar.gz
- Upload date:
- Size: 12.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.25
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4a65d7a835b55d2be54e469d1fd48fb7d561a4197d28f1bda6aafd06f305822f
|
|
| MD5 |
42692477f42c60c57e8cc41f43bef947
|
|
| BLAKE2b-256 |
fe4962c041370dd60088760655bd7962cdacba71aa61377ee88f9decd76e363b
|
File details
Details for the file regressout-0.0.2-py2.py3-none-any.whl.
File metadata
- Download URL: regressout-0.0.2-py2.py3-none-any.whl
- Upload date:
- Size: 6.0 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.25
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
df2ff8235f0125f7368903249dd44e86cb2f986eb1cc9e0e5735cd252a9e78ce
|
|
| MD5 |
3dd6277cabe80268985ac05a5ac1d1e4
|
|
| BLAKE2b-256 |
ad857f16882b49ed5a89e09432876f58d8899efbc0a38960ad34cc9251fb7a6d
|