A smart and lightweight toolkit for regression modeling, statistical summaries, and Excel annotations.
Project description
regmonkey
regmonkey is a lightweight Python package designed to streamline data analysis and regression modeling tasks. It simplifies tasks like descriptive statistics, dummy variable creation, regression model estimation, and exporting annotated Excel files.
Features
- 📊 Descriptive Statistics: Easily generate clean and rounded summary statistics from a DataFrame.
- 🧠 Smart Variable Parsing: Automatically interpret log transformations, polynomial terms, and interaction terms.
- 📦 Dummy Variable Creation: Quickly convert categorical variables into dummy/indicator variables.
- 📈 Regression Analysis: Run multiple linear regressions with support for log, power, and interaction terms, and export well-formatted result tables.
- 📄 Excel Footer Annotation: Add footnotes to Excel files automatically.
- 🌐 Multilingual Support: Specify variables and get results in English, Japanese, or Chinese, with automatic language detection and consistent output formatting.
Installation
pip install regmonkey
Functions
add_footer(file_path, value, sheet_name=None)
Adds a footer note to the last row of an Excel sheet.
Arguments:
file_path(str): Path to the Excel file.value(str): The content of the footer note.sheet_name(str, optional): Name of the sheet to modify. Defaults to the first sheet.
Example:
from regmonkey.stats import add_footer
add_footer("example.xlsx", "Note: Data is preliminary.", sheet_name="Sheet1")
get_dummies(df, columns)
Converts categorical variables into dummy/indicator variables.
Arguments:
df(DataFrame): The input DataFrame.columns(list of str): List of column names to convert.
Returns:
- A new DataFrame with dummy variables.
Example:
from regmonkey.stats import get_dummies
import pandas as pd
data = pd.DataFrame({"Year": ["2020", "2021", "2022"], "Value": [10, 20, 30]})
dummies = get_dummies(data, columns=["Year"])
print(dummies)
summary(df, var_list)
Generates descriptive statistics for a list of variables.
Arguments:
df(DataFrame): The input DataFrame.var_list(list of str): List of variable names to summarize.
Returns:
- A DataFrame containing descriptive statistics.
Example:
from regmonkey.stats import summary
import pandas as pd
data = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})
stats = summary(data, ["A", "B"])
print(stats)
regression(variables_dicts, df, decimal_places=2)
Performs regression analysis with support for log, power, and interaction terms.
Arguments:
variables_dicts(list of dict): List of dictionaries specifying dependent and independent variables.- Multilingual Support: You can use different language keys for variables:
- Japanese:
{"被説明変数": "Y", "説明変数": ["X1", "X2"]} - English:
{"y": "Y", "X": ["X1", "X2"]} - Chinese:
{"被解释变量": "Y", "解释变量": ["X1", "X2"]}
- Japanese:
- Log terms: Use
log(X1)for the natural logarithm ofX1. - Power terms: Use
X2**2orX2**3for squared or cubed terms. - Dummy variables: For binary variables, use them directly (e.g.,
['Gender']). For categorical variables, preprocess the DataFrame usingget_dummies(e.g.,['Year_1990', 'Year_1995']). - Interaction terms: Use
X1:X2for the interaction betweenX1andX2, or combine transformations likeX1**2:log(X2).
- Multilingual Support: You can use different language keys for variables:
df(DataFrame): The input data.decimal_places(int): Number of decimal places for results.
Returns:
- A tuple containing:
- Processed DataFrame.
- Descriptive statistics for used variables.
- Regression results table.
Example:
from regmonkey.stats import regression
import pandas as pd
# Sample data
data = pd.DataFrame({
"X1": [1, 2, 3],
"X2": [4, 5, 6],
"Y": [7, 8, 9],
"Category": ["A", "B", "A"]
})
# Preprocess categorical variables
data_with_dummies = pd.get_dummies(data, columns=["Category"])
# Define regression variables (Japanese)
variables_ja = [
{"被説明変数": "Y", "説明変数": ["X1", "X2", "log(X1)", "X2**2", "X1:X2"]}
]
# Define regression variables (English)
variables_en = [
{"y": "Y", "X": ["X1", "X2", "log(X1)", "X2**2", "X1:X2"]}
]
# Define regression variables (Chinese)
variables_zh = [
{"被解释变量": "Y", "解释变量": ["X1", "X2", "log(X1)", "X2**2", "X1:X2"]}
]
# Perform regression (using any of the above variable definitions)
df_processed, summary_result, regression_result = regression(variables_ja, data_with_dummies)
# Print results
print(regression_result)
In this example:
- The regression can be specified using any of the supported languages (Japanese, English, or Chinese).
log(X1)computes the natural logarithm ofX1.X2**2computes the square ofX2.X1:X2computes the interaction betweenX1andX2.- Dummy variables for
Categoryare automatically created usingget_dummies. - The output labels (e.g., "観測数"/"N"/"样本数") will automatically match the language of the input keys.
Usage Example
import pandas as pd
from regmonkey.stats import get_dummies, regression, add_footer
# Load data
data = pd.DataFrame({
"X1": [1, 2, 3],
"X2": [4, 5, 6],
"Y": [7, 8, 9],
"Category": ["A", "B", "A"]
})
# Create dummy variables
data_with_dummies = get_dummies(data, columns=["Category"])
# Perform regression with log, power, and interaction terms (using English keys)
variables = [
{"y": "Y", "X": ["X1", "X2", "log(X1)", "X2**2", "X1:X2"]}
]
df_processed, summary_result, regression_result = regression(variables, data_with_dummies)
# Save regression results to Excel and add a footer
regression_result.to_excel("regression_results.xlsx", index=False)
add_footer("regression_results.xlsx", "Note: Regression results include log, power, and interaction terms.")
# Save summary statistics to Excel and add a footer
summary_result.to_excel("summary_statistics.xlsx", index=False)
add_footer("summary_statistics.xlsx", "Note: Summary statistics for all variables used in the regression analysis.")
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file regmonkey-0.1.1.tar.gz.
File metadata
- Download URL: regmonkey-0.1.1.tar.gz
- Upload date:
- Size: 9.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
af9e71b59eeb3dd47c7be61a47a155dfb95c8ea2abfb0dde6cf96461b983b870
|
|
| MD5 |
ec1bf78778b50063e4eb5a83875b7a8b
|
|
| BLAKE2b-256 |
974ef23d7df40264dfcebd1d89b975c867909502c252b86e8a4b20eb6ada9eea
|
File details
Details for the file regmonkey-0.1.1-py3-none-any.whl.
File metadata
- Download URL: regmonkey-0.1.1-py3-none-any.whl
- Upload date:
- Size: 8.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
587358ba13f5bbe80700558b3c9989347e70bbbb476c33d50deeda3ff0b9c278
|
|
| MD5 |
f1da79ecc5696bda50eafef1d6a18d20
|
|
| BLAKE2b-256 |
f7cb00b21c6b0561065f906509b4356f19616f9c76e11302c9a960246b2e303c
|