A Python library for hypothesis testing with automated assumptions
Project description
Hypotest
A Python library for deterministic hypothesis testing with automatic assumption checking and optional LLM-based interpretation.
Hypotest provides a clean statistical engine designed for data scientists, researchers, and engineers who need reliable and reproducible statistical testing workflows.
Overview
Hypotest simplifies hypothesis testing by providing:
A deterministic statistical engine
Automatic assumption validation (normality, variance homogeneity)
Structured result objects with statistical metadata
Optional LLM-based interpretation layer
A safe Dataset abstraction for robust data handling
All statistical computations are deterministic and independent of LLM usage.
Installation
Once published:
pip install hypotest
Development install:
git clone https://github.com/chikku1234568/Unified-EDA-HypoTest-LM-Library cd hypotest pip install -e .
Optional LLM support:
pip install hypotest[llm]
Quick Start Example: Independent t-test import pandas as pd import numpy as np
import hypotest from hypotest.core.dataset import Dataset from hypotest.tests.parametric.ttest import TTest
Create example dataset
df = pd.DataFrame({ "group": ["A"] * 100 + ["B"] * 100, "value": np.concatenate([ np.random.normal(0, 1, 100), np.random.normal(1, 1, 100), ]) })
Wrap DataFrame in Dataset abstraction
dataset = Dataset(df)
Run t-test
test = TTest()
result = test.execute( dataset=dataset, target="value", features=["group"], )
print(result)
Output:
TestResult(test='Independent t-test', feature='group', statistic=4.231, p=0.00003, significant)
Automatic Assumption Checking
Hypotest automatically checks statistical assumptions before or during test execution.
for assumption in result.assumptions: print(assumption.assumption_name, assumption.passed)
Example output:
normality True homoscedasticity False
Each assumption provides:
statistical result
interpretation
recommendation
Optional LLM Interpretation
Hypotest can generate natural-language explanations using any OpenAI-compatible provider.
Example using DeepSeek:
hypotest.configure( llm_api_key="your-api-key", llm_base_url="https://api.deepseek.com/v1", llm_model="deepseek-chat", enable_llm_interpretation=True, )
print(result.explain())
Example output:
The independent t-test indicates a statistically significant difference between the two groups...
LLM interpretation is optional and does not affect statistical computation.
Configuration
Configure hypotest globally:
hypotest.configure( llm_api_key="your-key", llm_base_url="https://api.deepseek.com/v1", llm_model="deepseek-chat", enable_llm_interpretation=True, )
View configuration:
print(hypotest.info())
Dataset Abstraction
Hypotest uses a Dataset wrapper to provide safe data handling:
from hypotest.core.dataset import Dataset
dataset = Dataset(df)
This enables:
safe missing value handling
validation before test execution
future extensibility
Supported Tests (Current MVP)
Independent t-test
Planned:
Welch's t-test
Mann-Whitney U test
ANOVA
Chi-square test
Correlation tests
Features
Core features implemented:
Deterministic statistical engine
Automatic assumption checking
Structured TestResult objects
Dataset abstraction layer
Plug-in test registry system
Optional LLM interpretation
Planned features:
Automatic test recommendation
Effect size library
Automated reporting
Additional statistical tests
Example: Full Workflow import pandas as pd import numpy as np import hypotest
from hypotest.core.dataset import Dataset from hypotest.tests.parametric.ttest import TTest
hypotest.configure(enable_llm_interpretation=False)
df = pd.DataFrame({ "group": ["A"] * 50 + ["B"] * 50, "value": np.random.randn(100), })
dataset = Dataset(df)
test = TTest()
result = test.execute(dataset, "value", ["group"])
print(result)
for a in result.assumptions: print(a.assumption_name, a.passed)
print(result.explain()) # None if LLM disabled
Project Structure hypotest/ ├── core/ │ ├── dataset.py │ ├── result.py │ ├── tests/ │ ├── parametric/ │ ├── ttest.py │ ├── assumptions/ │ ├── normality.py │ ├── variance.py │ ├── llm/ │ ├── client.py │ ├── interpreter.py │ ├── config/ │ ├── manager.py │ ├── info.py
Requirements
Python ≥ 3.10
pandas ≥ 1.5
numpy ≥ 1.21
scipy ≥ 1.9
Optional:
openai-compatible client (for LLM interpretation)
Philosophy
Hypotest separates:
Deterministic statistical computation
Probabilistic natural-language interpretation
This ensures statistical correctness while enabling explainability.
License
MIT License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file lm_hypotest-0.1.0.tar.gz.
File metadata
- Download URL: lm_hypotest-0.1.0.tar.gz
- Upload date:
- Size: 33.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fe76a933e85242e8eb9aabb8a38e2ac8a3b54f01daac69f3755cb56223072dbf
|
|
| MD5 |
f9aaca2c4495ff21169e6aea2ebeb4c4
|
|
| BLAKE2b-256 |
f968f2e68e569cd96f7543a6d0b8dcb7e86bbb1b2764861ab1b4dba543bb53e4
|
File details
Details for the file lm_hypotest-0.1.0-py3-none-any.whl.
File metadata
- Download URL: lm_hypotest-0.1.0-py3-none-any.whl
- Upload date:
- Size: 44.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
949779172164918cd7fce9e6925cd83373448438866b6b3ca66f11dfeb24ff4a
|
|
| MD5 |
a4f5ec30621dee3f190b106be7ddf567
|
|
| BLAKE2b-256 |
25799d32573555000f7b9a07af970a3b88fad3dd32442924f97c1ef69010ec1d
|