Modular pipeline for quantitative signal discovery and validation
Project description
๐ง Edge Research Pipeline
The Edge Research Pipeline is a modular, privacy-first research toolkit for rule mining, pattern discovery, and interpretable machine learning on tabular datasets. It supports automated feature engineering, target labeling, robust validation, and signal discovery workflows across domains including quantitative finance, structured data mining, and subgroup analysis, its techniques are broadly applicable to any domain involving structured data and statistical rule discovery.
๐ Key Features
A flexible, modular Python library enabling you to:
- Clean, normalize, and transform tabular datasets
- Engineer features relevant to finance, statistics, and other structured-data domains
- Generate and label custom targets for supervised tasks
- Discover signals using rule mining and pattern search methods
- Perform robust validation tests (e.g., train/test splits, bootstrap, walk-forward analysis, false discovery rate)
- Reproduce results with complete configuration export and local-only processing
- Efficiently execute parameter grids via function calls or a CLI
๐ Privacy by Design
All computations run locallyโno data ever leaves your environment. Designed explicitly for regulated industries, confidential research, and reproducible workflows.
๐ฆ Installation
Install the latest release from PyPI:
pip install edge-research-pipeline
๐ฆ View on PyPI โ
๐ ๏ธ Advanced Option (Dev/Offline)
To install using the repo-based dependencies file:
pip install -r ./requirements.txt
Note: This file was generated via pipreqs and may need further validation in some environments.
โ ๏ธ Compatibility Notes & Optional Dependencies
This project includes optional support for advanced mining and synthetic data tools like orange3 and synthcity. These libraries are powerful but have strict, conflicting version requirements that cannot be satisfied simultaneously in a single install.
๐งจ Known Conflicts
orange3requiresxgboost >=1.7.4, <2.1synthcityrequiresxgboost >=2.1.0xgbse(a dependency ofsynthcity) enforces this version split- Installing both libraries together will cause
pip installto fail due to an irreconcilable conflict onxgboost
โ Resolution
To avoid these conflicts:
- The core package does not include
orange3orsynthcityby default - You can install them separately using extras:
pip install edge-research-pipeline[orange] # for orange3-based rule data generation
pip install edge-research-pipeline[synth] # for synthetic data workflows
โ ๏ธ Note: Installing both orange3 and synthcity via extras will fail due to incompatible xgboost requirements.
If you need both, install the pipeline without either extra:
pip install edge-research-pipeline
Then manually install each library:
pip install orange3
pip install synthcity
This bypasses pipโs dependency resolver and allows both to coexist โ but may require you to manage compatibility manually.
โ ๏ธ Additional Dependency Warnings
Some third-party tools (e.g., torch, scipy, pandas, databricks, ydata-profiling) may also have mutually incompatible version constraints depending on your environment. We strongly recommend installing this package in a clean virtual environment to prevent dependency resolution issues:
python -m venv erp_env
.\erp_env\Scripts\activate # Windows
# source erp_env/bin/activate # macOS/Linux
pip install edge-research-pipeline
๐งฉ Quick Start Example
Run a full pipeline example via the command line:
python edge_research/pipeline/main.py params/grid_params.yaml
Or check the ready-to-run examples in the examples/ directory.
๐ Project Structure
edge-research-pipeline
โโโ data/ # Sample datasets (sandbox only)
โโโ docs/ # Documentation per module
โโโ edge_research/ # Core logic modules
โ โโโ logger/
โ โโโ pipeline/
โ โโโ preprocessing/
โ โโโ rules_mining/
โ โโโ statistics/
โ โโโ utils/
โ โโโ validation_tests/
โโโ examples/ # Copy-pasteable usage examples
โโโ params/ # Configuration files
โโโ tests/ # Unit tests for major functions
โโโ LICENSE
โโโ README.md
โโโ requirements.txt
Detailed explanations for each subfolder are available within their respective READMEs.
โ๏ธ Configuration Philosophy
Configuration files are managed via YAML files within ./params/:
default_params.yaml: Base configuration with mandatory default values (do not modify)custom_params.yaml: Override specific parameters from defaultsgrid_params.yaml: Parameters specifically for orchestrating grid pipeline runs
Precedence hierarchy:
- For pipeline runs (
pipeline.pyor CLI):grid_params > custom_params > default_params - For direct function calls:
custom_params > default_params
Parameters can also be directly overridden by passing a Python dictionary at runtime.
๐งช Testing
Unit tests cover all major logical functions, ensuring correctness and robustness. Tests are written using pytest. Short utility functions, simple wrappers, and internal helpers are generally not included.
Run tests via:
pytest tests/
๐ค Contributing
We welcome contributions! Follow these guidelines:
- Keep your commits focused and atomic
- Always provide clear, descriptive commit messages
- Add or update tests for any new feature or bug fix
- Follow existing code style (e.g., use
blackandflake8for Python formatting) - Document new functionality thoroughly within the relevant
.mdfile indocs/ - Respect privacy-by-design principlesโno logging or external data exposure
Feel free to open issues for discussions or submit pull requests directly.
๐ License
This project is licensed under the Edge Research Personal Use License (ERPUL).
The Edge Research Pipeline is free for personal and academic use.
Commercial use requires a license.
๐ See PRICING.md for full license tiers and support options.
- โ Free for personal, student, and academic use (with citation)
- ๐ผ Commercial use requires approval (temporarily waived)
- ๐ No redistribution without permission
See LICENSE for full terms.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file edge_research_pipeline-0.1.9.tar.gz.
File metadata
- Download URL: edge_research_pipeline-0.1.9.tar.gz
- Upload date:
- Size: 1.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
de4280027bedec63a5ddc374a332b13b4fc98bfa1dc6fe4dc785e9ab257613ea
|
|
| MD5 |
984d18ba518e7ac0e11b432e104053fd
|
|
| BLAKE2b-256 |
eca71f5f78ddbf3aad5cd8445e3721bdb00b03c3910e38a59ce85a4917fd18f3
|
File details
Details for the file edge_research_pipeline-0.1.9-py3-none-any.whl.
File metadata
- Download URL: edge_research_pipeline-0.1.9-py3-none-any.whl
- Upload date:
- Size: 1.4 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9ff748d4fc0c0ea7a6915b531173a2498945d64f3954561e104af624a2ec232c
|
|
| MD5 |
4974adce9efaeebe2775cc5fe6092b12
|
|
| BLAKE2b-256 |
e38a8d5e73ddae3a115cce89032170454169196ab73dfcfdeb6911650451f2c1
|