Skip to main content

Modular pipeline for quantitative signal discovery and validation

Project description

๐Ÿง  Edge Research Pipeline

The Edge Research Pipeline is a modular, privacy-first research toolkit for rule mining, pattern discovery, and interpretable machine learning on tabular datasets. It supports automated feature engineering, target labeling, robust validation, and signal discovery workflows across domains including quantitative finance, structured data mining, and subgroup analysis, its techniques are broadly applicable to any domain involving structured data and statistical rule discovery.

PyPI version


๐Ÿš€ Key Features

A flexible, modular Python library enabling you to:

  • Clean, normalize, and transform tabular datasets
  • Engineer features relevant to finance, statistics, and other structured-data domains
  • Generate and label custom targets for supervised tasks
  • Discover signals using rule mining and pattern search methods
  • Perform robust validation tests (e.g., train/test splits, bootstrap, walk-forward analysis, false discovery rate)
  • Reproduce results with complete configuration export and local-only processing
  • Efficiently execute parameter grids via function calls or a CLI

๐Ÿ”’ Privacy by Design

All computations run locallyโ€”no data ever leaves your environment. Designed explicitly for regulated industries, confidential research, and reproducible workflows.


๐Ÿ“ฆ Installation

Install the latest release from PyPI:

pip install edge-research-pipeline

๐Ÿ“ฆ View on PyPI โ†’


๐Ÿ› ๏ธ Advanced Option (Dev/Offline)

To install using the repo-based dependencies file:

pip install -r ./requirements.txt

Note: This file was generated via pipreqs and may need further validation in some environments.


โš ๏ธ Compatibility Notes & Optional Dependencies

This project includes optional support for advanced mining and synthetic data tools like orange3 and synthcity. These libraries are powerful but have strict, conflicting version requirements that cannot be satisfied simultaneously in a single install.

๐Ÿงจ Known Conflicts

  • orange3 requires xgboost >=1.7.4, <2.1
  • synthcity requires xgboost >=2.1.0
  • xgbse (a dependency of synthcity) enforces this version split
  • Installing both libraries together will cause pip install to fail due to an irreconcilable conflict on xgboost

โœ… Resolution

To avoid these conflicts:

  • The core package does not include orange3 or synthcity by default
  • You can install them separately using extras:
pip install edge-research-pipeline[orange]     # for orange3-based rule data generation
pip install edge-research-pipeline[synth]      # for synthetic data workflows

โš ๏ธ Note: Installing both orange3 and synthcity via extras will fail due to incompatible xgboost requirements. If you need both, install the pipeline without either extra:

pip install edge-research-pipeline

Then manually install each library:

pip install orange3
pip install synthcity

This bypasses pipโ€™s dependency resolver and allows both to coexist โ€” but may require you to manage compatibility manually.


โš ๏ธ Additional Dependency Warnings

Some third-party tools (e.g., torch, scipy, pandas, databricks, ydata-profiling) may also have mutually incompatible version constraints depending on your environment. We strongly recommend installing this package in a clean virtual environment to prevent dependency resolution issues:

python -m venv erp_env
.\erp_env\Scripts\activate      # Windows
# source erp_env/bin/activate   # macOS/Linux
pip install edge-research-pipeline

๐Ÿงฉ Quick Start Example

Run a full pipeline example via the command line:

python edge_research/pipeline/main.py params/grid_params.yaml

Or check the ready-to-run examples in the examples/ directory.


๐Ÿ“ Project Structure

edge-research-pipeline
โ”œโ”€โ”€ data/                  # Sample datasets (sandbox only)
โ”œโ”€โ”€ docs/                  # Documentation per module
โ”œโ”€โ”€ edge_research/         # Core logic modules
โ”‚   โ”œโ”€โ”€ logger/
โ”‚   โ”œโ”€โ”€ pipeline/
โ”‚   โ”œโ”€โ”€ preprocessing/
โ”‚   โ”œโ”€โ”€ rules_mining/
โ”‚   โ”œโ”€โ”€ statistics/
โ”‚   โ”œโ”€โ”€ utils/
โ”‚   โ””โ”€โ”€ validation_tests/
โ”œโ”€โ”€ examples/              # Copy-pasteable usage examples
โ”œโ”€โ”€ params/                # Configuration files
โ”œโ”€โ”€ tests/                 # Unit tests for major functions
โ”œโ”€โ”€ LICENSE
โ”œโ”€โ”€ README.md
โ””โ”€โ”€ requirements.txt

Detailed explanations for each subfolder are available within their respective READMEs.


โš™๏ธ Configuration Philosophy

Configuration files are managed via YAML files within ./params/:

  • default_params.yaml: Base configuration with mandatory default values (do not modify)
  • custom_params.yaml: Override specific parameters from defaults
  • grid_params.yaml: Parameters specifically for orchestrating grid pipeline runs

Precedence hierarchy:

  • For pipeline runs (pipeline.py or CLI): grid_params > custom_params > default_params
  • For direct function calls: custom_params > default_params

Parameters can also be directly overridden by passing a Python dictionary at runtime.


๐Ÿงช Testing

Unit tests cover all major logical functions, ensuring correctness and robustness. Tests are written using pytest. Short utility functions, simple wrappers, and internal helpers are generally not included.

Run tests via:

pytest tests/

๐Ÿค Contributing

We welcome contributions! Follow these guidelines:

  • Keep your commits focused and atomic
  • Always provide clear, descriptive commit messages
  • Add or update tests for any new feature or bug fix
  • Follow existing code style (e.g., use black and flake8 for Python formatting)
  • Document new functionality thoroughly within the relevant .md file in docs/
  • Respect privacy-by-design principlesโ€”no logging or external data exposure

Feel free to open issues for discussions or submit pull requests directly.


๐Ÿ“„ License

This project is licensed under the Edge Research Personal Use License (ERPUL). The Edge Research Pipeline is free for personal and academic use.
Commercial use requires a license.

๐Ÿ‘‰ See PRICING.md for full license tiers and support options.

  • โœ… Free for personal, student, and academic use (with citation)
  • ๐Ÿ’ผ Commercial use requires approval (temporarily waived)
  • ๐Ÿ”’ No redistribution without permission

See LICENSE for full terms.

License: ERPUL

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

edge_research_pipeline-1.1.0.tar.gz (1.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

edge_research_pipeline-1.1.0-py3-none-any.whl (1.3 MB view details)

Uploaded Python 3

File details

Details for the file edge_research_pipeline-1.1.0.tar.gz.

File metadata

  • Download URL: edge_research_pipeline-1.1.0.tar.gz
  • Upload date:
  • Size: 1.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for edge_research_pipeline-1.1.0.tar.gz
Algorithm Hash digest
SHA256 17dfd8a59a63478cc0fc7ebfa9e63a42c99dcea579fc4cd03e72f3541eb1e5ab
MD5 364609175006c77c45936ebcf06a8244
BLAKE2b-256 7b3232e184ff40766b8ad2106fbf6919a67e5cc2d04c111913932c3ff0e6ea7e

See more details on using hashes here.

File details

Details for the file edge_research_pipeline-1.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for edge_research_pipeline-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 66a89d33f72d3231097c3f416b3e1f8a16e4aa2d40ca1c39baf7c9a490db6d84
MD5 4cf07f44670e19a4c9e07b77d99291a8
BLAKE2b-256 be563013c8247710f367ae99d11a5d9207f23d5bce21aad3e7169508ce2b6ef4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page