Skip to main content

Modular pipeline for quantitative signal discovery and validation

Reason this release was yanked:

Broken dependency on Python 2-only Orange==2.7.8. Please use v0.1.1+

Project description

๐Ÿง  Edge Research Pipeline

The Edge Research Pipeline is a modular, privacy-first research toolkit designed for discovering, validating, and analyzing patterns in tabular datasets. Originally built for quantitative finance, its techniques are broadly applicable to any domain involving structured data and statistical rule discovery.


๐Ÿš€ Key Features

A flexible, modular Python library enabling you to:

  • Clean, normalize, and transform tabular datasets
  • Engineer features relevant to finance, statistics, and other structured-data domains
  • Generate and label custom targets for supervised tasks
  • Discover signals using rule mining and pattern search methods
  • Perform robust validation tests (e.g., train/test splits, bootstrap, walk-forward analysis, false discovery rate)
  • Reproduce results with complete configuration export and local-only processing
  • Efficiently execute parameter grids via function calls or a CLI

๐Ÿ”’ Privacy by Design

All computations run locallyโ€”no data ever leaves your environment. Designed explicitly for regulated industries, confidential research, and reproducible workflows.


๐Ÿ“ฆ Installation

Install required dependencies using:

pip install -r ./requirements.txt

Note: Dependencies were generated via pipreqs and may need further validation.


๐Ÿงฉ Quick Start Example

Run a full pipeline example via the command line:

python edge_research/pipeline/main.py params/grid_params.yaml

Or check the ready-to-run examples in the examples/ directory.


๐Ÿ“ Project Structure

edge-research-pipeline
โ”œโ”€โ”€ data/                  # Sample datasets (sandbox only)
โ”œโ”€โ”€ docs/                  # Documentation per module
โ”œโ”€โ”€ edge_research/         # Core logic modules
โ”‚   โ”œโ”€โ”€ logger/
โ”‚   โ”œโ”€โ”€ pipeline/
โ”‚   โ”œโ”€โ”€ preprocessing/
โ”‚   โ”œโ”€โ”€ rules_mining/
โ”‚   โ”œโ”€โ”€ statistics/
โ”‚   โ”œโ”€โ”€ utils/
โ”‚   โ””โ”€โ”€ validation_tests/
โ”œโ”€โ”€ examples/              # Copy-pasteable usage examples
โ”œโ”€โ”€ params/                # Configuration files
โ”œโ”€โ”€ tests/                 # Unit tests for major functions
โ”œโ”€โ”€ LICENSE
โ”œโ”€โ”€ README.md
โ””โ”€โ”€ requirements.txt

Detailed explanations for each subfolder are available within their respective READMEs.


โš™๏ธ Configuration Philosophy

Configuration files are managed via YAML files within ./params/:

  • default_params.yaml: Base configuration with mandatory default values (do not modify)
  • custom_params.yaml: Override specific parameters from defaults
  • grid_params.yaml: Parameters specifically for orchestrating grid pipeline runs

Precedence hierarchy:

  • For pipeline runs (pipeline.py or CLI): grid_params > custom_params > default_params
  • For direct function calls: custom_params > default_params

Parameters can also be directly overridden by passing a Python dictionary at runtime.


๐Ÿงช Testing

Unit tests cover all major logical functions, ensuring correctness and robustness. Tests are written using pytest. Short utility functions, simple wrappers, and internal helpers are generally not included.

Run tests via:

pytest tests/

๐Ÿค Contributing

We welcome contributions! Follow these guidelines:

  • Keep your commits focused and atomic
  • Always provide clear, descriptive commit messages
  • Add or update tests for any new feature or bug fix
  • Follow existing code style (e.g., use black and flake8 for Python formatting)
  • Document new functionality thoroughly within the relevant .md file in docs/
  • Respect privacy-by-design principlesโ€”no logging or external data exposure

Feel free to open issues for discussions or submit pull requests directly.


๐Ÿ“„ License

This project is licensed under the Edge Research Personal Use License (ERPUL).

  • โœ… Free for personal, student, and academic use (with citation)
  • ๐Ÿ’ผ Commercial use requires approval (temporarily waived)
  • ๐Ÿ”’ No redistribution without permission

See LICENSE for full terms.

License: ERPUL

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

edge_research_pipeline-0.1.0.tar.gz (152.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

edge_research_pipeline-0.1.0-py3-none-any.whl (111.8 kB view details)

Uploaded Python 3

File details

Details for the file edge_research_pipeline-0.1.0.tar.gz.

File metadata

  • Download URL: edge_research_pipeline-0.1.0.tar.gz
  • Upload date:
  • Size: 152.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for edge_research_pipeline-0.1.0.tar.gz
Algorithm Hash digest
SHA256 536e152ed65c3a92e0dc41eece7e029a1cb4908285ce7f54368a6bc232cfcd4c
MD5 0fa3f4431051b7e1c4d5a8e14d49064d
BLAKE2b-256 ef4ee5a8bb4c476769ac0489796c1ede9d0be5c3d2c221080f938b73e64cea40

See more details on using hashes here.

File details

Details for the file edge_research_pipeline-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for edge_research_pipeline-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b1c2db63534698344e7b89d7988221ff5644095a006d1468ded08d2737745f44
MD5 15c91d1edb70e9db0b5c09beb12a2140
BLAKE2b-256 bc496ff14502a2e4054df596d0d9dd5a97c71ff485e46527f3487ec9ed07b1c5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page