An implementation of iSelf-Discover, an instance-level adaptation of the Self-Discover LLM-based reasoning framework. This repository also contains the original Self-Discover approach.
Project description
Effects of structure on reasoning in instance-level Self-Discover
An open-source implementation for the research paper investigating the effects of structured versus unstructured reasoning in Large Language Models using instance-level Self-Discover.
📖 About The Project
This repository contains the code and experimental setup for the research paper titled "Effects of structure on reasoning in instance-level Self-Discover". The project introduces iSELF-DISCOVER, an instance-level adaptation of the SELF-DISCOVER framework, to empirically evaluate the performance of dynamically generated structured JSON reasoning against its unstructured, natural language counterpart.
Our findings, particularly on benchmarks like MATH, BBH, and a replicated T4D, suggest a consistent advantage for unstructured reasoning plans. This work aims to provide insights into optimal plan generation granularity (instance-level vs. task-level) and the nuanced reliance on structured formats for complex LLM problem-solving.
🚀 Key Features
- Implementation of the iSELF-DISCOVER framework.
- Support for both structured (JSON) and unstructured (natural language) reasoning plan generation and execution.
- Evaluation scripts for benchmarks: BBH, T4D (replicated), and MATH.
- Integration with models like LLaMA-3.1-405B-Instruct and Mistral-Large via APIs.
- Configuration for 0-shot and few-shot (e.g., 5-shot) guidance for plan generation.
🛠️ Getting Started
Prerequisites
- Python (e.g., 3.9+)
- Poetry (for dependency management)
- API Keys:
MISTRAL_API_KEYLAMBDA_LABS_API_KEY
Installation
-
Clone the repository:
git clone https://github.com/anonymous/self-discover.git cd self-discover
-
Set up your Python virtual environment: It's recommended to create and activate a virtual environment. For example:
python -m venv .venv source .venv/bin/activate # On Windows: .venv\Scripts\activate
-
Install dependencies using Poetry:
poetry install -
Set up API Keys: Create a
.envfile in the root of the project with your API keys:MISTRAL_API_KEY="YOUR_MISTRAL_API_KEY" LAMBDA_LABS_API_KEY="YOUR_LAMBDA_LABS_API_KEY"
⚙️ Configuration
Experiments are primarily configured via evals/config.toml.
Key parameters in evals/config.toml:
[MODEL]
# Specifies the model family to use.
# "mistral" will use "mistral-large-2407".
# "llama" will use "llama3.1-405b-instruct-fp8".
model_type = "mistral"
[EVAL]
# batch_size controls the number of instances processed in a single batch.
# This is useful for managing API rate limits or resource usage.
batch_size = 5
# wait_time is the duration (in seconds) to wait if any Exceptions arises during the evaluation process (Can allow breathing room if unexpected API errors occur).
wait_time = 1
🔬 Running Research Experiments
To reproduce the experiments presented in the paper:
-
Ensure your Poetry environment is active: If you haven't already, activate it:
source .venv/bin/activate # Or your chosen environment activation command # Alternatively, you can prefix commands with `poetry run`
-
Prepare Log Directory: The evaluation scripts are designed to check for current progress in
evals/logsand resume. If you want to run experiments from scratch, ensure theevals/logsdirectory is deleted or empty. -
Run evaluation scripts (from the project root directory):
-
To evaluate the iSELF-DISCOVER approach (our proposed method): Use
evals/iself_discover_eval.py. Key arguments include:--structured: (Flag, no value) Use structured JSON reasoning. If omitted, defaults to unstructured.--few_shot_examples <N>: Number of few-shot examples to use (e.g.,0for zero-shot,5for five-shot). Defaults to0.--stream: (Flag, no value) Stream LangGraph steps one by one and log debug messages for each output from every stage. Defaults toFalse.
Example (unstructured, 0-shot, run from root):
python evals/iself_discover_eval.pyExample (structured, 5-shot, with streaming, run from root):
python evals/iself_discover_eval.py --structured --few_shot_examples 5 --stream
-
To evaluate the original SELF-DISCOVER approach (baseline): Use
evals/self_discover_eval.py. Key arguments include:--phase <PHASE_VALUE>: (Required for research experiments) Specify the stage of the SELF-DISCOVER framework to run.- Use
1to run Phase I (e.g., task-specific reasoning structure discovery). - Use
2to run Phase II (e.g., solving instances using a discovered structure). While the script might default to running both if this argument is omitted, for research evaluation purposes, you must explicitly specify either1or2.
- Use
--stream: (Flag, no value) Stream LangGraph steps and log debug messages. Defaults toFalse.
Example (running Phase I, no streaming, run from root):
python evals/self_discover_eval.py --phase 1
Example (running Phase II with streaming, run from root):
python evals/self_discover_eval.py --phase 2 --stream
-
Datasets
The necessary datasets are available locally in the data/ folder:
- T4D Dataset (
data/t4d/t4d.csv): 564 samples of replicated T4D benchmark - BBH Dataset (
data/bbh/bbh.csv): 6,511 samples across 25 BigBench Hard subsets - MATH Dataset (
data/math/math.csv): 200 samples of MATH benchmark subsample
The evaluation scripts are configured to automatically load these datasets sequentially. No manual dataset placement or downloading is required - the scripts will handle loading from the local CSV files using the HuggingFace datasets library.
Expected Output
Experimental results, detailed logs, and any generated reasoning traces will be stored in the evals/logs directory. The structure within this directory should allow for identification of results based on the benchmark, model, and experimental configuration.
📜 License
Distributed under the MIT License. See LICENSE file for more information.
🙏 Acknowledgements
- Authors of the original SELF-DISCOVER paper (Zhou et al., 2024).
- Mistral AI for providing a generous free-tier access.
- The open-source community for tools and libraries that made this work possible.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file self_discover-0.2.1.tar.gz.
File metadata
- Download URL: self_discover-0.2.1.tar.gz
- Upload date:
- Size: 18.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.2 CPython/3.12.3 Linux/6.6.87.1-microsoft-standard-WSL2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9270e9a48d395fcf01e347a712b99d770b3fd96c20dfddce8d7faf346ce73a83
|
|
| MD5 |
c901fd5a2930a463ebd4fd4ecda6f68d
|
|
| BLAKE2b-256 |
01f7b853a2988e470e5b7dbeae7ed02b19f936fe92ce2a21bf88f5a1d6e014d6
|
File details
Details for the file self_discover-0.2.1-py3-none-any.whl.
File metadata
- Download URL: self_discover-0.2.1-py3-none-any.whl
- Upload date:
- Size: 17.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.2 CPython/3.12.3 Linux/6.6.87.1-microsoft-standard-WSL2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
67e1dc4ec2473749ed522e074dd07ee37bfee48aa9faa3832fdbc6837fada3c7
|
|
| MD5 |
d1ede5dc4b019738eb9b7f012ebffca2
|
|
| BLAKE2b-256 |
9d64d1293ed87208df00fd59c6c51c4442136414a8278f82ece1210cc7bed42c
|