Skip to main content

An implementation of iSelf-Discover, an instance-level adaptation of the Self-Discover LLM-based reasoning framework. This repository also contains the original Self-Discover approach.

Project description

Effects of structure on reasoning in instance-level Self-Discover

An open-source implementation for the research paper investigating the effects of structured versus unstructured reasoning in Large Language Models using instance-level Self-Discover.

Paper PDF Python Version License Poetry


📖 About The Project

This repository contains the code and experimental setup for the research paper titled "Effects of structure on reasoning in instance-level Self-Discover". The project introduces iSELF-DISCOVER, an instance-level adaptation of the SELF-DISCOVER framework, to empirically evaluate the performance of dynamically generated structured JSON reasoning against its unstructured, natural language counterpart.

Our findings, particularly on benchmarks like MATH, BBH, and a replicated T4D, suggest a consistent advantage for unstructured reasoning plans. This work aims to provide insights into optimal plan generation granularity (instance-level vs. task-level) and the nuanced reliance on structured formats for complex LLM problem-solving.


🚀 Key Features

  • Implementation of the iSELF-DISCOVER framework.
  • Support for both structured (JSON) and unstructured (natural language) reasoning plan generation and execution.
  • Evaluation scripts for benchmarks: BBH, T4D (replicated), and MATH.
  • Integration with models like LLaMA-3.1-405B-Instruct and Mistral-Large via APIs.
  • Configuration for 0-shot and few-shot (e.g., 5-shot) guidance for plan generation.

🛠️ Getting Started

Prerequisites

  • Python (e.g., 3.9+)
  • Poetry (for dependency management)
  • API Keys:
    • MISTRAL_API_KEY
    • LAMBDA_LABS_API_KEY

Installation

  1. Clone the repository:

    git clone https://github.com/anonymous/self-discover.git
    cd self-discover
    
  2. Set up your Python virtual environment: It's recommended to create and activate a virtual environment. For example:

    python -m venv .venv
    source .venv/bin/activate  # On Windows: .venv\Scripts\activate
    
  3. Install dependencies using Poetry:

    poetry install
    
  4. Set up API Keys: Create a .env file in the root of the project with your API keys:

    MISTRAL_API_KEY="YOUR_MISTRAL_API_KEY"
    LAMBDA_LABS_API_KEY="YOUR_LAMBDA_LABS_API_KEY"
    

⚙️ Configuration

Experiments are primarily configured via evals/config.toml.

Key parameters in evals/config.toml:

[MODEL]
# Specifies the model family to use.
# "mistral" will use "mistral-large-2407".
# "llama" will use "llama3.1-405b-instruct-fp8".
model_type = "mistral"

[EVAL]
# batch_size controls the number of instances processed in a single batch.
# This is useful for managing API rate limits or resource usage.
batch_size = 5
# wait_time is the duration (in seconds) to wait if any Exceptions arises during the evaluation process (Can allow breathing room if unexpected API errors occur).
wait_time = 1

🔬 Running Research Experiments

To reproduce the experiments presented in the paper:

  1. Ensure your Poetry environment is active: If you haven't already, activate it:

    source .venv/bin/activate # Or your chosen environment activation command
    # Alternatively, you can prefix commands with `poetry run`
    
  2. Prepare Log Directory: The evaluation scripts are designed to check for current progress in evals/logs and resume. If you want to run experiments from scratch, ensure the evals/logs directory is deleted or empty.

  3. Run evaluation scripts (from the project root directory):

    • To evaluate the iSELF-DISCOVER approach (our proposed method): Use evals/iself_discover_eval.py. Key arguments include:

      • --structured: (Flag, no value) Use structured JSON reasoning. If omitted, defaults to unstructured.
      • --few_shot_examples <N>: Number of few-shot examples to use (e.g., 0 for zero-shot, 5 for five-shot). Defaults to 0.
      • --stream: (Flag, no value) Stream LangGraph steps one by one and log debug messages for each output from every stage. Defaults to False.

      Example (unstructured, 0-shot, run from root):

      python evals/iself_discover_eval.py
      

      Example (structured, 5-shot, with streaming, run from root):

      python evals/iself_discover_eval.py --structured --few_shot_examples 5 --stream
      
    • To evaluate the original SELF-DISCOVER approach (baseline): Use evals/self_discover_eval.py. Key arguments include:

      • --phase <PHASE_VALUE>: (Required for research experiments) Specify the stage of the SELF-DISCOVER framework to run.
        • Use 1 to run Phase I (e.g., task-specific reasoning structure discovery).
        • Use 2 to run Phase II (e.g., solving instances using a discovered structure). While the script might default to running both if this argument is omitted, for research evaluation purposes, you must explicitly specify either 1 or 2.
      • --stream: (Flag, no value) Stream LangGraph steps and log debug messages. Defaults to False.

      Example (running Phase I, no streaming, run from root):

      python evals/self_discover_eval.py --phase 1
      

      Example (running Phase II with streaming, run from root):

      python evals/self_discover_eval.py --phase 2 --stream
      

Datasets

The necessary datasets are available locally in the data/ folder:

  • T4D Dataset (data/t4d/t4d.csv): 564 samples of replicated T4D benchmark
  • BBH Dataset (data/bbh/bbh.csv): 6,511 samples across 25 BigBench Hard subsets
  • MATH Dataset (data/math/math.csv): 200 samples of MATH benchmark subsample

The evaluation scripts are configured to automatically load these datasets sequentially. No manual dataset placement or downloading is required - the scripts will handle loading from the local CSV files using the HuggingFace datasets library.

Expected Output

Experimental results, detailed logs, and any generated reasoning traces will be stored in the evals/logs directory. The structure within this directory should allow for identification of results based on the benchmark, model, and experimental configuration.


📜 License

Distributed under the MIT License. See LICENSE file for more information.


🙏 Acknowledgements

  • Authors of the original SELF-DISCOVER paper (Zhou et al., 2024).
  • Mistral AI for providing a generous free-tier access.
  • The open-source community for tools and libraries that made this work possible.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

self_discover-0.2.1.tar.gz (18.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

self_discover-0.2.1-py3-none-any.whl (17.5 kB view details)

Uploaded Python 3

File details

Details for the file self_discover-0.2.1.tar.gz.

File metadata

  • Download URL: self_discover-0.2.1.tar.gz
  • Upload date:
  • Size: 18.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.12.3 Linux/6.6.87.1-microsoft-standard-WSL2

File hashes

Hashes for self_discover-0.2.1.tar.gz
Algorithm Hash digest
SHA256 9270e9a48d395fcf01e347a712b99d770b3fd96c20dfddce8d7faf346ce73a83
MD5 c901fd5a2930a463ebd4fd4ecda6f68d
BLAKE2b-256 01f7b853a2988e470e5b7dbeae7ed02b19f936fe92ce2a21bf88f5a1d6e014d6

See more details on using hashes here.

File details

Details for the file self_discover-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: self_discover-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 17.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.12.3 Linux/6.6.87.1-microsoft-standard-WSL2

File hashes

Hashes for self_discover-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 67e1dc4ec2473749ed522e074dd07ee37bfee48aa9faa3832fdbc6837fada3c7
MD5 d1ede5dc4b019738eb9b7f012ebffca2
BLAKE2b-256 9d64d1293ed87208df00fd59c6c51c4442136414a8278f82ece1210cc7bed42c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page