IdeaSearch Framework - An AI-powered research idea generation and optimization system
Project description
About IdeaSearch
Project Overview
IdeaSearch framework is an AI-powered Research Idea Generation and Optimization System. In IdeaSearch, an Idea specifically refers to a text file with the .idea extension, which contains the creative content readable and processable by the system. It is inspired by the FunSearch framework, launched by Google in 2023. The FunSearch framework pioneered the discovery of new mathematical structures and algorithms by combining large language models with evaluation programs. Building upon this foundation, IdeaSearch aims to construct a more user-friendly and streamlined integrated framework, with the goal of supporting various fields in scientific research and education.
Compared to FunSearch, IdeaSearch introduces several innovative features that significantly enhance the system's flexibility and exploratory capabilities, including:
- Generation Bonus (generation_bonus): Provides additional score rewards for newly generated Ideas. This mechanism encourages the system to continuously explore and produce novel, more vibrant Ideas, effectively preventing it from getting stuck in local optima too early, and promoting broad exploration of the Idea space.
- Mutation (mutation): Introduces randomness, allowing minor modifications and perturbations to existing Ideas. This injects serendipity and diversity into the Idea search process, helping to discover unexpected new directions or optimize existing Ideas, even in seemingly saturated Idea spaces.
- Crossover (crossover): Generates new hybrid Ideas by combining elements from two or more existing Ideas. This classic operation from genetic algorithms is enhanced in
IdeaSearch; it facilitates more complex evolutionary paths, capable of merging the strengths of different excellent Ideas to produce novel combinations beyond the limitations of single Ideas. - Prompt Prologue (prologue_section) and Epilogue (epilogue_section): Allow users to more flexibly and modularly define the opening and closing parts of the prompts sent to the large language model. This makes it easy for users to provide context, set task objectives, or guide output format without rewriting the entire prompt each time. Furthermore, if the user chooses, they can completely control the prompt generation logic through a custom
generate_prompt_funcfunction, offering great flexibility for various complex scenarios. - Evaluator Information (evaluator info): Beyond providing a quantified score for an Idea, the evaluation function can now return additional string information. This allows users not only to know the quantitative 'goodness' of an Idea but also to understand 'why it's good,' 'where it can be improved,' or 'what makes it unique' through this supplementary information, providing richer context and deeper insights for subsequent Idea optimization and system analysis.
Key Features
- Multi-Island Parallel Search: Supports the creation of multiple independent "islands," each equipped with its own Samplers and Evaluators, to explore the Idea space in parallel, enhancing search efficiency and diversity.
- Large Language Model (LLM) Integration: Manages various LLM models via
ModelManagerand dynamically selects models for Idea generation. - Evolutionary Strategies:
- Sampling: Selects high-quality Ideas as references for generating new ones, based on their scores and temperature parameters.
- Evaluation: Scores each generated Idea using a user-defined
evaluate_func, with optional additional information return. - Mutation: Modifies Ideas slightly through a user-defined
mutation_functo introduce diversity. - Crossover: Combines existing Ideas using a user-defined
crossover_functo produce new hybrid Ideas.
- Dynamic Model Selection and Assessment: Adjusts the probability of models being selected in future rounds based on their performance in generating high-quality Ideas, encouraging better-performing models. Provides visualization of model scores.
- System Overall Assessment and Visualization: Periodically evaluates the overall quality of the entire Idea database using a user-defined
assess_funcand generates charts to display quality trends throughout the evolutionary process. - Data Persistence and Backup: Automatically manages Idea files and score data, supporting backup functionalities to ensure data security during the search process.
- Highly Configurable: Offers a rich set of parameters (via
set_methods) for users to customize search behavior, including model temperatures, sampling strategies, evaluation intervals, similarity thresholds, and more. - Internationalization Support: Includes built-in
gettextinternationalization mechanism, supporting multiple languages (currentlyzh_CNfor Simplified Chinese andenfor English).
Core API
The following methods in the IdeaSearcher class are marked as "⭐️ Important" and constitute the primary interface for user interaction:
__init__():- Function: Initializes an
IdeaSearcherinstance. Sets all default parameters and initializes internal states such as locks, model manager, and island dictionary. - Importance: The class constructor, serving as the starting point for all search parameters.
- Function: Initializes an
run(additional_interaction_num: int):- Function: Initiates the Idea search process. This method initializes and runs the samplers for all islands, allowing each island to evolve for
additional_interaction_numepochs (rounds). - Importance: The core method to start the entire Idea search cycle.
- Function: Initiates the Idea search process. This method initializes and runs the samplers for all islands, allowing each island to evolve for
load_models():- Function: Loads API keys for all LLM models from the configuration file specified by
api_keys_path. - Importance: This method must be called before any LLM interaction can occur.
- Function: Loads API keys for all LLM models from the configuration file specified by
add_island():- Function: Adds a new island to the system and returns its
island_id. The first time an island is added, it performs necessary initialization cleanup (e.g., clearing logs, old Idea directories, and backups). - Importance: Defines the parallelism of the search and creates independent search units.
- Function: Adds a new island to the system and returns its
delete_island(island_id: int):- Function: Deletes the island with the specified
island_idfrom the system. - Importance: Allows for dynamic management of search resources and strategies.
- Function: Deletes the island with the specified
repopulate_islands():- Function: Redistributes Ideas among islands. This method sorts all islands by their "best score" and then copies the best Ideas from the top half of islands to the bottom half, promoting Idea sharing and preventing local optima stagnation.
- Importance: A crucial operation in evolutionary algorithms for global optimization.
get_best_score():- Function: Returns the highest Idea score across all islands.
- Importance: Retrieves the quality metric of the best Idea found so far.
get_best_idea():- Function: Returns the content of the Idea with the highest score across all islands.
- Importance: Retrieves the content of the best Idea found so far.
Important Configuration Parameters
IdeaSearcher provides a rich set of set_ methods to configure its behavior. The following are the most critical parameters that must be set before calling the run() method:
-
set_program_name(value: str): The name of the project, used for logging and identification. -
set_prologue_section(value: str): The fixed introductory part of the prompt for LLMs. -
set_epilogue_section(value: str): The fixed concluding part of the prompt for LLMs.- Note:
prologue_sectionandepilogue_sectioncan be replaced by a customgenerate_prompt_func. Ifgenerate_prompt_funcis set, these two parameters can beNone.
- Note:
-
set_database_path(value: str): The root path for the Idea database. The system will automatically create subdirectories under this path:ideas/(for Idea files),data/(for data files like evaluation results),pic/(for images like trend charts), andlog/(for logs). -
set_models(value: List[str]): A list of LLM model names participating in Idea generation. -
set_model_temperatures(value: List[float]): The sampling temperature for each model, corresponding to themodelslist. The length of this list must match the length of themodelslist. -
set_evaluate_func(value: Callable[[str], Tuple[float, Optional[str]]]): A function used to evaluate a single Idea. It accepts an Idea string as input and returns a tuple containing a float score and optional additional information (Optional[str]). -
set_api_keys_path(value: str): The path to the JSON/YAML file containing API key configurations.API Keys File Format (
api_keys.json): This file should be a JSON object where each top-level key corresponds to a unique model alias that you will use inset_models(). The value for each model alias is a list of dictionaries, where each dictionary represents an instance of that model. This allows you to configure multiple instances (e.g., with different API keys or base URLs) for the same logical model, and the system will automatically manage them.{ "Deepseek_V3": [ { "api_key": "sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx", "base_url": "https://api.deepseek.com/v1", "model": "deepseek-chat" } ], "Deepseek_R1": [ { "api_key": "sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx", "base_url": "https://api.deepseek.com/v1", "model": "deepseek-reasoner" } ], "Qwen_Plus": [ { "api_key": "sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx", "base_url": "https://dashscope.aliyuncs.com/compatible-mode/v1", "model": "qwen-plus" } ], "Qwen_Max": [ { "api_key": "sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx", "base_url": "https://dashscope.aliyuncs.com/compatible-mode/v1", "model": "qwen-max" } ], "Gemini_2.0_Flash_Thinking_Experimental": [ { "api_key": "AIzaSyXXXX-XXXXXXXXXXXXXXXXXXXXXXXX", "base_url": "https://generativelanguage.googleapis.com/v1beta/", "model": "gemini-2.0-flash-thinking-exp" } ], "Gemini_1.5_Flash": [ { "api_key": "AIzaSyXXXX-XXXXXXXXXXXXXXXXXXXXXXXX", "base_url": "https://generativelanguage.googleapis.com/v1beta/", "model": "gemini-1.5-flash" } ], "Moonshot_V1_32k": [ { "api_key": "sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx", "base_url": "https://api.moonshot.cn/v1", "model": "moonshot-v1-32k" } ], "Moonshot_V1_8k": [ { "api_key": "sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx", "base_url": "https://api.moonshot.cn/v1", "model": "moonshot-v1-32k" } ] }
All parameters have corresponding get_ methods to retrieve their current values. For example, ideasearcher.get_program_name().
Workflow Overview
A typical IdeaSearch usage flow is as follows:
-
Instantiation:
from IdeaSearch.ideasearcher import IdeaSearcher ideasearcher = IdeaSearcher()
-
Configuration of Core Parameters:
# Example evaluation function def my_custom_evaluate_function(idea: str) -> Tuple[float, Optional[str]]: # Your evaluation logic goes here, returning a score and optional info score = len(idea) # Simple example: Longer Idea gets higher score info = "Idea length is " + str(len(idea)) return float(score), info ideasearcher.set_program_name("My AI Idea Project") ideasearcher.set_database_path("./my_idea_database") # Ensure this path exists and contains an ideas/initial_ideas/ subdirectory ideasearcher.set_models(["gpt-3.5-turbo", "gpt-4"]) ideasearcher.set_model_temperatures([0.7, 0.5]) ideasearcher.set_evaluate_func(my_custom_evaluate_function) ideasearcher.set_api_keys_path("./api_keys.json") # Assuming your API key file is here ideasearcher.set_prologue_section("Please generate innovative business Ideas for the following topic, be concise:") ideasearcher.set_epilogue_section("Each Idea must start with 'Idea:'.") # More optional configurations... ideasearcher.set_samplers_num(2) # 2 samplers per island ideasearcher.set_evaluators_num(1) # 1 evaluator per island ideasearcher.set_assess_interval(5) # Assess database every 5 rounds ideasearcher.set_generation_bonus(5.0) # New Ideas get a 5-point bonus # ... other set_ methods
-
Loading Model API Keys:
ideasearcher.load_models()
-
Adding Islands:
# Add an island. You can call add_island() multiple times to increase parallel search islands. # Idea files from the initial_ideas directory will be loaded into the first island. island_id_1 = ideasearcher.add_island() print(f"Island {island_id_1} added.")
-
Running the Search:
# Start the multi-threaded search process; each island will evolve for the specified number of rounds. print("Starting IdeaSearch...") ideasearcher.run(additional_interaction_num=50) # Each island runs for 50 epochs print("IdeaSearch finished.")
-
(Optional) Idea Repopulation Across Islands:
# Call during or after the search to facilitate the spread and merging of excellent Ideas. ideasearcher.repopulate_islands()
-
Retrieving Best Results:
best_score = ideasearcher.get_best_score() best_idea = ideasearcher.get_best_idea() print(f"\nCurrent Best Idea Score: {best_score}") print(f"Current Best Idea Content: {best_idea}")
-
Shutting Down Models:
ideasearcher.shutdown_models() print("Models shut down.")
Internationalization
IdeaSearch supports internationalization of its interface text.
You can set the system language using the set_language(value: str) method.
For example, ideasearcher.set_language('en') will switch the interface and log texts to English, while ideasearcher.set_language('zh_CN') will switch to Simplified Chinese.
The default language for IdeaSearch is Simplified Chinese (zh_CN).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ideasearch-0.1.1.tar.gz.
File metadata
- Download URL: ideasearch-0.1.1.tar.gz
- Upload date:
- Size: 139.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ea2c3fb57a4b4332ca6061812b3618d3a7791df5c0a2ba47afb43c5750ccf335
|
|
| MD5 |
41e264449d28c7d521ad3f549abc6035
|
|
| BLAKE2b-256 |
fa619e6ae9b6ab4c08ba9893029fcd62b98f2efc7722e5616bbba208a001aea6
|
File details
Details for the file ideasearch-0.1.1-py3-none-any.whl.
File metadata
- Download URL: ideasearch-0.1.1-py3-none-any.whl
- Upload date:
- Size: 60.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9a7dd1e2d23cf202d3bce9aab74527e3bb8a70939078e72014b7d6cba54e1ebe
|
|
| MD5 |
d41aa61b1ee0aa90547d6f73b13685e0
|
|
| BLAKE2b-256 |
7bd58224e09253bfeb50899f0e0aa2735b9b22272f02927d19218f1b683c3b81
|