CLI research agent for restaurant web research

These details have not been verified by PyPI

Project description

RestaurantRAG README

What it is

This is an agenetic RAG (Retrieval-Augmented Generation) pipeline for researching about restaurants. Essentially:

You give it a csv containing (restaurant name, general location of restaurant) e.g., (East is East, Vancouver Broadway)
You give it a series of fields you want the AI to fill out after conducting research (e.g., Menu items, price, address, website, etc)
For each restaurant in CSV, the AI autonomously researches that restaurant by (1) making targeted web searches (2) reading the results (3) deciding if it needs more info and do it again (4) reaching a point where it is able to synthesise the findings into an answer
The app returns the LLM-filled contents of the requested fields back to you in a uniform JSON structure

Background

I need an agenetic RAG system that would gather specific info on a bunch of restaurants / places systematically. I tried to build my own, but it ended up working horrible, and fixing it is like stuffing cash into a pocket with a hole in it.

Luckily, there was a pre-existing RAG system available on Github that I could use -> https://github.com/serpapi/web-research-agent (THIS PROJECT IS AN ADAPTION OF THIS ORIGINAL REPO! Check license)

However, there were two things I need to resolve:

This repo used OpenAI (LLM provider) and SerpAPI's (web searching tool provider) APIs. I used Hack Club AI (which is apparently largely compatiable with OpenAI's API) and Serper.dev's API.
This repo was meant to be adapted in a conversation-like style. I needed to fetch data in bulk. Hence, I need a system where I can input restaurants and output a list of results featuring specific fields of each restaurant.

...and so that's exactly what I did.

File structure

Examples.md contain a section of the Task Prompt, this file stores and describes the fields which we want the AI to research and provide per restaurant

(fields.md) is NOT it! (idk why I still have that I think it was just inspirational ideas about what fields to add next?)

full_test_list.md contains a full roster of restaurants organized in the CSV format. Simply paste them under the CSV headers in restaurants.csv

Speaking of restaurants.csv. This is the file the program is expected to intake to know the restaurants. Be careful! The header naming and other restrictions is very strict for restaurants.csv! (See around lines 491 - 509)

issues.md is just commentary for future improvements and current speculations of potential failure. The program I find is sufficiently good such that it can collect data thoroughly, and whenever it breaks re-calling it for a certain restaurant would probably work, so given the limited time and the APs I'm not going to bother touch it (archive.txt is also commentary / archive)

output.json is where the outputted data is stored, after the model researches about the restaurants and comes up with answers.

requirements.txt (duh)

research_agent.py THIS IS THE MAIN SCRPT! See below explanation

Code structure

is big section " " (nothing) is small section
Imports
Functions for parsing restaurant.csv Mechanism and limitations for retrying Validate empty or corrupted outputs
core research agent class Establish Hackclub LLM endpoint, set API keys
Toolbox definition for the agent!
Web searching function *Function used for searching with Serper.dev API
- Parsing search results (regular web results, Google answer box, Google knowledge panel, geographically local results)
Main loop for the agent Logging (debug printing) helper function
- Main infinite* loop until answer / conclusion reached
- Check + format LLM tool calls *Use function to execute searches in parallel (# of concurrent searches customisable)
- Results get added back to conversation, LLM reads it, decides what to do, and the main loop loops -> OR
- When LLM is satisfied and decides to not call tools anymore, extract, format, and log the final answer.
Command line interface
- Parse arguments given in command line
- Helper function to build the task prompt for one specific restaurant
- Helper function for parsing rows from spreadsheet to extract restaurant name and rough location Batch mode (with sheet) vs query mode (singular search) Print and save result

How it works (basically)

Question comes -> LLM gets question (conversation history), and tools -> "do I need to use my tools to get more knowledge?"

One of two decisions:

if LLM called tools: Formulate the specific queries,

Perform search with queries,

Add results back to conversation,

LLM gets conversation history and tools again (LOOP)

if LLM doesn't call tools:

It is done. Extract the output and return the final answer to user

How to set it up (deployment)

Since this is a commandline-based program and still requires environment setup, the closest thing to a "deployment" would be to set up the repo on the other person's computer. Hence,

On Windows, open up Powershell

git clone https://github.com/fengyuan66/web-research-agent-main.git
cd web-research-agent-main or whatever this directory may be for you
py -3 -m venv .venv
.venv\Scripts\Activate.ps1
python -m pip install --upgrade pip
pip install -r requirements.txt
setx HACKCLUB_API_KEY "YOUR_HACKCLUB_KEY" Please put your actual Hack Club AI key in "YOUR_HACKCLUB_KEY"
setx SERPERDEV_API_KEY "YOUR_SERPERDEV_KEY" Please put your actual Serper.Dev key in "YOUR_SERPERDEV_KEY"
Close powershell
Open a new powershell in the repo's folder
.venv\Scripts\Activate.ps1
python research_agent.py --sheet restaurants.csv --spec-file Examples.md -o output.json, ensuring that restaurants.csv and Examples.md exists and its contents are adjusted to your liking

How this could be used

I intend on feeding it my list of Vancouver restaurants and using its results to craft a dataset that I can then use to provide context for Voyago's agent, which will learn the user's preference to restaurants in coordination with knowledge of the restaurant's specific traits and thus be able to recommend relevant restaurants to the user.

Comments guide

In general, THINGS TYPED IN UPPERCASE means to indicate some sort of important variable / customisable element

things typed in lowercase (and sometimes surrounding blocks of code) mainly just point out what's happenning. This is used for maintenance purposes and the divide the code into clear regions

AI declaration

AI is used mainly for four things in my project:

Debugging (adding debugging statements that helped me find where the error was several times during the making, pointing out high-level issues that caused instability in the initial iteration the RAG system)
Prompt development (ChatGPT wrote prompts for the LLM because its very good at prompt engineering)
Difficult parsing (e.g., sorting info extracted from search into buckets based on their origins for better digestion). Some of these I found to be very tedious relative to its contribution to the program's function, mainly because although there are tutorials online, adapting the parsing to my own use case is a case in itself.
I used ChatGPT to help me summarise implementation documentation for particular APIs, namely the Serper.dev (web browsing tool) API.

AI was not used to write this very README!!!

Overall I can confidently say that the amount of my code written by the AI is <30% and used in a purposeful and reasonable manner

DEMO:

Here is a brief video demo showcasing one runthorugh of the pipeline on a small list of restaurants

https://www.youtube.com/watch?v=_eLFlo78jj0

How other people can contribute to it?

This pipeline is designed in a way that is universal to many tasks. If you are wishing to use RAG to research cars, for example, you would only need to change the fields and the prompts. You can also experiement with different web-browsing API, LLM API, or prompts to see which one works best for you and share the insight with the community

One more thing...

Dear HCTG orgs, please note that I am 100% committed to attending HCTG, so if my projects are unable to be approved by May 8th for the 20 hour reduction, pls be soft on the deadline cuz I'll 100% be coming and make the 40 hour total by the later deadline.

BELOW IS THE OLD REPO'S README!

Research Agent

LLM-powered researcher that combines OpenAI chat models with Google results via SerpAPI. The agent asks the model to emit all needed searches at once, runs them concurrently, feeds snippets back, and returns a well‑cited answer. Includes a simple CLI.

Features

OpenAI Chat Completions with function calling
Batches 2–50 search_web tool calls in one turn
Concurrent Google searches via SerpAPI
Optional JSON trace (--outfile) with steps and final answer

Requirements

Python 3.9+ (3.10+ recommended)
OPENAI_API_KEY
SERPAPI_API_KEY

Install

Assuming you have Python 3.9+ and virtual env installed:

git clone https://github.com/vladm-serpapi/web-research-agent
cd web-research-agent
pip install -r requirements.txt

Setup API keys

Option A — export in your shell (recommended):

export OPENAI_API_KEY="sk-..."
export SERPAPI_API_KEY="..."

Option B — .env file (don’t commit this file):

# .env
export OPENAI_API_KEY="sk-..."
export SERPAPI_API_KEY="..."
# load it
source .env

Security: Never share or commit your keys.

Quick start

python research_agent.py -q "What are the latest approaches to retrieval‑augmented generation in 2025?"
# Save full JSON trace
python research_agent.py -q "State of LLM reasoning benchmarks in 2025" --outfile trace.json

CLI

python research_agent.py -h
# usage: research_agent.py [-h] -q QUERY [-m {o3,o4-mini,gpt-4o}] [-n TOPN] [-o OUTFILE] [-d]
#   -q, --query        Research question (required)
#   -m, --model        o3 (default) | o4-mini | gpt-4o
#   -n, --topn         Organic results per search (default: 10)
#   -o, --outfile      Write JSON trace to file
#   -d, --debug        Print debug logs

How it works (brief)

System prompt asks the model to emit all search_web calls first
Agent executes all requested Google searches concurrently (SerpAPI)
Results are passed back as tool messages; model produces a final, cited answer
Note on model tool behavior: o3 / o4-mini reasoning models prefer to output single tool call per prompt, so gpt-4o is preferred when many queries are required

Examples

python research_agent.py -q "Compare FAISS vs. Milvus vs. Qdrant for RAG (2025)" -m o3 -n 8 -o rag_db_trace.json

With debug mode

 python research_agent.py -q "airlines industry trend 2025, compare multiple trends by impact and research each deeper to provide a comphrehensive picture" --outfile trace.json --debug --model gpt-4o

Sample output

 python research_agent.py -q "research the nuclear energy sector in 2025 and build a comprehensive thesis / report on it. I want this 
report to cover AI, uranium, energy, etc. Financial projections, key players, companies, etc. Do the research in iterative fashion, after each round of searches and getting new info
rmation, do another round of searches to dive deeper into each specific topic. Don't stop on surface findings. Think and analyze what data are you missing, and proceed to research it deeper." --outfile trace.json --debug --model gpt-4o
[DEBUG] → OpenAI chat.completions.create request …
[DEBUG] → SerpAPI query: 'nuclear energy sector 2025 overview'
[DEBUG] → SerpAPI query: 'AI in nuclear energy 2025'
[DEBUG] → SerpAPI query: 'uranium market 2025'
[DEBUG] → SerpAPI query: 'key companies in nuclear energy 2025'
[DEBUG] → OpenAI chat.completions.create request …
[DEBUG] → SerpAPI query: 'financial projections nuclear energy 2025'
[DEBUG] → SerpAPI query: 'nuclear energy policies 2025'
[DEBUG] → SerpAPI query: 'AI-driven nuclear technologies 2025'
[DEBUG] → SerpAPI query: 'key innovations in nuclear technology 2025'
[DEBUG] → OpenAI chat.completions.create request …
…

JSON trace example (with --outfile)

{
  "question": "...",
  "answer": "...",
  "steps": [
    { "type": "tool_call", "query": "first search" },
    { "type": "tool_result", "content": "- Title: snippet ..." },
    { "type": "assistant_answer", "content": "final answer text" }
  ]
}

Programmatic use

from research_agent import ResearchAgent

agent = ResearchAgent(model="o3", topn=10, debug=False)
result = agent.run("Summarize the most cited papers on RAG.")
print(result["answer"])  # final answer
print(len(result["steps"]))

Troubleshooting

"OPENAI_API_KEY and SERPAPI_API_KEY must be set." → export both keys or source your .env
Model not available → switch to a supported one (o3, o4-mini, gpt-4o)
Empty/failed searches → check SerpAPI key/quota and network settings

License

MIT License — see LICENSE.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

1.0

May 2, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

harry_hctg_voyago_web_research_agent-1.0.tar.gz (14.2 kB view details)

Uploaded May 2, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

harry_hctg_voyago_web_research_agent-1.0-py3-none-any.whl (14.6 kB view details)

Uploaded May 2, 2026 Python 3

File details

Details for the file harry_hctg_voyago_web_research_agent-1.0.tar.gz.

File metadata

Download URL: harry_hctg_voyago_web_research_agent-1.0.tar.gz
Upload date: May 2, 2026
Size: 14.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for harry_hctg_voyago_web_research_agent-1.0.tar.gz
Algorithm	Hash digest
SHA256	`23f2c586711d7a84a95cddb7aeaa092df3bea1e707f43b00b2285cc2ea9f5723`
MD5	`0403ddc7c1693a196fc1011420351532`
BLAKE2b-256	`beb1bf34343d19a1e590ad9b00fbf7b6c011503e77b8c74d3e9fdc98c8888fd8`

See more details on using hashes here.

File details

Details for the file harry_hctg_voyago_web_research_agent-1.0-py3-none-any.whl.

File metadata

Download URL: harry_hctg_voyago_web_research_agent-1.0-py3-none-any.whl
Upload date: May 2, 2026
Size: 14.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for harry_hctg_voyago_web_research_agent-1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6f89673e00ca148f0a6706faf61ff2cfbef2b6f24600efef6c47f27544f875e2`
MD5	`c8ca36397d0e2d5f01b3f2699e3e3127`
BLAKE2b-256	`7973021597fabb77a4dca93c7b2ceff30989b7102f2bf697a0d0a38e142d45f5`

See more details on using hashes here.

harry-hctg-voyago-web-research-agent 1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

RestaurantRAG README

What it is

Background

File structure

Code structure

How it works (basically)

How to set it up (deployment)

How this could be used

Comments guide

AI declaration

DEMO:

How other people can contribute to it?

One more thing...

BELOW IS THE OLD REPO'S README!

BELOW IS THE OLD REPO'S README!

Research Agent

Features

Requirements

Install

Setup API keys

Quick start

CLI

How it works (brief)

Examples

With debug mode

Sample output

JSON trace example (with --outfile)

Programmatic use

Troubleshooting

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes