OpenAlex-based deep research agent with an OpenAI-compatible LLM interface.
Project description
Open Deep Research
Open Deep Research is a small, practical repo for building a scholarly "deep research" workflow on top of OpenAlex and an OpenAI-compatible LLM.
It does four things:
- plans search queries from a research question
- searches and expands papers through OpenAlex references and citations
- fetches open-access text when available
- writes a Markdown literature review with explicit paper citations
The project is intentionally simple enough to teach in an Information Retrieval course and strong enough to serve as a working baseline for assignments.
Why this stack
- OpenAlex is the discovery graph and metadata backbone.
- OpenAI-compatible chat models handle planning, reranking, and synthesis.
- Local scoring and trace logging keep the retrieval decisions inspectable.
Repository layout
open_deep_research/
src/open_deep_research/
api.py
cli.py
config.py
fetchers.py
llm.py
models.py
openalex.py
planner.py
reporting.py
research.py
tests/
.env.example
pyproject.toml
Quickstart
- Create a virtual environment.
- Install the package.
- Set your API keys.
- Run a research job.
cd /Users/birger/Documents/uppsala_lektorat/Information_Retrieval_Course/open_deep_research
python3 -m venv .venv
source .venv/bin/activate
pip install -e .
cp .env.example .env
open-deep-research research "How do retrieval-augmented generation systems reduce hallucinations?" --output-dir outputs/rag
If you also want PDF extraction support:
pip install -e '.[pdf]'
Install directly from GitHub without cloning:
pip install "open-deep-research-cli @ git+https://github.com/BirgerMoell/open-deep-research.git"
Install from PyPI:
pip install open-deep-research-cli
Environment variables
OPENALEX_MAILTO: recommended for OpenAlex polite-pool accessOPENALEX_API_KEY: optional OpenAlex premium keyOPENAI_BASE_URL: defaults tohttps://api.openai.com/v1OPENAI_API_KEY: required for hosted OpenAI, often omitted for local OpenAI-compatible serversOPENAI_MODEL: defaults togpt-4o-mini
Commands
Research and write a report:
open-deep-research research "What are the main evaluation methods for neural information retrieval?" --final-papers 8
Read the question from stdin and print only the report body, which is the most convenient mode for agent skills:
printf '%s' "How are citation graphs used in scientific literature retrieval?" | \
open-deep-research research --stdin --format report
Disable the LLM and run the retrieval-only pipeline:
open-deep-research research "What are the main evaluation methods for neural information retrieval?" --no-llm
Inspect the query plan only:
open-deep-research plan "How do agentic retrieval systems differ from standard RAG?"
Print only the planned queries:
open-deep-research plan "How do agentic retrieval systems differ from standard RAG?" --format queries
Run the local JSON API:
open-deep-research serve --host 127.0.0.1 --port 8080
Example request:
curl -X POST http://127.0.0.1:8080/research \
-H 'Content-Type: application/json' \
-d '{"question": "What are the main design patterns in deep research systems?", "final_papers": 6}'
Outputs
Each run writes:
report.md: literature review in Markdownpapers.json: normalized paper metadata and scorestrace.json: planned queries, expansion edges, and selection decisions
research also supports skill-friendly stdout modes:
--format json: full structured result--format paths: just the output file locations--format report: printreport.md--format papers: printpapers.json--format trace: printtrace.json
Deep research workflow
question
-> query plan
-> OpenAlex search
-> reference/citation expansion
-> heuristic scoring
-> optional LLM reranking
-> OA text fetch
-> report synthesis
Notes
- This repo is designed for open scholarly discovery, not closed publisher access.
- OpenAlex does not contain all full texts. The pipeline therefore falls back to abstracts when open text cannot be fetched.
- For large-scale ingestion, OpenAlex also provides snapshots and an official CLI: OpenAlex CLI.
Codex skill use
This repo now includes a minimal skill template at codex_skill/open-deep-research/SKILL.md.
That template assumes the CLI is installed and then uses stdin plus explicit output modes, which is the cleanest way for an agent to call the tool:
printf '%s' "$QUESTION" | open-deep-research research --stdin --format report
Official references
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file open_deep_research_cli-0.1.1.tar.gz.
File metadata
- Download URL: open_deep_research_cli-0.1.1.tar.gz
- Upload date:
- Size: 18.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
21154993e0cd2102eef6d55b1188ada14b71c693b4e4cbbf51adbe3e5e37c690
|
|
| MD5 |
cfa3c8ce2fff8b06db9ca1c3647318a3
|
|
| BLAKE2b-256 |
67a084e8bfad10b07c56aa2e8a0cddf18fb93e25e0a93b13e7631397d2b9ae22
|
File details
Details for the file open_deep_research_cli-0.1.1-py3-none-any.whl.
File metadata
- Download URL: open_deep_research_cli-0.1.1-py3-none-any.whl
- Upload date:
- Size: 19.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bdb2fa6bf425d3c41c42ce9f4e7225b8208356844b19c61ce9c08fe9d46d99c6
|
|
| MD5 |
301333fec6cb06f00470318cf14478ba
|
|
| BLAKE2b-256 |
b081f98a4a3dc2f8f1295cb2aafe1299642bb371de681ecca408cabca353ecc4
|