Skip to main content

OpenAlex-based deep research agent with an OpenAI-compatible LLM interface.

Project description

Open Deep Research

Open Deep Research is a small, practical repo for building a scholarly "deep research" workflow on top of OpenAlex and an OpenAI-compatible LLM.

It does four things:

  • plans search queries from a research question
  • searches and expands papers through OpenAlex references and citations
  • fetches open-access text when available
  • writes a Markdown literature review with explicit paper citations

The project is intentionally simple enough to teach in an Information Retrieval course and strong enough to serve as a working baseline for assignments.

Why this stack

  • OpenAlex is the discovery graph and metadata backbone.
  • OpenAI-compatible chat models handle planning, reranking, and synthesis.
  • Local scoring and trace logging keep the retrieval decisions inspectable.

Repository layout

open_deep_research/
  src/open_deep_research/
    api.py
    cli.py
    config.py
    fetchers.py
    llm.py
    models.py
    openalex.py
    planner.py
    reporting.py
    research.py
  tests/
  .env.example
  pyproject.toml

Quickstart

  1. Create a virtual environment.
  2. Install the package.
  3. Set your API keys.
  4. Run a research job.
cd /Users/birger/Documents/uppsala_lektorat/Information_Retrieval_Course/open_deep_research
python3 -m venv .venv
source .venv/bin/activate
pip install -e .
cp .env.example .env
open-deep-research research "How do retrieval-augmented generation systems reduce hallucinations?" --output-dir outputs/rag

If you also want PDF extraction support:

pip install -e '.[pdf]'

Install directly from GitHub without cloning:

pip install "open-deep-research-cli @ git+https://github.com/BirgerMoell/open-deep-research.git"

Install from PyPI:

pip install open-deep-research-cli

Environment variables

  • OPENALEX_MAILTO: recommended for OpenAlex polite-pool access
  • OPENALEX_API_KEY: optional OpenAlex premium key
  • OPENAI_BASE_URL: defaults to https://api.openai.com/v1
  • OPENAI_API_KEY: required for hosted OpenAI, often omitted for local OpenAI-compatible servers
  • OPENAI_MODEL: defaults to gpt-4o-mini

Commands

Research and write a report:

open-deep-research research "What are the main evaluation methods for neural information retrieval?" --final-papers 8

Read the question from stdin and print only the report body, which is the most convenient mode for agent skills:

printf '%s' "How are citation graphs used in scientific literature retrieval?" | \
  open-deep-research research --stdin --format report

Disable the LLM and run the retrieval-only pipeline:

open-deep-research research "What are the main evaluation methods for neural information retrieval?" --no-llm

Inspect the query plan only:

open-deep-research plan "How do agentic retrieval systems differ from standard RAG?"

Print only the planned queries:

open-deep-research plan "How do agentic retrieval systems differ from standard RAG?" --format queries

Run the local JSON API:

open-deep-research serve --host 127.0.0.1 --port 8080

Example request:

curl -X POST http://127.0.0.1:8080/research \
  -H 'Content-Type: application/json' \
  -d '{"question": "What are the main design patterns in deep research systems?", "final_papers": 6}'

Outputs

Each run writes:

  • report.md: literature review in Markdown
  • papers.json: normalized paper metadata and scores
  • trace.json: planned queries, expansion edges, and selection decisions

research also supports skill-friendly stdout modes:

  • --format json: full structured result
  • --format paths: just the output file locations
  • --format report: print report.md
  • --format papers: print papers.json
  • --format trace: print trace.json

Deep research workflow

question
  -> query plan
  -> OpenAlex search
  -> reference/citation expansion
  -> heuristic scoring
  -> optional LLM reranking
  -> OA text fetch
  -> report synthesis

Notes

  • This repo is designed for open scholarly discovery, not closed publisher access.
  • OpenAlex does not contain all full texts. The pipeline therefore falls back to abstracts when open text cannot be fetched.
  • For large-scale ingestion, OpenAlex also provides snapshots and an official CLI: OpenAlex CLI.

Codex skill use

This repo now includes a minimal skill template at codex_skill/open-deep-research/SKILL.md.

That template assumes the CLI is installed and then uses stdin plus explicit output modes, which is the cleanest way for an agent to call the tool:

printf '%s' "$QUESTION" | open-deep-research research --stdin --format report

Official references

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

open_deep_research_cli-0.1.1.tar.gz (18.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

open_deep_research_cli-0.1.1-py3-none-any.whl (19.6 kB view details)

Uploaded Python 3

File details

Details for the file open_deep_research_cli-0.1.1.tar.gz.

File metadata

  • Download URL: open_deep_research_cli-0.1.1.tar.gz
  • Upload date:
  • Size: 18.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for open_deep_research_cli-0.1.1.tar.gz
Algorithm Hash digest
SHA256 21154993e0cd2102eef6d55b1188ada14b71c693b4e4cbbf51adbe3e5e37c690
MD5 cfa3c8ce2fff8b06db9ca1c3647318a3
BLAKE2b-256 67a084e8bfad10b07c56aa2e8a0cddf18fb93e25e0a93b13e7631397d2b9ae22

See more details on using hashes here.

File details

Details for the file open_deep_research_cli-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for open_deep_research_cli-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 bdb2fa6bf425d3c41c42ce9f4e7225b8208356844b19c61ce9c08fe9d46d99c6
MD5 301333fec6cb06f00470318cf14478ba
BLAKE2b-256 b081f98a4a3dc2f8f1295cb2aafe1299642bb371de681ecca408cabca353ecc4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page