Skip to main content

INSPIRE HEP tools and API wrapper for LangChain.

Project description

INSPIRE HEP Tools for LangChain

Integration with INSPIRE HEP, the trusted community hub for high energy physics research literature.

Overview

This contribution adds three tools for searching and retrieving physics papers from the INSPIRE HEP database:

  • Search Literature: Find physics papers by topic with flexible sorting options
  • Get Author Papers: Retrieve an author's publications (requires INSPIRE identifiers)
  • Get Paper Details: Fetch complete information about a specific paper by record ID

Features

✅ Search by topic, author, or citation count
✅ Sort results by most recent or most cited
✅ Configurable result limits
✅ Comprehensive error handling
✅ Full test coverage (15 unit tests + 5 integration tests)

Installation

pip install langchain-community

Quick Start

Basic Usage

from langchain_community.tools.inspire_hep import INSPIRESearchLiteratureTool

tool = INSPIRESearchLiteratureTool()

# Search for papers
result = tool.invoke({"query": "quantum field theory"})
print(result)

# Search with sorting
result = tool.invoke({
    "query": "string theory",
    "sort": "mostcited"  # or "mostrecent"
})

All Three Tools

from langchain_community.tools.inspire_hep import (
    INSPIRESearchLiteratureTool,
    INSPIREGetAuthorPapersTool,
    INSPIREGetPaperDetailsTool,
)

# Search for papers on a topic
search_tool = INSPIRESearchLiteratureTool()
result = search_tool.invoke({
    "query": "quantum gravity",
    "sort": "mostrecent"
})

# Get an author's papers (requires INSPIRE identifier)
author_tool = INSPIREGetAuthorPapersTool()
result = author_tool.invoke({
    "author_name": "Witten.Edward.1",
    "sort": "mostcited"
})

# Get details of a specific paper
details_tool = INSPIREGetPaperDetailsTool()
result = details_tool.invoke({
    "record_id": "451647"  # Maldacena's AdS/CFT paper
})

Using with AI Agents

from langchain_openai import ChatOpenAI
from langchain.agents import AgentExecutor, create_tool_calling_agent
from langchain_core.prompts import ChatPromptTemplate
from langchain_community.tools.inspire_hep import (
    INSPIRESearchLiteratureTool,
    INSPIREGetAuthorPapersTool,
)

# Create tools
tools = [
    INSPIRESearchLiteratureTool(),
    INSPIREGetAuthorPapersTool(),
]

# Create LLM (use models with good tool calling support)
llm = ChatOpenAI(model="gpt-4", temperature=0)

# Create prompt
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a physics research assistant with access to INSPIRE HEP."),
    ("human", "{input}"),
    ("placeholder", "{agent_scratchpad}"),
])

# Create agent
agent = create_tool_calling_agent(llm, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

# Use the agent
result = agent_executor.invoke({
    "input": "What are the most cited papers on string theory?"
})
print(result['output'])

Direct API Access (Without Agents)

For direct API access without LangChain agents:

from langchain_community.utilities.inspire_hep import INSPIREHEPAPIWrapper

# Create wrapper with custom settings
wrapper = INSPIREHEPAPIWrapper(top_k_results=5)

# Search papers
papers = wrapper.search_literature("quantum gravity", sort="mostcited")
print(papers)

# Get author papers
author_papers = wrapper.get_author_papers("Witten.Edward.1", sort="mostrecent")
print(author_papers)

# Get paper details
details = wrapper.get_paper_details("451647")
print(details)

Sorting Options

Both search and author tools support flexible sorting:

  • mostrecent (default for search): Newest papers first
  • mostcited (default for author): Most cited papers first
# Find recent breakthroughs
tool.invoke({"query": "quantum computing", "sort": "mostrecent"})

# Find influential papers
tool.invoke({"query": "supersymmetry", "sort": "mostcited"})

Finding Author Identifiers

The author papers tool requires INSPIRE identifiers (format: Lastname.Firstname.N), not plain names:

  1. Go to https://inspirehep.net/authors
  2. Search for the author by name
  3. Click on their profile
  4. Use the identifier shown (e.g., Witten.Edward.1)

Why? Plain names are ambiguous (many physicists share the same name), while INSPIRE identifiers are unique.

Advanced Search Syntax

INSPIRE HEP supports advanced search queries:

# Highly cited papers (1000+ citations)
tool.invoke({"query": "topcite 1000+"})

# Papers by specific author
tool.invoke({"query": "author:Witten"})

# Papers in date range
tool.invoke({"query": "date 2020->2024"})

# Combine criteria
tool.invoke({"query": "quantum gravity topcite 500+"})

See the INSPIRE HEP search guide for more syntax.

API Rate Limiting

INSPIRE HEP enforces rate limits of 15 requests per 5 seconds per IP address. The wrapper handles basic rate limiting, but avoid making rapid successive requests.

Testing

This contribution includes comprehensive test coverage:

  • 15 unit tests: Test wrapper and tools with mocked API responses
  • 5 integration tests: Test with real API calls

Run tests:

# Unit tests (fast, no internet required)
pytest tests/unit_tests/test_inspire_hep.py -v

# Integration tests (requires internet)
pytest tests/integration_tests/test_inspire_hep_integrations.py -v

# All tests
pytest tests/ -v

Known Limitations

  1. Author identifiers required: The author papers tool works reliably only with INSPIRE identifiers, not plain names. Users must look up identifiers at https://inspirehep.net/authors.

  2. LLM compatibility: Agent performance depends on the LLM's tool-calling capabilities. Works best with OpenAI GPT-4, Anthropic Claude, and other models with strong structured output support.

Example Use Cases

Research Assistant

"What are the most influential papers on the AdS/CFT correspondence?"
 Uses search_literature with sort="mostcited"

Literature Review

"Find recent papers on quantum entanglement from the last year"
 Uses search_literature with sort="mostrecent"

Author Research

"What are Edward Witten's most cited contributions?"
 Uses get_author_papers with author identifier

Paper Deep Dive

"Tell me about INSPIRE record 451647"
 Uses get_paper_details for full information

Citation

If you use INSPIRE HEP in your research, please cite:

@article{Moskovic:2021zjs,
    author = "Moskovic, Micha",
    title = "{The INSPIRE REST API}",
    url = "https://github.com/inspirehep/rest-api-doc",
    doi = "10.5281/zenodo.5788550",
    month = "12",
    year = "2021"
}

Contributing

This is an open-source contribution to LangChain. Future enhancements could include:

  • Job search functionality
  • Conference search
  • Advanced filtering options
  • Citation graph traversal
  • Batch operations

Resources

License

This contribution follows LangChain's MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langchain_community_inspire_hep-0.1.0.tar.gz (10.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

langchain_community_inspire_hep-0.1.0-py3-none-any.whl (10.3 kB view details)

Uploaded Python 3

File details

Details for the file langchain_community_inspire_hep-0.1.0.tar.gz.

File metadata

File hashes

Hashes for langchain_community_inspire_hep-0.1.0.tar.gz
Algorithm Hash digest
SHA256 8bc4c9d980694686787a03e04cfce67d0f8a5557fb583cbc79c989f7cbcaa676
MD5 51b62326da2781f57e95482a0708b9e8
BLAKE2b-256 97f92e60d1443e9bda70a84984424b4f312d027eb681ffde252ca952c81ce6ac

See more details on using hashes here.

File details

Details for the file langchain_community_inspire_hep-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for langchain_community_inspire_hep-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 62df564ced77a9719d0e58dac0af377e1b395d0ae877d5c219ee806a7693addf
MD5 01bba6f44a75be586eba125d416cb4c6
BLAKE2b-256 69d0fdc69e6f9616753e0924aaeb7cafb0eed8f269b99707aef7580657a9a8ac

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page