LangChain integration for Built-Simple research APIs (PubMed & ArXiv)

These details have not been verified by PyPI

Project links

Project description

langchain-builtsimple

LangChain integration for Built-Simple research APIs, providing easy access to PubMed and ArXiv scientific literature.

Features

PubMed Retriever & Tool - Search 4.5M+ peer-reviewed biomedical articles
ArXiv Retriever & Tool - Search 2.7M+ preprints in physics, math, CS, and ML
Combined Retriever - Search both sources simultaneously
RAG-ready - Documents include full metadata for citations
Agent-compatible - Tools work with LangChain agents out of the box

What Data is Included

PubMed Documents

page_content: Title + abstract (default) OR full article text (with include_full_text=True)
metadata:
- pmid - PubMed ID
- title - Article title
- journal - Journal name
- pub_year - Publication year
- doi - DOI identifier
- url - Link to PubMed
- has_full_text - Boolean indicating if full text was fetched
- full_text_length - Character count when available

🔥 FULL TEXT AVAILABLE! Unlike most research APIs, Built-Simple provides complete article text:

# Get full articles (15K-70K chars each!)
retriever = BuiltSimplePubMedRetriever(limit=5, include_full_text=True)
docs = retriever.invoke("cancer immunotherapy")

for doc in docs:
    print(f"Full text: {len(doc.page_content)} chars")  # ~15,000-70,000!

ArXiv Documents

page_content: Title + authors + abstract
metadata:
- arxiv_id - ArXiv ID (e.g., "2301.12345")
- title - Paper title
- authors - Author list
- year - Publication year
- url - ArXiv page link
- pdf_url - Direct PDF link

⚠️ Abstracts only - Full PDFs are not downloaded. Use pdf_url to fetch if needed.

Installation

pip install langchain-builtsimple

For development with examples:

pip install langchain-builtsimple[dev]

Quick Start

Basic Retrieval

from langchain_builtsimple import BuiltSimplePubMedRetriever, BuiltSimpleArxivRetriever

# Search PubMed
pubmed = BuiltSimplePubMedRetriever(limit=5)
docs = pubmed.invoke("CRISPR gene therapy")

for doc in docs:
    print(f"Title: {doc.metadata['title']}")
    print(f"Journal: {doc.metadata['journal']}")
    print(f"URL: {doc.metadata['url']}\n")

# Search ArXiv
arxiv = BuiltSimpleArxivRetriever(limit=5)
docs = arxiv.invoke("transformer neural networks")

for doc in docs:
    print(f"Title: {doc.metadata['title']}")
    print(f"Authors: {doc.metadata['authors']}")
    print(f"ArXiv ID: {doc.metadata['arxiv_id']}\n")

RAG Chain with ChatOpenAI

from langchain_builtsimple import BuiltSimplePubMedRetriever
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

# Create retriever
retriever = BuiltSimplePubMedRetriever(limit=5)

# Format documents for context
def format_docs(docs):
    return "\n\n".join(
        f"[{i+1}] {doc.metadata['title']} ({doc.metadata.get('pub_year', 'N/A')})\n{doc.page_content}"
        for i, doc in enumerate(docs)
    )

# Create RAG prompt
prompt = ChatPromptTemplate.from_template("""
Answer the question based on the following research papers. 
Cite papers by number [1], [2], etc.

Papers:
{context}

Question: {question}

Answer:""")

# Build chain
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

# Ask a question
answer = chain.invoke("What are the latest developments in CAR-T cell therapy?")
print(answer)

Agent with Research Tools

from langchain_builtsimple import BuiltSimplePubMedTool, BuiltSimpleArxivTool
from langchain_openai import ChatOpenAI
from langchain.agents import AgentExecutor, create_tool_calling_agent
from langchain_core.prompts import ChatPromptTemplate

# Create tools
tools = [
    BuiltSimplePubMedTool(),  # For biomedical research
    BuiltSimpleArxivTool(),   # For CS/ML/physics papers
]

# Create agent
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

prompt = ChatPromptTemplate.from_messages([
    ("system", """You are a research assistant with access to scientific databases.
    Use pubmed_search for medical/biological topics.
    Use arxiv_search for AI/ML/physics/math topics.
    Always cite your sources."""),
    ("human", "{input}"),
    ("placeholder", "{agent_scratchpad}"),
])

agent = create_tool_calling_agent(llm, tools, prompt)
executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

# Run agent
response = executor.invoke({
    "input": "Find recent papers on using transformers for drug discovery"
})
print(response["output"])

API Reference

Retrievers

All retrievers inherit from langchain_core.retrievers.BaseRetriever and return List[Document].

BuiltSimplePubMedRetriever

BuiltSimplePubMedRetriever(
    base_url: str = "https://pubmed.built-simple.ai",
    limit: int = 10,
    timeout: float = 30.0
)

Document Metadata:

source: "pubmed"
pmid: PubMed ID
title: Paper title
journal: Journal name
pub_year: Publication year
doi: DOI (if available)
url: Link to PubMed page

BuiltSimpleArxivRetriever

BuiltSimpleArxivRetriever(
    base_url: str = "https://arxiv.built-simple.ai",
    limit: int = 10,
    timeout: float = 30.0
)

Document Metadata:

source: "arxiv"
arxiv_id: ArXiv identifier
title: Paper title
authors: List of author names
year: Publication year
url: Link to ArXiv page

BuiltSimpleResearchRetriever

Searches both PubMed and ArXiv, interleaving results.

BuiltSimpleResearchRetriever(
    pubmed_url: str = "https://pubmed.built-simple.ai",
    arxiv_url: str = "https://arxiv.built-simple.ai",
    limit_per_source: int = 5,
    timeout: float = 30.0
)

Tools

All tools inherit from langchain_core.tools.BaseTool and can be used with LangChain agents.

BuiltSimplePubMedTool

Name: pubmed_search
Description: Search PubMed for peer-reviewed biomedical literature
Input: query (str), limit (int, default=5)

BuiltSimpleArxivTool

Name: arxiv_search
Description: Search ArXiv for preprints in physics, math, CS, ML
Input: query (str), limit (int, default=5)

BuiltSimpleResearchTool

Name: research_search
Description: Search both PubMed and ArXiv simultaneously
Input: query (str), limit (int, default=5)

Examples

See the examples/ directory for complete working examples:

basic_retrieval.py - Simple retriever usage
rag_chain.py - RAG chain with ChatOpenAI
agent_with_tools.py - Agent with research tools

License

MIT License - see LICENSE for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Jan 31, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langchain_builtsimple-0.1.0.tar.gz (10.2 kB view details)

Uploaded Jan 31, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

langchain_builtsimple-0.1.0-py3-none-any.whl (10.1 kB view details)

Uploaded Jan 31, 2026 Python 3

File details

Details for the file langchain_builtsimple-0.1.0.tar.gz.

File metadata

Download URL: langchain_builtsimple-0.1.0.tar.gz
Upload date: Jan 31, 2026
Size: 10.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.2

File hashes

Hashes for langchain_builtsimple-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`c8b7145a9ba86fe73a8b7f90fd8b9fe5d7fa590254fc61a399afd1e6a065517e`
MD5	`da28ebccdae11c0ab7ff0e73a0ea6a81`
BLAKE2b-256	`4c7d910908e5833082a23d418a0b6010f600c43bd5ba79486f0946d1faf1d0fc`

See more details on using hashes here.

File details

Details for the file langchain_builtsimple-0.1.0-py3-none-any.whl.

File metadata

Download URL: langchain_builtsimple-0.1.0-py3-none-any.whl
Upload date: Jan 31, 2026
Size: 10.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.2

File hashes

Hashes for langchain_builtsimple-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5580901a584b8912c4ffee509e0b5496f24dc7f8a28024f128a33ef46225c951`
MD5	`6edd01d8e4733f890c9c4527055f3a1c`
BLAKE2b-256	`6ca339a8addc02a6b9879dc31312ef7c59bf3b90c7235e7f706dd4b3ed271538`

See more details on using hashes here.

langchain-builtsimple 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

langchain-builtsimple

Features

What Data is Included

PubMed Documents

ArXiv Documents

Installation

Quick Start

Basic Retrieval

RAG Chain with ChatOpenAI

Agent with Research Tools

API Reference

Retrievers

BuiltSimplePubMedRetriever

BuiltSimpleArxivRetriever

BuiltSimpleResearchRetriever

Tools

BuiltSimplePubMedTool

BuiltSimpleArxivTool

BuiltSimpleResearchTool

Examples

License

Links

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes