Haystack integration for Built-Simple research APIs (PubMed, ArXiv)

These details have not been verified by PyPI

Project links

Project description

haystack-builtsimple

Haystack integration for Built-Simple research APIs. Search PubMed and ArXiv scientific literature directly from your Haystack pipelines.

Features

🔬 PubMed Retriever - Hybrid search over 35M+ biomedical articles
📄 ArXiv Retriever - Search preprints in physics, math, CS, and more
🔗 Combined Retriever - Search both sources simultaneously
📖 Full Text Support - Optionally fetch full article text (PubMed)
⚡ Pipeline Ready - Drop-in components for Haystack 2.x pipelines

Installation

pip install haystack-builtsimple

Or with development dependencies:

pip install haystack-builtsimple[dev]

Quick Start

Basic Usage

from haystack_builtsimple import BuiltSimplePubMedRetriever, BuiltSimpleArxivRetriever

# Search PubMed
pubmed = BuiltSimplePubMedRetriever(top_k=5)
results = pubmed.run(query="CRISPR gene therapy clinical trials")
for doc in results["documents"]:
    print(f"[PMID {doc.meta['pmid']}] {doc.meta['title']}")

# Search ArXiv
arxiv = BuiltSimpleArxivRetriever(top_k=5)
results = arxiv.run(query="large language models reasoning")
for doc in results["documents"]:
    print(f"[{doc.meta['arxiv_id']}] {doc.meta['title']}")

In a Haystack Pipeline

from haystack import Pipeline
from haystack.components.builders import PromptBuilder
from haystack.components.generators import OpenAIGenerator
from haystack_builtsimple import BuiltSimplePubMedRetriever

# Create a RAG pipeline
pipeline = Pipeline()
pipeline.add_component("retriever", BuiltSimplePubMedRetriever(top_k=5))
pipeline.add_component("prompt", PromptBuilder(template="""
Based on these research papers:
{% for doc in documents %}
- {{ doc.meta.title }} (PMID: {{ doc.meta.pmid }})
  {{ doc.content[:500] }}
{% endfor %}

Answer: {{ query }}
"""))
pipeline.add_component("llm", OpenAIGenerator())

pipeline.connect("retriever.documents", "prompt.documents")
pipeline.connect("prompt", "llm")

# Run
result = pipeline.run({
    "retriever": {"query": "mRNA vaccine efficacy"},
    "prompt": {"query": "What factors affect mRNA vaccine efficacy?"}
})
print(result["llm"]["replies"][0])

Combined Search

Search both PubMed and ArXiv at once:

from haystack_builtsimple import BuiltSimpleCombinedRetriever

retriever = BuiltSimpleCombinedRetriever(
    top_k=10,
    merge_strategy="score",  # or "interleave", "pubmed_first", "arxiv_first"
)

results = retriever.run(query="machine learning drug discovery")
for doc in results["documents"]:
    source = doc.meta["source"]  # "pubmed" or "arxiv"
    print(f"[{source}] {doc.meta['title']}")

Components

BuiltSimplePubMedRetriever

Retrieves documents from PubMed using hybrid search (semantic + keyword).

Parameters:

Parameter	Type	Default	Description
`api_base`	str	`https://pubmed.built-simple.ai`	API base URL
`top_k`	int	10	Number of documents to retrieve
`fetch_full_text`	bool	False	Fetch full article text
`timeout`	float	30.0	Request timeout in seconds

Outputs:

documents: List of Haystack Document objects

Document Metadata:

pmid - PubMed ID
title - Article title
authors - Comma-separated author names
journal - Journal name
year - Publication year
doi - DOI if available
source - Always "pubmed"

BuiltSimpleArxivRetriever

Retrieves documents from ArXiv.

Parameters:

Parameter	Type	Default	Description
`api_base`	str	`https://arxiv.built-simple.ai`	API base URL
`top_k`	int	10	Number of documents to retrieve
`timeout`	float	30.0	Request timeout in seconds

Outputs:

documents: List of Haystack Document objects

Document Metadata:

arxiv_id - ArXiv paper ID
title - Paper title
authors - Comma-separated author names
categories - ArXiv categories
published - Publication date
url - Link to ArXiv abstract page
source - Always "arxiv"

BuiltSimpleCombinedRetriever

Searches both PubMed and ArXiv, merging results.

Parameters:

Parameter	Type	Default	Description
`top_k`	int	10	Total documents to return
`pubmed_weight`	float	1.0	Score weight for PubMed results
`arxiv_weight`	float	1.0	Score weight for ArXiv results
`merge_strategy`	str	"score"	How to merge: "score", "interleave", "pubmed_first", "arxiv_first"
`fetch_full_text`	bool	False	Fetch full text for PubMed
`timeout`	float	30.0	Request timeout

Advanced Usage

Full Text Retrieval

For PubMed articles, you can fetch full text when available:

pubmed = BuiltSimplePubMedRetriever(
    top_k=3,
    fetch_full_text=True  # Slower, but includes full text
)

Custom Merge Strategies

When using the combined retriever:

# Prioritize PubMed results
retriever = BuiltSimpleCombinedRetriever(
    merge_strategy="pubmed_first"
)

# Weight ArXiv higher
retriever = BuiltSimpleCombinedRetriever(
    merge_strategy="score",
    pubmed_weight=0.8,
    arxiv_weight=1.2
)

Using with DocumentJoiner

For more control, use separate retrievers with Haystack's DocumentJoiner:

from haystack import Pipeline
from haystack.components.joiners import DocumentJoiner
from haystack_builtsimple import BuiltSimplePubMedRetriever, BuiltSimpleArxivRetriever

pipeline = Pipeline()
pipeline.add_component("pubmed", BuiltSimplePubMedRetriever(top_k=5))
pipeline.add_component("arxiv", BuiltSimpleArxivRetriever(top_k=5))
pipeline.add_component("joiner", DocumentJoiner())

pipeline.connect("pubmed.documents", "joiner.documents")
pipeline.connect("arxiv.documents", "joiner.documents")

result = pipeline.run({
    "pubmed": {"query": "protein folding"},
    "arxiv": {"query": "protein folding"},
})

Examples

See the examples/ directory for complete working examples:

basic_retrieval.py - Simple standalone usage
rag_pipeline.py - Full RAG pipeline with LLM
combined_search.py - Multi-source search patterns

API Reference

Built-Simple APIs

This package uses Built-Simple's hosted research APIs:

PubMed API: https://pubmed.built-simple.ai
- POST /hybrid-search - Hybrid semantic + keyword search
- GET /article/{pmid}/full_text - Fetch full text
ArXiv API: https://arxiv.built-simple.ai
- GET /api/search?q=...&limit=N - Search papers

No API key required. Rate limits apply for heavy usage.

Contributing

Contributions welcome! Please:

Fork the repository
Create a feature branch
Add tests for new functionality
Submit a pull request

License

MIT License - see LICENSE for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Feb 7, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

haystack_builtsimple-0.1.0.tar.gz (10.4 kB view details)

Uploaded Feb 7, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

haystack_builtsimple-0.1.0-py3-none-any.whl (11.1 kB view details)

Uploaded Feb 7, 2026 Python 3

File details

Details for the file haystack_builtsimple-0.1.0.tar.gz.

File metadata

Download URL: haystack_builtsimple-0.1.0.tar.gz
Upload date: Feb 7, 2026
Size: 10.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.2

File hashes

Hashes for haystack_builtsimple-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`47c0b7d8e8b3ba96269292d90199b9a464c91b742c2e419eb38b7acc125390b6`
MD5	`65bac96e66c23250d33baa21f28533de`
BLAKE2b-256	`349bb9998b973ed56f788483326e2488d69cf208902c84721bd6aaddf5cb18e8`

See more details on using hashes here.

File details

Details for the file haystack_builtsimple-0.1.0-py3-none-any.whl.

File metadata

Download URL: haystack_builtsimple-0.1.0-py3-none-any.whl
Upload date: Feb 7, 2026
Size: 11.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.2

File hashes

Hashes for haystack_builtsimple-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`36fd29d7ce7377b47661e0adfa62eb5ee1f672902dac87b7378a96ef1789e269`
MD5	`9ed0e46016398d2612d18b200a5ec544`
BLAKE2b-256	`24bb2b276681813105cd233e332b5a4b3a90771fa8799bfc3a3c3bd5138fad44`

See more details on using hashes here.

haystack-builtsimple 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

haystack-builtsimple

Features

Installation

Quick Start

Basic Usage

In a Haystack Pipeline

Combined Search

Components

BuiltSimplePubMedRetriever

BuiltSimpleArxivRetriever

BuiltSimpleCombinedRetriever

Advanced Usage

Full Text Retrieval

Custom Merge Strategies

Using with DocumentJoiner

Examples

API Reference

Built-Simple APIs

Contributing

License

Links

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes