LlamaIndex readers for Built-Simple research APIs (PubMed, ArXiv, Wikipedia)

These details have not been verified by PyPI

Project links

Project description

llama-index-readers-builtsimple

LlamaIndex readers for Built-Simple research APIs, providing semantic search over scientific literature.

Features

PubMed Reader - 4.5M+ biomedical articles with hybrid semantic/keyword search
ArXiv Reader - 2.7M+ preprints in physics, math, CS, and ML
Wikipedia Reader - Semantic search over Wikipedia articles
No API key required - Free tier available for all endpoints
Rich metadata - Full citation info for all documents

What Data is Included

PubMed Reader

Each document contains:

Text: Title + abstract (default) OR full article text (with include_full_text=True)
Metadata:
- pmid - PubMed ID (e.g., "31041627")
- title - Full article title
- journal - Publication journal name
- year - Publication year
- doi - DOI identifier
- doi_url - Direct DOI link
- url - Link to PubMed page
- has_full_text - Boolean indicating if full text was fetched
- full_text_length - Character count of full text (when available)

🔥 FULL TEXT AVAILABLE! Unlike most research APIs that only provide abstracts, Built-Simple has full article text for millions of papers:

# Get full article text (15K-70K chars per article)
reader = BuiltSimplePubMedReader(include_full_text=True)
docs = reader.load_data("cancer immunotherapy", limit=5)

for doc in docs:
    print(f"Full text length: {len(doc.text)} chars")  # ~15,000-70,000 chars!

ArXiv Reader

Each document contains:

Text: Title + authors + full abstract
Metadata:
- arxiv_id - ArXiv identifier (e.g., "2301.12345" or "cs/0308031")
- title - Paper title
- authors - Author names
- year - Publication year
- url - Link to ArXiv abstract page
- pdf_url - Direct PDF download link
- similarity_score - Semantic relevance score (0-1)

Note: Full paper PDFs are NOT downloaded—only abstracts. Use pdf_url to fetch the full PDF if needed.

Wikipedia Reader

Each document contains:

Text: Article title + summary/intro section
Metadata:
- title - Article title
- url - Link to Wikipedia page

Note: Only article summaries, not full articles.

Installation

pip install llama-index-readers-builtsimple

Quick Start

Basic Usage

from llama_index.readers.builtsimple import (
    BuiltSimplePubMedReader,
    BuiltSimpleArxivReader,
)

# Search PubMed for medical literature
pubmed_reader = BuiltSimplePubMedReader()
pubmed_docs = pubmed_reader.load_data("CRISPR gene therapy", limit=10)

for doc in pubmed_docs:
    print(f"Title: {doc.metadata['title']}")
    print(f"Journal: {doc.metadata['journal']}")
    print(f"Year: {doc.metadata['pub_year']}")
    print(f"URL: {doc.metadata['url']}\n")

# Search ArXiv for ML papers
arxiv_reader = BuiltSimpleArxivReader()
arxiv_docs = arxiv_reader.load_data("transformer architecture attention", limit=10)

for doc in arxiv_docs:
    print(f"Title: {doc.metadata['title']}")
    print(f"Authors: {doc.metadata['authors']}")
    print(f"ArXiv ID: {doc.metadata['arxiv_id']}\n")

Build a RAG Index

from llama_index.core import VectorStoreIndex
from llama_index.readers.builtsimple import BuiltSimplePubMedReader

# Load documents
reader = BuiltSimplePubMedReader()
documents = reader.load_data("immunotherapy cancer treatment", limit=20)

# Build index
index = VectorStoreIndex.from_documents(documents)

# Query
query_engine = index.as_query_engine()
response = query_engine.query("What are the side effects of CAR-T therapy?")
print(response)

Combine Multiple Sources

from llama_index.core import VectorStoreIndex
from llama_index.readers.builtsimple import (
    BuiltSimplePubMedReader,
    BuiltSimpleArxivReader,
)

# Load from multiple sources
pubmed = BuiltSimplePubMedReader()
arxiv = BuiltSimpleArxivReader()

# Combine documents
documents = []
documents.extend(pubmed.load_data("drug discovery machine learning", limit=10))
documents.extend(arxiv.load_data("drug discovery deep learning", limit=10))

# Build unified index
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()

response = query_engine.query(
    "How is machine learning being used for drug discovery?"
)
print(response)

API Reference

BuiltSimplePubMedReader

BuiltSimplePubMedReader(
    api_key: Optional[str] = None,  # Optional for higher rate limits
    timeout: int = 30,
)

def load_data(
    query: str,
    limit: int = 10,
) -> List[Document]

Document Metadata:

source: "builtsimple-pubmed"
pmid: PubMed ID
title: Paper title
journal: Journal name
pub_year: Publication year
doi: DOI identifier
url: Link to PubMed

BuiltSimpleArxivReader

BuiltSimpleArxivReader(
    api_key: Optional[str] = None,
    timeout: int = 30,
)

def load_data(
    query: str,
    limit: int = 10,
) -> List[Document]

Document Metadata:

source: "builtsimple-arxiv"
arxiv_id: ArXiv identifier (e.g., "2301.12345")
title: Paper title
authors: Author list
year: Publication year
url: Link to ArXiv

BuiltSimpleWikipediaReader

BuiltSimpleWikipediaReader(
    api_key: Optional[str] = None,
    timeout: int = 30,
)

def load_data(
    query: str,
    limit: int = 10,
) -> List[Document]

Document Metadata:

source: "builtsimple-wikipedia"
title: Article title
url: Link to Wikipedia

Rate Limits

Tier	Rate Limit	Notes
Free	10 req/min	No API key needed
Pro	100 req/min	Requires API key

Get an API key at pubmed.built-simple.ai or arxiv.built-simple.ai.

Why Built-Simple?

Unlike scraping or official APIs:

Pre-indexed vectors - No embedding costs, instant semantic search
Hybrid search - Combines BM25 + vector similarity
Always available - No rate limit hell from upstream providers
Structured data - Clean JSON responses with full metadata

Contributing

This package is part of the LlamaIndex ecosystem. To contribute:

Fork the repo
Create a feature branch
Submit a PR to run-llama/llama_index

License

MIT License - see LICENSE for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Jan 31, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llama_index_readers_builtsimple-0.1.0.tar.gz (12.4 kB view details)

Uploaded Jan 31, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llama_index_readers_builtsimple-0.1.0-py3-none-any.whl (16.2 kB view details)

Uploaded Jan 31, 2026 Python 3

File details

Details for the file llama_index_readers_builtsimple-0.1.0.tar.gz.

File metadata

Download URL: llama_index_readers_builtsimple-0.1.0.tar.gz
Upload date: Jan 31, 2026
Size: 12.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.2

File hashes

Hashes for llama_index_readers_builtsimple-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`01e2e66076c3f3925d92a712d56f2899e214d465f8ddb86e276fe4c20d843e6e`
MD5	`3044814f3184c6aa6a3bbfc4f64ab3d2`
BLAKE2b-256	`b56d97eec0de38e83d7af72abb0f7e28d4baeaeadf2ac09a674ab301768c23da`

See more details on using hashes here.

File details

Details for the file llama_index_readers_builtsimple-0.1.0-py3-none-any.whl.

File metadata

Download URL: llama_index_readers_builtsimple-0.1.0-py3-none-any.whl
Upload date: Jan 31, 2026
Size: 16.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.2

File hashes

Hashes for llama_index_readers_builtsimple-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d84406aec57f519dca41766bca57fb4990b6910a795f364f68662fd4bf96b1af`
MD5	`f9b89728c133b549990096a07a95bb99`
BLAKE2b-256	`d7f5cf4a868e689e05041bd71415400a3fdf88e65961b3ce58d2c6f9381a5ef8`

See more details on using hashes here.

llama-index-readers-builtsimple 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

llama-index-readers-builtsimple

Features

What Data is Included

PubMed Reader

ArXiv Reader

Wikipedia Reader

Installation

Quick Start

Basic Usage

Build a RAG Index

Combine Multiple Sources

API Reference

BuiltSimplePubMedReader

BuiltSimpleArxivReader

BuiltSimpleWikipediaReader

Rate Limits

Why Built-Simple?

Contributing

License

Links

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes