LLM-assisted biomedical literature screening and structured extraction for PubMed and GEO.

These details have not been verified by PyPI

Project links

Project description

biolit

LLM-assisted biomedical literature screening and structured extraction. Accepts PubMed alert emails, plain PMID lists, or GEO accession lists. Supports multiple LLM providers and optional full-text retrieval.

Setup

Requirements: Python 3.8+

Install the package (creates the biolit command):

pip install -e .

Copy .env.example to .env and add your API key:

cp .env.example .env
# edit .env and set ANTHROPIC_API_KEY (or OPENAI_API_KEY)

Usage

The tool accepts several input formats, auto-detected by file extension or content:

Input	How to pass	Example
PubMed alert email	positional `.eml` file	`alert.eml`
PMID list (file)	positional plain-text file, one PMID per line	`pmids.txt`
GEO accession list (file)	positional plain-text file, one accession per line	`geo_accessions.txt`
PMIDs (inline)	`--pmids` flag, comma-separated	`--pmids 41795042,41792186`
GEO accessions (inline)	`--accessions` flag, comma-separated	`--accessions GSE53987,GSE12345`

Use --default to run with schizophrenia genomics defaults (no prompts):

biolit alert.eml --default
biolit pmids.txt --default
biolit geo_accessions.txt --default
biolit --pmids 41795042,41792186 --default
biolit --accessions GSE53987 --default

Or specify criterion and fields as flags:

biolit pmids.txt \
  --criterion "Is this about treatment-resistant schizophrenia?" \
  --fields "methodology, sample_size, treatment, outcomes"

Or interactively (prompted if not provided):

biolit alert.eml

Single-record screening

Use biolit screen to quickly check one paper or GEO record for relevance without running the full extraction pipeline:

biolit screen --pmid 41627908 --default
biolit screen --accession GSE53987 --default
biolit screen --pmid 41627908 --criterion "Is this about treatment-resistant schizophrenia?"
biolit screen --pmid 41627908 --fulltext --default

Output is a single line to stdout:

RELEVANT [abstract] — Paper uses GWAS to investigate schizophrenia risk loci.

GEO accession input

Pass a file of GEO series accessions (GSE, GDS, GSM, or GPL prefixes) to screen GEO records directly. The tool fetches each record's MINiML XML, extracts the summary, overall design, experiment type, and organism, then runs the same LLM screening and extraction pipeline.

biolit geo_accessions.txt \
  --criterion "Does this study perturb a transcription factor?" \
  --fields "organism, experiment_type, tf_perturbed, perturbation_method, summary"

GEO results include geo_accession and pmids (linked PubMed IDs) columns in place of pmid.

Full-text retrieval (PubMed inputs only)

Use --fulltext to screen and extract from full text instead of just the abstract. The pipeline tries each source in order:

PMC JATS XML (open access)
Preprint XML (bioRxiv / medRxiv)
Unpaywall PDF (requires --unpaywall-email)
Abstract fallback

biolit alert.eml --default --fulltext --unpaywall-email you@example.com

Limit which sections are sent to the LLM:

biolit alert.eml --default --fulltext --sections methods,results

LLM providers

The tool supports Anthropic (default), OpenAI, and local Ollama models:

# OpenAI
biolit pmids.txt --default --provider openai --model gpt-4o

# Ollama (local)
biolit pmids.txt --default --provider ollama --model llama3

You can also set LLM_PROVIDER and LLM_MODEL as environment variables.

Output

Each run creates a timestamped directory (e.g. run_20260313_142000/) containing:

results.csv — one row per relevant record
artifacts/<id>/ — per-record folder with the text sent to the LLM, metadata, and any retrieved full-text files

With --default on PubMed inputs, the CSV columns are:

Column	Description
`title`	Paper title
`url`	PubMed link
`pmid`	PubMed ID
`doi`	DOI
`text_source`	Where the text came from (`abstract`, `pmc_fulltext`, `preprint_fulltext`, `unpaywall_pdf`)
`methodology`	General method (e.g. GWAS, scRNA-seq, proteomics)
`sample_type`	Tissue/sample type and origin
`causal_claims`	Statements about causes of schizophrenia inferred from the data
`genetics_claims`	Claims about specific genes, loci, or pathways
`summary`	2-3 sentence plain-language summary for triage

For GEO inputs, pmid is replaced by geo_accession and pmids.

The CSV can be imported directly into Google Sheets (File → Import).

MCP server

biolit ships an MCP server that exposes the pipeline as tools for any MCP-compatible client (Claude Desktop, Claude CLI, OpenAI Agents SDK, etc.).

Start the server:

biolit-mcp

Or test interactively with the MCP inspector:

mcp dev biolit/mcp_server.py

Configure Claude Desktop

Add to ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "biolit": {
      "command": "biolit-mcp"
    }
  }
}

Restart Claude Desktop. The tools will appear in the tool picker.

Configure Claude CLI

Add a .mcp.json in your project root:

{
  "mcpServers": {
    "biolit": {
      "command": "biolit-mcp"
    }
  }
}

Available tools

Batch pipelines (equivalent to the biolit CLI):

Tool	Description
`run_pipeline`	Screen + extract a list of PMIDs, write results CSV
`run_geo_pipeline`	Screen + extract a list of GEO accessions, write results CSV

Single-record (equivalent to biolit screen):

Tool	Description
`screen_by_pmid`	Fetch + screen a PubMed paper in one call
`screen_by_geo`	Fetch + screen a GEO record in one call

Low-level (for custom workflows):

Tool	Description
`search_pubmed`	Fetch PubMed metadata by PMID
`fetch_geo_record`	Fetch and parse a GEO record by accession
`fetch_fulltext`	Retrieve full text for a PMID
`screen_paper`	LLM relevance screen given pre-fetched text
`extract_fields`	Structured field extraction given pre-fetched text
`read_pmids_from_eml`	Parse PMIDs from a PubMed alert `.eml` file

Use as a Python library

The pipeline functions are importable directly:

from biolit.pipeline import screen_by_pmid, screen_by_geo, run, run_geo
from biolit.llm import get_llm_client

client = get_llm_client("anthropic")

# Single-record screen
result = screen_by_pmid(client, "41627908", "Is this about schizophrenia genomics?")
# {"relevant": True, "reason": "...", "text_source": "abstract"}

# Batch pipeline
run(client, pmids=["41627908", "33741721"], criterion="...", fields_description="methodology, summary", output_path="results.csv")

Known Limitations

Papers without abstracts or accessible full text are skipped silently.
Full-text retrieval (--fulltext) applies to PubMed inputs only; GEO records use the record metadata directly.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.32

May 6, 2026

0.1.31

May 4, 2026

0.1.30

May 4, 2026

0.1.29

May 2, 2026

0.1.28

May 1, 2026

0.1.27

Apr 30, 2026

0.1.26

Apr 30, 2026

0.1.25

Apr 1, 2026

0.1.24

Mar 31, 2026

0.1.23

Mar 29, 2026

0.1.22

Mar 27, 2026

0.1.21

Mar 24, 2026

0.1.20

Mar 24, 2026

0.1.19

Mar 19, 2026

0.1.18

Mar 19, 2026

0.1.17

Mar 19, 2026

0.1.16

Mar 18, 2026

0.1.15

Mar 18, 2026

0.1.14

Mar 18, 2026

0.1.13

Mar 18, 2026

0.1.12

Mar 17, 2026

0.1.11

Mar 17, 2026

0.1.10

Mar 17, 2026

0.1.9

Mar 17, 2026

0.1.8

Mar 17, 2026

0.1.7

Mar 17, 2026

0.1.6

Mar 17, 2026

0.1.5

Mar 16, 2026

0.1.3

Mar 16, 2026

This version

0.1.2

Mar 15, 2026

0.1.1

Mar 15, 2026

0.1.0

Mar 14, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

biolit-0.1.2.tar.gz (30.6 kB view details)

Uploaded Mar 15, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

biolit-0.1.2-py3-none-any.whl (27.8 kB view details)

Uploaded Mar 15, 2026 Python 3

File details

Details for the file biolit-0.1.2.tar.gz.

File metadata

Download URL: biolit-0.1.2.tar.gz
Upload date: Mar 15, 2026
Size: 30.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for biolit-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`02e3c99194f31f7f62ccd26bae189482e863449a7e0e55950b70550b56957d0a`
MD5	`818180e77b2700d48b4ee1395c18bbd8`
BLAKE2b-256	`aa0c82e5c91494864d278ad79d093fc9008dbf438dfc85fa251fb13e9774edd1`

See more details on using hashes here.

File details

Details for the file biolit-0.1.2-py3-none-any.whl.

File metadata

Download URL: biolit-0.1.2-py3-none-any.whl
Upload date: Mar 15, 2026
Size: 27.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for biolit-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`26d2b08b3a09dacb2b48ce880c85c80a963a0399ed79402886ff4c7401257c8a`
MD5	`5bbfd6c2cb47d5cb6dda21cbc533c28a`
BLAKE2b-256	`e1360a102d4f10761038e11a2e23f2620ce05044d9480f22db2507ca35714086`

See more details on using hashes here.

biolit 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

biolit

Setup

Usage

Single-record screening

GEO accession input

Full-text retrieval (PubMed inputs only)

LLM providers

Output

MCP server

Configure Claude Desktop

Configure Claude CLI

Available tools

Use as a Python library

Known Limitations

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes