Clinical-genomics agent: ask natural-language questions over a local VCF and get annotated, literature-grounded answers.
Project description
aiva-agent
A standalone CLI clinical-genomics agent. Ask natural-language questions about a local VCF, gather variant annotations, search literature, find clinical trials, prioritize genes from HPO phenotypes, and run ACMG/AMP variant classification — using any OpenAI-compatible provider (OpenAI, Anthropic, xAI Grok, Together, Fireworks, OpenRouter, etc.).
export LLM_MODEL=gpt-5.5
export LLM_BASE_URL=https://api.openai.com/v1
export LLM_API_KEY=sk-...
aiva_agent --vcf data/test.vcf.gz \
--prompt "How many PASS variants on chr1?"
Overview
The agent runs locally over a tabix-indexed VCF and orchestrates a curated set of tools to answer questions about variants, retrieve supporting literature, surface clinical trials, prioritize candidate genes from phenotype terms, and run ACMG/AMP variant classification. Tools are exposed under the following names and can be selectively turned off via --disable:
| Tool | What it does |
|---|---|
vcf |
Queries over your tabix-indexed .vcf.gz file. |
annotate |
Variant annotation. Supports human and plant species. |
literature |
PubMed / PMC search with gene / disease / variant / chemical entity annotations. |
trials |
Clinical-trials search by condition, intervention, gene/variant, phase, recruiting status; full-detail retrieval by NCT ID. |
phen2gene |
Rank candidate genes for a list of HPO phenotype terms. Can use negative terms to exclude genes. |
web |
Web search and clean content extraction from any URL. |
classify |
ACMG/AMP 2015 (germline) and AMP/ASCO/CAP 2017 (somatic) classification and returns a JSON classification. |
Prerequisites
- Python 3.11+ with
pip ≥ 24. - htslib tools (
bgzip,tabix) only needed for preparing VCFs:brew install htslib # macOS
- A model provider's API key — see the Provider section below. The agent works with any OpenAI-compatible endpoint.
Install
pip install aiva-agent
Quick start
Prepare a tabix-indexed VCF (only needed once per file):
bgzip -k path/to/sample.vcf
tabix -p vcf path/to/sample.vcf.gz
Tip: prefer a pre-annotated VCF. If you've already run your VCF through a
variant-effect annotator (VEP, SnpEff, ANNOVAR, …), those INFO fields are
auto-flattened into queryable columns by the vcf tool — the agent can read
gene, consequence, allele frequency, and other annotations directly out of the
file without spending API calls on the annotate tool. Pre-annotation is
strongly recommended for large or sensitive cohorts where you want fewer
external calls.
Set provider credentials:
# Example for OpenAI
export LLM_MODEL=gpt-5.5
export LLM_BASE_URL=https://api.openai.com/v1
export LLM_API_KEY=sk-...
Ask a question:
aiva_agent --vcf path/to/sample.vcf.gz \
--prompt "List pathogenic variants related to breast cancer"
Environment variables
You can drive everything via flags or env vars. Most users put the LLM credentials in .env once and skip the flags on every run — copy .env.example to .env to start.
| Variable | Purpose | Default |
|---|---|---|
LLM_MODEL |
Model ID for the provider (e.g. gpt-5.5, claude-opus-4-7). |
— |
LLM_BASE_URL |
Provider's OpenAI-compatible base URL. | — |
LLM_API_KEY |
Provider API key. | — |
AIVA_VCF |
Default VCF path or alias=path,... spec. |
unset (vcf tool auto-disables) |
AIVA_DISABLE |
Comma-separated tools to disable. | unset (all tools on) |
AIVA_MAX_TURNS |
Max agent turns per run. | 25 |
AIVA_FORCE |
Overwrite -o destination without passing --force. Accepts 1/true/yes/on. |
unset |
AIVA_SESSION_ID |
Conversation session ID; persist history across runs in ~/.aiva/sessions.db. |
new UUID per run |
Precedence: CLI flag > shell export > .env value. So you can override per-shell or per-run without editing the file.
Security note: prefer export LLM_API_KEY=... over --api-key sk-... so the secret doesn't leak into shell history or ps.
Usage
aiva_agent [--disable <tools>] [--vcf PATH] [--prompt TEXT | --prompt-file PATH]
[--model ID] [--base-url URL] [--api-key KEY]
[-o OUTPUT] [--force] [--session-id ID]
--disable
All tools are ON by default. Pass --disable a,b to drop tools from the agent's palette.
aiva_agent --disable vcf --model gpt-5.5 \
--prompt "Annotate rs113488022 and find recent papers."
# Multi-tool disable: comma-separated, no spaces
aiva_agent --disable phen2gene,web,trials --vcf data/test.vcf.gz --model gpt-5.5 \
--prompt "Classify the most pathogenic chr1 variant."
--disable falls back to the AIVA_DISABLE env var (e.g. AIVA_DISABLE=vcf in .env). The flag, when given, replaces (rather than merges with) the env value.
VCF path resolution
The vcf tool needs a path. Provide it via --vcf PATH or the AIVA_VCF env var (set in .env); the flag wins if both are set. If neither is set, the vcf tool auto-disables itself with a one-line stderr warning and the rest of the palette runs as normal. If a path is provided but the file doesn't exist, the CLI exits with an error — that case clearly signals user intent and shouldn't be silently swallowed. Pass --disable vcf to skip the resolution dance entirely.
Multiple VCFs (trio / tumor-normal)
The --vcf flag accepts a comma-separated alias=path list. Each entry becomes a separately named view; the agent JOINs them at query time on (CHROM, POS, REF).
aiva_agent --vcf "proband=trio/proband.vcf.gz,father=trio/father.vcf.gz,mother=trio/mother.vcf.gz" \
--prompt "Count de novo het variants in the proband (parents both 0/0)."
AIVA_VCF accepts the same syntax (AIVA_VCF=proband=p.vcf.gz,father=f.vcf.gz in .env).
Prefer a multisample VCF when you have one. If you've run joint calling and produced a single multisample file, pass it as one VCF. Joint-called VCFs use explicit ./. for missing genotypes; an outer-JOIN across separate per-sample files turns those into NULL, which the agent has to defensively coalesce(..., './.') to read correctly. For trio analyses the joint-called single-file path is genomically more correct.
Three ways to pass a prompt
# 1. Inline
aiva_agent --vcf data/test.vcf.gz --prompt "How many variants on chr1?"
# 2. From a file (UTF-8, trailing whitespace stripped)
aiva_agent --vcf data/test.vcf.gz --prompt-file prompts/chr1_audit.txt
# 3. From stdin (Unix pipe)
echo "How many variants on chr2?" | aiva_agent --vcf data/test.vcf.gz --prompt -
Writing the answer to a file
# Convenience flag — creates parent dirs, refuses to clobber unless --force
aiva_agent --model gpt-5.5 --disable vcf \
--prompt "..." -o reports/answer.md
# Or shell redirection (warnings go to stderr)
aiva_agent --model gpt-5.5 --disable vcf \
--prompt "..." > answer.md
Conversation sessions
By default each aiva_agent invocation is stateless. To carry context across calls, set --session-id (or AIVA_SESSION_ID) to any string you like; history is stored in a single sqlite file at ~/.aiva/sessions.db — no server, no setup. If you don't set one, the CLI prints an auto-generated ID on stderr that you can copy to resume:
export AIVA_SESSION_ID=case-2026-05
aiva_agent --prompt "Patient has chr7:117559590 G>A in CFTR. Remember it."
aiva_agent --prompt "What variant did I just mention, and what gene?"
Each session ID is independent — pick a fresh one per case to keep histories from bleeding into each other.
Use from a notebook or Python script
Set env vars in one cell, call aiva_agent(...) in the next; calls within the same kernel automatically share a session, so follow-up questions remember the prior turn.
# cell 1
import os
os.environ["LLM_API_KEY"] = "sk-..."
os.environ["LLM_BASE_URL"] = "https://api.openai.com/v1"
os.environ["LLM_MODEL"] = "gpt-5.5"
os.environ["AIVA_VCF"] = "data/sample.vcf.gz"
# cell 2
from aiva_agent import aiva_agent
print(aiva_agent("List 3 likely-pathogenic variants from vcf."))
print(aiva_agent("Of those, which is in a recessive disease gene?")) # remembers
To start a fresh conversation mid-notebook, call reset_session(). To pin a specific ID (e.g. resume across kernel restarts), set AIVA_SESSION_ID in env or pass session_id="my-case" to aiva_agent. Per-call kwargs vcf=, disable=, model=, base_url=, api_key= override the corresponding env vars.
Examples
# All tools are on by default — pass --disable vcf for prompts that don't need
# a local VCF, or set AIVA_DISABLE / AIVA_VCF in .env to make it permanent.
# Variant annotation
aiva_agent --disable vcf --model gpt-5.5 \
--prompt "Annotate rs113488022. Report ClinVar significance and population AF."
# Literature search
aiva_agent --disable vcf --model gpt-5.5 \
--prompt "Find 3 recent papers on TP53 R175H in lung cancer."
# Clinical trials
aiva_agent --disable vcf --model gpt-5.5 \
--prompt "Find phase 2 recruiting BRAF V600E melanoma trials."
# HPO -> genes
aiva_agent --disable vcf --model gpt-5.5 \
--prompt "Rank candidate genes for HP:0001250 + HP:0001263."
# Web search + scrape (free, no API key)
aiva_agent --disable vcf --model gpt-5.5 \
--prompt "Find and scrape the latest NCCN melanoma guideline summary."
# ACMG/AMP classification
aiva_agent --disable vcf --model gpt-5.5 \
--prompt "Classify rs113488022 (BRAF V600E) under AMP for melanoma on GRCh38."
# VCF query + literature (vcf tool turns on automatically once a path is provided)
aiva_agent --vcf data/test.vcf.gz --model gpt-5.5 \
--prompt "Find any TP53 variants and pull supporting literature."
License
Apache-2.0. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file aiva_agent-0.2.2.tar.gz.
File metadata
- Download URL: aiva_agent-0.2.2.tar.gz
- Upload date:
- Size: 60.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
98005a8a45c45b2adcaf52b2a9a59313390b0a4b92c8c7479174316b6eebe8c6
|
|
| MD5 |
dd4935d5e3a255a786a35aacc20a5e99
|
|
| BLAKE2b-256 |
7d4ce2ac471de1bb72227e228fdf7cae8aae84d7a07e271c6d402cc268df3dac
|
Provenance
The following attestation bundles were made for aiva_agent-0.2.2.tar.gz:
Publisher:
publish.yml on MHSPL/aiva-agent
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
aiva_agent-0.2.2.tar.gz -
Subject digest:
98005a8a45c45b2adcaf52b2a9a59313390b0a4b92c8c7479174316b6eebe8c6 - Sigstore transparency entry: 1421953798
- Sigstore integration time:
-
Permalink:
MHSPL/aiva-agent@6841c5463b143acb43a5f8a35596b2c81b3aa813 -
Branch / Tag:
refs/tags/v0.2.2 - Owner: https://github.com/MHSPL
-
Access:
internal
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@6841c5463b143acb43a5f8a35596b2c81b3aa813 -
Trigger Event:
release
-
Statement type:
File details
Details for the file aiva_agent-0.2.2-py3-none-any.whl.
File metadata
- Download URL: aiva_agent-0.2.2-py3-none-any.whl
- Upload date:
- Size: 49.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
21ae2397a9c0517567b1272042477a703492264a6271ccd38988de24a2155de9
|
|
| MD5 |
0e1119ac363c389c73ddfd819f0f563a
|
|
| BLAKE2b-256 |
d38beb361503d77e67ab2bb40d3d471e00127bf79e58e9650b9aca09bf206379
|
Provenance
The following attestation bundles were made for aiva_agent-0.2.2-py3-none-any.whl:
Publisher:
publish.yml on MHSPL/aiva-agent
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
aiva_agent-0.2.2-py3-none-any.whl -
Subject digest:
21ae2397a9c0517567b1272042477a703492264a6271ccd38988de24a2155de9 - Sigstore transparency entry: 1421953868
- Sigstore integration time:
-
Permalink:
MHSPL/aiva-agent@6841c5463b143acb43a5f8a35596b2c81b3aa813 -
Branch / Tag:
refs/tags/v0.2.2 - Owner: https://github.com/MHSPL
-
Access:
internal
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@6841c5463b143acb43a5f8a35596b2c81b3aa813 -
Trigger Event:
release
-
Statement type: