Clinical-genomics agent: ask natural-language questions over a local VCF and get literature-grounded answers.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

aiva-agent

A standalone clinical-genomics agent. Ask natural-language questions about a local pre-annotated VCF, gather variant annotations, search literature, find clinical trials, prioritize genes from HPO phenotypes, and run ACMG/AMP variant classification — using any OpenAI-compatible provider (OpenAI, Anthropic, xAI Grok, Together, Fireworks, OpenRouter, etc.).

export LLM_MODEL=gpt-5.5
export LLM_BASE_URL=https://api.openai.com/v1
export LLM_API_KEY=sk-...
aiva_agent --vcf data/test.vcf.gz \
  --prompt "3yo female with developmental regression, hand stereotypies, and acquired microcephaly. Find candidate variants and propose the most likely diagnosis with supporting evidence."

What it does
Prerequisites
Quickstart
Examples
Reference
License

What it does

The agent runs locally over a pre-annotated tabix-indexed VCF and orchestrates a curated set of tools to answer questions about variants, retrieve supporting literature, surface clinical trials, prioritize candidate genes from phenotype terms, and run ACMG/AMP variant classification. Tools are exposed under the following names and can be selectively turned off via --disable:

Tool	What it does
`vcf`	Queries over your tabix-indexed `.vcf.gz` file.
`annotate`	Variant annotation. Supports human and plant species.
`literature`	PubMed / PMC search with gene / disease / variant / chemical entity annotations.
`trials`	Clinical-trials search by condition, intervention, gene/variant, phase, recruiting status; full-detail retrieval by NCT ID.
`rank_genes`	Rank candidate genes for a list of HPO phenotype terms. Can use negative terms to exclude genes.
`web`	Web search and clean content extraction from any URL.
`classify`	ACMG/AMP 2015 (germline) and AMP/ASCO/CAP 2017 (somatic) classification and returns a JSON classification.
`bash`	Run shell commands in your working directory. Lets the agent peek at any file the other tools don't cover (CSV, TSV, Excel, JSON, parquet, plain text) and use whatever tools your environment has on PATH. Scope with `--workdir` or `AIVA_WORKDIR`; defaults to the current directory.
`manage_todos`	Plan and track multi-step work within a single run. The agent creates a todo list up front, marks items in progress as it starts each one, and completes them as it goes. State is per-run (not persisted across sessions).

Prerequisites

Python 3.11+ with pip ≥ 24 (for pip install).
htslib tools (bgzip, tabix) only needed for preparing VCFs (brew install htslib on macOS).
Linux: glibc ≥ 2.28 (Ubuntu ≥ 20.04, RHEL/Rocky ≥ 8, Debian ≥ 10) for the vcf tool. Older hosts can use the container image — see Running on HPC.
Internet connectivity — For HPC/cloud/remote environments/jobs, ensure outbound HTTPS to your LLM provider and to the public APIs the agent queries (NCBI E-utilities, ClinicalTrials.gov, DuckDuckGo, etc.). Behind a corporate proxy, set HTTPS_PROXY / HTTP_PROXY / NO_PROXY (Python clients honor these automatically); for MITM proxies, also set REQUESTS_CA_BUNDLE to your org's root cert. Only the local vcf tool runs without network.

Quickstart

1. Install (pick one)

pip — fastest if you have Python 3.11+ on a modern Linux/macOS host:

pip install aiva-agent

Docker — works anywhere a container runs (Docker, Podman, Apptainer/Singularity). Useful when your host can't install Python or has older glibc (see Running on HPC):

docker pull mhspl/aiva-agent:latest

Python import — call from a notebook or script (see Notebook usage):

from aiva_agent import aiva_agent

2. Configure

Set three env vars for your model provider. Drop them in a .env file in your working directory and the agent loads them automatically:

LLM_MODEL=gpt-5.5
LLM_BASE_URL=https://api.openai.com/v1
LLM_API_KEY=sk-...

For one-off shells you can export the same variables instead. The agent works with any OpenAI-compatible endpoint. For the full list of env vars and CLI flags, see Full configuration.

Security note: prefer .env or export LLM_API_KEY=... over --api-key sk-... so the secret doesn't leak into shell history or ps.

3. Run

Prepare a tabix-indexed VCF (only once per file):

bgzip -k path/to/sample.vcf
tabix -p vcf path/to/sample.vcf.gz

Tip: prefer a pre-annotated VCF. If you've already run your VCF through a variant-effect annotator (VEP, SnpEff, ANNOVAR, …), the agent can read those annotations directly from the file. Pre-annotation is recommended for variant prioritization and classification.

Then ask a question:

aiva_agent --vcf path/to/sample.vcf.gz \
  --prompt "List pathogenic variants"

Same invocation with Docker (bind-mount the VCF directory):

docker run --rm --env-file .env -v "$PWD/data:/work" \
  mhspl/aiva-agent:latest \
  --vcf /work/sample.vcf.gz --prompt "List pathogenic variants"

Examples

# All tools are on by default — pass --disable vcf for prompts that don't need
# a local VCF, or set AIVA_DISABLE / AIVA_VCF in .env to make it permanent.

# Variant annotation
aiva_agent --disable vcf --model gpt-5.5 \
  --prompt "Annotate rs113488022. Report ClinVar significance and population AF."

# Literature search
aiva_agent --disable vcf --model gpt-5.5 \
  --prompt "Find 3 recent papers on TP53 R175H in lung cancer."

# Clinical trials
aiva_agent --disable vcf --model gpt-5.5 \
  --prompt "Find phase 2 recruiting BRAF V600E melanoma trials."

# HPO -> genes
aiva_agent --disable vcf --model gpt-5.5 \
  --prompt "Rank candidate genes for HP:0001250 + HP:0001263."

# Web search + scrape (free, no API key)
aiva_agent --disable vcf --model gpt-5.5 \
  --prompt "Find and scrape the latest NCCN melanoma guideline summary."

# ACMG/AMP classification
aiva_agent --disable vcf --model gpt-5.5 \
  --prompt "Classify rs113488022 (BRAF V600E) under AMP for melanoma on GRCh38."

# VCF query + literature (vcf tool turns on automatically once a path is provided)
aiva_agent --vcf data/test.vcf.gz --model gpt-5.5 \
  --prompt "Find any TP53 variants and pull supporting literature."

Reference

Full configuration

You can drive everything via flags or env vars. Precedence: CLI flag > shell export > .env value.

Variable	CLI flag	Purpose	Default
`LLM_MODEL`	`--model`	Model ID for the provider (e.g. `gpt-5.5`, `claude-opus-4-7`).	—
`LLM_BASE_URL`	`--base-url`	Provider's OpenAI-compatible base URL.	—
`LLM_API_KEY`	`--api-key`	Provider API key.	—
`AIVA_VCF`	`--vcf`	Default VCF path or `alias=path,...` spec.	unset (vcf tool auto-disables)
`AIVA_WORKDIR`	`--workdir`	Working directory for the bash tool.	current directory at invocation
`AIVA_DISABLE`	`--disable`	Comma-separated tools to disable.	unset (all tools on)
`AIVA_MAX_TURNS`	—	Max agent turns per run.	`25`
`AIVA_FORCE`	`--force`	Overwrite `-o` destination without passing `--force`. Accepts `1/true/yes/on`.	unset
`AIVA_SESSION_ID`	`--session-id`	Conversation session ID; persist history across runs in `~/.aiva/sessions.db`.	new UUID per run
`AIVA_STREAM`	`--stream`	Force streaming on (`1/true/yes/on`) or off (`0/false/no/off`).	auto (on if stdout is a TTY and `--output` is unset)
`AIVA_STREAM_TOOL_OUTPUT`	—	When streaming, also dump each tool's output to stderr (debug aid). Accepts `1/true/yes/on`.	unset (off)
`AIVA_STREAM_TOOL_OUTPUT_MAX`	—	Cap on chars per tool output when the dump above is on. `0` means unlimited.	`2000`

Copy-paste template to your .env file:

# aiva-agent uses any OpenAI-compatible LLM provider.

# Model ID exactly as the provider expects.
# Examples: gpt-5.5, claude-opus-4-7, grok-2, meta-llama/Llama-3.1-70B-Instruct
LLM_MODEL=gpt-5.5

# Provider base URL — examples:
#   OpenAI:     https://api.openai.com/v1
#   Anthropic:  https://api.anthropic.com/v1
#   xAI Grok:   https://api.x.ai/v1
#   OpenRouter: https://openrouter.ai/api/v1
#   Together:   https://api.together.xyz/v1
#   Fireworks:  https://api.fireworks.ai/inference/v1
#   DeepSeek:   https://api.deepseek.com/v1
#   Groq:       https://api.groq.com/openai/v1
LLM_BASE_URL=https://api.openai.com/v1

# API key for the chosen provider. Keep the real .env OUT of source control.
LLM_API_KEY=your-provider-api-key

# Optional: default VCF spec. Two forms:
#   single:  AIVA_VCF=data/sample.vcf.gz
#   multi:   AIVA_VCF=proband=p.vcf.gz,father=f.vcf.gz,mother=m.vcf.gz
# For trios/cohorts, prefer a joint-called multisample VCF (single path) when you have one.
# --vcf overrides this per-run; pass --disable vcf to skip the tool entirely.
# AIVA_VCF=data/sample.vcf.gz

# Optional: comma-separated tools to disable by default.
# Choices: vcf, annotate, literature, trials, rank_genes, web, classify, bash, manage_todos
# --disable on the CLI replaces (does not merge with) this value.
# AIVA_DISABLE=vcf                #single-tool disable
# AIVA_DISABLE=web,annotate       #multi-tool disable

# Optional: max agent turns per run. Default 25.
# AIVA_MAX_TURNS=40

# Optional: always overwrite --output destination without --force. Useful in
# CI / scripted pipelines where re-runs are expected. Accepts 1/true/yes/on.
# AIVA_FORCE=1

# Optional: conversation session ID. Reuse the same value across `aiva` runs to
# continue a conversation; history is stored in ~/.aiva/sessions.db (local sqlite 
# database, no server). If unset, each run gets a fresh UUID and is effectively stateless.
# AIVA_SESSION_ID=my-case-2026-05

# Optional: stream the agent's response live as it runs tools and replies,
# instead of waiting for the full answer. Also toggled by the --stream CLI flag.
# Defaults on in a terminal, off when piping or using --output.
# Accepts 1/true/yes/on or 0/false/no/off.
# AIVA_STREAM=1

# Optional: when streaming, also dump each tool's output after the
# `[tool] done` marker. Off by default since outputs can be very large
# (multi-MB JSON). Accepts 1/true/yes/on. Useful for debugging tool behavior.
# AIVA_STREAM_TOOL_OUTPUT=1

# Optional: cap on chars per tool output when AIVA_STREAM_TOOL_OUTPUT is on.
# Default 2000. Set to 0 for unlimited (warning: may flood your terminal).
# AIVA_STREAM_TOOL_OUTPUT_MAX=2000

CLI-only flags (per-invocation, no env equivalent):

Flag	Purpose
`--prompt TEXT` / `--prompt-file PATH`	The question to ask. One of these is required. `--prompt -` reads from stdin.
`-o OUTPUT`	Write the agent's final answer to a file instead of stdout.

CLI flags in detail

`--disable`

All tools are ON by default. Pass --disable a,b to drop tools from the agent's palette.

aiva_agent --disable vcf --model gpt-5.5 \
  --prompt "Annotate rs113488022 and find recent papers."

# Multi-tool disable: comma-separated, no spaces
aiva_agent --disable rank_genes,web,trials --vcf data/test.vcf.gz --model gpt-5.5 \
  --prompt "Classify the most pathogenic chr1 variant."

--disable falls back to the AIVA_DISABLE env var (e.g. AIVA_DISABLE=vcf in .env). The flag, when given, replaces (rather than merges with) the env value.

`--vcf` path resolution

The vcf tool needs a path. Provide it via --vcf PATH or the AIVA_VCF env var; the flag wins if both are set. If neither is set, the vcf tool auto-disables itself with a one-line stderr warning and the rest of the palette runs as normal. If a path is provided but the file doesn't exist, the CLI exits with an error. Pass --disable vcf to skip the resolution dance entirely.

Multiple VCFs (trio / tumor-normal): --vcf accepts a comma-separated alias=path list. Each entry becomes a separately named handle the agent can query and cross-reference.

aiva_agent --vcf "proband=trio/proband.vcf.gz,father=trio/father.vcf.gz,mother=trio/mother.vcf.gz" \
  --prompt "Count de novo het variants in the proband (parents both 0/0)."

AIVA_VCF accepts the same syntax (AIVA_VCF=proband=p.vcf.gz,father=f.vcf.gz in .env).

Prefer a multisample VCF when you have one. If you've run joint calling and produced a single multisample file, pass it as one VCF — joint-called files preserve missing-genotype information consistently across samples, which is what trio analyses depend on. The multi-VCF path above works fine, but joint calling is genomically more correct when you can do it.

`--workdir` (bash tool)

The bash tool runs shell commands locally so the agent can work with files the structured tools don't cover. Scope it with --workdir:

aiva_agent --workdir ./data \
  --prompt "How many rows are in panel.csv, and which genes appear most often?"

Resolution: --workdir flag → AIVA_WORKDIR env → the directory you ran aiva_agent from. The bash tool inherits your current shell environment, so the agent uses whatever python, duckdb, awk, etc. binaries are on your PATH. The agent will not install packages without asking first. Pass --disable bash to drop it from the palette.

Prompt sources

# 1. Inline
aiva_agent --vcf data/test.vcf.gz --prompt "List pathogenic variants"

# 2. From a file (UTF-8, trailing whitespace stripped)
aiva_agent --vcf data/test.vcf.gz --prompt-file prompts/chr1_audit.txt

# 3. From stdin (Unix pipe)
echo "List pathogenic variants" | aiva_agent --vcf data/test.vcf.gz --prompt -

Writing to a file

# Convenience flag — creates parent dirs, refuses to clobber unless --force
aiva_agent --model gpt-5.5 --disable vcf \
  --prompt "..." -o reports/answer.md

# Or shell redirection (warnings go to stderr)
aiva_agent --model gpt-5.5 --disable vcf \
  --prompt "..." > answer.md

Streaming output

Long agent runs (multi-tool prompts, classification) can take a while. By default, when stdout is an interactive terminal, the CLI streams as it goes: tool-call markers like [tool] vcf_query… / [tool] done go to stderr the moment each call fires, and the final answer streams to stdout token-by-token. When stdout is piped or --output is set, streaming is automatically off so consumers receive a single clean string.

# Auto-on in a terminal:
aiva_agent --vcf data/test.vcf.gz --prompt "List pathogenic variants"

# Force on (e.g. when piping but you still want progress on stderr):
AIVA_STREAM=1 aiva_agent --prompt "..." | tee out.txt

# Force off in a terminal:
AIVA_STREAM=0 aiva_agent --prompt "..."

# Or use the explicit flag:
aiva_agent --stream --prompt "..."

For debugging, set AIVA_STREAM_TOOL_OUTPUT=1 to also print each tool's output (truncated to AIVA_STREAM_TOOL_OUTPUT_MAX chars, default 2000) to stderr after the [tool] done marker.

Conversation sessions

By default each aiva_agent invocation is stateless. To carry context across calls, set --session-id (or AIVA_SESSION_ID) to any string you like; history is stored locally at ~/.aiva/sessions.db — no server, no setup. If you don't set one, the CLI prints an auto-generated ID on stderr that you can copy to resume:

export AIVA_SESSION_ID=case-2026-05
aiva_agent --prompt "Patient has chr7:117559590 G>A in CFTR. Remember it."
aiva_agent --prompt "What variant did I just mention, and what gene?"

Each session ID is independent — pick a fresh one per case to keep histories from bleeding into each other.

Notebook usage

Set env vars in one cell, call aiva_agent(...) in the next; calls within the same kernel automatically share a session, so follow-up questions remember the prior turn.

# cell 1
import os
os.environ["LLM_API_KEY"] = "sk-..."
os.environ["LLM_BASE_URL"] = "https://api.openai.com/v1"
os.environ["LLM_MODEL"] = "gpt-5.5"
os.environ["AIVA_VCF"] = "data/sample.vcf.gz"

# cell 2
from aiva_agent import aiva_agent
print(aiva_agent("List 3 likely-pathogenic variants from vcf."))
print(aiva_agent("Of those, which is in a recessive disease gene?"))  # remembers

To start a fresh conversation mid-notebook, call reset_session(). To pin a specific ID (e.g. resume across kernel restarts), set AIVA_SESSION_ID in env or pass session_id="my-case" to aiva_agent. Per-call kwargs vcf=, disable=, model=, base_url=, api_key= override the corresponding env vars.

Running on HPC / older glibc

If your host has glibc < 2.28 (typical on CentOS/RHEL 7-era HPC nodes), the bundled DuckDB extension can't load and the vcf tool will refuse to start with a hint to use the container image instead. The image (mhspl/aiva-agent:latest) ships the extension pre-baked. Apptainer/Singularity can pull it directly:

apptainer pull aiva-agent.sif docker://mhspl/aiva-agent:latest
apptainer exec --bind "$PWD:/work" aiva-agent.sif \
  aiva_agent --vcf /work/sample.vcf.gz --prompt "..."

Submit the job on a node that satisfies the Outbound HTTPS prerequisite — the container doesn't change network requirements.

License

Apache-2.0. See LICENSE.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

mamidi

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.3.1

May 5, 2026

0.3.0

May 4, 2026

0.2.9

May 4, 2026

This version

0.2.8

May 3, 2026

0.2.7

May 3, 2026

0.2.6

May 2, 2026

0.2.5

May 2, 2026

0.2.4

May 1, 2026

0.2.3

May 1, 2026

0.2.2

May 1, 2026

0.2.1

May 1, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aiva_agent-0.2.8.tar.gz (80.7 kB view details)

Uploaded May 3, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

aiva_agent-0.2.8-py3-none-any.whl (63.0 kB view details)

Uploaded May 3, 2026 Python 3

File details

Details for the file aiva_agent-0.2.8.tar.gz.

File metadata

Download URL: aiva_agent-0.2.8.tar.gz
Upload date: May 3, 2026
Size: 80.7 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for aiva_agent-0.2.8.tar.gz
Algorithm	Hash digest
SHA256	`34c863e8793dc23eb949ef591c0b7d5913f80be5437746d6d37426a568ee98ef`
MD5	`bdc0226cd7d73fe165c5e3808e046d42`
BLAKE2b-256	`68d6ef56cf6ba5b4f5f7535c1ddbe094aba340ccbe101f33e7fcad66c7d42fb4`

See more details on using hashes here.

Provenance

The following attestation bundles were made for aiva_agent-0.2.8.tar.gz:

Publisher: publish.yml on MHSPL/aiva-agent

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: aiva_agent-0.2.8.tar.gz
- Subject digest: 34c863e8793dc23eb949ef591c0b7d5913f80be5437746d6d37426a568ee98ef
- Sigstore transparency entry: 1435682424
- Sigstore integration time: May 3, 2026
Source repository:
- Permalink: MHSPL/aiva-agent@6f0b1109a00be8409df8ff3898790840778f4900
- Branch / Tag: refs/tags/v0.2.8
- Owner: https://github.com/MHSPL
- Access: internal
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@6f0b1109a00be8409df8ff3898790840778f4900
- Trigger Event: release

File details

Details for the file aiva_agent-0.2.8-py3-none-any.whl.

File metadata

Download URL: aiva_agent-0.2.8-py3-none-any.whl
Upload date: May 3, 2026
Size: 63.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for aiva_agent-0.2.8-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5ff1c2628122d93bb08880841d2bd5478aebab88f8cbf350d32d45861fccb42a`
MD5	`8e87824bcf7ce6a9587b9181a2cf9f98`
BLAKE2b-256	`057ea995b59e3f09982bfb67a0db6145e5b4f2b735a42d8e2c47e464e40a2a4e`

See more details on using hashes here.

Provenance

The following attestation bundles were made for aiva_agent-0.2.8-py3-none-any.whl:

Publisher: publish.yml on MHSPL/aiva-agent

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: aiva_agent-0.2.8-py3-none-any.whl
- Subject digest: 5ff1c2628122d93bb08880841d2bd5478aebab88f8cbf350d32d45861fccb42a
- Sigstore transparency entry: 1435682441
- Sigstore integration time: May 3, 2026
Source repository:
- Permalink: MHSPL/aiva-agent@6f0b1109a00be8409df8ff3898790840778f4900
- Branch / Tag: refs/tags/v0.2.8
- Owner: https://github.com/MHSPL
- Access: internal
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@6f0b1109a00be8409df8ff3898790840778f4900
- Trigger Event: release

aiva-agent 0.2.8

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

aiva-agent

Contents

What it does

Prerequisites

Quickstart

1. Install (pick one)

2. Configure

3. Run

Examples

Reference

Full configuration

CLI flags in detail

--disable

--vcf path resolution

--workdir (bash tool)

Prompt sources

Writing to a file

Streaming output

Conversation sessions

Notebook usage

Running on HPC / older glibc

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

`--disable`

`--vcf` path resolution

`--workdir` (bash tool)