LITS: LLM Inference via Tree Search — modular agentic reasoning framework.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

LiTS — Language Inference via Tree Search

A modular Python framework for LLM reasoning and planning with tree search (e.g., MCTS, BFS and other custom search algorithms) and chain reasoning (e.g.,ReAct).

Why LiTS?
Installation
Quick Start — 5-Minute Demo
CLI Commands
🎬 2.5-Minute Demo Video
More CLI Examples
Quick Start — Python API
Architecture
- Extending with Custom Components
Task Types
Project Structure
Documentation
Citation
License

Why LiTS?

Concern	Challenge	LiTS Solution
Reusability	Reimplementing search algorithms for each new task	Task-agnostic data structures (`Action → Step → State → Node`) that hide search procedures from task-specific logic
Extensibility	Adding new tasks requires modifying many files	Modular components (`Policy`, `Transition`, `RewardModel`) + decorator-based registry — add a task by registering prompts and a transition
Observability	Tree search is expensive and hard to debug	Built-in `InferenceLogger` tracks token usage at component, instance, and search-phase levels; incremental checkpointing for fault tolerance

Installation

pip install lits-llm          # from PyPI
# or
pip install -e .              # editable install from source

Requires Python >= 3.11.

Quick Start — 5-Minute Demo

Three steps: install, configure an LLM, run a search and see the artifacts.

1. Install

pip install lits-llm

2. Configure an LLM provider

LiTS needs an LLM for policy (action generation) and reward (scoring). Pick one provider:

OpenAI — set OPENAI_API_KEY:

export OPENAI_API_KEY="sk-..."
MODEL="openai/gpt-4o-mini"

AWS Bedrock — configure AWS credentials (SSO or env vars):

aws sso login --profile default   # or export AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY
MODEL="bedrock/anthropic.claude-3-5-sonnet-20240620-v1:0"

Groq (free tier available) — set GROQ_API_KEY:

export GROQ_API_KEY="gsk_..."
MODEL="groq/llama-3.1-8b-instant"

Local HuggingFace — no API key needed (for testing only; prefer GPU or Apple Silicon for larger LLMs):

MODEL="Qwen/Qwen2.5-0.5B-Instruct"  # auto-downloads from HuggingFace
# pass device via --search-arg: device=cuda (default), device=mps (Apple Silicon), or device=cpu

Any OpenAI-compatible API (vLLM, Together AI, Fireworks, etc.):

export OPENAI_API_KEY="your-key"           # use "EMPTY" for local vLLM
export OPENAI_API_BASE="http://localhost:8000/v1"  # vLLM / provider's base URL
MODEL="openai/meta-llama/Llama-3-8B-Instruct"

3. Run MCTS on a math problem

Save this as my_benchmark.py (the .py extension matters) in your working directory:

from lits.registry import register_dataset

@register_dataset("my_math", task_type="language_grounded")
def load_my_math(**kwargs):
    return [
        {
            "question": (
                "The proper divisors of 12 are 1, 2, 3, 4 and 6. "
                "A proper divisor of an integer $N$ is a positive divisor of $N$ "
                "that is less than $N$. What is the sum of the proper divisors "
                "of the sum of the proper divisors of 284?"
            ),
            "answer": "284",
        }
    ]

Then run MCTS:

lits-search --include my_benchmark \
    --dataset my_math \
    --policy-model "$MODEL" \
    --search-arg roll_out_steps=2 n_iters=50 force_terminating_on_depth_limit=false n_actions=3 max_steps=10 \
    -o demo_results --override

What you should see

demo_results/
├── checkpoints/           # Intermediate tree states per iteration
├── terminal_nodes/        # All terminal nodes found
├── config.json            # Full config (reproducible)
├── execution.log          # Execution log
└── inferencelogger.log    # Per-call token usage with component/phase tags

(Optional) 4. Evaluate

lits-eval --result_dir demo_results

The evaluation report will be saved to demo_results/eval.log

Validate config without LLM calls (no API key needed)

lits-search --include my_benchmark \
    --dataset my_math --dry-run

This prints the resolved components, dataset info, and first example — useful for checking your setup before a real run.

CLI Commands

lits-search       # Run tree search (MCTS, BFS)
lits-eval         # Evaluate tree search results
lits-chain        # Run chain agents (ReAct, EnvChain)
lits-eval-chain   # Evaluate chain results

🎬 2.5-Minute Demo Video

See the full walkthrough — MCTS on math, Crosswords (environment-grounded), and evaluation:

👉 https://youtu.be/nRGX43YrR3I

The commands demonstrated in the video are listed below for direct copy-paste.

More CLI Examples

All commands below assume cd demos and a configured $MODEL (see Quick Start above).

MCTS on MATH500

lits-search --include lits_benchmark.math_qa \
    --dataset math500 \
    --policy concat --transition concat --reward generative \
    --search-arg n_iters=50 n_actions=3 max_steps=10 \
    --var limit=5

Swap to RAP (different components, same algorithm)

lits-search --include lits_benchmark.math_qa lits_benchmark.formulations.rap \
    --dataset math500 \
    --policy rap --transition rap --reward rap \
    --search-arg n_iters=10 n_confidence=3

Swap to BFS (different algorithm, same components)

lits-search --include lits_benchmark.math_qa \
    --dataset math500 \
    --cfg search_algorithm=bfs \
    --policy concat --transition concat --reward generative \
    --search-arg roll_out_steps=2 n_actions=3 max_steps=10

Environment-grounded task (BlocksWorld)

lits-search --include lits_benchmark.blocksworld \
    --dataset blocksworld \
    --transition blocksworld \
    --search-arg max_steps=6 n_iters=50

Tool-use task (MapEval-SQL)

lits-search --include lits_benchmark.mapeval \
    --dataset mapeval-sql

No component flags needed — the framework auto-selects tool-use components.

Chain-in-Tree: collapse redundant branches with a BN evaluator

When sampled candidate actions agree, tree search wastes LLM calls exploring identical children. The Branching Necessity (BN) evaluator gates a continuation phase: it chains forward greedily while the policy agrees, and only branches when genuine diversity appears. Selectable via --search-arg bn_method:

# Exact string self-consistency — no extra LLM calls (best for tool-use / env-grounded)
lits-search --include lits_benchmark.math_qa --dataset math500 \
    --search-arg add_continuation=true bn_method=sc_exact reward_gamma=0.5

# LLM-based variants: sc (semantic self-consistency), entropy (clustering), direct (1–4 score)
lits-search --include lits_benchmark.math_qa --dataset math500 \
    --search-arg add_continuation=true bn_method=entropy n_actions_for_bne=3 reward_gamma=0.5

See the BN Evaluator guide for the four methods, task-type compatibility, and the --bn-model flag.

Cross-trajectory memory: learn from prior attempts

LiTS agents can carry knowledge across trajectories of the same task — extracting atomic facts (environmental knowledge: schemas, API responses) or strategy-level reflections from completed trajectories and injecting them into the policy prompt on later attempts. Enabled with --memory-arg:

# Cross-trajectory fact memory (pass@5 ReAct) — facts extracted by Sonnet, shared across attempts
lits-chain --include lits_benchmark.dbbench --dataset dbbench \
    --cfg n_attempts=5 --cfg temperature=0.9 \
    --memory-arg backend=local augmentors=fact skip_similarity_filtering=true batch=true \
    --memory-arg model=bedrock/us.anthropic.claude-sonnet-4-6

# Reflection memory (LATS-style) — reflect on failed trajectories, inject lessons
lits-chain --include lits_benchmark.dbbench --dataset dbbench \
    --cfg n_attempts=5 --cfg temperature=0.9 \
    --memory-arg augmentors=reflection model=bedrock/us.anthropic.claude-sonnet-4-6

Sibling-aware expansion: cross-branch diversity in one tree

Within a single MCTS/BFS tree, sibling-aware expansion injects each sibling's prior actions into the next sibling's prompt, biasing candidates away from redundant (or give-up) actions — a cross-branch memory that costs no extra LLM calls:

lits-search --include lits_benchmark.dbbench --dataset dbbench --dataset-arg database=wikisql \
    --cfg search_algorithm=mcts_sibling_aware \
    --search-arg n_actions=3 n_iters=5

These features are studied in When Does Memory Help Multi-Trajectory Inference for Tool-Use LLM Agents? (see Context Augmentor, Fact Memory, and Reflection).

Quick Start — Python API

Tree search algorithms are class-based, inheriting from BaseTreeSearch:

from lits.agents.tree.mcts import MCTSSearch, MCTSConfig
from lits.lm import get_lm
from lits.components.policy.concat import ConcatPolicy
from lits.components.transition.concat import ConcatTransition
from lits.components.reward.generative import GenerativePRM
from lits.agents.tree.common import extract_answers_from_terminal_nodes
from lits.components.utils import get_fn_retrieve_answer

MODEL_NAME = "bedrock/anthropic.claude-3-5-sonnet-20240620-v1:0"

# Load model
model = get_lm(MODEL_NAME)

# Configure search
config = MCTSConfig(
    max_steps=3,
    n_actions=2,
    n_iters=3,
)

# Create components
policy = ConcatPolicy(base_model=model, n_actions=config.n_actions)
transition = ConcatTransition(base_model=model)
reward = GenerativePRM(base_model=model)

# Create search instance with components
search = MCTSSearch(
    config=config,
    policy=policy,           # generates candidate actions
    world_model=transition,  # executes actions, produces new states
    reward_model=reward,     # evaluates action quality
)

# Run search
result = search.run(query="What is 25 * 17?", query_idx=0)

# Extract answers from terminal nodes
retrieve_answer_fn = get_fn_retrieve_answer(model)
vote_answers, answer_rewards, best_node, trace = extract_answers_from_terminal_nodes(
    terminal_nodes_collected=result.terminal_nodes_collected,
    retrieve_answer=retrieve_answer_fn,
    query="What is 25 * 17?"
)

print(f"Vote answers: {vote_answers}")
print(f"Answer rewards: {answer_rewards}")

Logging

To see search progress logs, add this before running:

import logging
logging.basicConfig(level=logging.INFO, format='%(name)s - %(message)s')

Checkpoints

To save incremental checkpoints per iteration, pass checkpoint_dir:

search = MCTSSearch(
    config=config,
    policy=policy,
    world_model=transition,
    reward_model=reward,
    checkpoint_dir="./my_checkpoints",  # saves tree state per iteration as JSON
)

Inference Cost Tracking

To track token usage per component and search phase, attach an InferenceLogger to the model before running:

from lits.lm import InferenceLogger

inference_logger = InferenceLogger(root_dir="./my_results", override=True)
model.inference_logger = inference_logger

# ... run search ...

# After search, inspect usage breakdowns
print(inference_logger.get_metrics_by_component())  # policy, prm, dynamics
print(inference_logger.get_metrics_by_phase())       # expand, simulate, continuation
print(inference_logger.get_metrics_by_instance())    # per example

ReAct Agent (tool use)

from lits.agents import create_tool_use_agent

agent = create_tool_use_agent(tools=tool_list, max_iter=50)
state = agent.run(query="Find restaurants near Sydney Opera House")

Supported LLM Providers

from lits.lm import get_lm

model = get_lm("bedrock/us.anthropic.claude-3-5-haiku-20241022-v1:0")  # AWS Bedrock
model = get_lm("openai/gpt-4")                                         # OpenAI
model = get_lm("Qwen/Qwen2.5-0.5B-Instruct", device="cuda")           # HuggingFace
model = get_lm("groq/llama-3.1-8b-instant")                            # Groq
model = get_lm("tgi:///meta-llama/Meta-Llama-3-8B")                    # TGI

Architecture

Three core component abstractions compose into agents:

Policy          →  generates candidate actions from states
Transition      →  executes actions, produces new states
RewardModel     →  evaluates action quality (optional)

Search frameworks bundle these with an algorithm:

Framework	Algorithm	Components
ReST-MCTS*	MCTS	ConcatPolicy + ConcatTransition + GenerativePRM
RAP	MCTS	RAPPolicy + RAPTransition + RapPRM
ToT-BFS	BFS	ConcatPolicy + ConcatTransition + GenerativePRM

Extending with Custom Components

from lits.components.registry import register_transition, register_dataset

@register_transition("my_domain")
class MyTransition(Transition):
    def step(self, example, state, action, **kwargs):
        ...
    def is_terminal(self, state, example, **kwargs):
        ...

@register_dataset("my_dataset", task_type="env_grounded")
def load_my_dataset(**kwargs):
    ...

Then use via CLI:

lits-search --include my_package \
    --dataset my_dataset --transition my_domain

Task Types

Task Type	State Space	Examples
`language_grounded`	Text context	Math reasoning (GSM8K, MATH500)
`env_grounded`	Symbolic/physical state	BlocksWorld, Crosswords
`tool_use`	Context + tool state	SQL queries, web search, APIs

Project Structure

lits/                    # Core framework
├── agents/             # MCTS, BFS, ReAct, EnvChain
├── components/         # Policy, Transition, RewardModel
├── lm/                 # Multi-provider LLM interface
├── structures/         # State, Action, Step, Node
├── cli/                # CLI entry points
├── eval/               # Evaluation utilities
└── tools/              # Tool implementations

Documentation

Architecture Overview
Tree Search Guide
CLI Reference
CLI–Registry Protocol — dataset, resource, and evaluator contracts (including stateful tools)
ReAct Agent
Component API
BN Evaluator (Chain-in-Tree continuation)
Context Augmentor — cross-trajectory memory framework
- Fact Memory · Reflection
Learning Loop
Custom Evaluators
Tree Visualization

Citation

@misc{li2026litsmodularframeworkllm,
      title={LiTS: A Modular Framework for LLM Tree Search}, 
      author={Xinzhe Li and Yaguang Tao},
      year={2026},
      eprint={2603.00631},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2603.00631}, 
}

<!-- Chain-in-Tree -->
@inproceedings{li2026chainintree,
  title={Chain-in-Tree: Back to Sequential Reasoning in {LLM} Tree Search},
  author={Li, Xinzhe},
  booktitle={Findings of the Association for Computational Linguistics: ACL 2026},
  year={2026},
  url={https://openreview.net/forum?id=l4YrnqAogl}
}

<!-- Cross-Trajectory Memory -->
@article{li2026does,
  title={When Does Memory Help Multi-Trajectory Inference for Tool-Use LLM Agents?},
  author={Li, Xinzhe and Tao, Yaguang},
  journal={arXiv preprint arXiv:2605.28224},
  year={2026}
}

License

Apache License 2.0

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

xinzheli212

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.4.0

Jun 4, 2026

0.3.2

Feb 28, 2026

0.3.1

Feb 18, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lits_llm-0.4.0.tar.gz (322.1 kB view details)

Uploaded Jun 4, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

lits_llm-0.4.0-py3-none-any.whl (376.5 kB view details)

Uploaded Jun 4, 2026 Python 3

File details

Details for the file lits_llm-0.4.0.tar.gz.

File metadata

Download URL: lits_llm-0.4.0.tar.gz
Upload date: Jun 4, 2026
Size: 322.1 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for lits_llm-0.4.0.tar.gz
Algorithm	Hash digest
SHA256	`c9dc05dac7c81b69d1e94e7cb5ca2a3b903911b37c983d75ebeb04d1c13aa4a6`
MD5	`0f4663ad9103fad24aab93372e2efc32`
BLAKE2b-256	`53ae253ed4001ce50cdff2bf7e9e66cf7c18cac8e667ae4dcdb0947cdc68dea1`

See more details on using hashes here.

Provenance

The following attestation bundles were made for lits_llm-0.4.0.tar.gz:

Publisher: publish.yml on xinzhel/lits-llm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: lits_llm-0.4.0.tar.gz
- Subject digest: c9dc05dac7c81b69d1e94e7cb5ca2a3b903911b37c983d75ebeb04d1c13aa4a6
- Sigstore transparency entry: 1716518226
- Sigstore integration time: Jun 4, 2026
Source repository:
- Permalink: xinzhel/lits-llm@8b76b9ace95dd9ba684278a4a76cb9931541cba1
- Branch / Tag: refs/tags/v0.4.0
- Owner: https://github.com/xinzhel
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@8b76b9ace95dd9ba684278a4a76cb9931541cba1
- Trigger Event: push

File details

Details for the file lits_llm-0.4.0-py3-none-any.whl.

File metadata

Download URL: lits_llm-0.4.0-py3-none-any.whl
Upload date: Jun 4, 2026
Size: 376.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for lits_llm-0.4.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`fad419d823c858ac2039a352bad5212e7a98f7a76da92e91c2c2c768171bfc90`
MD5	`11411b263e870a386efdcc6f67044f28`
BLAKE2b-256	`e729c782a08297c8a378be662701125394e10a0358dabc9d63515ccb6f735b4b`

See more details on using hashes here.

Provenance

The following attestation bundles were made for lits_llm-0.4.0-py3-none-any.whl:

Publisher: publish.yml on xinzhel/lits-llm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: lits_llm-0.4.0-py3-none-any.whl
- Subject digest: fad419d823c858ac2039a352bad5212e7a98f7a76da92e91c2c2c768171bfc90
- Sigstore transparency entry: 1716518304
- Sigstore integration time: Jun 4, 2026
Source repository:
- Permalink: xinzhel/lits-llm@8b76b9ace95dd9ba684278a4a76cb9931541cba1
- Branch / Tag: refs/tags/v0.4.0
- Owner: https://github.com/xinzhel
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@8b76b9ace95dd9ba684278a4a76cb9931541cba1
- Trigger Event: push

lits-llm 0.4.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

LiTS — Language Inference via Tree Search

Table of Contents

Why LiTS?

Installation

Quick Start — 5-Minute Demo

1. Install

2. Configure an LLM provider

3. Run MCTS on a math problem

(Optional) 4. Evaluate

Validate config without LLM calls (no API key needed)

CLI Commands

🎬 2.5-Minute Demo Video

More CLI Examples

MCTS on MATH500

Swap to RAP (different components, same algorithm)

Swap to BFS (different algorithm, same components)

Environment-grounded task (BlocksWorld)

Tool-use task (MapEval-SQL)

Chain-in-Tree: collapse redundant branches with a BN evaluator

Cross-trajectory memory: learn from prior attempts

Sibling-aware expansion: cross-branch diversity in one tree

Quick Start — Python API

Logging

Checkpoints

Inference Cost Tracking

ReAct Agent (tool use)

Supported LLM Providers

Architecture

Extending with Custom Components

Task Types

Project Structure

Documentation

Citation

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance