Vision-language models as state estimators for PDDL-based planning
Project description
S3E: Semantic Symbolic State Estimation
Overview
s3e is a Python package for estimating grounded PDDL state predicates from images using vision-language models (VLMs).
It is designed for workflows that need to connect visual observations to symbolic planning. Given a PDDL domain and problem, s3e enumerates grounded predicates, translates them into model-friendly queries, and returns either boolean state assignments, per-predicate probabilities, or normalized model outputs suitable for inspection and debugging.
The package integrates naturally with Unified Planning / PDDL-based systems and supports both HuggingFace and OpenAI-backed VLMs, as well as custom backends.
For a longer tutorial, see the tutorial notebook.
Features
- Estimate boolean symbolic states or probabilistic predicate values from one or more images.
- Parse PDDL domains and problems from strings or
.pddlfiles. - Automatically ground predicates over the current problem objects.
- Translate predicates with pluggable strategies:
IdentityTranslator,TemplateTranslator,PrewrittenTranslator, andLLMTranslator. - Use HuggingFace VLMs, OpenAI VLMs, or custom implementations via the
VLMBackendinterface. - Support multi-image estimation with either single-pass or per-image averaging.
- Expose normalized
VLMOutputobjects for prompt tuning and backend inspection. - Convert estimated states back into Unified Planning-compatible state objects.
- Cache LLM-generated predicate translations for reuse across runs.
Installation
Prerequisites
- Python
>=3.10 pipgitif installing from source- For larger HuggingFace VLMs, a GPU-capable PyTorch environment is recommended
Install from source
git clone https://github.com/CLAIR-LAB-TECHNION/s3e.git
cd s3e
pip install -e .
You can also install directly from the GitHub repository without cloning:
pip install "git+https://github.com/CLAIR-LAB-TECHNION/s3e.git"
Optional dependencies
Install OpenAI support:
pip install -e '.[openai]'
Install development dependencies:
pip install -e '.[dev]'
Optional acceleration for supported HuggingFace models:
FlashAttention installation is platform- and hardware-dependent. If your chosen model and environment support it, follow the installation guide to set it up.
Quick Start / Usage
The example below uses a small HuggingFace model and template-based predicate translation.
from PIL import Image
from s3e import SemanticStateEstimator, TemplateTranslator
domain_pddl = """
(define (domain blocksworld)
(:requirements :typing)
(:types block)
(:predicates
(on ?x - block ?y - block)
(clear ?x - block)
)
)
"""
problem_pddl = """
(define (problem bw-2)
(:domain blocksworld)
(:objects a b - block)
(:init (on a b) (clear a))
(:goal (on b a))
)
"""
translator = TemplateTranslator(
{
"on": "Is the {0} block on top of the {1} block?",
"clear": "Is the {0} block clear?",
}
)
estimator = SemanticStateEstimator(
domain_pddl,
problem_pddl,
vlm="HuggingFaceTB/SmolVLM-256M-Instruct",
query_translator=translator,
user_prompt_template="Answer yes or no only: {query}",
)
images = [Image.open("scene.png")]
state = estimator(images)
probabilities = estimator.estimate_probabilities(images)
print(state)
print(probabilities)
You can also inspect normalized backend outputs directly:
raw_outputs = estimator.estimate_raw(images)
print(raw_outputs["on(a,b)"])
To convert the boolean state back into a Unified Planning state object:
from s3e.pddl.up_utils import state_dict_to_up_state
up_state = state_dict_to_up_state(estimator.up_problem, state)
For OpenAI-backed models, install the optional dependency and use an OpenAI/-prefixed model ID, for example "OpenAI/gpt-4o".
API Reference / Configuration
Core estimator
SemanticStateEstimator(domain, problem, vlm, ...) is the main entry point.
Key arguments:
domain,problem: PDDL domain and problem, provided either as strings or file paths.vlm: aVLMBackendinstance or a model string. Strings prefixed withOpenAI/select the OpenAI backend; all other strings select the HuggingFace backend.query_translator: translation strategy used to convert grounded predicates into queries.confidence: default threshold used when converting probabilities into booleans.multi_image_strategy: either"single"or"average".probability_method: either"logprobs"or"text_match".true_tokens,false_tokens: optional token groups used for probability extraction.batch_size: number of predicate queries grouped into each backend batch.user_prompt_template: format string for each translated query; must contain{query}.additional_instructions: additional text appended to the system prompt.vlm_kwargs: keyword arguments forwarded whenvlmis provided as a model string.inference_kwargs: per-query inference arguments forwarded to backendquery/query_batchcalls.- For OpenAI models, these are request arguments for
chat.completions.create(for exampletemperature,max_completion_tokens). - For HuggingFace models, these are forwarded to
model(...)in logprobs mode andmodel.generate(...)in generation mode.
- For OpenAI models, these are request arguments for
vlm_kwargs and inference_kwargs are intentionally different:
vlm_kwargsconfigure backend/client construction.- OpenAI backend: forwarded to
openai.OpenAI(...)(for exampleapi_key,base_url,timeout). - HuggingFace backend: forwarded to backend/model construction (for example
device_map,torch_dtype,attn_implementation).
- OpenAI backend: forwarded to
inference_kwargsconfigure runtime inference and are forwarded on every query.
Example:
estimator = SemanticStateEstimator(
domain_pddl,
problem_pddl,
vlm="OpenAI/gpt-4o",
vlm_kwargs={"api_key": "..."},
inference_kwargs={"temperature": 0.2, "max_completion_tokens": 200},
)
For HuggingFace generation mode (probability_method="text_match"), s3e applies a deterministic default (do_sample=False) unless overridden via inference_kwargs. No default generation cap is imposed; set max_new_tokens in inference_kwargs if you want an explicit cap.
Common methods:
estimator(images) -> dict[str, bool]: return a boolean symbolic state.estimate_probabilities(images) -> dict[str, float]: return per-predicate probabilities.estimate_raw(images) -> dict[str, VLMOutput]: return normalized backend outputs.swap_problem(domain, problem): rebuild the estimator for a new planning problem.
Translators
IdentityTranslator: use grounded predicates as-is.TemplateTranslator: format grounded predicates with per-predicate templates.PrewrittenTranslator: provide explicit prompts for each grounded predicate.LLMTranslator: generate natural-language prompts with an LLM and optionally cache them.
Environment variables and optional configuration
OPENAI_API_KEY: required forOpenAIVLMand OpenAI-backedLLMTranslatorusage.cache_dironLLMTranslator: enables on-disk caching of generated predicate translations.
Contributing
Install development dependencies:
pip install -e '.[dev]'
Run the fast test loop:
pytest -m "not slow"
Run the full test suite:
pytest
To contribute:
- Fork the repository and create a feature branch.
- Add or update tests for behavioral changes.
- Run the relevant test commands before submitting.
- Open a pull request with a concise description of the change and its motivation.
License
This project is licensed under the MIT License. See LICENSE for details.
Citation
@inproceedings{azranS3ESemanticSymbolic2025,
title = {{{S3E}}: {{Semantic Symbolic State Estimation With Vision-Language Foundation Models}}},
shorttitle = {{{S3E}}},
booktitle = {{{AAAI}} 2025 {{Workshop LM4Plan}}},
author = {Azran, Guy and Goshen, Yuval and Yuan, Kai and Keren, Sarah},
year = 2025,
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file s3e-0.1.0.tar.gz.
File metadata
- Download URL: s3e-0.1.0.tar.gz
- Upload date:
- Size: 45.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fcf831e1d360171b06fdba36c27545ba4ca7814a6f7a9f88a9ea0b741ca1c9c0
|
|
| MD5 |
006b187998c0fdaeab700e0e5553feb4
|
|
| BLAKE2b-256 |
2db7b168daadf2b859e5c672e1e7a3f6a031b059c82aaf144873cb43d0df1a8f
|
File details
Details for the file s3e-0.1.0-py3-none-any.whl.
File metadata
- Download URL: s3e-0.1.0-py3-none-any.whl
- Upload date:
- Size: 32.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b3674e754b14af23d2d45b7b2c12588d50817a845269f614a7b1174d8fd4971c
|
|
| MD5 |
bce042ce5fb1f6c70d8cfac481c1cbde
|
|
| BLAKE2b-256 |
83aae188fb1c35326abbf93c2d521ed56f820e7ec176e5c5b54fa37b1cb13a29
|