Vision-language models as state estimators for PDDL-based planning

These details have not been verified by PyPI

Project links

Repository

Project description

S3E: Semantic Symbolic State Estimation

Overview

s3e is a Python package for estimating grounded PDDL state predicates from images using vision-language models (VLMs).

It is designed for workflows that need to connect visual observations to symbolic planning. Given a PDDL domain and problem, s3e enumerates grounded predicates, translates them into model-friendly queries, and returns either boolean state assignments, per-predicate probabilities, or normalized model outputs suitable for inspection and debugging.

The package integrates naturally with Unified Planning / PDDL-based systems and supports both HuggingFace and OpenAI-backed VLMs, as well as custom backends.

For a longer tutorial, see the tutorial notebook.

Features

Estimate boolean symbolic states or probabilistic predicate values from one or more images.
Parse PDDL domains and problems from strings or .pddl files.
Automatically ground predicates over the current problem objects.
Translate predicates with pluggable strategies: IdentityTranslator, TemplateTranslator, PrewrittenTranslator, and LLMTranslator.
Use HuggingFace VLMs, OpenAI VLMs, or custom implementations via the VLMBackend interface.
Support multi-image estimation with either single-pass or per-image averaging.
Expose normalized VLMOutput objects for prompt tuning and backend inspection.
Convert estimated states back into Unified Planning-compatible state objects.
Cache LLM-generated predicate translations for reuse across runs.

Installation

Prerequisites

Python >=3.10
pip
git if installing from source
For larger HuggingFace VLMs, a GPU-capable PyTorch environment is recommended

Install from source

git clone https://github.com/CLAIR-LAB-TECHNION/s3e.git
cd s3e
pip install -e .

You can also install directly from the GitHub repository without cloning:

pip install "git+https://github.com/CLAIR-LAB-TECHNION/s3e.git"

Optional dependencies

Install OpenAI support:

pip install -e '.[openai]'

Install development dependencies:

pip install -e '.[dev]'

Optional acceleration for supported HuggingFace models:

FlashAttention installation is platform- and hardware-dependent. If your chosen model and environment support it, follow the installation guide to set it up.

Quick Start / Usage

The example below uses a small HuggingFace model and template-based predicate translation.

from PIL import Image

from s3e import SemanticStateEstimator, TemplateTranslator

domain_pddl = """
(define (domain blocksworld)
  (:requirements :typing)
  (:types block)
  (:predicates
    (on ?x - block ?y - block)
    (clear ?x - block)
  )
)
"""

problem_pddl = """
(define (problem bw-2)
  (:domain blocksworld)
  (:objects a b - block)
  (:init (on a b) (clear a))
  (:goal (on b a))
)
"""

translator = TemplateTranslator(
    {
        "on": "Is the {0} block on top of the {1} block?",
        "clear": "Is the {0} block clear?",
    }
)

estimator = SemanticStateEstimator(
    domain_pddl,
    problem_pddl,
    vlm="HuggingFaceTB/SmolVLM-256M-Instruct",
    query_translator=translator,
    user_prompt_template="Answer yes or no only: {query}",
)

images = [Image.open("scene.png")]

state = estimator(images)
probabilities = estimator.estimate_probabilities(images)

print(state)
print(probabilities)

You can also inspect normalized backend outputs directly:

raw_outputs = estimator.estimate_raw(images)
print(raw_outputs["on(a,b)"])

To convert the boolean state back into a Unified Planning state object:

from s3e.pddl.up_utils import state_dict_to_up_state

up_state = state_dict_to_up_state(estimator.up_problem, state)

For OpenAI-backed models, install the optional dependency and use an OpenAI/-prefixed model ID, for example "OpenAI/gpt-4o".

API Reference / Configuration

Core estimator

SemanticStateEstimator(domain, problem, vlm, ...) is the main entry point.

Key arguments:

domain, problem: PDDL domain and problem, provided either as strings or file paths.
vlm: a VLMBackend instance or a model string. Strings prefixed with OpenAI/ select the OpenAI backend; all other strings select the HuggingFace backend.
query_translator: translation strategy used to convert grounded predicates into queries.
confidence: default threshold used when converting probabilities into booleans.
multi_image_strategy: either "single" or "average".
probability_method: either "logprobs" or "text_match".
true_tokens, false_tokens: optional token groups used for probability extraction.
batch_size: number of predicate queries grouped into each backend batch.
user_prompt_template: format string for each translated query; must contain {query}.
additional_instructions: additional text appended to the system prompt.
vlm_kwargs: keyword arguments forwarded when vlm is provided as a model string.
inference_kwargs: per-query inference arguments forwarded to backend query/query_batch calls.
- For OpenAI models, these are request arguments for chat.completions.create (for example temperature, max_completion_tokens).
- For HuggingFace models, these are forwarded to model(...) in logprobs mode and model.generate(...) in generation mode.

vlm_kwargs and inference_kwargs are intentionally different:

vlm_kwargs configure backend/client construction.
- OpenAI backend: forwarded to openai.OpenAI(...) (for example api_key, base_url, timeout).
- HuggingFace backend: forwarded to backend/model construction (for example device_map, torch_dtype, attn_implementation).
inference_kwargs configure runtime inference and are forwarded on every query.

Example:

estimator = SemanticStateEstimator(
    domain_pddl,
    problem_pddl,
    vlm="OpenAI/gpt-4o",
    vlm_kwargs={"api_key": "..."},
    inference_kwargs={"temperature": 0.2, "max_completion_tokens": 200},
)

For HuggingFace generation mode (probability_method="text_match"), s3e applies a deterministic default (do_sample=False) unless overridden via inference_kwargs. No default generation cap is imposed; set max_new_tokens in inference_kwargs if you want an explicit cap.

Common methods:

estimator(images) -> dict[str, bool]: return a boolean symbolic state.
estimate_probabilities(images) -> dict[str, float]: return per-predicate probabilities.
estimate_raw(images) -> dict[str, VLMOutput]: return normalized backend outputs.
swap_problem(domain, problem): rebuild the estimator for a new planning problem.

Translators

IdentityTranslator: use grounded predicates as-is.
TemplateTranslator: format grounded predicates with per-predicate templates.
PrewrittenTranslator: provide explicit prompts for each grounded predicate.
LLMTranslator: generate natural-language prompts with an LLM and optionally cache them.

Environment variables and optional configuration

OPENAI_API_KEY: required for OpenAIVLM and OpenAI-backed LLMTranslator usage.
cache_dir on LLMTranslator: enables on-disk caching of generated predicate translations.

Contributing

Install development dependencies:

pip install -e '.[dev]'

Run the fast test loop:

pytest -m "not slow"

Run the full test suite:

pytest

To contribute:

Fork the repository and create a feature branch.
Add or update tests for behavioral changes.
Run the relevant test commands before submitting.
Open a pull request with a concise description of the change and its motivation.

License

This project is licensed under the MIT License. See LICENSE for details.

Citation

@inproceedings{azranS3ESemanticSymbolic2025,
  title = {{{S3E}}: {{Semantic Symbolic State Estimation With Vision-Language Foundation Models}}},
  shorttitle = {{{S3E}}},
  booktitle = {{{AAAI}} 2025 {{Workshop LM4Plan}}},
  author = {Azran, Guy and Goshen, Yuval and Yuan, Kai and Keren, Sarah},
  year = 2025,
}

Project details

These details have not been verified by PyPI

Project links

Repository

Release history Release notifications | RSS feed

0.2.0

May 12, 2026

This version

0.1.0

May 7, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

s3e-0.1.0.tar.gz (45.3 kB view details)

Uploaded May 7, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

s3e-0.1.0-py3-none-any.whl (32.8 kB view details)

Uploaded May 7, 2026 Python 3

File details

Details for the file s3e-0.1.0.tar.gz.

File metadata

Download URL: s3e-0.1.0.tar.gz
Upload date: May 7, 2026
Size: 45.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for s3e-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`fcf831e1d360171b06fdba36c27545ba4ca7814a6f7a9f88a9ea0b741ca1c9c0`
MD5	`006b187998c0fdaeab700e0e5553feb4`
BLAKE2b-256	`2db7b168daadf2b859e5c672e1e7a3f6a031b059c82aaf144873cb43d0df1a8f`

See more details on using hashes here.

File details

Details for the file s3e-0.1.0-py3-none-any.whl.

File metadata

Download URL: s3e-0.1.0-py3-none-any.whl
Upload date: May 7, 2026
Size: 32.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for s3e-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b3674e754b14af23d2d45b7b2c12588d50817a845269f614a7b1174d8fd4971c`
MD5	`bce042ce5fb1f6c70d8cfac481c1cbde`
BLAKE2b-256	`83aae188fb1c35326abbf93c2d521ed56f820e7ec176e5c5b54fa37b1cb13a29`

See more details on using hashes here.

s3e 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

S3E: Semantic Symbolic State Estimation

Overview

Features

Installation

Prerequisites

Install from source

Optional dependencies

Quick Start / Usage

API Reference / Configuration

Core estimator

Translators

Environment variables and optional configuration

Contributing

License

Citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes