ROS log processing and dataset conversion

These details have not been verified by PyPI

Project links

Homepage

Project description

Hephaes

Python package for turning raw ROS/MCAP logs into standardized datasets with consistent schemas across runs. The package helps you:

ingest ROS1 .bag and ROS2 .mcap logs
inspect topics, rates, and recording time ranges
synchronize asynchronous sensor streams onto a shared timeline (downsample or interpolate)
convert logs into wide dataset files such as Parquet and TFRecord
standardize dataset schemas with explicit topic-to-field mappings

Current Scope

The library is intentionally focused on the core dataset-prep path.

Input formats: ROS1 .bag, ROS2 .mcap
Input paths must be files, not bag directories
Output formats: one wide Parquet or TFRecord file per input log
Interface: Python library
Python: 3.11+

If you need the same dataset schema across different robots or recording setups, you can map multiple possible source topics to the same target field. The converter will use the first topic that exists in each log.

Installation

Install from pypi:

pip install hephaes

Install from source:

pip install .

For local development and tests:

pip install -r requirements.txt

Or install the dev extra directly:

pip install -e ".[dev]"

Quick Start

1. Profile a log

Use Profiler to inspect timing metadata and topic inventory before deciding how to map the log.

from hephaes import Profiler

profile = Profiler(["data/run_001.mcap"], max_workers=1).profile()[0]

print(profile.ros_version)
print(profile.duration_seconds)
print(profile.start_time_iso, profile.end_time_iso)
print([(topic.name, topic.message_type, topic.rate_hz) for topic in profile.topics])

2. Define a standardized schema

You can auto-generate a mapping from discovered topics:

from hephaes import build_mapping_template

mapping = build_mapping_template(profile.topics)
print(mapping.root)

Or define a stable schema explicitly. This is the main mechanism for dataset schema standardization.

from hephaes import build_mapping_template_from_json

mapping = build_mapping_template_from_json(
    profile.topics,
    {
        "front_camera": ["/camera/front/image_raw", "/sensors/front_cam"],
        "imu": ["/imu/data", "/sensors/imu"],
        "vehicle_twist": ["/cmd_vel", "/vehicle/twist"],
    },
    strict_unknown_topics=False,
)

In the example above, front_camera, imu, and vehicle_twist become the canonical dataset fields. Each field can list fallback source topics, which is useful when topic names vary across robots, fleets, or recording versions.

3. Convert logs into Parquet or TFRecord

Use Converter to write one dataset file per input log. Parquet remains the default.

from hephaes import Converter, ResampleConfig, TFRecordOutputConfig

converter = Converter(
    ["data/run_001.mcap"],
    mapping,
    output_dir="dataset/processed",
    output=TFRecordOutputConfig(),
    resample=ResampleConfig(freq_hz=10.0, method="interpolate"),
    max_workers=1,
)

dataset_paths = converter.convert()
print(dataset_paths[0])

4. Stream the output rows

from hephaes import stream_tfrecord_rows

for row in stream_tfrecord_rows(dataset_paths[0]):
    print(row)
    break

Synchronization Modes

hephaes supports three practical ways to align asynchronous topics:

Mode	Configuration	Behavior
Preserve original timestamps	`resample=None`	Writes rows at the union of observed message timestamps.
Downsample to a fixed rate	`ResampleConfig(freq_hz=10.0, method="downsample")`	Buckets messages on a regular grid and keeps the latest payload seen in each bucket.
Interpolate to a fixed rate	`ResampleConfig(freq_hz=10.0, method="interpolate")`	Builds a regular timestamp grid and linearly interpolates numeric JSON leaves between samples.

Interpolation is intended for numeric sensor payloads. Non-numeric leaves fall back to the earlier sample.

For Parquet output, preserve/downsample modes store raw message bytes as base64-wrapped JSON strings, while interpolate stores normalized JSON payloads derived from deserialized messages. For TFRecord output, all modes deserialize messages and emit flattened typed features.

Output Format

Each input log becomes one dataset file named like:

episode_0001.parquet
episode_0002.parquet
episode_0003.tfrecord

The logical row schema is wide and simple:

timestamp_ns: int64
front_camera: string
imu: string
vehicle_twist: string
...

Notes:

timestamp_ns is always present.
Parquet keeps one nullable column per mapping target.
TFRecord expands each mapping target into flattened typed feature names such as imu__orientation__x.
Parquet stores each mapped field as a JSON string column.
Raw byte payloads are wrapped as base64-encoded JSON objects shaped like {"__bytes__": true, "encoding": "base64", "value": "..."}.
TFRecord stores flattened typed features derived from deserialized messages.
TFRecord uses float_list, int64_list, and bytes_list features, plus companion <field>__present flags for nulls.
Image-like payload bytes are written as raw bytes_list features alongside their metadata fields.

This makes the output easy to stream, inspect, and hand off to downstream ETL, analysis or ML pipelines while preserving source payload fidelity.

Direct Log Access

If you want to read logs directly instead of converting them immediately, use RosReader.

from hephaes import RosReader

with RosReader.open("data/run_001.bag") as reader:
    print(reader.topics)

    for message in reader.read_messages(topics=["/cmd_vel"]):
        print(message.timestamp, message.topic, message.data)
        break

Development

Run the test suite with:

pytest

Build a wheel locally with:

python -m build

License

MIT. See LICENSE.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.3.0

Mar 30, 2026

0.2.2

Mar 25, 2026

This version

0.2.1

Mar 12, 2026

0.2.0

Mar 12, 2026

0.1.1

Mar 12, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hephaes-0.2.1.tar.gz (37.0 kB view details)

Uploaded Mar 12, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

hephaes-0.2.1-py3-none-any.whl (28.2 kB view details)

Uploaded Mar 12, 2026 Python 3

File details

Details for the file hephaes-0.2.1.tar.gz.

File metadata

Download URL: hephaes-0.2.1.tar.gz
Upload date: Mar 12, 2026
Size: 37.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for hephaes-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`6764eac851633d3c4accb71d07a0550ec9c41f1791fb839394def43d1b9d8463`
MD5	`e39766bb773264432654c1ca148fdfa9`
BLAKE2b-256	`fe8c8f0eb37b933016a0e8f0645ed749993eabb7ee9d6f130cbfd6f4f94bf23d`

See more details on using hashes here.

File details

Details for the file hephaes-0.2.1-py3-none-any.whl.

File metadata

Download URL: hephaes-0.2.1-py3-none-any.whl
Upload date: Mar 12, 2026
Size: 28.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for hephaes-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`079d558f48e27a8bc9a90a416e390e27c5dc02e1d796fee3f8784e1ebeedf132`
MD5	`744a6bd4a793ebce9c546f790c6e913e`
BLAKE2b-256	`1b75c7e7a6233f3275830c2195a748d610b009aee3858521a0da26d3e77b7526`

See more details on using hashes here.

hephaes 0.2.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Hephaes

Current Scope

Installation

Quick Start

1. Profile a log

2. Define a standardized schema

3. Convert logs into Parquet or TFRecord

4. Stream the output rows

Synchronization Modes

Output Format

Direct Log Access

Development

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes