Skip to main content

Resource manager for MolmoSpaces

Project description

MolmoSpaces-resources

Resource manager for MolmoSpaces. Downloads, caches, and symlinks versioned data archives from remote buckets.

  • Versioned, read-only download cache with symlink-based install directories
  • Eager or on-demand archive extraction with parallel downloads
  • Pluggable archive indexing for fast asset lookup
  • Fork-resilient: safe to use across multiprocessing workers

Installation

pip install molmospaces-resources

Setup example

Data is by default served from a gated Hugging Face dataset repo, so we need to pass a valid token:

import os
from molmospaces_resources import HFRemoteStorage, setup_resource_manager

source = HFRemoteStorage("allenai/molmospaces", repo_prefix="mujoco", token=os.environ["HF_TOKEN"])
mgr = setup_resource_manager(
  source,
  symlink_dir=SYMLINK_DIR,
  versions=ASSETS_VERSIONS,
  cache_dir=CACHE_DIR,
)

Legacy. Alternatively, you can use the R2 remote storage for direct bucket access:

from molmospaces_resources import R2RemoteStorage, setup_resource_manager

source = R2RemoteStorage("mujoco-thor-resources")  # known bucket name or full URL
mgr = setup_resource_manager(
  source,
  symlink_dir=SYMLINK_DIR,
  versions=ASSETS_VERSIONS,
  cache_dir=CACHE_DIR,
)

Optionally, run project-specific logic after setup via post_setup:

def my_post_setup(manager):
  ## Install all scenes (while skipping per-file symlinking):
  # manager.install_all_for_data_type("scenes", skip_linking=True)
  
  ## Install all objects (with per-dataset symlinking):
  # manager.install_all_for_data_type("objects")
  pass

mgr = setup_resource_manager(
  source,
  symlink_dir=SYMLINK_DIR,
  versions=ASSETS_VERSIONS,
  cache_dir=CACHE_DIR,
  post_setup=my_post_setup,
  force_post_setup=True,  # run post_setup even if all data sources are configured
)

Logging

By default, Python's logging shows WARNING and above, so the library will be quiet. To see detailed progress messages, either configure the root logger:

import logging
logging.basicConfig(level=logging.DEBUG)

or set the molmospaces_resources logger level explicitly:

import logging
logging.getLogger("molmospaces_resources").setLevel(logging.DEBUG)

FAQ

FAQ 1. What is the difference between cache_dir and symlink_dir?

The resource manager uses two separate directory trees that must not overlap:

  • cache_dir is the versioned download cache. Archives are extracted into a <data_type>/<source>/<version>/ hierarchy, and multiple versions can coexist side by side. Files here are set read-only to prevent accidental modification. This directory can be safely shared across containers or workers.

  • symlink_dir is the user-facing install directory. It presents a flat <data_type>/<source>/ layout with no version in the path — the version is hidden behind symlinks that point into cache_dir. This allows application code to use stable, version-agnostic paths while the underlying data can be upgraded by simply re-pointing the symlinks.

These two directories must resolve to different physical locations and neither can be nested inside the other. The manager validates this at construction time and raises an error if the paths overlap.

FAQ 2. A process hangs waiting for a lock on a shared filesystem (e.g. WekaFS, NFS).

The resource manager uses file-based locks (.lock files) to coordinate concurrent access to the cache and symlink directories. On local filesystems, the OS automatically releases these locks when a process dies. On shared/networked filesystems like WekaFS or NFS, if a container is destroyed without a clean shutdown (e.g. killed by the orchestrator, node crash), the lock may not be released immediately. The filesystem will typically detect the dead client eventually, but this can take a varying amount of time.

If setup appears stuck, you can manually remove the stale lock file (typically in the shared cache directory):

rm /path/to/cache_dir/.lock

If the interrupted process was mid-install, the cache may contain partially extracted archives. The safest recovery is to delete the entire affected <data_type>/<source>/<version>/ directory under cache_dir, remove the corresponding entry from the local manifest files, and re-run setup. Partial cleanup (e.g. removing individual archives) is possible but requires checking the .complete_extract / .complete_links flag files as well.

In shared-cache scenarios with sufficient storage, we recommend installing all data eagerly with a single leader process and letting workers use the pre-populated cache without locks via cache_lock=False.

License

Apache 2.0 — see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

molmospaces_resources-0.0.1b0.tar.gz (26.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

molmospaces_resources-0.0.1b0-py3-none-any.whl (28.8 kB view details)

Uploaded Python 3

File details

Details for the file molmospaces_resources-0.0.1b0.tar.gz.

File metadata

  • Download URL: molmospaces_resources-0.0.1b0.tar.gz
  • Upload date:
  • Size: 26.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for molmospaces_resources-0.0.1b0.tar.gz
Algorithm Hash digest
SHA256 50292cf040d3cb1d2017842be4c527597980ff1ec7fb972016fd83b296705c5d
MD5 cdd11144c827b04d20703f3f6b6f5056
BLAKE2b-256 067e4e095ade837e3e5261de7a569bd443596e299266d84ca10351341341f10a

See more details on using hashes here.

File details

Details for the file molmospaces_resources-0.0.1b0-py3-none-any.whl.

File metadata

File hashes

Hashes for molmospaces_resources-0.0.1b0-py3-none-any.whl
Algorithm Hash digest
SHA256 d92e0bc55c73caa0d4c8b37a3845bd26fe2ee1169feead04ceb0792821ada87e
MD5 f2d13e0d8d611b9977a24c709ec7e19c
BLAKE2b-256 ee8d29e86c5efd2829af7c363a7c336094ac44467d7594206095ab0ccc97183a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page