High-performance inverse kinematics solver optimized for cross-embodiment VLA/AI applications
Project description
EmbodiK: Cross-Embodiment Inverse Kinematics with Nanobind
EmbodiK is a high-performance inverse kinematics (IK) library for cross-embodiment VLA/AI applications.
- The core is implemented in C++, with Python bindings created using Nanobind.
- EmbodiK delivers robust and high-performance IK behaviors, particularly optimized for humanoid robots and AI/VLA integrations.
- The name "EmbodiK" highlights its focus on supporting various kinematic structures across different embodiment types.
- The library handles diverse constraint types, supporting both single-task and multi-task velocity IK solvers.
- Advanced inverse methods provide singularity-robustness.
- Features include self-collision avoidance and interactive 3D visualization tools.
Author: Andy Park andypark.purdue@gmail.com
Features
- High Performance: C++ core with optimized Eigen linear algebra
- Python Integration: Seamless numpy array support via Nanobind
- Multiple Solvers: Single-step and full multi-task velocity IK
- Singularity Robust: Advanced inverse methods for stable solutions
- Constraint Support: Joint limits and operational space constraints
- Lie-Group Integration: Manifold-aware
integrate()/difference()for floating-base, quaternion, and continuous joints - Joint Index Access: Per-joint config/velocity space indexing (idx_q, nq, idx_v, nv) without importing Pinocchio
- Inverse Dynamics: Gravity compensation, RNEA, mass matrix, and Coriolis via native C++ bindings
- Limit Recovery: Configurable joint limit recovery gain when outside bounds
- Collision Avoidance: Self-collision detection and avoidance
- Visualization: Interactive 3D visualization with Viser
- Robot Models: Built-in support for common robots (Panda, IIWA)
- GPU Acceleration: Batched velocity IK via CusADi for massive parallelism (100-500x speedup)
Installation
Note (v0.4.0+): EmbodiK no longer requires the Python
pinpackage at runtime. All Pinocchio functionality is exposed through native C++ bindings. This resolves numpy dependency conflicts when using EmbodiK alongside other packages.
Option A: Fresh Environment (No existing Pinocchio)
If you don't have Pinocchio/Boost installed locally, installation is straightforward:
python3 -m venv .venv
source .venv/bin/activate
pip install -U pip
# Install build dependencies (pin is needed for build only, not runtime)
pip install pin scikit-build-core nanobind cmake ninja
# Set CMAKE_PREFIX_PATH and install
export CMAKE_PREFIX_PATH=$(python -c "import pinocchio, pathlib; print(pathlib.Path(pinocchio.__file__).resolve().parents[4])")
pip install --no-build-isolation embodik
# Verify (no pin import needed!)
python -c "import embodik; print(embodik.__version__, embodik.RobotModel)"
macOS Apple Silicon note:
- Install Xcode command-line tools first:
xcode-select --install - Install Eigen (required by CMake
find_package(Eigen3)):brew install eigen - Before
pip install, setexport Eigen3_DIR="$(brew --prefix eigen)/share/eigen3/cmake"(the PyPIpinwheel may not ship Eigen’s CMake package). - The same
pin+CMAKE_PREFIX_PATHflow above works onmacosx_arm64.
Option B: Robotics Environment (Existing Pinocchio/ROS)
If you have local Pinocchio/Boost builds (e.g., from source or ROS), you must clear conflicting paths first:
python3 -m venv .venv
source .venv/bin/activate
pip install -U pip
# IMPORTANT: Clear local Pinocchio paths to avoid library conflicts
unset LD_LIBRARY_PATH DYLD_LIBRARY_PATH CMAKE_PREFIX_PATH pinocchio_DIR
# Install build dependencies (pin is needed for build only, not runtime)
pip install pin scikit-build-core nanobind cmake ninja
# Set CMAKE_PREFIX_PATH to the PyPI pin package
export CMAKE_PREFIX_PATH=$(python -c "import pinocchio, pathlib; print(pathlib.Path(pinocchio.__file__).resolve().parents[4])")
# Install embodik
pip install --no-build-isolation embodik
# Verify (no pin import needed!)
python -c "import embodik; print(embodik.__version__, embodik.RobotModel)"
Running Examples
pip install "embodik[examples]"
embodik-examples --copy
cd embodik_examples
python 01_basic_ik_simple.py --robot panda
Troubleshooting
| Error | Cause | Fix |
|---|---|---|
ImportError: libboost_*.so... |
LD_LIBRARY_PATH points to local Pinocchio |
unset LD_LIBRARY_PATH |
ImportError: Library not loaded: @rpath/... |
DYLD_LIBRARY_PATH points to a conflicting local Pinocchio/Boost on macOS |
unset DYLD_LIBRARY_PATH |
Could not find Eigen3 / Eigen3Config.cmake (macOS) |
Eigen not installed or Eigen3_DIR unset |
brew install eigen then export Eigen3_DIR="$(brew --prefix eigen)/share/eigen3/cmake" |
CMake cannot find pinocchio |
Build can't find Pinocchio config | Set CMAKE_PREFIX_PATH (see above) |
Cannot import scikit_build_core |
Missing build deps with --no-build-isolation |
pip install scikit-build-core nanobind cmake ninja |
For Developers
See docs/installation.md for development setup with Pixi.
Optional: For Seer controller teleop (03_teleop_ik.py): pixi install -e teleop then pixi run -e teleop demo-teleop.
See PUBLISHING.md for wheel building and PyPI publishing.
Quick Start
import embodik
import numpy as np
# Load robot model from URDF
robot = embodik.RobotModel("path/to/robot.urdf", floating_base=False)
# Create kinematics solver
solver = embodik.KinematicsSolver(robot)
# Add a frame task for end-effector control
frame_task = solver.add_frame_task("ee_task", "end_effector")
frame_task.priority = 0
frame_task.weight = 1.0
# Set target velocity (6D: 3 linear + 3 angular)
target_velocity = np.array([0.1, 0.0, 0.0, 0.0, 0.0, 0.0])
frame_task.set_target_velocity(target_velocity)
# Solve velocity IK
q = np.zeros(robot.nq)
result = solver.solve_velocity(q, apply_limits=True)
if result.status == embodik.SolverStatus.SUCCESS:
print(f"Joint velocities: {result.joint_velocities}")
API Overview
Native Math Utilities
EmbodiK provides native bindings for rotation and pose math (no Python pin package needed):
import embodik as eik
import numpy as np
# Rotation matrix to axis-angle (replaces pin.log3)
R = np.eye(3)
omega = eik.log3(R) # Returns [0, 0, 0]
# Axis-angle to rotation matrix (replaces pin.exp3)
omega = np.array([0, 0, np.pi/4])
R = eik.exp3(omega)
# Rotation matrix to quaternion (wxyz format)
w, x, y, z = eik.matrix_to_quaternion_wxyz(R)
# Quaternion to rotation matrix
R = eik.quaternion_wxyz_to_matrix(w, x, y, z)
# Create SE3 transform
T = eik.Rt(R=R, t=np.array([1, 0, 0]))
# Collision distance (no pin needed)
robot = eik.RobotModel("robot.urdf")
robot.update_configuration(q)
min_distance = robot.compute_min_collision_distance()
High-Level API (Recommended)
EmbodiK provides a high-level API built on top of Pinocchio for easy robot modeling and IK solving:
import embodik
import numpy as np
# Create robot model
robot = embodik.RobotModel("robot.urdf", floating_base=False)
# Create solver
solver = embodik.KinematicsSolver(robot)
# Add tasks
frame_task = solver.add_frame_task("task1", "end_effector")
posture_task = solver.add_posture_task("posture")
# Configure tasks
frame_task.priority = 0
frame_task.weight = 1.0
posture_task.priority = 1
posture_task.weight = 0.1
# Solve
q = np.zeros(robot.nq)
result = solver.solve_velocity(q, apply_limits=True)
Low-Level API
For advanced users, EmbodiK also provides low-level multi-task velocity IK functions:
import embodik as eik
import numpy as np
# Multiple tasks with constraints
goals = [np.array([0.1, -0.2]), np.array([0.3])]
jacobians = [
np.array([[1.0, 0.0, 0.0], [0.0, 1.0, 0.0]]),
np.array([[0.0, 0.0, 1.0]])
]
# Constraint matrix and limits
C = np.eye(3)
lower = np.array([-1e6, -1e6, -1e6])
upper = np.array([1e6, 1e6, 1e6])
params = {
"epsilon": 1e-6,
"regularization_factor": 1e-1,
}
result = eik.solve_velocity_ik_multi_task_np(
goals, jacobians, C, lower, upper, params
)
Examples
The repository includes several example scripts:
| Script | Description |
|---|---|
01_basic_ik_simple.py |
Basic IK solving with interactive visualization |
02_collision_aware_IK.py |
Collision-aware IK with self-collision avoidance + GPU benchmark panel |
04_gpu_batch_ik.py |
GPU-accelerated batched velocity IK benchmark |
05_gpu_collision_batch.py |
GPU-accelerated batch collision detection |
06_gpu_solver_demo.py |
Comprehensive GPU solver demonstration and benchmark |
07_parallel_trajectory_tracking.py |
100 robots tracking different trajectories in parallel (GPU demo) |
08_com_constraint_example.py |
CoM support-polygon constraint with Viser visualization |
09_dual_arm_ects.py |
Dual-arm ECTS (Orthogonal + ECTS modes), collision avoidance, mode snap |
robot_model_example.py |
Robot model usage and configuration |
visualization_example.py |
Interactive 3D visualization examples |
scripts/benchmark_fi_pesns.py |
FI-PeSNS vs CPU accuracy and performance benchmark |
scripts/benchmark_pph_sns_comparison.py |
FI-PeSNS vs PPH-SNS solver comparison (CPU + GPU) |
scripts/benchmark_pph_sns_batched.py |
Batched GPU benchmark for both solvers |
Running Examples
For pip-installed users:
# Install with example dependencies
pip install embodik[examples]
# Copy examples to a local directory
embodik-examples --copy
# Run examples
cd embodik_examples
python 01_basic_ik_simple.py --robot panda
python 02_collision_aware_IK.py --robot panda
For developers (from repository):
# Install example dependencies
pixi run install
# Run basic IK example
pixi run python examples/01_basic_ik_simple.py
# Run collision-aware IK example
pixi run python examples/02_collision_aware_IK.py --robot panda
# Run GPU examples (requires cuda environment)
pixi run -e cuda demo-gpu # GPU solver benchmark
pixi run -e cuda demo-ik-gpu # Interactive IK with GPU panel
pixi run -e cuda benchmark-gpu # Batch IK benchmark
pixi run -e cuda benchmark-collision # Collision detection benchmark
See the Examples Documentation for detailed guides.
GPU Acceleration
Note: GPU solvers (FI-PeSNS, PPH-SNS) are experimental and require further validation. Use with caution in production systems.
EmbodiK supports GPU-accelerated batched velocity IK solving for massive parallelism, ideal for:
- RL Training: 4096+ parallel environments in Isaac Gym/Orbit
- Motion Planning: Batch trajectory validation
- Dataset Generation: Offline batch processing
Performance
| Batch Size | CPU Sequential | GPU Batched | Speedup | Per-Sample | Constraint Sat |
|---|---|---|---|---|---|
| 100 | 3.3 ms | 1.6 ms | 2x | 16 µs | 100% |
| 1,000 | 29 ms | 3.1 ms | 9x | 3 µs | 100% |
| 10,000 | 300 ms | 15 ms | 20x | 1.5 µs | 100% |
Benchmarks on NVIDIA RTX A2000 8GB. FI-PeSNS solver with k_max=12, 7-DOF robot, 6D task.
Key Results:
- ~670,000 IK solves/second at batch size 10,000
- 100% constraint satisfaction with zero violations
- Speedup scales with batch size due to GPU parallelism
Quick Start (GPU)
from embodik import solve_velocity_batched
# Batch of IK problems (e.g., 1000 parallel environments)
result = solve_velocity_batched(
targets_batch, # List of (task_dim,) arrays
jacobians_batch, # List of (task_dim, n_dof) arrays
constraints_batch, # List of (n_dof, n_dof) arrays
lower_bounds_batch,
upper_bounds_batch,
use_gpu=True,
casadi_path="path/to/fn_velocity_solve.casadi"
)
velocities = result.velocities # (batch_size, n_dof)
Setup
-
Install CUDA environment:
cd embodik pixi install -e cuda pixi run -e cuda install # Install embodik in cuda env pixi run -e cuda check-cuda # Verify PyTorch CUDA
-
Install CusADi (one-time):
pixi run -e cuda install-cusadi # Clones to ~/.local/cusadi and installs pixi run -e cuda check-gpu # Verify all GPU components # Output: CasADi: True, CusADi: True, CUDA: True
-
Export and compile CasADi function:
# Export symbolic function pixi run -e cuda export-casadi # Compile to CUDA kernel mv fn_velocity_solve.casadi ~/.local/cusadi/src/casadi_functions/ cd ~/.local/cusadi python run_codegen.py --fn=fn_velocity_solve
-
Run GPU demos:
pixi run -e cuda demo-gpu # Comprehensive benchmark pixi run -e cuda demo-ik-gpu # Interactive IK with GPU panel pixi run -e cuda benchmark-gpu # Batch IK benchmark pixi run -e cuda benchmark-collision # Collision benchmark
Available GPU Tasks
| Task | Description |
|---|---|
pixi run -e cuda check-cuda |
Verify PyTorch CUDA availability |
pixi run -e cuda check-gpu |
Verify CasADi + CusADi + CUDA |
pixi run -e cuda install-cusadi |
Install CusADi from GitHub |
pixi run -e cuda export-casadi |
Export FI-PeSNS velocity solve function |
pixi run -e cuda export-pph-sns |
Export PPH-SNS velocity solve function |
pixi run -e cuda benchmark-solver-comparison |
Compare FI-PeSNS vs PPH-SNS (CPU + GPU) |
pixi run -e cuda benchmark-solver-batched |
Batched GPU benchmark for both solvers |
pixi run -e cuda demo-gpu |
Run GPU solver demo/benchmark |
pixi run -e cuda demo-ik-gpu |
Interactive IK with GPU benchmark panel |
pixi run -e cuda benchmark-gpu |
Batch IK performance benchmark |
pixi run -e cuda benchmark-gpu-batched |
GPU batched IK benchmark (100/1000/10000) |
pixi run -e cuda benchmark-fi-pesns |
FI-PeSNS vs CPU accuracy benchmark |
pixi run -e cuda benchmark-collision |
Collision detection benchmark |
pixi run -e cuda demo-parallel-tracking |
100 robots tracking trajectories in parallel |
pixi run -e cuda test-gpu |
Run GPU-specific tests |
GPU Solvers: FI-PeSNS and PPH-SNS
EmbodiK provides two GPU-optimized velocity IK solvers, both suitable for CusADi compilation:
| Solver | Description | Best For |
|---|---|---|
| FI-PeSNS | Fixed-Iteration Penalized eSNS | Default choice, proven accuracy |
| PPH-SNS | Parallel Penalized Hierarchical SNS | Alternative with soft top-k violation selection |
Both achieve 100% constraint satisfaction with zero violations. FI-PeSNS is typically ~7% faster at large batch sizes; PPH-SNS offers a different formulation with limited rank-1 projector updates.
Benchmark (10,000 instances, 7-DOF Panda):
| Solver | Time | Throughput |
|---|---|---|
| FI-PeSNS | 14.8 ms | 675,000 solves/sec |
| PPH-SNS | 15.8 ms | 632,000 solves/sec |
# Compare both solvers
pixi run -e cuda benchmark-solver-comparison
pixi run -e cuda benchmark-solver-batched
FI-PeSNS: Fixed-Iteration Penalized eSNS
FI-PeSNS is the primary GPU solver—a variant of eSNS that trades exact constraint saturation for simpler, parallelizable penalty-based enforcement:
Key Features:
- SRINV: Singularity-Robust Inverse for numerical stability
- Analytical Scaling: Computes feasible task scales without iterative saturation
- Penalty Gradient: Nudges solution toward feasibility each iteration
- Fixed Iterations: Predictable compute time, ideal for real-time RL
Algorithm:
for i in range(k_max):
P = I # Reset projector
for each task:
J_pinv = srinv(J @ P)
delta = J_pinv @ (target - J @ dq)
scale = get_feasible_scale(...)
dq += scale * delta
P -= J_pinv @ J @ P
# Penalty nudge toward feasibility
violation = max(0, max(lower - C@dq, C@dq - upper))
dq += eta * mu * C.T @ grad_violation
mu *= gamma # Ramp penalty
Benchmark (7-DOF Panda, 6D task):
| Mode | Batch | Time | Per-Sample | Max Violation | Constraint Sat |
|---|---|---|---|---|---|
| CPU Sequential | 100 | 3.3 ms | 33 µs | 0.0 | 100% |
| CPU Sequential | 1,000 | 29 ms | 29 µs | 0.0 | 100% |
| GPU Batched | 100 | 1.6 ms | 16 µs | 0.0 | 100% |
| GPU Batched | 1,000 | 3.1 ms | 3 µs | 0.0 | 100% |
| GPU Batched | 10,000 | 15 ms | 1.5 µs | 0.0 | 100% |
GPU benchmarks on NVIDIA RTX A2000 8GB with CusADi-compiled CUDA kernels.
PPH-SNS: Parallel Penalized Hierarchical SNS
PPH-SNS is an alternative GPU-native design with:
- Soft top-k violation selection using softmax weights
- Limited rank-1 projector updates (1–2 violators per iteration)
- Aggressive penalty ramping (γ=3.0)
- Fixed-depth unrolling for CusADi compilation
# Export PPH-SNS (writes to ~/.local/cusadi/src/casadi_functions/)
pixi run -e cuda export-pph-sns
# Compile to CUDA kernel
cd ~/.local/cusadi && python run_codegen.py --fn=fn_pph_sns_velocity_solve
from embodik.gpu.casadi_pph_sns import build_pph_sns_single_task
fn = build_pph_sns_single_task(
n_dof=7, task_dim=6, n_constraints=7,
k_max=14, m_max=2, # Outer iterations, max saturations per iteration
)
velocity, scales = fn(target, jacobian.flatten(), C, lower, upper)
Parallel Trajectory Tracking Demo
Visualize GPU parallelization with 100 robot instances simultaneously tracking different trajectories:
# Run the interactive demo (requires viser)
pixi run -e cuda demo-parallel-tracking
# Run benchmark only (no visualization)
pixi run -e cuda demo-parallel-tracking-benchmark
Each robot tracks a unique trajectory (circles, figure-8s, spirals, hearts) while the GPU solver computes all 100 IK solutions in parallel. With GPU acceleration, this achieves ~50,000+ IK solves/second.
Usage:
from embodik.gpu.casadi_fi_pesns import build_fi_pesns_single_task
# Build solver
fn = build_fi_pesns_single_task(
n_dof=7, task_dim=6, n_constraints=7,
k_max=10, # Fixed iterations
mu0=1e-2, # Initial penalty
gamma=2.0, # Penalty growth
eta=0.1, # Gradient step
)
# Solve
velocity, scales = fn(target, jacobian.flatten(), C, lower, upper)
Export for CusADi:
# Export FI-PeSNS for CusADi compilation
pixi run -e cuda python -m embodik.gpu.export_casadi_velocity_solve \
--robot panda --k_max 10 \
--out fn_velocity_solve.casadi
# Compile to CUDA kernel
mv fn_velocity_solve.casadi ~/.local/cusadi/src/casadi_functions/
cd ~/.local/cusadi && python run_codegen.py --fn=fn_velocity_solve
GPU Collision Detection (Experimental)
EmbodiK also supports GPU-accelerated collision detection via NVIDIA Warp:
from embodik.gpu.warp_collision import compute_collision_distances_batched
# Batch collision queries
result = compute_collision_distances_batched(
robot_model,
q_batch, # (batch_size, n_dof) configurations
use_gpu=True
)
distances = result.distances # (batch_size,) minimum distances
See docs/installation.md for detailed GPU setup instructions.
Testing
# Run all tests
pixi run test
# Run tests with verbose output
pixi run test-verbose
# Run tests with coverage
pixi run test-cov
Architecture
embodik/
├── cpp_core/ # C++ core implementation
│ ├── include/embodik/ # Header files
│ └── src/ # Implementation files
├── python_bindings/ # Nanobind C++ bindings
│ └── src/ # Binding code
├── python/embodik/ # Python package
│ ├── utils.py # Utility functions
│ └── visualization.py # Visualization support
├── examples/ # Example scripts
│ ├── 01_basic_ik_simple.py
│ ├── 02_collision_aware_IK.py
│ └── robot_models/ # Robot URDF files
├── docs/ # Documentation (MkDocs)
└── test/ # Test suite
Documentation
Full documentation is available at: https://robodreamer.github.io/embodik/
- Installation Guide - Detailed installation instructions
- Quickstart - Get started in 5 minutes
- GPU Solvers - FI-PeSNS and PPH-SNS GPU-accelerated solvers
- API Reference - Complete API documentation
- Examples - Example code and tutorials
- Development Guide - Contributing and development
Contributing
We welcome contributions! Please see CONTRIBUTING.md for guidelines.
Key principles:
- Follow the existing code style
- Add tests for new functionality
- Ensure numerical accuracy and stability
- Update documentation for API changes
License
This project is licensed under the MIT License - see the LICENSE file for details.
Copyright (c) 2025 Andy Park andypark.purdue@gmail.com
The MIT License is a permissive license that allows for:
- Commercial use
- Modification
- Distribution
- Private use
While providing liability protection for the authors. This makes it ideal for open-source projects that want to encourage widespread adoption and contribution.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file embodik-0.14.3.tar.gz.
File metadata
- Download URL: embodik-0.14.3.tar.gz
- Upload date:
- Size: 773.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a31383b980dd6b63b0a4c6fb40d85593b0b0a3ab6c1e7aaaf8bd12d20a6f1b55
|
|
| MD5 |
0572a10fd90feb1ab6c2f80ac4a29332
|
|
| BLAKE2b-256 |
bd622adac40dbb17ad6e6d561d760d26acd6f031c0f0db08d11eadf585e2bd8c
|