❄️
❄️
❄️
❄️
❄️
❄️
❄️
❄️
❄️
❄️
❄️
❄️
❄️
❄️
❄️
❄️
❄️
❄️
❄️
❄️
❄️
❄️
❄️
❄️
❄️

MIKASA-Robo major update — now supports VLA research

We have released a major update of MIKASA-Robo, extending the benchmark to the Vision-Language-Action (VLA) setting. MIKASA-Robo-VLA preserves the original benchmark’s focus on memory-intensive tabletop manipulation, while broadening the task suite, introducing language-conditioned evaluation, and providing standardized data export for modern VLA training pipelines.

📚 Documentation: https://mikasarobo.github.io/ 💻 Code: https://github.com/CognitiveAISystems/MIKASA-Robo

What changed from MIKASA-Robo (RL release)

  • Task set grows from 32 → 90 registered environments covering 10 memory types (vs 4 in the RL release).
  • Every task ships a natural-language LANGUAGE_INSTRUCTION for VLA conditioning.
  • Episodes are grouped into three horizon splits (Short / Medium / Long) so multi-task training and evaluation are tractable.
  • 22,500 PPO / motion-planning oracle trajectories are released on Hugging Face in RLDS and LeRobotDataset v3 formats — no further conversion needed (6+ million transitions).
  • Dense and normalised-dense rewards are calibrated for every task, enabling both offline imitation learning and online RL.
  • The original 32-task RL implementation is available from the mikasa-robo-rl branch and remains under mikasa_robo_suite/rl/ for backwards compatibility.

Pick your path

  • “I want to evaluate my VLA model” → Benchmarking (CLI, JSON output, Python API) and the canonical Evaluation Protocol.
  • “I want to fine-tune a VLA model” → Datasets (RLDS, LeRobotDataset v3) and Observation and Action Space.
  • “I want to explore tasks” → Environments & Tasks (per-task pages with previews, language instructions, horizons, and setup parameters).
  • “I want to know what makes the benchmark important” → Core Concepts (memory taxonomy, episode structure).

Key features

  • 90 memory tasks across 10 memory types, horizons 25–2160 steps, multiple difficulty levels.
  • The public benchmark grows from 32 (RL release) to 90 tasks with language instructions for every task.
  • Three horizon splits (Short / Medium / Long) for structured multi-task evaluation.
  • Trajectory collection via PPO oracles and motion planning.
  • 22,500 trajectories (>6 M timesteps) in RLDS and LeRobotDataset v3 formats on Hugging Face.
  • Physics fixes, dense / normalised-dense rewards, and full GPU-parallelised simulation via ManiSkill.
Views: Loading