Dynamic Execution Horizon Prediction for Chunk-based Robot Policies

1University of Toronto, 2Vector Institute, 3Acceleration Consortium, 4Georgia Institute of Technology, 5NVIDIA
* Equal contribution

DEHP learns when to commit to a chunk and when to replan, improving frozen chunk-based robot policies without changing their action generator.

Abstract

Action chunking has become a standard design in modern robot policies, where the policy predicts a sequence of actions and executes a fixed number of them before replanning. This fixed execution horizon creates a trade-off: long horizons improve smoothness and efficiency, while short horizons improve reactivity during fine-grained manipulation.

Dynamic Execution Horizon Prediction (DEHP) trains a lightweight execution-horizon head with online reinforcement learning while keeping the pretrained chunk policy completely frozen. The horizon head conditions on the current observation and the full predicted action chunk, then chooses how many actions to execute before the next policy query.

Across high-precision and long-horizon manipulation tasks, DEHP improves success over tuned fixed-horizon baselines. Qualitative analysis shows that it predicts shorter horizons during grasp alignment and insertion, and longer horizons during free-space motion.

Method

DEHP leaves the base action-chunking policy fixed. At each decision point, the base policy predicts a length-H action chunk, and DEHP samples a categorical execution horizon h in {1, ..., H}. The robot executes only the first h actions and discards the rest before replanning.

Inference comparison between fixed execution horizon and dynamic execution horizon.

Frozen Base Policy

The action generator is treated as a black box, making the method compatible with pretrained chunk policies such as Diffusion Policy.

Chunk-Level PPO

The horizon head is optimized from sparse task return using a semi-Markov chunk-level PPO formulation.

Adaptive Replanning

Long horizons are used when motion is stable; short horizons are used when contact and alignment require frequent feedback.

Results

In multi-stage peg insertion, DEHP improves overall success by 23.03% on average over fixed-horizon behavior cloning across action-noise levels, with the largest gains on the tightest insertion stage. The same adaptive horizon idea also improves performance as the amount of demonstration data increases.

Multi-stage insertion success rates under action noise and demonstration data scale.

We also show that dynamic execution horizons transfer beyond the multi-stage insertion setting. With the same frozen base policy, DEHP improves over the best fixed-horizon baselines on long-horizon FurnitureBench assembly tasks and on the high-precision bimanual needle-syringe task, indicating that learned replanning frequency is useful across both broad task progress and contact-sensitive manipulation.

Learning curves across one-leg, round-table, and needle-syringe tasks.

Rollouts

These rollouts illustrate DEHP across fine-grained insertion and long-horizon assembly tasks. The learned execution horizon allows the robot to move efficiently through stable phases while replanning more frequently around contact-rich alignment and insertion.

Bimanual Needle-Syringe

Multi-Stage Insertion

One-Leg Assembly

Citation

@misc{zhao2026dynamicexecutionhorizonprediction,
      title={Dynamic Execution Horizon Prediction for Chunk-based Robot Policies}, 
      author={Yuchi Zhao and Miroslav Bogdanovic and Arjun Sohal and Liyu Tao and Kourosh Darvish and Alán Aspuru-Guzik and Florian Shkurti and Animesh Garg},
      year={2026},
      eprint={2606.11408},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2606.11408}, 
}