World Models: From Prediction to Planning, HWM and the Challenge of Long-Horizon Control

516 0 4

Giới thiệu

However, a model’s ability to predict does not equate to its ability to handle long-horizon tasks. When faced with multi-stage control, systems typically encounter two pressures. One is that prediction errors continuously accumulate during long rollouts (sequential multi-step simulation), causing the entire path to increasingly deviate from the target. The other is that the action search space expands rapidly as the horizon (planning distance) grows, leading to continuously rising planning costs. HWM does not rewrite the underlying learning path of world models; instead, it adds a hierarchical planning structure on top of existing action-conditioned world models, allowing the system to first organize stage paths and then handle local actions.

From a technical perspective, V-JEPA 2 (https://ai.meta.com/research/vjepa/) leans more towards world representation and fundamental prediction, HWM leans more towards long-horizon planning, and WAV (World Action Verifier: Self-Improving World Models via Forward-Inverse Asymmetry, https://arxiv.org/abs/2604.01985) leans more towards a model’s ability to identify and correct its own prediction distortions. These three lines are gradually converging. The focus of world model research has shifted from merely predicting the future to how to transform predictive capabilities into executable, correctable, and verifiable system capabilities.

1. Why Long-Horizon Control Remains a Bottleneck for World Models

The difficulty of long-horizon control becomes clearer when placed in robot tasks. Taking robotic arm manipulation as an example, picking up a cup and placing it in a drawer is not a single action but a sequence of continuous steps. The system must approach the object, adjust its posture, complete the grasp, move to the target location, and then handle the drawer and placement. Once the chain becomes long, two problems appear simultaneously. One is that prediction errors continuously accumulate along the rollout, and the other is that the action search space expands rapidly.

World Models: From Prediction to Planning, HWM and the Challenge of Long-Horizon Control

What the system typically lacks is not local prediction ability, but the ability to organize distant goals into stage paths. Many actions, when viewed locally, may seem to deviate from the goal, but are actually intermediate steps required to complete it. For example, raising the arm before grasping, or moving back slightly and adjusting the angle before opening a drawer.

In demonstration tasks, world models are already capable of providing coherent predictions. However, upon entering real control scenarios, performance begins to decline, and problems emerge. The pressure comes not only from the representation itself but also from the planning layer not being mature enough.

2. How HWM Restructures the Planning Process

HWM splits the originally single-layer planning process into two layers. The upper layer is responsible for stage direction on a longer timescale, while the lower layer is responsible for local execution on a shorter timescale. The model plans not at a single rhythm but simultaneously at two different temporal rhythms.

When handling long tasks, single-layer methods typically need to directly search the entire action chain within the underlying action space. The longer the task, the higher the search cost, and the more easily prediction errors diffuse along multi-step rollouts. After HWM splits the process, the high-level layer only handles route selection on a longer timescale, and the low-level layer only handles the completion of actions in the current segment. The entire long task is broken into multiple shorter tasks, reducing planning complexity.

There is also a key design here: high-level actions are not simply the difference between two states; instead, an encoder is used to compress a segment of low-level actions into a higher-level action representation. For long tasks, the key is not just the difference between the start and end points, but also how the intermediate steps are organized. If the high-level layer only looks at displacement differences, it easily loses the path information within that action chain.

HWM embodies a hierarchical task organization approach. Faced with a multi-stage job, the system no longer unfolds all actions at once but first forms a coarser stage path, then executes and corrects segment by segment. Once this hierarchical relationship is incorporated into the world model, predictive capabilities begin to transform more stably into planning capabilities.

3. From 0% to 70%: What the Experimental Results Indicate

In the real-world grasp-and-place tasks set up in the paper, the system only receives the final goal condition, with no manually decomposed intermediate goals provided. Under these conditions, HWM achieved a success rate of 70%, while the single-layer world model had a 0% success rate. A long task that was originally almost impossible to complete became a highly probable outcome after introducing hierarchical planning.

World Models: From Prediction to Planning, HWM and the Challenge of Long-Horizon Control

The paper also tested simulation tasks such as object pushing and maze navigation. The results show that hierarchical planning not only improved success rates but also reduced computational costs during the planning phase. In some environments, planning phase computational costs could be reduced to about a quarter of the original while maintaining higher or comparable success rates.

4. From V-JEPA to HWM to WAV

V-JEPA 2 represents the world representation path. V-JEPA 2 was pre-trained on over 1 million hours of internet video and then combined with less than 62 hours of robot video for post-training (targeted training after pre-training), resulting in a latent action-conditioned world model (a world model that predicts in an abstract representation space incorporating action information) usable for understanding, predicting, and planning in the physical world. It demonstrates that models can acquire world representations through large-scale observation and transfer this representation to robot planning.

HWM is the next step. The model already possesses world representation and basic prediction capabilities, but once it enters multi-stage control, the problems of error accumulation and search space expansion erupt. HWM does not change the underlying representation learning path; instead, it adds a multi-timescale planning structure on top of existing action-conditioned world models. It addresses the problem of how the model organizes a distant goal into a set of intermediate steps and then advances segment by segment.

WAV further shifts the focus to verification capabilities. For world models to enter policy optimization and deployment scenarios, they cannot just predict; they must also be able to identify areas where they are prone to distortion and correct accordingly. It focuses on how the model checks itself.

V-JEPA leans towards world representation, HWM leans towards task planning, and WAV leans towards result verification. Although the three have different focuses, their general direction is consistent. The next phase for world models is no longer just internal prediction, but the gradual integration of prediction, planning, and verification into a cohesive system capability.

World Models: From Prediction to Planning, HWM and the Challenge of Long-Horizon Control

5. From Internal Prediction to Executable Systems

Many past world model efforts were closer to improving the continuity of future state predictions or enhancing the stability of internal world representations. However, the current research focus has begun to change. Systems must now both form judgments about the environment and translate those judgments into actions, continuing to correct the next steps after results emerge. To get closer to real-world deployment, it is necessary to control error propagation, compress search scope, and reduce inference costs in long-horizon tasks.

Such changes will also affect AI agents. Many agent systems can already complete short-chain tasks, such as calling tools, reading files, and executing multi-step instructions. But once tasks become long-chain, multi-stage, and require mid-course replanning, performance declines. This is not fundamentally different from the difficulties in robot control; both stem from insufficient high-level path organization capability, leading to a disconnect between local execution and overall goals.

The hierarchical approach provided by HWM—high-level responsible for paths and stage goals, low-level responsible for local actions and feedback processing, layered with result verification—this type of hierarchical structure will continue to appear in more systems in the future. The next phase for world models will also no longer focus solely on predicting the future, but on organizing prediction, execution, and correction into a path that can be run.

Bài viết này được lấy từ internet: World Models: From Prediction to Planning, HWM and the Challenge of Long-Horizon Control

Related: Bitcoin’s Short-Term Volatility Does Not Alter Mid-Term Bearish Trend, HYPE Presents Another Long Opportunity | Invited Analysis

This week, we will focus on two main themes: First, HYPE’s Wave IV correction is nearing completion in terms of both time and price. The daily chart bottoming pattern is strengthening, and our quantitative model has also triggered a bottom warning. We will focus on capturing long entry opportunities this week. Second, Bitcoin’s medium-term bearish trend remains unchanged. We maintain our range-bound consolidation forecast for this week and will flexibly execute two short-term trading plans (A and B) based on support/resistance levels and model signals. Summary of This Week’s Core Trading Views: • HYPE multi-timeframe structure analysis. (See Part 1) • HYPE market forecast and short-term trading strategy for this week. (See Part 2) • BTC multi-timeframe structure interpretation. (See Part 3) • BTC market forecast and medium/short-term trading strategies…