Posted 2026-02-05Updated 2026-03-01Note21 minutes read (About 3181 words)

模型预测控制 (MPC) 与学习方法的融合

概述

模型预测控制 (Model Predictive Control, MPC) 是一种基于优化的控制方法，通过预测未来状态并优化控制序列来实现目标。将 MPC 与现代学习方法（VLA、Diffusion Policy）结合，可以同时获得学习方法的感知能力和 MPC 的物理可行性保证。

MPC 基础

核心思想

MPC 采用滚动优化 (Receding Horizon) 策略：

测量当前状态 $x(t)$
预测未来 $N$ 步的状态轨迹
求解优化问题得到控制序列
只执行第一步控制 $u(t)$
下一时刻重复上述过程

数学形式

离散时间 MPC 优化问题：

$$
\begin{align}
\min_{u_0, \ldots, u_{N-1}} \quad & \sum_{k=0}^{N-1} \left( |x_k - x_{ref}|^2_Q + |u_k|^2_R \right) + |x_N - x_{ref}|^2_P \
\text{subject to} \quad & x_{k+1} = f(x_k, u_k) \quad \text{(动力学约束)} \
& x_k \in \mathcal{X} \quad \text{(状态约束)} \
& u_k \in \mathcal{U} \quad \text{(控制约束)} \
& x_0 = x(t) \quad \text{(初始条件)}
\end{align}
$$

符号说明：

$x_k$：第 $k$ 步的状态
$u_k$：第 $k$ 步的控制输入
$Q, R, P$：权重矩阵
$\mathcal{X}, \mathcal{U}$：可行域

MPC 在人形机器人中的应用

1. 步态规划 MPC

简化模型：线性倒立摆模型 (Linear Inverted Pendulum Model, LIPM)

$$
\ddot{x} = \frac{g}{h}(x - p)
$$

$h$：质心高度
$p$：支撑点位置（落脚点）
$g$：重力加速度

控制变量：

足部落脚点位置
接触力分布

状态变量：

质心位置和速度
机器人姿态

目标：

跟踪期望速度
保持平衡稳定性

预测时域：通常 0.5-2 秒

2. 接触力优化 MPC

Boston Dynamics Atlas 使用的方法：

# 优化问题
minimize    sum(||f_i - f_desired||^2)  # 接触力偏差
subject to:
    # 动力学约束
    m * a_com = sum(f_i) + m * g
    I * alpha = sum(r_i × f_i)

    # 摩擦锥约束
    ||f_i_tangent|| <= mu * f_i_normal

    # 单侧约束
    f_i_normal >= 0

    # ZMP 约束
    ZMP in support_polygon

3. 全身运动 MPC

同时优化质心轨迹和关节运动
考虑动力学耦合
处理多接触场景（双足、手足并用）

MPC 与学习方法的融合架构

架构 1: 分层控制

┌────────────────────────────────┐
│  VLA / Diffusion Policy (高层)  │
│  - 视觉感知                      │
│  - 语言理解                      │
│  - 任务规划                      │
└────────────┬────────────────────┘
             │ 输出: 参考轨迹/目标
             ↓
┌─────────────────────────────────┐
│         MPC (低层)               │
│  - 轨迹跟踪                      │
│  - 约束满足                      │
│  - 稳定性保证                    │
└────────────┬────────────────────┘
             │ 输出: 关节力矩
             ↓
┌─────────────────────────────────┐
│        机器人执行                │
└─────────────────────────────────┘

优势：

高层处理感知和决策
低层保证物理可行性
解耦复杂性

Diffusion Policy → MPC 集成

方案 A: 轨迹级集成

class DiffusionMPCController:
    def __init__(self):
        self.diffusion_policy = DiffusionPolicy()
        self.mpc = MPCController(horizon=20, dt=0.05)

    def control_step(self, observation):
        # Diffusion Policy 生成动作序列
        action_sequence = self.diffusion_policy.predict(
            observation,
            horizon=16,  # 预测未来 16 步
            num_samples=10  # 采样 10 条轨迹
        )

        # 选择最优轨迹（可选）
        best_trajectory = self.select_best(action_sequence)

        # 转换为 MPC 参考轨迹
        x_ref = self.action_to_state(best_trajectory)

        # MPC 跟踪轨迹
        u_opt = self.mpc.solve(
            x_current=self.get_state(),
            x_ref=x_ref,
            constraints={
                'joint_limits': True,
                'friction_cone': True,
                'stability': True
            }
        )

        return u_opt[0]  # 执行第一步

关键点：

Diffusion Policy 提供长期规划
MPC 提供短期精确控制
异步更新：Diffusion 可以较慢运行

方案 B: 目标级集成

# Diffusion Policy 输出高层目标
goal = diffusion_policy.predict_goal(observation)
# 例如: {"grasp_position": [x, y, z],
#        "contact_points": [...],
#        "gait_parameters": {...}}

# MPC 以此为终端约束
mpc.set_terminal_constraint(goal)
u_opt = mpc.solve(x_current)

优势：

更高层次的抽象
MPC 有更大优化自由度
减少 Diffusion Policy 的输出维度

VLA → MPC 集成

典型流程

┌──────────────────────────────┐
│  视觉输入 + 语言指令          │
│  "Pick up the red cup"       │
└──────────┬───────────────────┘
           │
           ↓
┌──────────────────────────────┐
│  VLA 模型 (如 RT-2, OpenVLA) │
│  - 视觉编码器                 │
│  - 语言编码器                 │
│  - 动作解码器                 │
└──────────┬───────────────────┘
           │ 输出
           ↓
┌──────────────────────────────┐
│  - 末端执行器目标位姿         │
│  - 期望接触力方向             │
│  - 步态参数                   │
└──────────┬───────────────────┘
           │
           ↓
┌──────────────────────────────┐
│  MPC 优化器                   │
│  目标: 跟踪 VLA 输出          │
│  约束: 动力学、稳定性、限位   │
└──────────┬───────────────────┘
           │
           ↓
┌──────────────────────────────┐
│  安全的控制指令               │
└──────────────────────────────┘

代码示例

class VLAMPCController:
    def __init__(self):
        self.vla = VLAModel()  # RT-2, OpenVLA, etc.
        self.mpc = MPCController(horizon=20, dt=0.05)
        self.update_freq = 10  # VLA 更新频率 (Hz)

    def control_step(self, obs, language_command):
        # 高层策略更新（较慢）
        if self.frame % (self.control_freq // self.update_freq) == 0:
            # VLA 预测
            vla_output = self.vla.predict(
                image=obs['camera'],
                language=language_command,
                proprioception=obs['joint_states']
            )

            # 解析 VLA 输出
            reference_traj = self.parse_vla_output(vla_output)
            self.mpc.set_reference(reference_traj)

        # 低层优化（快速）
        current_state = obs['robot_state']
        u_opt = self.mpc.solve(
            current_state,
            constraints={
                'friction_cone': True,
                'joint_limits': True,
                'stability': True,
                'collision_avoidance': True
            }
        )

        return u_opt[0]

    def parse_vla_output(self, vla_output):
        """将 VLA 输出转换为 MPC 参考轨迹"""
        # 示例: VLA 输出末端执行器位姿
        ee_pose = vla_output['end_effector_pose']

        # 使用逆运动学生成关节轨迹
        joint_traj = self.inverse_kinematics(ee_pose)

        return joint_traj

技术挑战与解决方案

1. 时间尺度不匹配

问题：

Diffusion Policy 生成：50-200 ms
VLA 推理：100-500 ms
MPC 求解：1-10 ms

解决方案：

异步运行

import threading

class AsyncController:
    def __init__(self):
        self.reference_traj = None
        self.lock = threading.Lock()

        # 启动高层策略线程
        self.policy_thread = threading.Thread(
            target=self.update_policy_loop
        )
        self.policy_thread.start()

    def update_policy_loop(self):
        while True:
            new_traj = self.diffusion_policy.predict(self.obs)
            with self.lock:
                self.reference_traj = new_traj
            time.sleep(0.1)  # 10 Hz 更新

    def control_step(self):
        with self.lock:
            ref = self.reference_traj

        # MPC 使用缓存的参考轨迹
        u = self.mpc.solve(self.state, ref)
        return u

预测更长时域

Diffusion Policy 预测 2-5 秒
MPC 消耗预测轨迹
分摊计算成本

快速采样方法

使用 DDIM (Denoising Diffusion Implicit Models)
减少扩散步数：50 步 → 10 步
牺牲少量质量换取速度

2. 可行性保证

问题：学习模型输出可能违反物理约束

解决方案 A: MPC 作为投影算子

def project_to_feasible(x_ref_infeasible):
    """将不可行参考投影到可行集"""
    x_ref_feasible = solve_qp(
        minimize    ||x - x_ref_infeasible||^2
        subject to  dynamics_constraints(x)
                    stability_constraints(x)
                    joint_limits(x)
    )
    return x_ref_feasible

解决方案 B: 软约束

$$
\min \sum |x_k - x_{ref}|^2_Q + |u_k|^2_R + \lambda \cdot \text{constraint_violation}
$$

允许轻微违反约束
通过权重 $\lambda$ 平衡跟踪和可行性

解决方案 C: 约束感知训练

# 训练时加入物理约束损失
loss = reconstruction_loss + \
       lambda_dynamics * dynamics_violation + \
       lambda_stability * stability_violation

3. 反馈闭环

问题：学习模型需要知道 MPC 的实际执行结果

解决方案：

class ClosedLoopController:
    def control_step(self, obs):
        # 计算跟踪误差
        tracking_error = self.x_ref - self.x_actual

        # 将误差作为观测的一部分
        augmented_obs = {
            'vision': obs['camera'],
            'proprioception': obs['joint_states'],
            'tracking_error': tracking_error,  # 新增
            'mpc_cost': self.mpc.last_cost     # 新增
        }

        # 策略根据反馈调整
        new_ref = self.policy.predict(augmented_obs)

        return new_ref

实际案例

MIT Cheetah 3

架构：

卷积网络 (地形感知)
    ↓
MPC (落脚点规划)
    ↓
WBC (全身控制)

成果：盲走、跑跳、楼梯攀爬

Tesla Optimus (推测)

架构：

神经网络策略 (遥操作数据训练)
    ↓ 输出期望关节位置/速度
全身控制器 (类 MPC)
    ↓ 考虑力矩限制、平衡约束
执行

DeepMind 的工作

论文：Learning Agile and Dynamic Motor Skills for Legged Robots

方法：

强化学习策略输出高层指令
MPC 作为安全层过滤不可行动作
在线适应环境变化

高级话题

1. 可微分 MPC

将 MPC 作为神经网络层：

import torch

class DifferentiableMPC(torch.nn.Module):
    def forward(self, x_current, x_ref):
        # 使用可微分优化求解器
        u_opt = cvxpylayers.solve_qp(
            Q, R, x_current, x_ref,
            dynamics_matrix, constraint_matrix
        )
        return u_opt

优势：

端到端训练
梯度可以反向传播到策略网络
联合优化感知和控制

2. 学习 MPC 参数

class LearnedMPC:
    def __init__(self):
        # 学习代价函数权重
        self.Q_net = nn.Linear(obs_dim, state_dim * state_dim)
        self.R_net = nn.Linear(obs_dim, action_dim * action_dim)

    def forward(self, obs, x_current):
        # 根据观测调整权重
        Q = self.Q_net(obs).reshape(state_dim, state_dim)
        R = self.R_net(obs).reshape(action_dim, action_dim)

        # 使用学习的权重求解 MPC
        u_opt = mpc_solve(x_current, Q, R)
        return u_opt

应用：

任务自适应
环境自适应
个性化控制

3. 隐式 MPC

思想：神经网络直接学习 MPC 的最优解映射

$$
u^* = \pi_\theta(x, x_{ref})
$$

训练：

# 生成训练数据
for _ in range(num_samples):
    x = sample_state()
    x_ref = sample_reference()
    u_opt = mpc_solve(x, x_ref)  # 精确求解

    dataset.append((x, x_ref, u_opt))

# 训练神经网络拟合
model.fit(dataset)

优势：

推理速度快（无需在线优化）
保留 MPC 的结构
可处理高维问题

实现建议

选择集成方式

场景	推荐方案	理由
高动态运动	MPC 主导 + 学习补偿	需要精确动力学控制
复杂感知任务	VLA 主导 + MPC 安全层	感知是瓶颈
灵巧操作	Diffusion Policy + MPC 跟踪	需要多模态动作
实时性要求高	隐式 MPC	避免在线优化
数据充足	端到端学习 + 可微分 MPC	联合优化

调试技巧

分别验证：先确保 MPC 和学习模型各自工作
可视化参考轨迹：检查学习模型输出是否合理
监控约束违反：记录 MPC 约束满足情况
渐进式集成：从简单场景开始，逐步增加复杂度

开源实现

MPC 库

MuJoCo MPC: Google DeepMind 的 MPC 实现
MPPI: Model Predictive Path Integral
acados: 快速非线性 MPC 求解器

学习框架

Diffusion Policy: 官方实现
OpenVLA: 开源 VLA 模型
LeRobot: Hugging Face 机器人学习库

集成示例

# 安装依赖
pip install mujoco mujoco-mpc
pip install diffusers transformers
pip install cvxpy osqp

# 示例代码
git clone https://github.com/google-deepmind/mujoco_mpc
cd mujoco_mpc/python
python examples/humanoid_walk.py

未来方向

世界模型 + MPC：学习环境动力学，用于 MPC 预测
多模态 MPC：处理接触模式切换的不确定性
分布式 MPC：多机器人协同控制
神经符号融合：结合符号推理和神经网络
终身学习：持续改进 MPC 参数和模型

参考资源

论文

“Learning Agile and Dynamic Motor Skills for Legged Robots” (DeepMind, 2019)
“Diffusion Policy: Visuomotor Policy Learning via Action Diffusion” (Columbia, 2023)
“RT-2: Vision-Language-Action Models” (Google, 2023)