Uncertainty Mitigation via Offline Robust Planning

Robust optimization has a long history as a mathematical framework for finding solutions whose performance does not degrade significantly in the presence of uncertainty. While most robust optimization problems take the form of minimax optimization, the fundamental technical difficulties lie in how the specific minimax optimization should be solved. My research has developed algorithmic solutions for robust planning under several types of uncertainties that arise in stochastic multi-agent systems.

Distributional uncertainty: Stochasticity in many engineering systems can be difficult to characterize exactly by standard textbook distributions. Examples include wind and solar power generation in energy systems, rider demands in transportation systems, job requests submitted to a data center, and consumer demands in supply chains. To reduce the influence of modeling errors in the probability distribution, I have worked on distributionally robust optimization, in which the underlying probability distribution is assumed to belong to an ambiguity set of distributions, typically constructed from data.

System state uncertainty: When sensing is limited, decision-making algorithms often only have noisy observations of the system state. While the case of “simple” noise distributions (e.g., Gaussian) admits well-known solutions, the case where the noise or perturbation is chosen adversarially remains challenging in general, especially for systems comprised of multiple interacting agents in a game-theoretic setting.

Type uncertainty: Another common source of uncertainty that arises in multi-agent systems is due to insufficient knowledge of the utility (typically referred to as the type) of other agents. A standard formulation, known as Bayesian games, models the type uncertainty as a (prior) probability distribution over the set of possible types. Nevertheless, our recent work shows that decision-making with a misspecified prior can be significantly suboptimal. Motivated by the brittleness of the Bayesian approach, we adopt an alternative approach based on robust optimization, which minimizes the worst-case regret against candidate utility functions.

  1. Songyang Han, Sanbao Su, Sihong He, Shuo Han, Haizhao Yang, Shaofeng Zou, and Fei Miao, “What is the Solution for State-Adversarial Multi-Agent Reinforcement Learning?,” Transactions on Machine Learning Research, Feb. 2024. [pdf]
  2. Sihong He, Songyang Han, Sanbao Su, Shuo Han, Shaofeng Zou, and Fei Miao, “Robust Multi-Agent Reinforcement Learning with State Uncertainty,” Transactions on Machine Learning Research, Jun. 2023. [pdf]
  3. Haoxiang Ma, Shuo Han, Charles A. Kamhoua, and Jie Fu, “Optimizing Sensor Allocation Against Attackers With Uncertain Intentions: A Worst-Case Regret Minimization Approach,” IEEE Control Systems Letters, vol. 7, pp. 2863–2868, 2023. [pdf]
  4. Shuo Han, Molei Tao, Ufuk Topcu, Houman Owhadi, and Richard M. Murray, “Convex optimal uncertainty quantification,” SIAM Journal on Optimization, vol. 25, no. 3, pp. 1368–1387, 2015. [pdf]

Uncertainty Mitigation via Online Learning and Adaptation

For large uncertainties, approaches based on offline robust planning tend to yield overly conservative deci- sions. This motivates the study of methods that mitigate uncertainties adaptively by learning the unknown environment during online interactions.

Fast online adaptation: Compared with learning in computer vision and natural language processing, online decision-making under uncertainty typically operates in the “small data” regime, where the amount of data collected from interacting with the environment is significantly more limited. We have studied how to accelerate online learning for adapting to model uncertainty, i.e., uncertainties in the transition dynamics, by leveraging side information or useful structures in the problem. One situation is when an inaccurate model is available, where we have demonstrated the benefits of using an inaccurate model to guide model-free reinforcement learning. Another situation is when a hypothesis set of possible models is provided.

Online learning against learning opponents: Multi-agent environments pose an additional challenge to online learning. Aside from the planning (ego) agent, other self-interested agents may also adjust their strategies over time based on certain learning rules, which must be accounted for in designing online learning algorithms. Moreover, the learning rules used by other agents are typically unknown to the decision-making agent. We view the unknown learning rule as a nonparametric, uncertain dynamical system and use tools from control theory to provide guidelines for designing learning algorithms. In particular, we have studied two representative cases. One is the case of two-player zero-sum games. Another is two-player Stackelberg games, where the (unknown) follower learns much faster than the leader.

  1. Yansong Li and Shuo Han, “Efficient Collaboration with Unknown Agents: Ignoring Similar Agents without Checking Similarity,” in International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2024. (extended abstract) [pdf]
  2. Yansong Li and Shuo Han, “Solving Strongly Convex and Smooth Stackelberg Games Without Modeling the Follower,” in American Control Conference, 2023. [pdf]
  3. Yansong Li and Shuo Han, “Accelerating Model-Free Policy Optimization Using Model-Based Gradient: A Composite Optimization Perspective,” in Proceedings of The 4th Annual Learning for Dynamics and Control Conference, 2022, pp. 304–315. [pdf]
  4. Shuo Han, “Gradient Methods With Dynamic Inexact Oracles,” IEEE Control Systems Letters, vol. 5, no. 4, pp. 1163–1168, Oct. 2021. [pdf]

Uncertainty Mitigation by Creating Disinformation

In a competitive multi-agent setting, the planning agent can counteract her opponent by creating disinformation through deception. The disinformation is used to mislead the opponent and prevent the opponent from using strategies that exploit any information advantage.

Capability deception: In a competitive setting, the planning agent may gain an advantage by capability deception, where the agent chooses to hide her capabilities in order to create a false image of weakness. Doing so can make her opponent (incorrectly) exploit the misperceived weakness and choose a suboptimal strategy against the planning agent’s true capabilities. We have studied the problem of planning with action deception in sequential decision-making, where the planning agent may choose to initially hide certain actions, only to use (and hence reveal) them when it is beneficial.

Reward manipulation: Another way to exploit an unknown opponent is to manipulate the perceived reward of the opponent. In cyber defense, for instance, this can be done by introducing additional computer hosts camouflaged as valid targets in the eyes of potential attackers. When the unmanipulated reward of the opponent is known, we show that an optimal manipulation strategy can be computed by solving a mixed-integer linear program. We further extend the results to settings in which the unmanipulated reward is partially or completely unknown.

  1. Haoxiang Ma, Shuo Han, Ahmed Hemida, Charles A. Kamhoua, and Jie Fu, “Adaptive Incentive Design for Markov Decision Processes with Unknown Rewards,” Aug. 2024.
  2. Shuo Wu, Haoxiang Ma, Jie Fu, and Shuo Han, “Robust Reward Design for Markov Decision Processes,” Jun. 2024. [pdf]
  3. Haoxiang Ma, Chongyang Shi, Shuo Han, Michael Dorothy, and Jie Fu, “Covert Planning aganist Imperfect Observers,” in International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2024. [pdf]
  4. Haoxiang Ma, Shuo Han, Chales Kamhoua, and Jie Fu, “Optimal Resource Allocation for Proactive Defense with Deception in Probabilistic Attack Graphs,” in Conference on Decision and Game Theory for Security (GameSec), 2023. [pdf]
  5. Chongyang Shi, Shuo Han, and Jie Fu, “Quantitative Planning with Action Deception in Concurrent Stochastic Games,” in International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2023. [pdf]

Past Research