Chapter 2
Beyond Normal Form
In this section, we will show how to construct \(\Phi\)-regret minimizers for strategy sets \(\mathcal{X}\) that are more complex than the simplex. To do so, we first define the sets of deviations that we work with.
Definition 2.1 ([ZAT+25a[ZAT+25a] B. H. Zhang, I. Anagnostides, E. Tewolde, R. E. Berker, G. Farina, V. Conitzer and T. Sandholm. (2025). Learning and Computation of \(\Phi\)-Equilibria at the Frontier of Tractability. Economics and Computation.]) .
We will assume without loss of generality that \(\Phi^{q}\) contains the identity map.
Example 2.2 .
In general, we will describe algorithms that have complexity polynomial in the dimension \(k\). Note that, in the special case of degree-\(β\) polynomials, these algorithms will therefore run in time polynomial in \(d^{β}\).
In these settings, we run into two fundamental issues when trying directly to generalize the GGM construction. Recall that the GGM construction requires two ingredients. We will describe why both ingredients are problematic.
-
Fixed-point oracle: Finding a fixed point of a continuous function \(\phi : \mathcal{X} \to \mathcal{X}\) is \(\text{PPAD}\)-complete, and therefore is unlikely to be possible efficiently. Even more generally, if \(\phi\) contains discontinuous maps, \(\phi\) may not even admit any approximate fixed points!
-
External regret minimizer for \(\Phi^{q}\): External regret minimization implies the ability to (approximately) perform linear optimization over \(\Phi^{q}\). But linear optimization over \(\Phi^{q}\) can be hard, in two ways:
-
If \(\Phi^{q}\) contains nonlinear functions, then that this optimization problem is \(\text{NP}\)-hard even when \(\mathcal{X}\) is the hypercube \([0, 1]^{d}\) [ZAFS24[ZAFS24] B. H. Zhang, I. Anagnostides, G. Farina and T. Sandholm. (2024). Efficient \(\Phi\)-Regret Minimization with Low-Degree Swap Deviations in Extensive-Form Games. Advances in Neural Information Processing Systems.].
-
Even if \(\Phi^{q}\) is the set of linear endomorphisms, if \(\mathcal{X}\) itself is only available via an oracle (e.g., a separation oracle), then optimizing a linear function over \(\Phi^{q}\) can in general be hard [DFF+25[DFF+25] C. Daskalakis, G. Farina, M. Fishelson, C. Pipis and J. Schneider. (2025). Efficient Learning and Computation of Linear Correlated Equilibrium in General Convex Games. Symposium on Theory of Computing (STOC).].
-
Remark 2.3 .
Remark 2.4 .
We will show how to circumvent both of the above issues. We start with the fixed point computation.
2.1 Expected fixed points
Given that computing a fixed point is generally hard (when \(\phi\) is continuous but nonlinear) or even impossible (when \(\phi\) is discontinuous), we need a different notion of fixed point. Fortunately, by examining the proof of the GordonβGreenwaldβMarks framework, it turns out that a weaker condition suffices.
Definition 2.5 (Expected fixed point [ZAFS24]) .
Given a function \(\phi : \mathcal{X} \to \mathcal{X}\), an \(\epsilon\)-approximate expected fixed point of \(\phi\) is a distribution11Throughout this document, to avoid measure-theoretic issues, we only consider finite-support distributions. \(\mu \in \Delta (\mathcal{X})\) such that
As we will see shortly, expected fixed points fix both problems with fixed point computation: an (approximate) expected fixed point always exists, and moreover, there is an efficient algorithm for computing one!
Theorem 2.6 (Expected fixed point computation) .
The existence of this algorithm proves, as a corollary, that \(\epsilon\)-approximate expected fixed points always exist for every \(\epsilon > 0\). Although we will not need it here, it also turns out to be true that exact expected fixed points also always exist (even when \(\phi\) is discontinuous) [ZAT+25a]. The βeither-orβ nature of the above theorem will become useful shortly.
Proof.
Let \(\boldsymbol{x}^{(1)} \in \mathcal{X}\) be arbitrary, and consider the sequence of points \(\boldsymbol{x}^{(1)}, \dots , \boldsymbol{x}^{(K)}\) where \(\boldsymbol{x}^{(k)} = \phi (\boldsymbol{x}^{(k-1)})\) for each \(k > 0\). Let \(\mu\) be the uniform distribution on \(\{\boldsymbol{x}^{(1)}, \dots , \boldsymbol{x}^{(K)}\}\). Then, by a telescoping sum, we have
where \(\boldsymbol{x}^{(K+1)} \coloneqq \phi (\boldsymbol{x}^{(K)})\) for notational simplicity. But the right-hand side has norm at most \(\text{diam}(\mathcal{X})/K\), so setting \(K = \text{diam}(\mathcal{X})/\epsilon\) completes the proof.
2.2 Semi-separation
Theorem 2.6 also, conveniently, solves the second problem. The key observation, as it turns out, is that it is not actually necessary to be able to efficiently optimize/separate over \(\Phi^{q}\). In fact, it is sufficient to be able to perform the following task, which we call semi-separation.
Definition 2.7 (Semi-separation) .
In the semi-separation problem we are given a function \(\phi : \mathcal{X} \to \mathbb{R}^{d}\) and we have to compute either
- an \(\epsilon\)-expected fixed point \(\mu \in \Delta (\mathcal{X})\) of \(\phi\), or
- a point \(\boldsymbol{x} \in \mathcal{X}\) such that \(\phi (\boldsymbol{x}) \notin \mathcal{X}\).
The proof of Theorem 2.6 also gives a solution to the semi-separation problem: indeed, while constructing the sequence \(\boldsymbol{x}^{(1)}, \dots , \boldsymbol{x}^{(K)}\), if any \(\boldsymbol{x}^{(k)}\) is not in \(\mathcal{X}\), we can immediately output \(\boldsymbol{x}^{(k-1)}\).
It is instructive to contemplate briefly the implications of this result. A point \(\boldsymbol{x} \in \mathcal{X}\) such that \(\phi (\boldsymbol{x}) \notin \mathcal{X}\) gives a separating direction \(\boldsymbol{a}^{β€} \phi (\boldsymbol{x}) \le b\) separating \(\phi (\boldsymbol{x})\) from \(\mathcal{X}\); the same constraint can be used to separate \(\phi\) from \(\Phi^{q}\). Determining whether a given function \(\phi : \mathcal{X} \to \mathbb{R}^{d}\) actually has image in \(\mathcal{X}\) is a hard problem, and indeed a semi-separation algorithm does not achieve this. Semi-separation only guarantees an expected fixed point or a separation certificate; in particular, it is possible that semi-separation algorithms output an expected fixed point even when \(\phi \notin \Phi^{q}\).
2.3 Efficient regret minimization via semi-separation
We now briefly sketch how to use a semi-separation oracle to construct an efficient regret minimizer over any set of functions \(\Phi^{q}\) expressible in the form given by Definition 2.1.
At a high level, the algorithm relies on an online algorithm called Shell gradient descent, developed by [DFF+25]. This is just projected gradient descent, but with the twist that the underlying constraint setβthe βshell setβ \(\tilde{\Phi}\)βis changing from round to round. The key invariance which guarantees correctness is that \(\tilde{\Phi}\) always contains the constraint set of interestβnamely, the set of endomorphisms. Whatβs more, unlike the set of endomorphisms \(\Phi\), \(\tilde{\Phi}\) always admits efficient oracle access.
Now, for a given function \(\phi\), the algorithm of [DFF+25] first attempts to find a fixed point of \(\phi\) in \(\mathcal{X}\) through a semi-separation oracle. If it succeeds, we can safely output such a fixed point even though \(\phi\) may not necessarily be an endomorphism. Otherwise, the semi-separation oracle yields a separating direction, thereby providing sufficient information to shrink the shell set \(\tilde{\Phi}\). Moreover, shell gradient descent hinges on a projection oracle, which again crucially uses the semi-separation oracle developed above. For the details, we refer the interested readers to [DFF+25], as well as [ZAT+25a] for the extension beyond linear swap regret.
Bibliography for this chapter
| [ZAT+25a] | B. H. Zhang, I. Anagnostides, E. Tewolde, R. E. Berker, G. Farina, V. Conitzer and T. Sandholm. (2025). Learning and Computation of Ξ¦-Equilibria at the Frontier of Tractability. Economics and Computation. |
| [ZAFS24] | B. H. Zhang, I. Anagnostides, G. Farina and T. Sandholm. (2024). Efficient Ξ¦-Regret Minimization with Low-Degree Swap Deviations in Extensive-Form Games. Advances in Neural Information Processing Systems. |
| [DFF+25] | C. Daskalakis, G. Farina, M. Fishelson, C. Pipis and J. Schneider. (2025). Efficient Learning and Computation of Linear Correlated Equilibrium in General Convex Games. Symposium on Theory of Computing (STOC). |
| [ZAT+25b] | B. H. Zhang, I. Anagnostides, E. Tewolde, R. E. Berker, G. Farina, V. Conitzer and T. Sandholm. (2025). Expected Variational Inequalities. International Conference on Machine Learning, 74422β74446. |
| [ZFS24] | B. H. Zhang, G. Farina and T. Sandholm. (2024). Mediator Interpretation and Faster Learning Algorithms for Linear Correlated Equilibria in General Sequential Games. International Conference on Learning Representations. |
Notes
1Throughout this document, to avoid measure-theoretic issues, we only consider finite-support distributions.