Chapter 2

Beyond Normal Form

In this section, we will show how to construct \(\Phi\)-regret minimizers for strategy sets \(\mathcal{X}\) that are more complex than the simplex. To do so, we first define the sets of deviations that we work with.

Definition 2.1 ([ZAT+25a[ZAT+25a] B. H. Zhang, I. Anagnostides, E. Tewolde, R. E. Berker, G. Farina, V. Conitzer and T. Sandholm. (2025). Learning and Computation of \(\Phi\)-Equilibria at the Frontier of Tractability. Economics and Computation.]) .

Given a feature map \(q : \mathcal{X} \to \mathbb{R}^{k}\), the set of deviations \(\Phi^{q}\) is the set of all maps \(\phi_{\mathbf{M}}: \mathcal{X} \to \mathcal{X}\) that can be expressed as the matrix-vector product \(\mathbf{M}(\phi ) q(\boldsymbol{x})\) for some matrix \(\mathbf{M} \in \mathbb{R}^{d \times k}\) and vector \(\boldsymbol{c} \in \mathbb{R}^{d}\).

We will assume without loss of generality that \(\Phi^{q}\) contains the identity map.

Example 2.2 .

If \(q(\boldsymbol{x}) = (1, \boldsymbol{x}) \in \mathbb{R}^{d+1}\), then \(\Phi^{q}\) is the set of all linear maps \(\phi : \mathcal{X} \to \mathcal{X}\). More generally, if \(q(\boldsymbol{x}) = (1, \boldsymbol{x})^{βŠ— β„“} \in \mathbb{R}^{k}\) where \(k = \binom{d}{\le β„“}\) is the set of all monomials in \(\boldsymbol{x}\) up to degree \(β„“\), then \(\Phi^{q}\) is the set of all degree-\(β„“\) polynomials in \(\boldsymbol{x}\).

In general, we will describe algorithms that have complexity polynomial in the dimension \(k\). Note that, in the special case of degree-\(β„“\) polynomials, these algorithms will therefore run in time polynomial in \(d^{β„“}\).

In these settings, we run into two fundamental issues when trying directly to generalize the GGM construction. Recall that the GGM construction requires two ingredients. We will describe why both ingredients are problematic.

  1. Fixed-point oracle: Finding a fixed point of a continuous function \(\phi : \mathcal{X} \to \mathcal{X}\) is \(\text{PPAD}\)-complete, and therefore is unlikely to be possible efficiently. Even more generally, if \(\phi\) contains discontinuous maps, \(\phi\) may not even admit any approximate fixed points!

  2. External regret minimizer for \(\Phi^{q}\): External regret minimization implies the ability to (approximately) perform linear optimization over \(\Phi^{q}\). But linear optimization over \(\Phi^{q}\) can be hard, in two ways:

    1. If \(\Phi^{q}\) contains nonlinear functions, then that this optimization problem is \(\text{NP}\)-hard even when \(\mathcal{X}\) is the hypercube \([0, 1]^{d}\) [ZAFS24[ZAFS24] B. H. Zhang, I. Anagnostides, G. Farina and T. Sandholm. (2024). Efficient \(\Phi\)-Regret Minimization with Low-Degree Swap Deviations in Extensive-Form Games. Advances in Neural Information Processing Systems.].

    2. Even if \(\Phi^{q}\) is the set of linear endomorphisms, if \(\mathcal{X}\) itself is only available via an oracle (e.g., a separation oracle), then optimizing a linear function over \(\Phi^{q}\) can in general be hard [DFF+25[DFF+25] C. Daskalakis, G. Farina, M. Fishelson, C. Pipis and J. Schneider. (2025). Efficient Learning and Computation of Linear Correlated Equilibrium in General Convex Games. Symposium on Theory of Computing (STOC).].

Remark 2.3 .

When \(\mathcal{X}\) is represented by an explicit set of linear constraints, i.e., \(\mathcal{X} = \{\boldsymbol{x} : \mathbf{A} \boldsymbol{x} \le \boldsymbol{b}\}\), the set of linear endomorphisms can also be expressed as an explicit set of linear constraints [ZAT+25b[ZAT+25b] B. H. Zhang, I. Anagnostides, E. Tewolde, R. E. Berker, G. Farina, V. Conitzer and T. Sandholm. (2025). Expected Variational Inequalities. International Conference on Machine Learning, 74422–74446.]. This is true, for instance, for extensive-form games. Thus, in this particular case, linear regret minimization can be accomplished without all the machinery of this section.

Remark 2.4 .

For extensive-form games in particular, the set of linear endomorphisms has a clean game-theoretic interpretation [ZFS24[ZFS24] B. H. Zhang, G. Farina and T. Sandholm. (2024). Mediator Interpretation and Faster Learning Algorithms for Linear Correlated Equilibria in General Sequential Games. International Conference on Learning Representations.]. We will not detail it here, though.

We will show how to circumvent both of the above issues. We start with the fixed point computation.

2.1 Expected fixed points

Given that computing a fixed point is generally hard (when \(\phi\) is continuous but nonlinear) or even impossible (when \(\phi\) is discontinuous), we need a different notion of fixed point. Fortunately, by examining the proof of the Gordon–Greenwald–Marks framework, it turns out that a weaker condition suffices.

Definition 2.5 (Expected fixed point [ZAFS24]) .

Given a function \(\phi : \mathcal{X} \to \mathcal{X}\), an \(\epsilon\)-approximate expected fixed point of \(\phi\) is a distribution11Throughout this document, to avoid measure-theoretic issues, we only consider finite-support distributions. \(\mu \in \Delta (\mathcal{X})\) such that

\[\displaystyle \left\Vert \mathop{\mathbb{E}}\limits_{\boldsymbol{x} \sim \mu} \left[\phi \left(\boldsymbol{x}\right) - \boldsymbol{x}\right]\right\Vert_{2} \le \epsilon .\]

As we will see shortly, expected fixed points fix both problems with fixed point computation: an (approximate) expected fixed point always exists, and moreover, there is an efficient algorithm for computing one!

Theorem 2.6 (Expected fixed point computation) .

There is an algorithm with runtime \(\text{poly}(d, 1/\epsilon )\) that, given query access to \(\phi : \mathcal{X} \to \mathcal{X}\) and oracle access to \(\mathcal{X}\), outputs an \(\epsilon\)-approximate expected fixed point of \(\phi\).

The existence of this algorithm proves, as a corollary, that \(\epsilon\)-approximate expected fixed points always exist for every \(\epsilon > 0\). Although we will not need it here, it also turns out to be true that exact expected fixed points also always exist (even when \(\phi\) is discontinuous) [ZAT+25a]. The β€œeither-or” nature of the above theorem will become useful shortly.

Proof.

Let \(\boldsymbol{x}^{(1)} \in \mathcal{X}\) be arbitrary, and consider the sequence of points \(\boldsymbol{x}^{(1)}, \dots , \boldsymbol{x}^{(K)}\) where \(\boldsymbol{x}^{(k)} = \phi (\boldsymbol{x}^{(k-1)})\) for each \(k > 0\). Let \(\mu\) be the uniform distribution on \(\{\boldsymbol{x}^{(1)}, \dots , \boldsymbol{x}^{(K)}\}\). Then, by a telescoping sum, we have

\[\displaystyle \mathop{\mathbb{E}}\limits_{\boldsymbol{x} \sim \mu} \left[\phi \left(\boldsymbol{x}\right) - \boldsymbol{x}\right] = \frac{1}{K} \sum_{k=1}^{K} \left(\phi \left(\boldsymbol{x}^{\left(k\right)}\right) - \boldsymbol{x}^{\left(k\right)}\right) = \frac{1}{K} \sum_{k=1}^{K} \left(\boldsymbol{x}^{\left(k+1\right)} - \boldsymbol{x}^{\left(k\right)}\right) = \frac{1}{K} \left(\boldsymbol{x}^{\left(K+1\right)} - \boldsymbol{x}^{\left(1\right)}\right)\]

where \(\boldsymbol{x}^{(K+1)} \coloneqq \phi (\boldsymbol{x}^{(K)})\) for notational simplicity. But the right-hand side has norm at most \(\text{diam}(\mathcal{X})/K\), so setting \(K = \text{diam}(\mathcal{X})/\epsilon\) completes the proof.

2.2 Semi-separation

Theorem 2.6 also, conveniently, solves the second problem. The key observation, as it turns out, is that it is not actually necessary to be able to efficiently optimize/separate over \(\Phi^{q}\). In fact, it is sufficient to be able to perform the following task, which we call semi-separation.

Definition 2.7 (Semi-separation) .

In the semi-separation problem we are given a function \(\phi : \mathcal{X} \to \mathbb{R}^{d}\) and we have to compute either

  • an \(\epsilon\)-expected fixed point \(\mu \in \Delta (\mathcal{X})\) of \(\phi\), or
  • a point \(\boldsymbol{x} \in \mathcal{X}\) such that \(\phi (\boldsymbol{x}) \notin \mathcal{X}\).

The proof of Theorem 2.6 also gives a solution to the semi-separation problem: indeed, while constructing the sequence \(\boldsymbol{x}^{(1)}, \dots , \boldsymbol{x}^{(K)}\), if any \(\boldsymbol{x}^{(k)}\) is not in \(\mathcal{X}\), we can immediately output \(\boldsymbol{x}^{(k-1)}\).

It is instructive to contemplate briefly the implications of this result. A point \(\boldsymbol{x} \in \mathcal{X}\) such that \(\phi (\boldsymbol{x}) \notin \mathcal{X}\) gives a separating direction \(\boldsymbol{a}^{⊀} \phi (\boldsymbol{x}) \le b\) separating \(\phi (\boldsymbol{x})\) from \(\mathcal{X}\); the same constraint can be used to separate \(\phi\) from \(\Phi^{q}\). Determining whether a given function \(\phi : \mathcal{X} \to \mathbb{R}^{d}\) actually has image in \(\mathcal{X}\) is a hard problem, and indeed a semi-separation algorithm does not achieve this. Semi-separation only guarantees an expected fixed point or a separation certificate; in particular, it is possible that semi-separation algorithms output an expected fixed point even when \(\phi \notin \Phi^{q}\).

2.3 Efficient regret minimization via semi-separation

We now briefly sketch how to use a semi-separation oracle to construct an efficient regret minimizer over any set of functions \(\Phi^{q}\) expressible in the form given by Definition 2.1.

At a high level, the algorithm relies on an online algorithm called Shell gradient descent, developed by Daskalakis, Farina, Fishelson, Pipis and Schneider [DFF+25]. This is just projected gradient descent, but with the twist that the underlying constraint setβ€”the β€œshell set” \(\tilde{\Phi}\)β€”is changing from round to round. The key invariance which guarantees correctness is that \(\tilde{\Phi}\) always contains the constraint set of interestβ€”namely, the set of endomorphisms. What’s more, unlike the set of endomorphisms \(\Phi\), \(\tilde{\Phi}\) always admits efficient oracle access.

Now, for a given function \(\phi\), the algorithm of Daskalakis, Farina, Fishelson, Pipis and Schneider [DFF+25] first attempts to find a fixed point of \(\phi\) in \(\mathcal{X}\) through a semi-separation oracle. If it succeeds, we can safely output such a fixed point even though \(\phi\) may not necessarily be an endomorphism. Otherwise, the semi-separation oracle yields a separating direction, thereby providing sufficient information to shrink the shell set \(\tilde{\Phi}\). Moreover, shell gradient descent hinges on a projection oracle, which again crucially uses the semi-separation oracle developed above. For the details, we refer the interested readers to Daskalakis, Farina, Fishelson, Pipis and Schneider [DFF+25], as well as Zhang, Anagnostides, Tewolde, Berker, Farina, Conitzer and Sandholm [ZAT+25a] for the extension beyond linear swap regret.

Figure 1. Taken from Daskalakis, Farina, Fishelson, Pipis and Schneider [DFF+25]. The set of linear endomorphisms on \(\mathcal{X}\) is denoted by \(\Phi\). \(\Phi_{\text{FP}}\) is the set of transformations containing a fixed point in \(\mathcal{X}\). Clearly \(\Phi \subseteq \Phi_{\text{FP}}\). The algorithm operates over a changing β€œshell set” \(\tilde{\Phi}\) which, unlike \(\Phi\), admits an efficient description. Given a function \(\phi\), the key idea is that if \(\phi\) does not admit a fixed point in \(\mathcal{X}\), one can shrink \(\tilde{\Phi}\) by a significant margin. Otherwise, the algorithm can safely output a fixed point of \(\phi\).

Bibliography for this chapter

[ZAT+25a] B. H. Zhang, I. Anagnostides, E. Tewolde, R. E. Berker, G. Farina, V. Conitzer and T. Sandholm. (2025). Learning and Computation of Ξ¦-Equilibria at the Frontier of Tractability. Economics and Computation.
[ZAFS24] B. H. Zhang, I. Anagnostides, G. Farina and T. Sandholm. (2024). Efficient Ξ¦-Regret Minimization with Low-Degree Swap Deviations in Extensive-Form Games. Advances in Neural Information Processing Systems.
[DFF+25] C. Daskalakis, G. Farina, M. Fishelson, C. Pipis and J. Schneider. (2025). Efficient Learning and Computation of Linear Correlated Equilibrium in General Convex Games. Symposium on Theory of Computing (STOC).
[ZAT+25b] B. H. Zhang, I. Anagnostides, E. Tewolde, R. E. Berker, G. Farina, V. Conitzer and T. Sandholm. (2025). Expected Variational Inequalities. International Conference on Machine Learning, 74422–74446.
[ZFS24] B. H. Zhang, G. Farina and T. Sandholm. (2024). Mediator Interpretation and Faster Learning Algorithms for Linear Correlated Equilibria in General Sequential Games. International Conference on Learning Representations.

Notes

1Throughout this document, to avoid measure-theoretic issues, we only consider finite-support distributions.