Chapter 5
TreeSwap: Efficient Swap Regret Minimization
So far, we have seen that no-regret learning, and efficient equilibrium computation, are essentially possible whenever the set of deviations \(\Phi\) is “efficiently representable”, in particular, whenever external regret minimization is possible over \(\Phi\). What about beyond? In particular, we have not yet addressed the strongest possible notion of \(\Phi\)-regret, namely, the case where \(\Phi\) contains all functions \(\phi : \mathcal{X} \to \mathcal{X}\)—known as swap regret.
The fact that algorithms for \(\Phi\)-regret minimization grow increasingly complex as \(\Phi\) itself grows in complexity may lead one to believe that efficient swap regret minimization is hopeless. For example, the set of all functions \(\phi : \mathcal{X} \to \mathcal{X}\) has infinite dimension, so the techniques we have covered so far do not work. If \(\mathcal{X}\) is a polytope, we could run the Blum-Mansour algorithm over its vertices—this would indeed succeed, but it has the drawback that the number of vertices of \(\mathcal{X}\) is generally not polynomial in the dimension \(d\) of \(\mathcal{X}\), and hence the algorithm is not efficient. Hence, for a long time, the possibility of swap regret minimization—and correspondingly, correlated equilibrium computation beyond normal-form games—remained a major open question.
The question has since essentially been fully resolved. In a simultaneous major breakthrough, [PR24[PR24] B. Peng and A. Rubinstein. (2024). Fast Swap Regret Minimization and Applications to Approximate Correlated Equilibria. Symposium on Theory of Computing (STOC).] and [DDFS24[DDFS24] Y. Dagan, C. Daskalakis, M. Fishelson and N. Golowich. (2024). From External to Swap Regret 2.0: An Efficient Reduction for Large Action Spaces. Symposium on Theory of Computing (STOC).], using essentially an identical algorithm now known as TreeSwap, showed that, if there is a regret minimizer on \(\mathcal{X}\) that achieves external regret \(\epsilon\) after \(M\) rounds, then there is a regret minimizer on \(\mathcal{X}\) that achieves \(\epsilon T\) swap regret after \(M^{1/\epsilon}\) rounds, and therefore efficient swap regret minimization ks in fact possible, at least when \(\epsilon\) is a constant. They also showed nearly-matching lower bounds for normal-form games, which [DFG+24[DFG+24] C. Daskalakis, G. Farina, N. Golowich, T. Sandholm and B. H. Zhang. (2024). A Lower Bound on Swap Regret in Extensive-Form Games. arXiv preprint arXiv:2406.13116.] later extended to extensive-form games as well, thereby precluding the possibility of \(\text{poly}(d, 1/\epsilon )\)-time algorithms for swap regret beyond normal-form games. In this chapter, we will go over the upper bound.
5.1 The TreeSwap Algorithm
We now describe the TreeSwap algorithm, following [PR24] and [DDFS24]. Let \(\mathcal{R}\) be a no-regret algorithm on \(\mathcal{X}\) whose regret is bounded by \(\epsilon T\) after \(M\) rounds. Of course, \(\mathcal{R}\) will not necessarily have small swap regret. So, how can we hope to bound swap regret?
One thing to try might be the following. Instead of having \(\mathcal{R}\) update its strategy on every iteration, it will be lazy, and only update its strategy every \(M\) iterations, using the average utility on those \(M\) iterations. That is, at time \(M\), it receives utility \(\frac{1}{M} \sum_{t=1}^{M} \boldsymbol{u}^{(t)}\), at time \(2M\) it receives utility \(\frac{1}{M} \sum_{t=M+1}^{2M} \boldsymbol{u}^{(t)}\), and so on. Then, between those \(M\) iterations, that is, while \(\mathcal{R}\) is playing the same strategy, we initialize another copy of the same regret minimizer, which we will call \(\mathcal{R}'\), which updates its strategy on every iteration.
The purpose of \(\mathcal{R}'\) is to bound the value of the most profitable deviation from every single strategy that \(\mathcal{R}\) plays. Therefore, \(\mathcal{R}'\) can ensure that no strategy played by \(\mathcal{R}\) incurs large swap regret. Of course, \(\mathcal{R}'\) itself will have high swap regret… but this can be solved by introducing yet more layers!
The TreeSwap algorithm uses \(K\) layers of regret minimizers \(\mathcal{R}_{0}, \dots , \mathcal{R}_{K-1}\), each responsible for ensuring that there is no profitable swap deviation for the regret minimizers in the next layer. \(\mathcal{R}_{0}\) updates on every iteration. Each regret minimizer \(\mathcal{R}_{k}\) updates \(M\) times slower than the previous regret minimizer \(\mathcal{R}_{k-1}\), and resets every \(M\) updates. Hence, this algorithm runs for \(M^{K}\) steps. The strategy output by TreeSwap is the uniform distribution (not to be confused with the average) over the strategies \((\boldsymbol{x}_{0}^{(t)}, \dots , \boldsymbol{x}_{K-1}^{(t)})\) output by the \(K\) regret minimizers. For simplicity of notation, we will zero-index timesteps and use the notation \([n] = \{0, \dots , n-1\}\).
TreeSwap
else if \(M^{k} | t\) then pass to \(\mathcal{R}_{k}\) the average of the last \(M^{k}\) utilities,
Theorem 5.1 .
TreeSwap is bounded by \(T \cdot (\epsilon + 1/K)\). In particular, taking \(K = 1/\epsilon\) and \(M = \text{poly}(d, 1/\epsilon )\) (as is achieved by any reasonable external-regret-minimizing algorithm), TreeSwap achieves swap regret \(\epsilon T\) after \(M^{K} = d^{\tilde{O}(1/\epsilon )}\) iterations.Remark 5.2 .
Remark 5.3 .
We now sketch a proof of Theorem 5.1.
Proof Sketch.
In this proof, timesteps are zero-indexed, and \([n] \coloneqq \{0, \dots , n-1\}\). Let \(\phi : \mathcal{X} \to \mathcal{X}\) be any function. Then the (time-averaged) regret against \(\phi\) is given by
where, in the second-to-last line, the bound on the first term comes from the regret bound of each phase between resets (of length \(M\) updates), noting that \(\phi (\boldsymbol{x}_{k}^{(t)})\) is constant within any given phase and thus the regret against \(\phi (\boldsymbol{x}_{k}^{(t)})\)̧ is bounded by the external regret during the phase.
Bibliography for this chapter
| [PR24] | B. Peng and A. Rubinstein. (2024). Fast Swap Regret Minimization and Applications to Approximate Correlated Equilibria. Symposium on Theory of Computing (STOC). |
| [DDFS24] | Y. Dagan, C. Daskalakis, M. Fishelson and N. Golowich. (2024). From External to Swap Regret 2.0: An Efficient Reduction for Large Action Spaces. Symposium on Theory of Computing (STOC). |
| [DFG+24] | C. Daskalakis, G. Farina, N. Golowich, T. Sandholm and B. H. Zhang. (2024). A Lower Bound on Swap Regret in Extensive-Form Games. arXiv preprint arXiv:2406.13116. |