## Introduction

self-learning algorithm

• learns strategy by repeatedly playing against itself
• initialized with uniformly random
• playing every action at every decision point with equal probability
• play the action with maximum regret
• it will converge to optimal strategy that can do no worse than tie against any opponent

## Implementation

• Summing total regret for each action at each decision point

• regret: how much better if just always played this one action at this decision, instead of previous choices?
• Positive regret means that we would have done better if we had taken that action more often
• Negative regret means that we would have done better by not taking that action at all
• 愈多regret，代表此選項要多選
• do actions with probabilities proportional to their positive regret
• after each game, update regret values
• Counter-intuitively, sequence of strategies does not necessarily converge to anything useful

• But it now does so in practice
• in a two-player zero-sum game, if you compute the average strategy over those billions of strategies in the sequence, then that average strategy will converge towards Nash equilibrium of the game

## Nash equilibrium(納許均衡)

• Do no worse than tie against any other strategy
• Plays perfect defence
• Just wins when the opponent makes mistakes
• since attempting to find and exploit an opponent’s mistakes usually makes it possible for an even smarter opponent to exploit your new strategy
• exploitability(利用度)
• maximum expectation that a perfect counter-strategy could win
• exploitability = 0 when Nash equilibrium
• CFR can make average strategy’s exploitability converges towards zero

## Result

• best poker programs started beating the world’s best human players in heads-up limit hold’em in 2008, even though there were still massively exploitable by this worst-case measure
• In January 2015, we’ve essentially weakly solved the game
• a strategy with such a low exploitability (0.000986 big blinds per game)
• have 95% statistical confidence that they were actually winning against everyone

## 參考資料

• (CFR)Regret Minimization in Games with Incomplete Information
• (CFR+)Solving Large Imperfect Information Games Using CFR+
• (CFR)Explanation of CFR by inventor himself
• poker AI news
• poker AI news2
• (Implementation)openCFR