## Introduction

self-learning algorithm

- learns strategy by repeatedly playing against itself
- initialized with uniformly random
- playing every action at every decision point with
**equal**probability

- playing every action at every decision point with
- play the action with maximum regret
- it will converge to optimal strategy that can do no worse than tie against any opponent

## Implementation

Summing total regret for each action at each decision point

- regret: how much better if just
**always**played this one action at this decision, instead of previous choices?- Positive regret means that we would have done better if we had taken that action more often
- Negative regret means that we would have done better by not taking that action at all
- 愈多regret，代表此選項要多選

- do actions with probabilities proportional to their positive regret
- after each game, update regret values

- regret: how much better if just
Counter-intuitively, sequence of strategies

**does not necessarily converge to anything useful**- But it now does so in practice
**in a two-player zero-sum game**, if you compute the average strategy over those billions of strategies in the sequence, then that average strategy will converge towards**Nash equilibrium**of the game

## Nash equilibrium(納許均衡)

- Do no worse than tie against any other strategy
- Plays perfect defence
- Just wins when the opponent makes mistakes
- since attempting to find and exploit an opponent’s mistakes usually makes it possible for an even smarter opponent to exploit your new strategy

- Just wins when the opponent makes mistakes
- exploitability(利用度)
- maximum expectation that a perfect counter-strategy could win
- exploitability = 0 when Nash equilibrium
- CFR can make average strategy’s exploitability converges towards zero

## Result

- best poker programs started beating the world’s best human players in heads-up limit hold’em in 2008, even though there were still massively exploitable by this worst-case measure
- In January 2015, we’ve essentially
**weakly solved**the game- a strategy with such a low exploitability (0.000986 big blinds per game)
- have 95% statistical confidence that they were actually winning against everyone

- a strategy with such a low exploitability (0.000986 big blinds per game)

## Algorithm Implementation

## Example Code

## Summary

## 參考資料

- (CFR)Regret Minimization in Games with Incomplete Information
- (CFR+)Solving Large Imperfect Information Games Using CFR+
- (CFR)Explanation of CFR by inventor himself
- poker AI news
- poker AI news2
- (Implementation)openCFR