Payment matrix. Lower and upper price of the game

Let's consider the steam room the end game. Let the player A has T personal strategies, which we denote

Let the player IN available P personal strategies, let us designate them. They say the game has dimensions T X P.

As a result of the players’ choice of any pair of strategies, the outcome of the game is uniquely determined, i.e. winnings A;. player A(positive or negative) and loss (-ay) player IN. Let's assume that the values A.. known for any pair of strategies (A:, B;.). Matrix P =(a..), i = = 1, 2, ..., m j = 1, 2, ..., P, the elements of which are winnings corresponding to the strategies A. And Bj, called payment matrix, or matrix of the game. General form such a matrix is presented in table. 12.1. The rows of this table correspond to the player's strategies A, and the columns – the player’s strategies IN.

Table 12.1

Let's create a payment matrix for the next game.

12.1. Search game.

Player A can hide in one of two shelters (I and II); player IN looking for a player A, and if he finds it, he receives a fine of 1 den. units from A, otherwise pays the player A 1 day units It is necessary to build the payment matrix of the game.

SOLUTION. To compile payment matrix the behavior of each player should be analyzed. Player A can hide in shelter I - we denote this strategy by A v or to the shelter II – strategy A. d Player IN can look for the first player in the shelter I - strategy IN(or to the shelter II - strategy IN.,. If the player A is located in Vault I and is discovered by the player there IN, those. a couple of strategies are being implemented (Α ν IN{), then the player A pays a fine, i.e. A n = -1. Similarly we get A. n = -1 (A 2, IN.,). It is obvious that strategies (A, IN.,) and (L2, /1,) give the player A payoff is 1, so A P = a. n = I. Thus, for a search game of size 2x2, we obtain the payoff matrix:

Consider the game T X P with matrix P =a j) , i = 1,2, ..., τη; j= 1, 2, ..., and and determine the best among strategies A at A v..., A t. Choosing a strategy A jy player A must expect that the player IN will answer it using one of the strategies IN., for which the payoff for the player A minimal (player IN seeks to "harm" the player A).

Let us denote by a; player's smallest winnings A when he chooses strategy L; for all possible player strategies IN(smallest number in i-th line payment matrix), i.e.

Among all numbers a (r = 1,2,..., T) Let's choose the largest: . Let's call and the lower price of the game, or maximum winnings (maximin). This guaranteed win player A for any strategy of player B. Hence,

(12.2)

The strategy corresponding to maximin is called maximin strategy. Player IN interested in reducing the player's winnings A; choosing a strategy IN., it takes into account the maximum possible gain for A. Let's denote

Among all the numbers β. let's choose the smallest one,

and call β top price of the game, or minimax win (minimax). This guaranteed loss for player B. Hence,

(12.4)

The strategy corresponding to minimax is called minimax strategy.

The principle that dictates that players choose the most “cautious” minimax and maximin strategies is called the principle minimax. This principle follows from the reasonable assumption that each player strives to achieve a goal opposite to that of his opponent. Let us determine the lower and upper prices of the game and the corresponding strategies in Problem 12.1. Consider the payment matrix

from problem 12.1. When choosing strategy L, (first row of the matrix), the minimum payoff is equal to a, =min(-l; 1) = -1 and corresponds to the player’s strategy β1 IN. When choosing a strategy L 2 (second row of the matrix) the minimum winning is A 2 = min(l; -1) = -1, it is achieved with the strategy IN.,.

Guaranteeing yourself maximum win for any player strategy IN, i.e. lower price of the game a = max(a, a2) = = max(-l; -1) = -1, player A can choose any strategy: Aj or A 2, i.e. any of his strategies is maximin.

Choosing strategy B, (column 1), the player IN understands that the player A will respond with a strategy A 2 to maximize your winnings (losing IN). Therefore, the player's maximum loss is IN when he chooses strategy B, is equal to β, = check(-1; 1) = 1.

Similarly, the maximum loss of player B (win A) when he chooses strategy B2 (column 2) is equal to β2 = max(l; -1) = 1.

Thus, for any player strategy A the guaranteed minimum loss of player B is equal to β = = πιίη(β1, β2) = min(l; 1) = 1- top price games.

Any strategy of player B is minimax. Having added the table 12.1 line β; and column a;, we get table. 12.2. At the intersection of the additional rows and columns we will write down the upper and lower prices of the games.

Table 12.2

In Problem 12.1 discussed above, the upper and lower prices of the game are different: a F β.

If the upper and lower prices of the game coincide, then general meaning the upper and lower prices of the game α = β = υ is called the pure price of the game, or at the cost of the game. Minimax strategies corresponding to the price of the game are optimal strategies, and their totality - the optimal solution, or decision games. In this case the player A receives the maximum guaranteed (independent of the player’s behavior) IN) the payoff is υ, and the player IN achieves the minimum guaranteed (regardless of the behavior of player A) loss υ. They say that the solution to the game has stability, those. if one of the players sticks to his optimal strategy, then it cannot be profitable for the other to deviate from his optimal strategy.

Pair pure strategies A. and B. gives an optimal solution to the game if and only if the corresponding element y is simultaneously the largest in its column and the smallest in its row. This situation, if it exists, is called saddle point(similar to the surface of a saddle, which curves up in one direction and down in the other).

Let's denote A* And IN*– a pair of pure strategies that achieve a solution to the game in the saddle point problem. Let us introduce the payoff function of the first player for each pair of strategies: P(A:, IN-) = and y. Then from the optimality condition in saddle point the double inequality holds: P(Aj, B*)<Р(А*, В*)<Р(А", В ), which is fair for everyone i = 1, 2, ..., m;j = 1, 2, ..., P. Indeed, the choice of strategy A* the first player with an optimal strategy IN" the second player maximizes the minimum possible payoff: P(A*, B")> P(A G IN"), and the choice of strategy B" the second player, with the optimal strategy of the first, minimizes the maximum loss: P(D, IN*)<Р(А", В).

12.2. Determine the lower and upper price of the game given by the payment matrix

Does the game have a saddle point?

Table 12. 3

Solution. It is convenient to carry out all calculations in a table, which, in addition to the matrix R, column a is entered; and string)