Which matrix is ​​called the payoff matrix, payoff matrix. The concept of game models

Lecture 9. The concept of game models. Payment matrix.

§ 6 ELEMENTS OF GAME THEORY

6.1 The concept of game models.

The mathematical model of a conflict situation is called game , parties involved in the conflict - players, and the outcome of the conflict is win .

For each formalized game, rules , those. a system of conditions that determines: 1) options for players’ actions; 2) the amount of information each player has about the behavior of their partners; 3) the gain that each set of actions leads to. Typically, winning (or losing) can be quantified; for example, you can value a loss as zero, a win as one, and a draw as 1/2. Quantifying the results of a game is called payment .

The game is called steam room , if it involves two players, and multiple , if the number of players is more than two. We will only consider doubles games. They involve two players A And IN, whose interests are opposite, and by game we mean a series of actions on the part of A And IN.

The game is called zero sum game or antagonistic sky , if the gain of one of the players is equal to the loss of the other, i.e. the sum of the winnings of both sides is zero. To complete the game task, it is enough to indicate the value of one of them . If we designate A– winnings of one of the players, b the other's winnings, then for a zero-sum game b =A, therefore it is enough to consider, for example A.

The choice and implementation of one of the actions provided for by the rules is called progress player. The moves may be personal And random . Personal move this is a conscious choice by the player of one of the possible actions (for example, a move to chess game). The set of possible options for each personal move is regulated by the rules of the game and depends on the totality of previous moves on both sides.

Random move it is a randomly chosen action (for example, choosing a card from a shuffled deck). For a game to be mathematically defined, the rules of the game must indicate for each random move probability distribution possible outcomes.

Some games may consist only of random moves (so-called pure gambling) or only of personal moves (chess, checkers). Most card games belong to mixed type games, that is, they contain both random and personal moves. In the future, we will consider only the personal moves of the players.

Games are classified not only by the nature of moves (personal, random), but also by the nature and amount of information available to each player regarding the actions of the other. A special class of games consists of the so-called “games with complete information». A game with complete information is a game in which each player, with each personal move, knows the results of all previous moves, both personal and random. Examples of games with complete information include chess, checkers, and famous game"crosses and toes". Most games of practical importance do not belong to the class of games with complete information, since uncertainty about the actions of the enemy is usually an essential element of conflict situations.

One of the main concepts of game theory is the concept strategies .

Strategy A player is a set of rules that determine the choice of his action at each personal move, depending on the current situation. Usually during the game, with each personal move, the player makes a choice depending on the specific situation. However, it is in principle possible that all decisions are made by the player in advance (in response to any given situation). This means that the player has chosen a specific strategy, which can be specified as a list of rules or a program. (This way you can play the game using a computer.) The game is called ultimate , if each player has a finite number of strategies, and endless .– otherwise.

In order to decide game , or find game solution , for each player we should choose a strategy that satisfies the condition optimality , those. one of the players must receive maximum win, when the second one sticks to his strategy, At the same time the second player must have minimum loss , if the first one sticks to his strategy. Such strategies are called optimal . Optimal strategies must also satisfy the condition sustainability , those. It must be disadvantageous for either player to abandon their strategy in this game.

If the game is repeated quite a few times, then players may not be interested in winning and losing in each specific game, but Aaverage win (loss) in all batches.

The goal of game theory is to determine the optimal strategy for each player.

6.2. Payment matrix. Lower and top price games

The ultimate game in which the player A It has T strategies, and the player V – p strategies is called a game.

Consider the game
two players A And IN(“we” and “enemy”).

Let the player A has T personal strategies, which we denote
. Let the player IN available n personal strategies, let's designate them
.

Let each side choose a specific strategy; for us it will be , for the enemy . As a result of the players choosing any pair of strategies And (
) the outcome of the game is uniquely determined, i.e. winnings player A(positive or negative) and loss
player IN.

Let's assume that the values known for any pair of strategies ( ,). Matrix
,
, the elements of which are winnings corresponding to the strategies And , called payment matrix or matrix of the game. The rows of this matrix correspond to the player's strategies A, and the columns – the player’s strategies B. These strategies are called pure.

Game Matrix
has the form:

Consider the game
with matrix

and determine the best among strategies
. Choosing a strategy , player A must expect that the player IN will answer it using one of the strategies , for which the payoff for the player A minimal (player IN seeks to "harm" the player A).

Let us denote by player's smallest winnings A when choosing a strategy for all possible player strategies IN(smallest number in i th row of the payment matrix), i.e.

(1)

Among all the numbers (
) choose the largest:
.

Let's call
lowest price ngra, or maximum winnings (maxmin). This guaranteed win player A for any strategy of player B. Hence,

. (2)

The strategy corresponding to maximin is called maximin strategy . Player IN interested in reducing the player's winnings A, choosing a strategy , it takes into account the maximum possible gain for A. Let's denote

. (3)

Among all the numbers let's choose the smallest

and let's call top at the cost of the game or minimax win (minimax). Ego guaranteed loss of player B . Therefore,

. (4)

The strategy corresponding to minimax is called minimax strategy.

The principle that dictates players to choose the most “cautious” minimax and maximin strategies is called minimax principle . This principle follows from the reasonable assumption that each player strives to achieve a goal opposite to that of his opponent.

Theorem.The lower price of the game always does not exceed the upper price of the game
.

If the upper and lower prices of the game are the same, then the total value of the upper and lower prices of the game
called the pure price of the game, or at the cost of the game. Minimax strategies corresponding to the price of the game are optimal strategies , and their totality - optimal solution or solution of the game. In this case the player A receives the maximum guaranteed (independent of the player’s behavior) IN) winnings v, and the player IN achieves the minimum guaranteed (regardless of the player’s behavior A) losing v. They say that the solution to the game has stability , those. if one of the players sticks to his optimal strategy, then it cannot be profitable for the other to deviate from his optimal strategy.

If one of the players (for example A) sticks to his optimal strategy, and the other player (IN) will deviate from its optimal strategy in any way, then for the player who made the deviation, it can never be profitable; such player deviation IN can at best leave the winnings unchanged. and in worst case– increase it.

On the contrary, if IN adheres to its optimal strategy, and A deviates from its own, then this can in no way be beneficial for A.

A couple of pure strategies And gives an optimal solution to the game if and only if the corresponding element is both the largest in its column and the smallest in its row. This situation, if it exists, is called saddle point. In geometry, a point on a surface that has the property of having a simultaneous minimum in one coordinate and a maximum in another is called saddle point, by analogy this term is used in game theory.

The game for which
,
called game with a saddle point. Element , which has this property, is a saddle point of the matrix.

So, for every game with a saddle point, there is a solution that determines a pair of optimal strategies for both sides, differing in the following properties.

1) If both sides stick to their optimal strategies, then the average payoff is equal to the net cost of the game v, which is simultaneously its lower and upper price.

2) If one of the parties adheres to its optimal strategy, and the other deviates from its own, then the deviating party can only lose and in no case can increase its winnings.

The class of games that have a saddle point is of great interest from both theoretical and practical points of view.

In game theory, it is proven that, in particular, every game with complete information has a saddle point, and, therefore, every such game has a solution, i.e., there is a pair of optimal strategies of both sides, giving an average payoff equal to the cost of the game. If a game with complete information consists only of personal moves, then when each side applies its optimal strategy, it should always end in a well-defined outcome, namely, a win exactly equal to the cost of the game.

http://emm. *****/lect/lect5.html#vopros2

(looks better on the Internet than this copy)

Elements of Game Theory

1. Basic concepts and definitions. Subject of game theory.
2. Zero-sum games. The solution is in pure strategies.
3. Solving games in mixed strategies.
4. Geometric interpretation of games.
5. Reducing the pairs game to the problem linear programming.
6. General scheme solutions to zero-sum paired games.
7. Use of alternative criteria for determining optimal strategies.

1. Basic concepts and definitions. Subject of game theory

Quite often in my practical activities a person has to face problems in which it is necessary to make a decision in conditions where two or more parties pursue different goals, and the results of any action of each party depend on the activities of the partner. Such situations, which arise, for example, when playing chess, checkers or dominoes, are classified as conflictual: The result of each player's move depends on the opponent's counter move. What this response will be is not known in advance, so they say that the decision has to be made under conditions of uncertainty. The goal of the game is for one of the participants to win.

In economics, conflict situations occur frequently and are diverse in nature. These include, for example, relationships between supplier and consumer, buyer and seller, bank and client. In these examples, the conflict situation is generated by the difference in the interests of the partners and the desire of each of them to make decisions that realize their goals to the greatest extent. At the same time, everyone has to take into account not only their own goals, but also the goals of their partner, and take into account the decisions unknown in advance that these partners will make.

The situation is called conflict, if it involves parties whose interests are completely or partially opposite.

There are scientifically based methods for rationally solving problems with conflict situations. Such methods have been developed mathematical theory conflict situations, which is called game theory.

A game- this is an actual or formal conflict in which there are at least two participants (players), each of whom strives to achieve their own goals.

The permissible actions of each player aimed at achieving a certain goal are called rules of the game.

The game is called steam room, if it involves two players, and multiple, if the number of players is more than two. In what follows, we will consider only doubles games. This game involves two players - A and B, whose interests are opposite. The game (game process) will be understood as a series of actions on the part of A and B.

Quantifying the results of a game is called payment.

The doubles game is called zero sum game, or antagonistic, if the amount of payments is zero, that is, the gain of one player is equal to the loss of the other. In this case for complete task game, it is enough to indicate one of the quantities. If, for example, a– winnings of one of the players, b- the other's winnings, then for a zero-sum game b = -a, therefore it is enough to consider, for example, a.

Within this course We will consider zero-sum paired games.

The choice and implementation of one of the actions provided for by the rules is called progress player. Moves can be personal and random.

Personal move is a conscious choice by a player of one of the possible actions (for example, a move in a chess game).

Random move is a randomly chosen action (for example, choosing a card from a shuffled deck).

In the future, we will consider only the personal moves of the players.

Player strategy is a set of rules that determine the choice of his action at each personal move, depending on the current situation.

Usually during the game, with each personal move, the player makes a choice depending on the specific situation. However, in principle, it is possible that decisions are made by the player in advance (in response to any given situation). This means that the player has chosen a specific strategy, which can be specified as a list of rules or a program.

The game is called ultimate, if each player has a finite number of strategies, and endless- otherwise.

The player's strategy is called optimal, if it provides the player with the maximum win (or, what is the same, the minimum loss), provided that the second player adheres to his strategy.

If the game is repeated many times, then players may be interested not in winning and losing in each specific game, but in average win (loss) in all batches.

In order to solve the game, or find a solution to the game, it is necessary for each player to choose the optimal strategy.

Thus, subject of game theory constitute methods for finding optimal player strategies.

When choosing an optimal strategy, it is natural to assume that both players behave reasonably in terms of their interests. The most important limitation of game theory is the uniqueness of payoff as an indicator of efficiency, while in most real economic problems there is more than one indicator of efficiency. In addition, in economics, as a rule, there are tasks in which the interests of partners are not necessarily antagonistic. However, solving games in the presence of many participants with consistent interests is a much more difficult task.

We will limit ourselves to considering zero-sum paired games.

2. Zero-sum games. Solution in pure strategies

Let's consider the steam room the end game.

Let player A have m personal strategies: A1, A2, …, Am. Let player B have n personal strategies. Let us denote them B1, B2, …, Bn. In this case, the game has a dimension mxn..gif" width="39" height="17 src=">) the outcome of the game is uniquely determined, i.e. the winning aij of player A (positive or negative) and the loss (- aij) of player IN.

Let us assume that the values ​​of aij are known for any pair of strategies (Ai, Bj).

Matrix A = (aij), 230" style="width:172.5pt">

a11 a12 ... a1n
a21 a22 ... a2n

The payment matrix is ​​also often presented in table form (see Table 5.1).

Table 5.1 - General form payment matrix

The rows of matrix A correspond to the strategies of the first player, and the columns correspond to the strategies of the second.

These strategies are called clean.

Example 5.1. Create a payoff matrix for the next game (Search game).

Player A can hide in one of two shelters (I or II); player B looks for player A, and if he finds it, he receives a fine of 1 monetary unit from A, otherwise he pays player A 1 monetary unit.

Solution.

In order to create a payment matrix, you should analyze the behavior of each player. Player A can hide in shelter I - let's denote this strategy by A1, or in shelter II - strategy A2.

Player B can look for the first player in shelter I - strategy B1, or in shelter II - strategy B2. If player A is in shelter I and player B discovers him there, i.e., a pair of strategies (A1, B1) is implemented, then player A pays a penalty, i.e. a11 = -1. Similarly a22 = -1.

Obviously, combinations of strategies (A1, B2) and (A2, B1) give player A a payoff equal to one, so a12 = a21 = 1.

Thus, for the game "Search" size 2x2 we get the following payment matrix:

A (hiding) =

Let's consider a game of size mxn with matrix A = (aij), https://pandia.ru/text/78/456/images/image002_132.gif" width="39" height="17 src=">and determine the best among strategies A1, A2, …, Am.

When choosing a strategy Ai, player A must expect that player B will respond to it with one of the strategies Bj for which the payoff of player A is minimal (player B seeks to “harm” player A).

Let's denote MsoNormalTable">

Among the numbers https://pandia.ru/text/78/456/images/image010_40.gif" width="44" height="16 src=">) we will choose the largest. Let's call lower price of the game or maximum winnings (maximin). This guaranteed payoff of player A for any strategy of player B.

The final formula can be written as follows:

The strategy corresponding to maximin is called maximin strategy.

Similar reasoning can be carried out in relation to player B.

Player B is interested in reducing Player A's payoff.

When choosing strategy Bj, he takes into account that player A will strive for the maximum win.

Let's denote https://pandia.ru/text/78/456/images/image015_30.gif" width="10" height="17 src=">.gif" width="58 height=23" height="23" >and let's call top price of the game or minimax. This minimum guaranteed loss for player B.

Thus:

The strategy corresponding to minimax is called minimax strategy.

The principle that dictates players to choose the most “cautious” maximin and minimax strategies is called minimax principle. This principle follows from the reasonable assumption that each player strives to achieve a goal opposite to that of his opponent.

The player chooses his actions, assuming that the enemy will act in an unfavorable manner, that is, he will try to “harm.”

Let's go back to example 5.1 and determine the lower and upper price of the game in the “Search” problem.

Consider the payment matrix:

When choosing strategy A1 (the first row of the matrix), the minimum payoff is https://pandia.ru/text/78/456/images/image012_33.gif" width="10" height="8 src=">2 = min (- 1; 1) = -1, it is achieved when player B uses strategy B2.

Guaranteeing yourself the maximum win for any strategy of player B, i.e..gif" width="10" height="8 src=">.gif" width="7" height="14">1 = max (-1 ; 1) = 1.

Similarly, the maximum loss of player B when he chooses strategy B2 (second column) is 2 = max (1; -1) = 1.

Thus, for any strategy of player A, the guaranteed minimum loss of player B is = min (1, 2) = min (1, 1) = 1 - the upper price of the game.

Any strategy of player B is minimax.

We summarize the results of our reasoning in table 5.2, which is a payment matrix with an additional row j and column i. At their intersection we will write down the upper and lower prices of the game.

Table 5.2 - Payment matrix Search game with extra row and column

Thus, in the problem under consideration, the lower and upper prices of the game are different: https://pandia.ru/text/78/456/images/image017_28.gif" height="14 src=">.

If the upper and lower prices of the game coincide, then the total value of the upper and lower prices v = https://pandia.ru/text/78/456/images/image017_28.gif" height="14 src=">is called at the pure price of the game, or simply at the cost of the game. The maximin and minimax strategies corresponding to the price of the game are optimal strategies, and their totality – optimal solution, or simply game solution.

In this case, player A receives the maximum guaranteed (independent of the behavior of player B) winnings v, and player B achieves the minimum guaranteed (independent of the behavior of player A) loss v. They say that the solution to the game has stability, i.e., if one of the players adheres to its optimal strategy, then it cannot be profitable for the other to deviate from its optimal strategy.

A pair of pure strategies Ai and Bj gives an optimal solution to the game if and only if its corresponding element aij is both the largest in its column and the smallest in its row.

This situation, if it exists, is called saddle point(similar to the surface of a saddle, which curves up in one direction and down in the other).

Thus, for a game with a saddle point, finding a solution involves choosing a maximin and minimax strategy, which are optimal.

Example 5.2. Determine the lower and upper price of the game, which is given by the following payoff matrix:

0,5 0,6 0,8
0,9 0,7 0,8
0,7 0,6 0,6

Solution.

Let's find out whether the game has a saddle point. It is convenient to carry out the solution in a table. Table 5.3 includes the game's payoff matrix as well as an additional row and column that illustrate the process of finding optimal strategies.

Table 5.3 - Payment matrix of example 5.2 with additional row and column

Let's give some explanations.

Column https://pandia.ru/text/78/456/images/image012_33.gif" width="10" height="8 src=">1 = 0.5; 2 = 0.7; 3 = 0, 6 - minimum numbers in lines.

Similarly, https://pandia.ru/text/78/456/images/image017_28.gif" width="7 height=14" height="14">2 = 0.7; 3 = 0.8 - maximum numbers in columns.

3. Solving games in mixed strategies

So, for a game with a saddle point, finding a solution consists of choosing the maximin and minimax strategies, which are optimal.

If the game does not have a saddle point, then the use of pure strategies does not provide an optimal solution to the game. For example, in the game "Search" ( example 5.1) there is no saddle point.

In this case, you can obtain the optimal solution by alternating pure strategies.

A mixed strategy for player A is the use of pure strategies A1, A2, …, Am with probabilities u1, u2, …, um.

Usually the mixed strategy of the first player is denoted as a vector: U = (u1, u2, ..., um), and the strategy of the second player as a vector: Z = (z1, z2, ..., zm).

It's obvious that:

ui ≥ 0, ,
zj ≥ 0, ,
ui = 1, zj = 1.

Optimal game solution(or simply - game solution) is a pair of optimal strategies U*, Z*, generally mixed, having the following property: if one of the players adheres to his optimal strategy, then it cannot be profitable for the other to deviate from his. The payoff corresponding to the optimal solution is called at the cost of the game v. The price of the game satisfies the inequality:

The following fundamental theorem of game theory is true.

Neumann's theorem. Every finite zero-sum game has a mixed strategy solution..

Let U* = (, https://pandia.ru/text/78/456/images/image030_23.gif" width="15" height="17 src=">) and Z* = (, https:// pandia.ru/text/78/456/images/image033_22.gif" width="13" height="17 src=">) - a pair of optimal strategies. If a pure strategy is included in an optimal mixed strategy with probability different from zero, then it is called active.

Active Strategies Theorem. If one of the players sticks to his optimal mixed strategy, then the payoff remains unchanged and equal to the game price v, if the second player does not go beyond the limits of his active strategies..

This theorem is of great practical importance - it provides specific models for finding optimal strategies in the absence of a saddle point.

Consider the game of size 2 x2 .

Such a game is the simplest case of a finite game. If such a game has a saddle point, then the optimal solution is a pair of pure strategies corresponding to this point.

For a game in which there is no saddle point in accordance with Neumann’s theorem, an optimal solution exists and is determined by a pair of mixed strategies U* = (https://pandia.ru/text/78/456/images/image029_24.gif" width="13 " height="17 src=">) and Z* = (, https://pandia.ru/text/78/456/images/image029_24.gif" width="13" height="17"> = v. Considering that + = 1, we obtain a system of equations:

Consider a game with a matrix

The letter i will denote the number of our strategy, and the letter the number of the enemy’s strategy.

Let's discard the question of mixed strategies and consider only pure ones for now. Let's set the task: to determine the best among our strategies. Let's analyze each of them sequentially, starting with and ending with. When choosing, we must expect that the enemy will respond to it with the strategy for which our winnings are minimal. Let's find the smallest number in the line and denote it

(the sign indicates the minimum value of this parameter for all possible

Let's write down the numbers (row minimums) next to the matrix on the right in the form of an additional column:

When choosing a strategy, we must count on the fact that as a result of the enemy’s reasonable actions we will only win. Naturally, acting most carefully (i.e., avoiding any risk), we must prefer to others the strategy for which the number is maximum. Let's denote this maximum value

or. taking into account formula (4.1),

The value a is called the lower price of the game, otherwise the maximin payoff or maximin. The strategy of player A that corresponds to maximin a is called maximin strategy.

Obviously, if we adhere to the maximin strategy, then regardless of the enemy’s behavior we are guaranteed a win, at least not less than a. Therefore, the value a is called the “lower price of the game.” This is the guaranteed minimum that we can provide for ourselves by adhering to our most cautious (“reinsurance”) strategy.

Obviously, a similar reasoning can be carried out for opponent B. He is interested in reducing our winnings to a minimum; this means that he must look through all his strategies, highlighting the maximum winning value for each of them. Let us write down the matrix (4.2) maximum values ​​by columns:

and find their minimum:

(4.4)

The value is called the upper price of the game, otherwise the minimax win or minimax. The opponent's winning strategy is called his minimax strategy. By sticking to his most cautious minimax strategy, the opponent is guaranteed that in any case he will lose no more than p.

The principle of caution, which dictates that players choose appropriate strategies (maximin and minimax), is fundamental in game theory and is called the minimax principle. It follows from the assumption that each player is reasonable, striving to achieve a goal opposite to the opponent’s goal. The most “cautious” maximin and minimax strategies are often denoted general term"minimax strategies".

Let us determine the lower and upper prices of the game, as well as minimax strategies, for the three examples discussed in the previous paragraph.

Example 1. (Search game). Determining the row minima and column maxima we get

Since the values ​​, are constant and equal to -1, respectively, and the lower and upper prices of the game are also equal to -1 and

Any strategy of player A is his maximin strategy, and any strategy of player B is his minimax strategy. The conclusion is trivial: by adhering to any of his strategies, player A can guarantee that he will lose no more than 1 ruble; Player B can guarantee the same for any of his strategies.

Example 2. (Three fingers game). By writing out the minimums of the rows and maximums of the columns, we will find the lower price of the game and the upper one (highlighted in bold in the table). Our maximin strategy (by applying it systematically, we guarantee that we will win no less than -3, i.e. we will lose no more than 3).

The enemy’s minimax strategy is any of the strategies, using them systematically, he can guarantee that he will not give up more than 4. If we deviate from our maximin strategy (for example, choose A 2), then the enemy can “punish” us for this by applying and reducing our winning and the opponent's retreat from his minimax strategy can be “punished” by increasing his loss to 6.

Let us pay attention to the fact that minimax strategies in in this case not stable. Indeed, let, for example, the opponent choose one of his minimax strategies and stick to it. Having learned this, we will move on to strategy and win 4. The enemy will respond with strategy and win 5; to this we, in turn, will respond with a strategy and win 4, etc. Thus, the situation in which both players use their minimax strategies is unstable and can be violated by the received information about the strategy used by the other side. However, such instability is not always observed; We will see this in the following example.

Example 3. (Game “weapons and aircraft”). Determine row minimums and column maximums:

In this case lowest price game is equal to the top:

Minimax strategies are stable: if one of the players adheres to his minimax (maximin) strategy, then the other player cannot improve his position by deviating from his.

Thus, we see that there are games for which the lower price is equal to the upper:

These games occupy a special place in game theory and are called saddle point games. In the matrix of such a game there is an element that is both minimal in its row and maximal in its column; such an element is called a saddle point” (by analogy with a saddle point on a surface, where a minimum is achieved along one coordinate and a maximum along another).

The total value of the lower and upper price of the game

called the net price of the game.

The saddle point corresponds to a pair of minimax strategies; these strategies are called optimal, and their combination is called a solution to the game. The solution of the game has the following property: if one of the players adheres to his optimal strategy, then it cannot be profitable for the other to deviate from his optimal one (such a deviation will either leave the situation unchanged or worsen it).

Indeed, in a game with a saddle point, let player A stick to his optimal strategy, and player B stick to his. As long as this is so, the payoff remains constant and equal to the game price v. Now suppose that B deviated from his optimal strategy. Since v is the minimum element in its row, such a deviation cannot be beneficial to B; Likewise, for A, if B adheres to his optimal strategy, deviation from his cannot be beneficial.

We see that for a game with a saddle point, minimax strategies are stable. A pair of optimal strategies in a game with a saddle point is like an equilibrium position: a deviation from the optimal strategy causes a change in the payoff that is disadvantageous for the deviating player and forces him to return to his optimal strategy.

The net price of play v in a saddle point game is the value of the payoff that, in a game against a reasonable opponent, player A cannot increase and player B cannot decrease.

Note that there may be more than one saddle point in the payment matrix, but several.

For example, there are six saddle points in the matrix, with general meaning winnings and corresponding pairs of optimal strategies: It is easy to prove (we will not do this) that if there are several saddle points in the game matrix, then they all give the same payoff value.

Example. Side A - air defense systems - defends a section of territory from an air raid, having two guns No. 1 and No. 2, the coverage areas of which do not overlap (Fig. 9.1). Each weapon can only fire at an aircraft passing through its coverage area, but to do this it must track it in advance (before the target enters the area) and generate targeting data. If the target is fired at, it is hit with a probability. Side B has two aircraft, each of which can be directed to any zone At the moment when side A carries out target distribution (assigns which weapon to shoot at which target), the movement of target aircraft No. 1 is directed into the range of action of gun No. 1, and target No. 2 is directed into the range of action of gun No. 2 However, after making a decision on target distribution, each target can maneuver using a “deception maneuver” (see dotted arrows in Figure 9.1).

The task of side A is to maximize, and side B’s task is to minimize the number of targets hit. Find a solution to the game (optimal strategies of the parties)

Solution. Side A (air defense weapons) has four possible strategies - each gun monitors the target heading into its zone,

The guns track targets “crosswise” (each one tracking a target heading towards its neighbor),

Both guns are tracking target No. 1,

Both guns are tracking target No. 2. Side B (target aircraft) also has four strategies:

Both intact do not change direction,

Both targets use deception.

The first target uses a deception maneuver, but the second does not,

The second target uses a deception maneuver, but the first does not.

The result is a 4X4 game, the matrix of which is given in the table:

By finding the minima of the rows and the maxima of the columns, we are convinced that the lower price of the game is equal to the upper price of the game: this means that the game has a saddle point and a solution in pure strategies, leading to the net price of the game. In this case, there is not one, but four saddle points. Each of them corresponds to a pair of optimal strategies that give a solution to the game. The price of the game means that with optimal behavior of the parties, the planes will inevitably lose one plane, and no tricks will help them lose less, and the means Air defense - shoot down more than one aircraft This state of equilibrium is achieved when both sides use their optimal strategies: both guns track the same aircraft (any), and the aircraft are sent after target distribution to the same zone (any)

The class of games that have a saddle point is very interesting from both theoretical and practical points of view. In particular, all so-called “games with complete information” belong to it.

A game with complete information is a game in which each player, with each personal move, knows the results of all previous moves - both personal and random. Examples of games with complete information include: checkers, chess, the well-known game of tic-tac-toe, etc.

In game theory it is proven that every game with complete information has a saddle point and, therefore, a solution in pure strategies. In other words, in every game with complete information, there is a pair of optimal strategies on both sides that give a stable payoff equal to the net cost of the game. If a game with complete information consists only of personal moves, then when each side applies its optimal strategy, the game should always end with a well-defined outcome equal to the cost of the game

As an example, consider the following game with complete information. Two players alternately place identical coins on round table, choosing the position of the coin arbitrarily (mutual overlap of coins is not allowed). The winner is the one who puts in the last coin (when there is no room left for others). It is not difficult to see that the outcome of this game is predetermined, and there is a certain strategy that ensures a certain win for the player who puts in the coin first. Namely, he must place the coin in the center of the table for the first time, and then respond to each opponent’s move with a symmetrical move. Obviously, no matter how the enemy behaves, he cannot avoid losing. Therefore, the game makes sense only for people who do not know its solution. The situation is exactly the same with chess and other games with complete information; any of these games has a saddle point and, therefore, a solution that indicates to each player his optimal strategy, so the game makes sense only as long as the solution is unknown. A solution to the chess game has not been found (and is unlikely to be found in the foreseeable future) only because the number of strategies (combinations of moves) in chess is too large to be able to construct a payoff matrix and find a saddle point in it.

Lecture 4

Topic: “ELEMENTS OF GAME THEORY”

The concept of game models

In practice, we often encounter problems in which it is necessary to make decisions under conditions of uncertainty, i.e. Situations arise in which two (or more) parties pursue different goals, and the results of any action of each party depend on the activities of the partner. Such situations that arise when playing chess, checkers, dominoes, etc., are considered conflict situations: the result of each player’s move depends on the opponent’s response move, the goal of the game is to win one of the partners. In economics, conflict situations occur very often and are of a diverse nature. These include, for example, relationships between supplier and consumer, buyer and seller, bank and client. In all these examples, the conflict situation is generated by the difference in the interests of the partners and the desire of each of them to make optimal decisions that realize their goals to the greatest extent. At the same time, everyone has to take into account not only their own goals, but also the goals of their partner, and take into account the decisions unknown in advance that these partners will make.

To competently solve problems with conflict situations, scientifically based methods are needed. Such methods are developed by the mathematical theory of conflict situations, which is called game theory .

Let's get acquainted with the basic concepts of game theory. The mathematical model of a conflict situation is called game , parties involved in the conflict - players And, and the outcome of the conflict is win . For each formalized game, rules A, those. system of conditions defining:

1) options for players’ actions;

2) the amount of information each player has about the behavior of their partners;

3) the gain that each set of actions leads to. Typically, winning (or losing) can be quantified; for example, you can value a loss as zero, a win as one, and a draw as 1/2.



The game is called steam room , if it involves two players, and multiple , if the number of players is more than two. We will only consider doubles games. They involve two players A And IN, whose interests are opposite, and by game we mean a series of actions on the part of A And IN.

The game is called zero sum game , or antagonistic th, if the gain of one of the players is equal to the loss of the other, i.e. To complete the game task, it is enough to indicate the value of one of them. If we designate A - winnings of one of the players, b - the other's winnings, then for a zero-sum game b = -A, therefore it is enough to consider, for example A .

The choice and implementation of one of the actions provided for by the rules is called progress player. Moves can be personal and random. Personal move - this is a conscious choice by the player of one of the possible actions (for example, a move in a chess game). Random move - it is a randomly chosen action (for example, choosing a card from a shuffled deck). In the future, we will consider only the personal moves of the players.

Strategy A player is a set of rules that determine the choice of his action at each personal move, depending on the current situation. Usually during the game, with each personal move, the player makes a choice depending on the specific situation. However, in principle, it is possible that all decisions are made by the player in advance (in response to any given situation). This means that the player has chosen a specific strategy, which can be specified as a list of rules or a program. (This way you can play the game using a computer.) The game is called Certainly th, if each player has a finite number of strategies, and endless - otherwise.

In order to decide game, or find a solution to the game, you should choose a strategy for each player that satisfies the condition optimality And, those. one of the players must receive maximum win , when the second one sticks to his strategy. At the same time, the second player must have minimum loss , if the first one sticks to his strategy. Such strategists And are called optimal. Optimal strategies must also satisfy the condition sustainability , those. It must be disadvantageous for either player to abandon their strategy in this game.

If the game is repeated quite a few times, then players may be interested not in winning and losing in each specific game, but in average win (loss) in all batches.

The goal of game theory is to determine the optimal strategy for each player. When choosing an optimal strategy, it is natural to assume that both players behave reasonably in terms of their interests. The most important limitation of game theory is the uniqueness of payoff as an indicator of efficiency, while in most real economic problems there is more than one indicator of efficiency. In addition, in economics, as a rule, problems arise in which the interests of partners are not necessarily antagonistic. The development of game theory apparatus for solving problems with many participants having consistent interests is beyond the scope of the lecture.

Payment matrix.

Lower and upper price of the game

Consider a paired finite game. Let the player A has T personal strategies, which we denote by . Let the player IN available P personal strategies, let's denote them . They say the game has dimensions t x p . As a result of the players’ choice of any pair of strategies, the outcome of the game is uniquely determined, i.e. player's winnings A (positive or negative) and player loss IN . Assume that the values ​​are known for any pair of strategies . Matrix Р=(), i = 1,2, ..., t; j = 1, 2, ..., n , elements of which are winnings corresponding to strategies and , called payment matrix or matrix of the game . The general appearance of such a matrix is ​​presented in the table:

The rows of this table correspond to the player's strategies A , and the columns - the player's strategies IN .

Example.Create a payoff matrix for the next game. Search game.

Player A can hide in one of two shelters (I and II); player IN looking for a player A , and if he finds it, he receives a fine of 1 den. units from A , otherwise pays the player A 1 day units It is necessary to build the payment matrix of the game.

Solution. To compile a payment matrix, you should analyze the behavior of each player. Player A can hide in shelter I - let's denote this strategy by or in shelter II - strategy.

Player IN can look for the first player in the shelter I - strategy , or to the shelter II - strategy. If the player A is located in Vault I and is discovered by the player there IN , those. a pair of strategies is implemented then the player A pays a fine, i.e. . Similarly we get . Obviously, strategies give the player A win 1, therefore . So for the game" search"of size 2x2 we get a payment matrix

Consider the game t p with matrix, i =1, 2, ..., T ; j =1, 2, ..., P and determine the best among strategies . Player choosing strategy A must expect that the player IN will answer it using one of the strategies , for which the payoff for the player A minimal (player IN seeks to "harm" the player A ).

Let us denote by , the player’s smallest payoff A when choosing a strategy , for all possible player strategies IN (smallest number in i -and the line of the payment matrix), i.e.

, j =1,…n . (1)

Among all the numbers Let's choose the largest: . Let's call the lowest price of the game, or maximum winnings (maximin). This guaranteed winnings for the player A for any player strategy IN. Hence,

. (2)

The strategy corresponding to maximin is called maximin strategy . Player IN interested in reducing the player's winnings A ; choosing a strategy , it takes into account the maximum possible gain for A . Let's denote

Among all the numbers, choose the smallest one and call it top price of the game or minimax win (minimax). This guaranteed player loss IN. Therefore,

(4)

The strategy corresponding to minimax is called minimax strategy.

The principle that dictates players to choose the most “cautious” minimax and maximin strategies is called the minimax principle. This principle follows from the reasonable assumption that each player strives to achieve a goal opposite to that of his opponent.

Example. Determine the lower and upper prices of the game and the corresponding strategies in the game " Search" Consider the payment matrix:

When choosing a strategy (first row of the matrix), the minimum payoff is and corresponds to the player's strategy IN . When choosing a strategy (second row of the matrix), the minimum payoff is equal to , it is achieved with the strategy.

Guaranteeing yourself the maximum win for any player strategy IN, those. lower price of the game, player A can choose any strategy: or , those. any of his strategies is maximin.

When choosing a strategy (column 1), the player IN understands that the player A will respond with a strategy to maximize his gain (loss IN ). Therefore, the player's maximum loss is IN when choosing a strategy it is equal to max (-l; 1) = 1.

Similarly, the maximum loss of a player IN (winning A ) when he chooses a strategy (column 2) is equal to .

Thus, for any player strategy A guaranteed minimum player loss IN equal to the top price of the game.

Any player strategy IN is minimax. By adding a row and a column to the table, we get a new table:

-1 -1
-1 -1
1

At the intersection of the additional rows and columns we will write down the upper and lower prices of the games.

A zero-sum game in which each player has a finite set of strategies at his disposal. Rules matrix game is determined by a payment matrix, the elements of which are the winnings of the first player, which are also the losses of the second player.

Matrix game is an antagonistic game. The first player receives the maximum guaranteed (independent of the behavior of the second player) winnings, equal to the price of the game; similarly, the second player achieves the minimum guaranteed loss.

Under strategy is understood as a set of rules (principles) that determine the choice of action for each personal move of the player, depending on the current situation.

Now about everything in order and in detail.

Payment matrix, pure strategies, game price

IN matrix game its rules are determined payment matrix .

Consider a game in which there are two participants: the first player and the second player. Let the first player have at his disposal m pure strategies, and at the disposal of the second player - n pure strategies. Since the game is being considered, it is natural that in this game there are wins and there are losses.

IN payment matrix the elements are numbers expressing the players' wins and losses. Wins and losses can be expressed in points, amount of money or other units.

Let's create a payment matrix:

If the first player chooses i-th pure strategy, and the second player - j th pure strategy, then the payoff of the first player will be aij units, and the loss of the second player is also aij units.

Because aij + (- a ij) = 0, then the described game is a zero-sum matrix game.

The simplest example of a matrix game is coin toss. The rules of the game are as follows. The first and second players throw a coin and the result is either heads or tails. If "heads" and "heads" or "tails" or "tails" are thrown at the same time, then the first player will win one unit, and in other cases he will lose one unit (the second player will win one unit). The same two strategies are at the disposal of the second player. The corresponding payment matrix will be as follows:

The task of game theory is to determine the choice of the first player's strategy, which would guarantee him the maximum average win, as well as the choice of the second player's strategy, which would guarantee him the maximum average loss.

How do you choose a strategy in a matrix game?

Let's look at the payment matrix again:

First, let's determine the amount of winnings for the first player if he uses i th pure strategy. If the first player uses i th pure strategy, then it is logical to assume that the second player will use such a pure strategy due to which the first player’s payoff would be minimal. In turn, the first player will use such a pure strategy that would provide him with the maximum win. Based on these conditions, the winnings of the first player, which we denote as v1 , called maximin winnings or lower price of the game .

At for these values, the first player should proceed as follows. From each line, write down the value of the minimum element and select the maximum one from them. Thus, the first player's winnings will be the maximum of the minimum. Hence the name - maximin winning. The line number of this element will be the number of the pure strategy that the first player chooses.

Now let’s determine the amount of loss for the second player if he uses j th strategy. In this case, the first player uses his own pure strategy in which the loss of the second player would be maximum. The second player must choose a pure strategy in which his loss would be minimal. The loss of the second player, which we denote as v2 , called minimax loss or top price of the game .

At solving problems on the price of the game and determining the strategy To determine these values ​​for the second player, proceed as follows. From each column, write down the value of the maximum element and select the minimum from them. Thus, the loss of the second player will be the minimum of the maximum. Hence the name - minimax win. The column number of this element will be the number of the pure strategy that the second player chooses. If the second player uses "minimax", then regardless of the choice of strategy by the first player, he will lose no more than v2 units.

Example 1.

.

The largest of the smallest elements of the rows is 2, this is the lower price of the game, the first row corresponds to it, therefore, the maximin strategy of the first player is the first. The smallest of the largest elements of the columns is 5, this is the upper price of the game, the second column corresponds to it, therefore, the minimax strategy of the second player is the second.

Now that we have learned to find the lower and upper price of the game, the maximin and minimax strategies, it’s time to learn how to formally define these concepts.

So, the guaranteed win for the first player is:

The first player must choose a pure strategy that would provide him with the maximum of the minimum winnings. This gain (maximin) is denoted as follows:

.

The first player uses his pure strategy so that the loss of the second player is maximum. This loss is indicated as follows:

The second player must choose his pure strategy so that his loss is minimal. This loss (minimax) is indicated as follows:

.

Another example from the same series.

Example 2. Given a matrix game with a payoff matrix

.

Determine the maximin strategy of the first player, the minimax strategy of the second player, the lower and upper price of the game.

Solution. To the right of the payment matrix, we will write out the smallest elements in its rows and note the maximum of them, and below the matrix - the largest elements in the columns and select the minimum of them:

The largest of the smallest elements of the lines is 3, this is the lower price of the game, the second line corresponds to it, therefore, the maximin strategy of the first player is the second. The smallest of the largest elements of the columns is 5, this is the upper price of the game, the first column corresponds to it, therefore, the minimax strategy of the second player is the first.

Saddle point in matrix games

If the upper and lower prices of the game are the same, then the matrix game is considered to have a saddle point. The converse is also true: if a matrix game has a saddle point, then the upper and lower prices of the matrix game are the same. The corresponding element is both the smallest in the row and the largest in the column and is equal to the price of the game.

Thus, if , then is the optimal pure strategy of the first player, and is the optimal pure strategy of the second player. That is, equal lower and upper game prices are achieved using the same pair of strategies.

In this case matrix game has a solution in pure strategies .

Example 3. Given a matrix game with a payoff matrix

.

Solution. To the right of the payment matrix, we will write out the smallest elements in its rows and note the maximum of them, and below the matrix - the largest elements in the columns and select the minimum of them:

The lower price of the game coincides with the upper price of the game. Thus, the price of the game is 5. That is . The price of the game is equal to the value of the saddle point. The first player's maxin strategy is the second pure strategy, and the second player's minimax strategy is the third pure strategy. This matrix game has a solution in pure strategies.

Solve a matrix game problem yourself, and then look at the solution

Example 4. Given a matrix game with a payoff matrix

.

Find the lower and upper price of the game. Does this matrix game have a saddle point?

Matrix games with optimal mixed strategy

In most cases, a matrix game does not have a saddle point, so the corresponding matrix game has no solutions in pure strategies.

But it has a solution in optimal mixed strategies. To find them, you need to assume that the game is repeated a sufficient number of times so that, based on experience, you can guess which strategy is more preferable. Therefore, the decision is associated with the concept of probability and average (mathematical expectation). In the final solution there is both an analogue of the saddle point (that is, the equality of the lower and upper prices of the game), and an analogue of the strategies corresponding to them.

So, in order for the first player to get the maximum average win and for the second player to have a minimum average loss, pure strategies should be used with a certain probability.

If the first player uses pure strategies with probabilities , then the vector is called a mixed first player strategy. In other words, it is a “mixture” of pure strategies. In this case, the sum of these probabilities is equal to one:

.

If the second player uses pure strategies with probabilities , then the vector is called a second player mixed strategy. In this case, the sum of these probabilities is equal to one:

.

If the first player uses a mixed strategy p, and the second player - a mixed strategy q, then it makes sense expected value the first player's win (the second player's loss). To find it, you need to multiply the first player's mixed strategy vector (which will be a one-row matrix), the payoff matrix and the second player's mixed strategy vector (which will be a one-column matrix):

.

Example 5. Given a matrix game with a payoff matrix

.

Determine the mathematical expectation of the first player's win (the second player's loss), if the first player's mixed strategy is , and the second player's mixed strategy is .

Solution. According to the formula for the mathematical expectation of the first player’s win (the second player’s loss), it is equal to the product of the first player’s mixed strategy vector, the payment matrix and the second player’s mixed strategy vector:

The first player is called such a mixed strategy that would provide him with the maximum average payoff if the game is repeated a sufficient number of times.

Optimal mixed strategy the second player is called such a mixed strategy that would provide him with a minimum average loss if the game is repeated a sufficient number of times.

By analogy with the maximin and minimax notations, in the case of pure strategies the optimal mixed strategies are designated as follows (and are linked to the mathematical expectation, that is, the average, of the first player’s win and the second player’s loss):

,

.

In this case, for the function E there is a saddle point , which means equality.

In order to find optimal mixed strategies and a saddle point, that is, solve a matrix game in mixed strategies , you need to reduce the matrix game to a linear programming problem, that is, to an optimization problem, and solve the corresponding linear programming problem.

Reducing a matrix game to a linear programming problem

In order to solve a matrix game in mixed strategies, you need to construct a straight line linear programming problem And dual task. In the dual problem, the extended matrix, which stores the coefficients of the variables in the system of constraints, free terms and coefficients of the variables in the objective function, is transposed. In this case, the minimum of the goal function of the original problem is matched to the maximum in the dual problem.

Goal function in a direct linear programming problem:

.

System of constraints in a direct linear programming problem:

The goal function in the dual problem is:

.

System of restrictions in the dual problem:

The optimal plan for a direct linear programming problem is denoted by

,

and the optimal plan for the dual problem is denoted by

We denote the linear forms for the corresponding optimal plans by and ,

and they need to be found as sums of the corresponding coordinates of optimal plans.

In accordance with the definitions of the previous paragraph and the coordinates of optimal plans, the following mixed strategies of the first and second players are valid:

.

Theoretical mathematicians have proven that game price is expressed in the following way through the linear forms of optimal plans:

,

that is, it is the reciprocal of the sums of coordinates of optimal plans.

We, practitioners, can only use this formula to solve matrix games in mixed strategies. Like formulas for finding optimal mixed strategies the first and second players respectively:

in which the second factors are vectors. Optimal mixed strategies are also, as we already defined in the previous paragraph, vectors. Therefore, multiplying the number (game price) by a vector (with the coordinates of optimal plans) we also obtain a vector.

Example 6. Given a matrix game with a payoff matrix

.

Find the price of the game V and optimal mixed strategies and .

Solution. We create a linear programming problem corresponding to this matrix game:

We obtain a solution to the direct problem:

.

We find the linear form of the optimal plans as the sum of the found coordinates.