Paired finite game with a payoff matrix. Payment matrix

PRACTICAL WORK No. 3

Game theory models

The concept of gaming models

Game theory deals with the development of various kinds of recommendations for making decisions in a conflict situation. Forming conflict situations mathematically, they can be represented as a game of two, three or more players, each of whom pursues the goal of maximizing their winnings at the expense of the other player. The mathematical model of a conflict situation is called game, the parties involved in the conflict – players, and the outcome of the conflict is win. For each formalized game, rules, i.e. system of conditions defining:

1. options for players’ actions;

2. the amount of information each player has about the behavior of their partners;

3. the gain that each set of actions leads to.

As a rule, the winnings can be specified quantitatively (for example, a loss is 0, a win is 1, a draw is ½). The game is called steam room, if it involves two players, and multiple, if the number of players is more than two. The game is called zero sum game, if the gain of one of the players is equal to the loss of the other. The choice and implementation of one of the actions provided for by the rules is called progress player. Moves can be personal and random. Personal move– a conscious choice by the player of one of the possible actions (a move in chess game), random move– a randomly selected action (choosing a card from a shuffled deck).

Player strategy is a set of rules that determine the choice of his action for each personal move, depending on the current situation. The game is called ultimate, if the player has a finite number of strategies, and endless- otherwise.

To solve a game, or find game solution, you should choose a strategy for each player that satisfies the optimality condition, i.e. one of the players must receive maximum win when the second one sticks to his strategy. At the same time, the second player must have minimum loss, if the first one sticks to his strategy. Such strategies are called optimal. Purpose game theory is to determine the optimal strategy for each player. When choosing an optimal strategy, it is natural to assume that both players behave reasonably in terms of their interests.

Payment matrix. Lower and top price games

Consider a paired finite game. Let the player A has m personal strategies, which we denote A 1, A 2,…, A m. Let the player B available n personal strategies, let's designate them B 1, B 2,…, B n. They say the game has dimensions m´n. As a result of the players choosing any pair of strategies A i And Bj The outcome of the game is clearly determined, i.e. winnings a ij player A(positive or negative) and loss (- a ij) player IN. Matrix Р=(a ij), the elements of which are the winnings corresponding to the strategies A i And Bj, called payment matrix or matrix of the game.

Bj A i B 1 B 2 Bn
A 1 a 11 a 12 a 1n
A 2 a 21 a 22 a 2n
Am a m1 am 2 a mn

Example - the game "Search"

Player A can hide in shelter 1 - let's denote this strategy as A 1 or in shelter 2 - strategy A 2. Player IN can look for the first player in hideout 1 – strategy IN 1, or in shelter 2 - strategy AT 2. If the player A is located in shelter 1 and is discovered there by the player IN, i.e. a couple of strategies are being implemented (A 1,B 1), then the player A pays a fine, i.e. a 11=–1. Similarly we get a 22=–1. It is obvious that the strategies (A 1,B 2) And (A 2,B 1) give to the player A payoff is 1, so a 12=a 21=1. Thus, we obtain the payment matrix

Consider the game m´n with matrix Р=(a ij) and determine the best among the player’s strategies A. Choosing a strategy A i, player A must expect that the player IN will answer it using one of the strategies In j, for which the payoff for the player A minimal (player IN seeks to “harm” the player A).

Let us denote by a i player's smallest winnings A when choosing a strategy A i for all possible player strategies IN(smallest number in i th row of the payment matrix), i.e. .

Among all the numbers a i Let's choose the largest: . Let's call a lower price of the game , or maximum winnings (maximin ). This guaranteed win player A for any strategy of player B. Hence, .

The strategy corresponding to maximin is called maximin strategy. Player IN interested in reducing the player's winnings A; choosing a strategy Bj, it takes into account the maximum possible gain for A. Let us denote .

Among all the numbers, we choose the smallest one and call it b top price of the game , or minimax win (minimax ). This guaranteed loss of player B for any strategy of player A. Hence, .

The strategy corresponding to minimax is called minimax strategy. The principle that dictates that players choose the most cautious minimax and maximin strategies is called minimax principle.

Statistical games

In many tasks that lead to games, uncertainty is caused by the lack of information about the conditions under which the action is carried out. These conditions depend not on the conscious actions of the other player, but on objective reality, which is commonly called “nature.” Such games are called games with nature (statistical games).

Task

After several years of operation, industrial equipment finds itself in one of the following states: B 1 – the equipment can be used in next year after preventive maintenance; B 2 – for trouble-free operation of the equipment in the future, individual parts and assemblies should be replaced; 3 – equipment requires major repairs or replacement.

Depending on the current situation B 1, B 2, B 3, the management of the enterprise can make the following decisions: A 1 - repair the equipment by factory specialists, which requires corresponding costs a 1 = 6, and 2 = 10, and 3 = 15 monetary units ; A 2 - call a special team of repairmen, the costs in this case will be b 1 = 15, b 2 = 9, b 3 = 18 monetary units; A 3 – replace the equipment with new one, selling the obsolete equipment at its residual value. The total costs associated with the results of this event will be equal to, respectively, c 1 =13, c 2 =24, c 3 =12 monetary units.

Exercise

1. Having given the described situation game scheme, identify its participants, indicate possible pure strategies of the parties.

2. Create a payment matrix, explaining the meaning of the elements a ij of the matrix (why are they negative?).

3. Find out what decision on the operation of equipment in the coming year is advisable to recommend to the management of the enterprise in order to minimize losses under the following assumptions: a) the experience accumulated at the enterprise in operating similar equipment shows that the probabilities of the indicated equipment states are equal, respectively, q 1 = 0.15; q 2 =0.55; q 3 =0.3 (apply Bayes criterion); b) existing experience indicates that all three possible states of the equipment are equally probable (apply the Laplace criterion); c) nothing definite can be said about the probability of equipment (apply the Wald, Savage, Hurwitz criteria). The value of the parameter g=0.8 in the Hurwitz criterion is specified.

Solution

1) The described situation is a statistical game.

The statistician is the management of the enterprise, which can make one of the following decisions: repair the equipment on its own (strategy A 1), call repairmen (strategy A 2); replace the equipment with new ones (strategy A 3).

The second playing side, nature, will be considered a set of factors influencing the condition of the equipment: the equipment can be used after preventative repairs (condition B 1); individual components and parts of equipment need to be replaced (state B 2): major repairs or replacement of equipment will be required (state B 3).

2) Let’s create the payment matrix of the game:

Element of the payment matrix a ij shows the costs of the enterprise management if, with the chosen strategy A i, the equipment ends up in state B j. The elements of the payment matrix are negative, since with any chosen strategy, the management of the enterprise will have to bear costs.

a) the experience accumulated at the enterprise in operating similar equipment shows that the probabilities of equipment states are equal to q 1 = 0.15; q 2 =0.55; q 3 =0.3.

Let's present the payment matrix as:

Strategies statistics, A i States of nature B j
B 1 B 2 B 3
A 1 -6 -10 -15 -10,9
A 2 -15 -9 -18 -12,6
A 3 -13 -24 -12 -18,75
q j 0,15 0,55 0,3

where , (i=1.3)

According to the Bayes criterion, the optimal pure strategy A i is taken to be the one that maximizes the average gain of the statistician, i.e. provided =max .

The optimal strategy according to Bayes is strategy A 1 .

b) existing experience indicates that all three possible states of the equipment are equally probable, i.e. = 1/3.

Average winnings are:

1/3*(-6-10-15) = -31/3 "-10.33;

1/3*(-15-9-18) = -42/3 = -14;

1/3*(-13-24-12) = -49/3 » -16.33.

The optimal Laplace strategy is strategy A 1 .

c) nothing definite can be said about the probabilities of equipment.

According to the Wald criterion, a pure strategy is taken as optimal, which under the worst conditions guarantees the maximum gain, i.e.

.

= max (-15, -18, -24) = -15.

Thus, strategy A 1 is optimal.

Let's build a risk matrix, where .

Game theory is a mathematical discipline whose subject of study is decision-making methods in conflict situations.

The situation is called conflict, if the interests of several (usually two) persons pursuing opposing goals collide. Each party may undertake a series of activities to achieve its goals, with the success of one party meaning the failure of the other.

In economics, conflict situations occur very often (relationships between supplier and consumer, buyer and seller, banker and client). Conflict situations occur in many other areas.

A conflict situation is generated by the difference in interests of partners and the desire of each of them to make optimal decisions that realize their goals to the greatest extent. At the same time, everyone has to take into account not only their own goals, but also the goals of their partner, and take into account the decisions unknown in advance that the partners will make.

Typically, conflict situations are difficult to analyze directly due to the many secondary factors involved. In order to make mathematical analysis of a conflict situation possible, it is necessary to simplify it, taking into account only the main factors. A simplified formalized model of a conflict situation is called game, the parties involved in the conflict - players, and the outcome of the conflict is win. Typically, winning (or losing) can be quantified; for example, you can value a loss as zero, a win as one, and a draw as 1/2.

The game is a collection rules, describing the behavior of players. Each instance of playing a game in some specific way from beginning to end represents game game. The choice and implementation of one of the actions provided for by the rules is called progress player. Moves can be personal and random. Personal move- this is a conscious choice by the player of one of the possible actions (for example, a move in a chess game). Random move- this is also a choice of one of many options, but here the option is selected not by the player, but by some mechanism random selection(tossing coins, choosing a card from a shuffled deck).

Strategy A player is called a set of rules that determine the choice of his actions at each personal move, depending on the current situation.



If the game consists only of personal moves, then the outcome of the game is determined if each player chooses his own strategy. However, if the game has random moves, then the game will be probabilistic in nature and the choice of players’ strategies will not yet ultimately determine the outcome of the game.

In order to decide game, or find a solution to the game, you should choose a strategy for each player that satisfies the condition optimality, those. one of the players must receive maximum win, when the second one sticks to his strategy. At the same time, the second player must have minimum loss, if the first one sticks to his strategy. Such strategies are called optimal. Optimal Strategies must satisfy the stability condition, i.e. It must be disadvantageous for either player to abandon their strategy in this game.

The goal of game theory is to determine the optimal strategy for each player.

Consider a paired finite game. Let the player A has m personal strategies, which we denote A 1 , A 2 , ..., Am . Let the player IN available n personal strategies, let's designate them B 1 , B 2 , ..., B m . They say the game has dimensions m×n . As a result of the players choosing any pair of strategies



A i and B j (i = 1, 2, ..., m; j = 1, 2, ..., n)

The outcome of the game is clearly determined, i.e. winnings a ij player A (positive or negative) and loss ( - a ij ) player IN . Let's assume that the values OU known for any pair of strategies (A i ,B j ). Matrix , the elements of which are the winnings corresponding to the strategies A i And Bj , called payment matrix or matrix of the game. General form such a matrix is ​​presented in Table 3.1.

Table 3.1

The rows of this table correspond to the player's strategies A , and the columns are the player’s strategies IN . Let's create a payment matrix for the next game.

Consider the game m×n with matrix P = (a ij), i = 1, 2, ..., m; j = 1, 2, ..., n and determine the best among strategies A 1 , A 2 , ..., Am . Choosing a strategy A i player A must expect that the player IN will answer it using one of the strategies Bj , for which the payoff for the player A minimal (player IN seeks to "harm" the player A ). Let us denote by α i , the player's smallest winnings A when choosing a strategy A i for all possible player strategies IN (smallest number in i th row of the payment matrix), i.e.

The strategy corresponding to maximin is called maximin strategy. Player IN interested in reducing the player's winnings A ; choosing a strategy Bj , it takes into account the maximum possible gain for A . Let's denote

A strategy corresponding to minimax is called a minimax strategy. The principle that dictates players to choose the most “cautious” minimax and maximin strategies is called minimax principle. This principle follows from the reasonable assumption that each player strives to achieve a goal opposite to that of his opponent. Let us determine the lower and upper prices of the game and the corresponding strategies in the problem.

If the upper and lower prices of the game coincide, then general meaning upper and lower game prices α = β = v called net price games , or at the cost of the game . Minimax strategies corresponding to the price of the game are optimal strategies, and their totality is optimal solution , or game solution. In this case the player A receives the maximum guaranteed (independent of the player’s behavior) IN ) winnings v , and the player IN achieves the minimum guaranteed (regardless of the player’s behavior A ) loss v . They say that the solution to the game has stability , i.e. If one player sticks to his optimal strategy, then it cannot be profitable for the other to deviate from his optimal strategy.

A couple of pure strategies A i And Bj gives an optimal solution to the game if and only if the corresponding element a ij , is both the largest in its column and the smallest in its row. This situation, if it exists, is called saddle point (similar to the surface of a saddle, which curves up in one direction and down in the other).

Basic concepts of the inventory management model.

In both business and manufacturing, it is common practice to maintain a reasonable inventory of material resources or components to ensure continuity of the production process. Traditionally, inventory is viewed as an unavoidable cost, with inventory levels that are too low leading to costly production shutdowns, and inventory levels that are too high leading to the “death” of capital. The goal of inventory management is to determine the level of inventory that balances the two extreme cases mentioned.

Let's consider the main characteristics of inventory management models.

Demand. The demand for the stocked product can be deterministic(in the simplest case - constant in time) or random. Demand randomness is described by either a random moment of demand or a random amount of demand at deterministic or random times.

Warehouse replenishment. Replenishment of the warehouse can be carried out either periodically at certain intervals, or as stocks are exhausted, i.e. reducing them to a certain level.

Order quantity. With periodic replenishment and occasional depletion of stocks, the order quantity may depend on the condition observed at the time the order is placed. An order is usually placed for the same amount when the stock reaches a given level - the so-called order points.

Delivery time. Idealized inventory management models assume that ordered replenishment is delivered to the warehouse instantly. Other models consider delays in deliveries for a fixed or random period of time.

Delivery cost. As a rule, it is assumed that the cost of each delivery consists of two components - one-time costs that do not depend on the volume of the ordered batch, and costs that depend (most often linearly) on the volume of the batch.

Storage costs. In most inventory management models, the warehouse volume is considered to be practically unlimited, and the volume of stored inventory serves as the controlling variable. It is assumed that a certain fee is charged for storing each unit of stock per unit of time.

Deficiency penalty. Any warehouse is created in order to prevent shortages of a certain type of product in the serviced system. Lack of stock at the right time leads to losses associated with equipment downtime, irregular production, etc. These losses are called penalty for deficit.

Stock nomenclature. In the simplest cases, it is assumed that the warehouse stores a stock of similar products or a homogeneous product. In more complex cases it is considered multi-item stock.

Structure of the warehouse system. The most fully developed mathematical models of a single warehouse. However, in practice there are also more complex structures: hierarchical warehouse systems with different replenishment periods and order delivery times, with the possibility of exchanging stocks between warehouses of the same hierarchy level, etc.

The criterion for the effectiveness of the adopted inventory management strategy is cost function (costs), representing the total costs of supplying the stocked product, its storage and the cost of fines.

Inventory management consists of finding a strategy for replenishment and consumption of inventories in which the cost function takes on a minimum value.

Let the functions , and express respectively:

Restocking,

Inventory consumption

Demand for the product being stocked

for a period of time.

Inventory management models usually use derivatives of these functions with respect to time, , , called, respectively,

We'll find the best strategy player A, for which we analyze all his strategies sequentially. Choosing a strategy A i, we must count on the player B will answer it with such a strategy Bj, for which the gain A will be minimal. Therefore, among the numbers in the first line, we select the minimum one, denote it, and write it in the additional column. Similarly for each strategy A i choose, i.e. α i– minimum winnings when applying the strategy A i.
In example 1:
α 1= min (0, –1, –2) = –2;
α 2= min (1, 0, –1) = –1;
α 3= min (0, –1, –2) = 0.
We will write these numbers in the additional column. What strategy should the player choose? A? Of course, the strategy for which α i maximum. Let's denote . This is a guaranteed win that a player can secure for himself A, i.e. ; this winning is called lower price of the game or maximin . Strategy A i, ensuring the receipt of the lower price of the game is called maximin(reinsurance). If the player A will adhere to this strategy, then he is guaranteed to win ≥ α for any behavior of the player B.
In example 1. This means that if A will write “3”, then at least he won’t lose. Player B interested in reducing winnings A. Choosing a strategy B 1, for reasons of caution, he takes into account the maximum possible gain A. Let's denote . Similarly when choosing a strategy Bj maximum possible gain A–; Let's write these numbers in an additional line. To reduce your winnings A, it is necessary to choose the smallest from the numbers β j. The number is called top price of the game or minimax . This is a guaranteed loss for the player B(i.e. he will lose no more than β). Player strategy B, providing a payoff ≥ - β is called its minimax strategy.
In example 1:
;
;
;
.
This means that the optimal strategy B– write “3”, then at least he won’t lose.

BA B 1 B 2 B 3
A 1 0 – 1 –2 –2
A 2 1 0 –1 –1
A 3 2 1 0 0
2 1 0 0 0
The principle that dictates players to choose the most “cautious” minimax and maximin strategies is called minimax principle . This principle follows from the reasonable assumption that each player strives to achieve a goal opposite to that of his opponent.
It can be proven that , i.e. .
In example 1 α = β. If those. minimax coincides with maximin, then this game is called saddle point game. Saddle point is a pair of optimal strategies ( Ai,Bj). In example 1, the game has a saddle point ( A 3, B 3). In this case, the number α = β is called (pure) at the cost of the game(the lower and upper price of the game are the same). This means that the matrix contains an element that is the minimum in its row and at the same time the maximum in its column. In example 1, this is element 0. The price of the game is 0.
Optimal strategies in any game have an important property, namely: stability . This means that each player is not interested in deviating from his optimal strategy, since this is unprofitable for him. Deviation from the player's optimal strategy A leads to a decrease in his winnings, and the player’s one-sided deviation IN- to increase losses. They say that the saddle point gives equilibrium position .
Example 2. First side (player A) selects one of three types of weapons – A 1, A 2, A 3, and the opponent (player IN) - one of three types of aircraft: IN 1, AT 2, AT 3. Target IN– breakthrough of the defense front, goal A- aircraft destruction. Probability of aircraft being hit IN 1 weapons A 1 equal to 0.5, aircraft AT 2 weapons A 1 equal to 0.6, aircraft AT 3 weapons A 1 equals 0.8, etc., i.e. element a ij payment matrix – probability of aircraft being hit INj weapons Ai. The payoff matrix has the form: Solve the game, i.e. find the lower and upper price of the game and optimal strategies.
Solution. In each line we find the minimum element and write it in the additional column. In each column we find the maximum element and write it in the additional line.
IN A IN 1 AT 2 AT 3 α i
A 1 0,5 0,6 0,8 0,5
A 2 0,9 0,7 0,8 0,7
A 3 0,7 0,5 0,6 0,5
β j 0,9 0,7 0,8 0,7 0,7

In the additional column we find the maximum element = 0.7, in the additional line we find the minimum element = 0.7.
Answer: = 0.7. Optimal strategies – A 2 And AT 2.

Example 3. A game of toss. Each player can choose one of two strategies during his move: heads or tails. If the selected strategies coincide A receives a win +1, if there is a mismatch B receives payoff 1 (i.e. A receives a payoff –1). Payment matrix:
Find the lower and upper price of the game. Does the game have a saddle point?

Solution.

IN A IN 1 AT 2
A 1 1 -1 -1
A 2 -1 1 1
1 1 -1 1

α = -1, β = 1, i.e. A will lose no more than 1, and B will lose no more than 1. Since α ≠ β, the game has no saddle point. There is no equilibrium position in this game, and an optimal solution cannot be found in pure strategies.

Example. Find the lower price of the game, the upper price of the game, determine saddle points, optimal pure strategies and game price (if they exist).