paper:
address of the paper: conclusion with graph neural networks "Sr-gnn
github:/cripac-dig/Sr-gnn
session-based recommendation is generally to model sequential sessions, code the whole session into a hidden vector, and then use this hidden vector to predict the next click. However, this method does not take into account the direct and complicated transitions of items, that is, there is a complicated node pointing relationship in the directed graph in addition to the time sequence between items in the clicked session, so the previous method is not enough to model the click sequence well.
The existing session-based recommendation methods mainly focus on circular neural network and Markov chain. This paper puts forward two shortcomings of the existing methods:
1) When the number of user behaviors in a session is very limited, these methods are difficult to obtain accurate user behavior representation. For example, when using RNN model, the expression of user behavior is the output of the last unit, which is not very accurate.
2) According to previous work, it is found that the transfer mode between items is a very important feature in conversation recommendation, but RNN and Markov process only model the one-way transfer relationship between two adjacent items, ignoring other items in the conversation.
In order to overcome the above shortcomings, this paper proposes a graph neural network pairing method to model the user's session:
Here, how to recommend graph sequence
V = {v1,v2...vm} as all items and S = {} as items clicked in a session in chronological order is introduced in detail. The goal of this paper is to predict the user's next item vs,n+1.
we construct a subgraph for each Session, and obtain its corresponding out-of-degree and in-degree matrices.
suppose a click sequence is v1->; v2-> v4-> V3, then the subgraph it gets is shown in the red part in the following figure:
Another example, a click sequence is v1->; v2-> v3-> v2-> V4, then its subgraph is as follows:
At the same time, we will construct an out-of-degree and in-degree matrix for each subgraph, and normalize each row of the out-of-degree and in-degree matrix, such as our sequence v1-> v2-> v3-> v2-> The matrix corresponding to v4 is as follows:
How are the values in this matrix calculated? Let's talk about it below:
Look at the outgoing matrix on the left, the first line is 1 1 1 1, which represents V 1->; V2, because v1 has only one pointing item, is 1; Look at the second line, 1 1 1/2 1/2. Because v2 has edges pointing to v3 and v4, each value becomes 1/2 after normalization. The calculation method of the in-degree matrix is the same, so I won't say it again.
in this paper, GRU unit is used for sequence modeling, and graph information is embedded into neural network, so that GRU can fully learn the relationship between items. Traditional GRU can only learn the relationship between two adjacent items, and after adding graph information, it can learn the information of the whole session subgraph.
The calculation formula is as follows:
In order to understand this calculation process, we still use the previous example: v1-> v2-> v3-> v2-> V4 to analyze the input-output process step by step.
(1) is the input corresponding to the I-th click in conversation S at time t, and it is n? 2n matrix, that is, the complete matrix of the conversation subgraph, but one of the rows, that is, the row corresponding to item vi, has a size of 1? 2n, n represents the number of different items in the sequence.
according to the example, if I takes 2, it is [1 1 1/2 1/2 1/2 1 1/2 1]
Further, it can be disassembled into [,]
(2) It can be understood as the corresponding embedding vector of the I-th item in the sequence, which changes with the training of the model and can be understood.
(3)? H is the weight vector of d*2d, and it can also be regarded as a block matrix, which can be understood as H=[Hin|Hout], and each block is a vector of d * d.
then let's look at the calculation process:
1) [...], and the result is a matrix of d * n, and after transposition, it is a matrix of n*d, which is counted as
2): H is equivalent to [? ], that is, after disassembly, multiply and then splice, so the result is a vector of 1 * 2d.
The above is the complete calculation process of the input of the ith click. It can be seen that before entering the GRU calculation, the graph information is embedded into the neural network by multiplying it with the As,i matrix, which deepens the interactive information between item learned by the neural network.
In addition, it is the calculation process of GRU, which is different from the original GRU in that the input is changed from xt to as,i embedded with graph information.
The normal sample also has an update gate and a reset gate, and the calculation method is exactly the same as that of the original GRU.
What's here is actually equivalent to that in the original gru, except that in SR-GNN, I doesn't change during a round of calculation, which is equivalent to each item going into GRU to calculate and getting its own vector, that is to say, it is constantly changing during the calculation of GRU. It's easier to understand if you look at the source code:
hidden is in the formula. Every step calculation in gru will be updated. I have a question here. If all the hidden items are updated, then all the items in the whole sequence should be calculated in GRU in parallel, and each step will get its own vector. When the vector of each item is updated, the next step will be calculated according to the new calculation, and then the next step will be calculated.
The calculation process is probably as follows:
There are four GRU parallel calculations, and their hidden status is not updated every time, and all hidden and graph information is considered for input.
from the above figure, it seems that each item has to go through t steps to get its own item-vec, so after t steps, we get the vectors of all items in the sequence, that is, the vectors drawn in blue boxes in the figure. With these vectors, how can we get the prediction results? This leads to the next question.
observing the model structure above, we can see attention. Yes, we think that not all the item-vec in a session have an impact on the prediction results, and some items have a great impact on the results, while others have a little impact, so we make a weighted sum. At the same time, the paper thinks that session is important for the last item-vec, so it is taken out separately:
Formula (6) is a simple attention operation, in fact, from the formula point of view, it is to calculate the weight of each vi and the last vector s1=vn, and then make a weighted sum.
at the last output layer, the inner product is calculated by using sh and embedding of each item, where vi should be the vector from the embedding layer of the item, instead of the hidden:
Finally, the click probability of each item is obtained through a softmax.
The loss function is a cross entropy loss function.
From the data point of view, SR-GNN exceeds the classic GRU4REC.
In this paper, the graph information is cleverly embedded in the neural network, so that GRU can learn the relationship between each item more highly, and it is no longer limited to learning between adjacent items. In recent years, the ideas and methods of graph neural network have been repeatedly used in recommendation systems, and learning graph neural network well should be the next upsurge of recommendation systems.