Current location - Recipe Complete Network - Take-out food franchise - Establish user preference model based on user behavior analysis
Establish user preference model based on user behavior analysis

Establishing a user preference model based on user behavior analysis We often abstract the idea of ??personalized recommendation simply as: infer the user's interests through the user's behavior, and then recommend items that meet their interests to the user.

Then we actually need to build a user preference (preference is interest) model through user behavior analysis, which contains one or more preferences of each user.

Insert a paragraph about words like "user behavior" and "user interests". Most people have a default perception, and even the understanding of such words may have solidified into common sense, so I rarely see articles using these words.

when explaining them.

I feel that when it comes to algorithmic models, an unqualified broad understanding of these words can easily affect the in-depth understanding of algorithmic models, leading to vague perceptions without knowing it.

Because different people may have the same basic understanding of these words, but their extended understandings vary.

This article will give a limited explanation, and the user behavior discussed in this article refers to behavior on the network (which can be a telecommunications network, the Internet).

Concept Explanation Entity Domain When we want to build a user preference model based on user behavior analysis, we must limit user behavior and interest topics to an entity domain.

Personalized recommendations are implemented in specific recommendations in a certain entity domain.

For example, for a reading website, the entity domain includes all books, which we can call the book domain.

Others include personalized music recommendations, personalized movie recommendations, personalized information recommendations, etc.

User Behavior: Users click on information on portals, comment on information, post status on social networking sites, comment on status, browse products on e-commerce websites, purchase products, review products, and various behaviors on other types of websites are all user behaviors.

The user behavior referred to in this article refers to the user's behavior on a certain entity domain.

For example, user behaviors in the book domain include reading, purchasing, rating, commenting, etc.

The interest dimension of interest topic users is also the interest limited to a certain entity domain, and can usually be expressed in the form of tags.

For example, for book reading, the topic of interest can be "suspense", "technology", "emotion" and other classification tags.

It is worth mentioning that interest topics are just dimensions of interest abstracted from user behavior, and there is no unified standard.

For example, the book classification labels of QQ Reading and Douban Reading are quite different.

The granularity of the interest dimension is not fixed, just like the portal website has first-level categories such as "News", "Sports", and "Entertainment", and under the news there are second-level categories such as "Domestic", "Social", "International", and Entertainment

There are secondary categories of "stars", "zodiac signs" and "gossip" below.

What granularity of interest space we choose depends on our requirements for the user preference model.

The interest space is a collection of interest dimensions at the same level. For example, in Douban Reading, you can use "newly released", "popular", "special price", and "free" to form an interest space (of course, if you use this interest space to represent users

interest is too rough, this is just a hypothesis), you can also use "novel", "fantasy", "computer", "technology", "history"..."food" to form an interest space.

These are two different classification dimensions.

If "newly released" is also added to the latter set, it will obviously be a bit confusing.

It is worth mentioning that this is not impossible. It depends on how to view this collection. If it is not regarded as a content-based classification, but a book tag library, then it is also feasible and even conducive to establishing a better system.

Model.

I mentioned it later in this article.

User behavior data Xiang Liang has a detailed introduction in Section 2.1 of his "Recommendation System Practice".

Usually, the easier-to-understand data generated after the aggregation and processing of behavior logs is a session log describing user behavior.

This kind of log records various user behaviors. For example, in a book reading app, these behaviors mainly include clicks, trial readings, purchases, reading (in local apps, reading behaviors may not be tracked), ratings, and comments.

Establishing a user preference model The core task of establishing a user preference model based on user behavior analysis is to convert user behavior into user preferences.

We adopt the way of thinking of matrix operations and take book reading as an example.

The following figure represents the user (user) collection: The following figure represents the book (item) collection: Then the user's behavior matrix can be expressed as: the rows represent users, and the columns represent books. For the time being, we only consider the purchase behavior of books, and 1 means that the user has read the book.

Book, 0 means the user has not read the book.

How to convert the above user behavior matrix into a user interest matrix (that is, the rows represent users and the columns represent interest dimensions). An obvious method is that we first determine the corresponding relationship matrix between books and interest dimensions.

The premise of this is that we determine what kind of interest space to use.

A common way is for experts to give the classification results of some samples, which is training data in a general sense, and then use the classification algorithm to obtain the classification model, and then apply it to the classification problem of the remaining data to solve the classification problem of the remaining large amounts of data.