Current location - Recipe Complete Network - Catering industry - Label system for user portrait
Label system for user portrait
First, why do you need labels?

With the rise of the Internet, a large amount of content is generated in the form of video and uploaded to major platforms every day. In the face of massive content, how to improve the efficiency of intelligent distribution of these contents is an important issue faced by major platforms.

To achieve this goal, the first step is to know our users better. The process of creating user portraits is essentially the process of labeling user information. Through the construction of the label system, on the one hand, the data becomes readable and easy to understand, which is convenient for business use; On the other hand, the labels are organized and arranged through the label category system, so as to match the needs of changing business scenarios in the future in a more suitable organizational way. How to plan the labeling system reasonably has a great influence on the operation of products, so labeling is a particularly key part of product strategy.

Second, what is the label?

In different scenarios, the definition of tags is often different. If you are too entangled or obsessed with a single concept definition, you will not be able to promote the actual business and work. All our technical and business work is for business objectives and should be practical and applicable, not purely academic.

Generally speaking, we think that tag refers to "readable, understandable and valuable data that can be directly used by business by using original data and outputting it through certain processing logic."

There are two ways to organize the label system: structured label and semi-structured/unstructured label.

The so-called structured label is to formulate a hierarchical label system according to a certain classification, in which the upper label is the parent node of the lower layer and contains the relationship in crowd coverage. Some brand-oriented advertisements often use this structured labeling system for audience orientation. It should be pointed out that the labels in this system are formulated according to the logic of the demander, and some secret-related labels that are of great significance to the media, such as the military, should not appear in the label system because there is no clear demand correspondence.

Another way to organize interest tags is to set corresponding tags according to specific needs. All tags cannot be described in the same classification system, and there is no clear parent-child relationship. This semi-structured or unstructured tag system often contains a set of relatively accurate tags, so it is mainly suitable for a variety of goals, especially for the demand of accurate content delivery with simultaneous effect goals.

Choosing a structured interest tag system or an unstructured interest tag system is more based on the decision of business scenarios. When the tags are only the intermediate variables needed by the delivery system and input as variables of CTR prediction or other modules, then the structured tag system is actually unnecessary, and the tags should be planned or mined completely in an effect-driven way, and there is no need for hierarchical relationship constraints between tags.

There is also a special label form, keywords. Dividing people and placing advertisements directly according to the keywords of searching or browsing content can often achieve more accurate results. Keywords This label system is hierarchical and completely unstructured. Although it is easy to understand, it is not easy to operate. However, due to the important position of search in the Internet, a special technology for selecting and optimizing keywords has been fully developed, so this kind of tag is also commonly used in practice.

Third, how to build a labeling system?

1. Determine the object

To build labels, we must first know what kind of objects to label, that is, to determine the objects. Object is the abstraction of the research object in the objective world, including both physical objects and virtual objects. In the process of enterprise management, many objects can be abstracted. These objects are cross-related in different business scenarios and are important assets of enterprises, which need to be fully described and understood.

After summarizing the experience of many industries and many label systems, the objects can be divided into three categories: people, things and relationships. These three objects are different. "People" often have initiative and wisdom, can take the initiative to participate in social activities, take the initiative to play a driving role, and are often the senders of relationships. "Things" are often passive, including raw materials, equipment, buildings, simple tools or function sets, and are the recipients of relationships. When the equipment in the conventional sense has enough artificial intelligence and becomes a robot, it belongs to the category of "people". "People" and "things" are both physical objects, that is, tangible objects, while "relationship" belongs to a virtual object and is the definition of the connection between two physical entities. Because relationships are very important, enterprises are defining, repeating, recording, analyzing and optimizing relationships in most cases, so they need the object of "relationship" to describe and study relationships. According to different motives, relationships can be divided into factual relationships and attribution relationships. Fact relation can produce quantifiable fact measure, and attribution relation is only an attribution attribute.

By defining the definition and classification of objects, we can determine the objects to establish a label system according to the needs of the business. There are so many content-based objects that it is impossible to establish an independent labeling system for all the objects. Generally, we will sort the labels according to the business flow demand, the number of manuscripts, the similarity of categories and the relationship between categories, and determine the priority and necessity of labels.

2. Design framework

Generally speaking, Internet products need to use a large number of tag categories. When the number of label items exceeds a certain number, it becomes very troublesome for business personnel to use or find labels, and it becomes very difficult to manage labels. Therefore, the author draws lessons from the classic methods in library management: a large number of books need a special book classification system to number the books and arrange them in the cabinet according to the numbers. Readers can quickly find the books they need through the numbered index when consulting books, and librarians can also arrange all the books conveniently and effectively.

To build a label category system, we must first determine the root directory. The root directory is the object mentioned above, so there are three root directories: people, things and relationships. The root directory, like the root, directly determines what tree it is.

If the root directory is a person, that is, the label category system is a person's label category system, and each root directory has an identification column to uniquely identify a specific object. The category of human includes two sub-roots: natural person and corporate legal person. At the same time, a natural person group or a company as a legal person can also be considered as the secondary root within the scope of human objects. Examples of natural persons can be consumers, employees, franchisees, etc. Therefore, a label category system of consumers, employees and franchisees can be formed. Similarly, legal persons can be subdivided into entity companies, marketing companies, transportation companies and so on. From the largest "person" root to the sub-root of "natural person/legal person/natural person group/legal person group" and then to the example "user/employee/franchisee", they all belong to the category of root directory.

Similarly, things can be subdivided into sub-categories such as objects, objects, collections of objects, and collections of objects, and roots can also be subdivided under each sub-category. Relationships can also be subdivided into "relationship records" and "relationship sets".

Label classification system is to design, allocate and classify the labels needed by business by using classification system. The category system itself is an organization that classifies a certain kind of target, and the classification usually uses the first category, the second category and the third category as the classification names.

Category structure can be compared to tree structure, and the first-level branches growing from the roots are called first-level categories; The second branch that grows from the first branch is called the second category; The third branch that grows from the second branch is called the third category. The general category structure can be set as a three-level hierarchical structure. A category without the next classification is called a leaf category, and the specific leaf hanging on the leaf category is a label.

It should be noted that the construction of category framework is generally based on business, because the core meaning of category system is to help users find and manage data/tags quickly.

The following figure shows the customer label category system constructed by a bank, in which the customer is the root directory, which will be uniquely identified by custom_id, and there are primary categories such as basic characteristics, asset characteristics, behavior characteristics, preference characteristics, value characteristics, risk characteristics and marketing characteristics under the root directory. The first category of basic features is divided into two categories: ID card information, demographic information, address information and occupation information. The second type of address information is further subdivided into three categories: billing address, home address, work address and mobile phone address. Under the three-level category of "billing address", there are labels such as "detailed billing address", "billing address postcode" and "province where billing address is located".

After the label category design is completed, the framework of the whole label system will be available. The next thing to do is to fill each leaf category with labels that have commercial value and can be processed, and then complete the design of the whole label system.

Fill in the contents

Through the label category design, there has been a label system framework for an object, but there is no specific label content. Label design is to design a suitable label and mount it into the label category. In this part, the author will try to analyze how to "label" from the technical point of view and from the product point of view.

First, how to disassemble the content. The disassembly of content is divided into three parts: user, content and relationship, as the root directory. The next part about "people" can be divided into: population attribute, interest attribute, behavior preference, publication time and so on. Similarly, regarding the content, we can divide it into "statistics", "quality" and "vector". Then, we split the secondary categories, such as statistical categories including click-through rate, duration, broadcast completion rate, favorable comments and bounce rate.

It is important to note that the usual actions of labeling and labeling others are actually not design labels, but design eigenvalues. For example, the definition of someone is "female, 20-30 years old, white-collar, lively and cheerful", which are the specific characteristic values of gender, age, occupation and personality labels.

These features will cross to some extent, giving this feature more meaning. For example, by crossing the user portrait with the content portrait, we can get the user's long-term and short-term interest matching, the generalized matching of conversation interest, the user's age preference for certain content categories, the user's gender preference for certain content categories and so on. If the user's characteristics and the context of the request are crossed, we will get where the user lives and the changes of the user's interests with time. For example, some users will watch news in the morning and some entertainment information in the evening. There are also some scenes, such as users like to watch videos on the subway, but they like to watch pictures and texts at work. Through the combination of these characteristic values, the user group can be divided as efficiently as possible, so as to realize the accurate distribution of content.

Now, we know how to build a label system and divide user groups through the label system, but to do a good job of labeling, we should not only deconstruct technology from demand, but also base ourselves on "good content". In this part, the author will operate &; The creator simply analyzes how to make a "good label" from the perspective.

If you want to make a label that can impress people, you must first understand the user and cut the pain points of the user.

How can we understand users? One way is to change roles, put yourself in the other's shoes, treat yourself as a user and be a "little white user" who knows nothing, and look at the problem and think about it from this angle.

For example, as an UP owner, you received a marketing order to promote "noise-reducing headphones". Your task is to get users to place orders and complete the value transformation of content. Think about it, how should this story be designed?

The following is a reference copy: you are the manager of a bank, and it is very difficult to maintain customer relations. You can't keep your post. You have a mortgage and a car loan, and you pay 5 thousand yuan a month. Your child is not good at math. Your wife works as a nurse in the Municipal People's Hospital. Her mother has uremia and has been on dialysis for many years. She doesn't love you. When you were young, you thought you could achieve something. Now that's it. All your friends get along better than you. Life is so bad that you need an independent environment to express your emotions. At this time, you put on noise canceling headphones.

This is a typical "user perspective", which describes a scene. It makes you have a strong sense of substitution while watching, and you are involuntarily infected by the content, resulting in emotional fluctuations. Driven by emotions, you can complete orders and realize value transformation.

In addition to the above annotation methods based on content experience, there is another way, which is the "feature value" we mentioned before. The high-precision content tags generated by the algorithm are generally based on video frames, titles, authors, content attributes, geographical attributes, time and so on. These content labels generated by the algorithm can replace manual labeling, thus saving labor costs and improving the production efficiency of content labels. At present, the accuracy of content tagging technology has reached more than 90%, and some tag values are automatically generated by analyzing content through algorithms.

For example, in the video above, the generated tag values may be country dogs, rural areas, millions of broadcasts, dogs, Huanong brothers, cute pets and animals in China.

After several steps such as object determination, frame design, category design, label design and labeling, we have completed the construction of the whole label system. The article is relatively simple and should be used as a guide.

Fourth, some problems.

In the process of landing the label system, we will encounter many problems, and the following problems are also what the author has been thinking about. Any good suggestions can be added to the author's WeChat exchange: shmusk

Timeliness of content: any content, including video or graphic, has a life cycle, and the content is long and short. It is very difficult to predict the life cycle of a content, whether through algorithms or other technologies; Assuming that we already know the life cycle of the content, how to effectively expose the content in an effective cycle is also a difficult problem. How to balance these two issues, timeliness is very important, because it is meaningless to recommend content to users after its life cycle, and the user experience will be poor.

Determination of content quality: how to determine the quality of a content, what is the good standard, how to model it, if it can be modeled, what are the characteristics, and how to effectively use the characteristics to determine our model?

Cold start problem: it is divided into content cold start and user cold start. Content cold start means that a new content enters the platform and is not distributed; User cold start is a new user, and the interaction data and behavior are very sparse. How to make better recommendations, guide more intensive subsequent interactions and increase stickiness, so as to enhance the user experience and better meet the needs of users?