Sharing guest: Dr. Zhang Hongzhi, an algorithm expert of Meituan
Editing and finishing: Liao Yuanyuan Midea's group
Production platform: DataFunTalk
Introduction: Meituan, as the largest online local life service platform in China, connects hundreds of millions of users and tens of millions of merchants, which contains rich knowledge related to daily life. Since 2118, Meituan Knowledge Mapping Team has been focusing on mapping construction and using knowledge mapping to empower business and improve user experience. Specifically, the "Meituan Brain" is a knowledge brain in the field of life service, which is formed by deeply understanding tens of millions of merchants, billions of commodities and dishes, billions of user comments and millions of scenes in Meituan business. At present, the "Meituan Brain" has covered billions of entities and tens of billions of triplets, which has verified the effectiveness of the knowledge map in the fields of catering, take-away, hotels and comprehensive services. Today, we introduce the construction and application of life service knowledge map in Meituan's brain, mainly focusing on the following three aspects:
-
What is Meituan's brain?
The following is the overall RoadMap built by Meituan Brain. At first, the construction of catering knowledge map was started in 2118, and the rich structured data and user behavior data of Meituan were preliminarily mined, and some important data dimensions were further explored, such as emotional analysis of user comments on meals. In 2119, represented by tag map, the unstructured user comments were deeply mined. After 2121, combined with the characteristics of various fields, in-depth data mining and construction will be carried out one by one, including commodities, food, wine tours, comprehensive maps and cross maps.
-
In search, users usually need to abstract their intentions into a series of refined search keywords that search engines can support. Tagging knowledge map is to carry users' needs through "tags", thus improving users' search experience. For example, through the tag knowledge map, users can directly search for "taking children" or "dating couples", and then they can return the appropriate merchants/content offerings. From the perspective of information gain, the unstructured text of user comments contains a lot of knowledge (such as the scene, crowd and environment suitable for a merchant, etc.), and information gain can be realized by mining unstructured data. The team takes the massive review data in the field of life service as the main source of knowledge, and through key technologies such as tag mining, relationship mining between tags and tag-merchant association, the team combs user needs, scenarios and main concerns from bottom to top to complete the map construction.
The construction of label knowledge map is divided into the following four parts: knowledge extraction, relationship mining, map marking and map application.
① knowledge extraction
tag mining adopts a simple sequence tagging framework, including Single span tag mining and skipping tag mining. In addition, it also combines semantic discrimination or context discrimination to obtain more accurate tags by remote supervised learning and result voting.
② Relationship mining
Synonym mining: Synonym mining is defined as finding synonyms in N for each word in M given a pool of N words and M business tag words. The existing synonym mining methods include search log mining, encyclopedia data extraction, rule-based similarity calculation, etc., which lack certain universality. At present, our goal is to find a tag synonym mining method that is universal and can be widely used in large-scale data sets.
The following is the specific scheme of synonym mining given by the author. Firstly, the offline tag pool or online query tag is represented by vector to obtain the vector index, and then the vector hash is recalled to further generate the synonym pair candidate of TopN for this tag. Finally, the synonym discrimination model is used. The advantage of this scheme is that it reduces the computational complexity and improves the operational efficiency. Compared with the generation of inverted index candidates, synonyms without overlap can be recalled, with high accuracy and simple parameter control.
for labeled data, the mainstream tag word embedding representation methods include word2vec, BERT and so on. Word2vec method is simple to realize, and the word vectors are averaged, ignoring the order of words; BERT can capture richer semantic representation through pre-training, but the effect of directly taking [CLS] flag bit vector is equivalent to word2vec. Sentence-Bert improved the Bert model. tagA and tagB token vectors were obtained through the pre-training model of the two towers, and then the similarity of the two vectors was measured by cosine similarity, thus the semantic similarity of the two tags was obtained.
For unlabeled data, the sentence representation can be obtained by the method of comparative learning. As shown in the figure, the original Bert model has high vector similarity for sentences with different similarities. After the adjustment of comparative learning, the vector similarity can better reflect the text similarity.
Contrastive learning model design: firstly, a sentence is given, and this sample is disturbed to generate a sample pair. Generally speaking, a pair is formed by adding an offensive attack at the embedding level, Shuffling at the lexical level or dropping some words. In the process of training, the similarity of the same sample in batch is maximized and the similarity of other samples in batch is minimized. The final result shows that unsupervised learning can achieve the effect of supervised learning to a certain extent, and the effect of unsupervised learning+supervised learning is significantly improved compared with supervised learning.
synonym discrimination model design: two tag words are spliced into Bert model, and the tags are obtained through multi-layer semantic interaction.
tag hyponymy mining: Lexical inclusion relation is the most important source of hyponymy mining, and it can also be mined by combining semantic or statistical methods. However, the current difficulty is that it is difficult to unify the upper and lower standards, and it is usually necessary to modify the algorithm mining results in combination with the needs of the field.
③ map marking: how to construct the relationship between labels and merchants' supply?
given a tag set, the candidate tag-POI can be obtained by blocking a threshold according to the frequency of tags and their synonyms appearing in the UGC/ group list of merchants. There will be a problem that even if the frequency is high, it is not necessarily related, so it is necessary to filter the bad case through a merchant marking discrimination module.
merchant marking considers three levels of information: label and merchant, user comments and merchant Taxonomy. Specifically, label-merchant granularity, the label is spliced with merchant information (merchant name, merchant three-level category, merchant top label) and input into Bert model for discrimination.
The microscopic granularity of user comments can judge whether there is a positive, negative, irrelevant or uncertain relationship between each tag and comments (called evidence) that mention the tag, so it can be used as a discriminant model of four categories. We have two options. The first one is based on multi-task learning. The disadvantage of this method is that the cost of adding a label is high. For example, if a label is added, some training data must be added for the label. In the end, the author adopts a discriminant model based on semantic interaction, and inputs tags as parameters, so that the model can discriminate based on semantics, thus supporting the dynamic addition of tags.
The discriminant model based on semantic interaction, which firstly represents vector, then interacts, and finally aggregates the comparison results, has a faster calculation speed, while the method based on BERT has a large amount of calculation but a higher accuracy. We balance accuracy and speed. For example, when POI has more than 31 evidence, we tend to use lightweight methods. If there are only a few evidence in POI, it can be judged with high accuracy.
From a macro perspective, there are three main relationships: definitely not, probably, definitely. Generally, the voting results are obtained through the association results at the merchant level, and at the same time, some rules will be added. When the accuracy requirement is high, manual review can be conducted.
④ Atlas application: the direct application of the mined data or the knowledge vector representation application
In the scene related to merchant knowledge question answering, we answer the user's questions based on the merchant's marking results and the evidence corresponding to the labels.
firstly, identify the tags in the user's query and map them to the id, and then pass them to the index layer through search recall or sorting layer, so as to recall the merchants with marking results and show them to the C-end users. A/B experiments show that the user's long tail demand search experience has been significantly improved. In addition, some online experiments have been done in the field of hotel search, and the search results have been improved obviously through supplementary recall methods such as synonym mapping.
it is mainly realized by GNN model, and two kinds of edges are constructed in the composition, namely Query-POI click behavior and Tag-POI association information; Graph Sage is used for graph learning. The goal of learning is to judge whether Tag and POI are related or whether Query and POI are clicked, and then sample according to the intensity of correlation. The online results show that when only using Query-POI information for composition, there is no online gain, and the online effect is significantly improved after introducing Tag-POI related information. This may be because the ranking model relies on the click behavior information of Query-POI to learn, and introducing Graph Sage learning is equivalent to changing a learning method, with relatively little information gain; Introducing Tag-POI information is equivalent to introducing new knowledge information, so it will bring significant improvement.
in addition, only accessing the Query-POI vector similarity line will not improve the effect, but the effect will be significantly improved after accessing the Query and POI vectors. This may be because the feature dimension of search is high, and it is easy to ignore the feature of vector similarity, so the feature dimension is improved after the Query and POI vectors are spliced in.
this task predicts the masked Item clicked by the user through the currently known item. For example, when obtaining the context representation of an Item, the related Attribute information is also represented by vectors, so as to judge whether the Item has Attribute information.
in addition, you can also make Masked Item Attribute prediction, so as to integrate the knowledge map information of tags into the sequence recommendation task. The experimental results show that the accuracy of knowledge information is improved by orders of magnitude on different data sets. At the same time, we also do the work of online transformation, and make the Item representation as a vector recall; Specifically, recalling topN-like items based on the items clicked by users in history can supplement the online recommendation results and significantly improve the food list recommendation page.
-
The goal of building a knowledge map of dishes is to build a systematic understanding of dishes on the one hand, and to build a relatively complete knowledge map of dishes on the other hand. Here, the strategies for building a knowledge map of dishes are explained from different levels.
** * Understanding of dish names **
The dish names contain the most accurate and lowest-cost information, and the understanding of the dish names is also the premise of the subsequent explicit knowledge reasoning generalization ability. Firstly, the essential words/main dishes of the dish names are extracted, and then the sequence labels are used to identify each component in the dish names. Different models are designed for the two scenarios. In the case of word segmentation, word segmentation symbols are added to the model as special symbols. The first model is to identify the corresponding type of each token. For the case without word segmentation, it is necessary to do the task of Span-Trans first, and then reuse the module with word segmentation.