Construction and Application Progress of Meituan's Brain Billion Knowledge Map

Sharing Guest: Dr. Zhang Hongzhi, an algorithm expert of Meituan

Editor: Liao Yuanyuan Midea Group

Production platform: DataFunTalk

Introduction: Meituan, as the largest online local life service platform in China, connects hundreds of millions of users and tens of millions of businesses, and contains rich knowledge about daily life. From 20 18, Meituan Knowledge Mapping Team began to focus on mapping construction, empowering business with knowledge mapping and enhancing user experience. Specifically, "Meituan Brain" is a knowledge brain in the field of life service, which is formed through a deep understanding of tens of millions of merchants, billions of commercial dishes, billions of user comments and millions of scenes in Meituan business. At present, the "Meituan Brain" has covered billions of entities and tens of billions of dollars, which verifies the effectiveness of knowledge maps in the fields of catering, take-away, hotels and comprehensive services. Today, we introduce the construction and application of life service knowledge map in Meituan's brain, mainly focusing on the following three aspects:

What is "Meituan Brain"?

The following is the overall road map of the "Meituan Brain". At first, the construction of catering knowledge map began at 20 18, which initially excavated the rich structured data and user behavior data of Meituan, and deeply excavated some important data dimensions, such as the emotional analysis of users' food comments. 20 19 takes tag graph as the representative, and deeply digs unstructured user comments. After 2020, combined with the characteristics of various fields, in-depth data mining and construction will be carried out one by one, including commodities, food, wine tours, comprehensive maps and cross maps.

In search, users usually need to abstract their intentions into a series of refined search keywords that search engines can support. Tagging knowledge map is to carry users' needs through "tags", thus improving users' search experience. For example, through the tag knowledge map, users can directly search for "bringing children" or "dating couples", and then they can return to the appropriate businesses/content providers. From the point of view of information gain, the unstructured text of user comments contains a lot of knowledge (such as the scene, people and environment suitable for a certain business, etc. ), information gain can be achieved by mining unstructured data. Taking the massive review data in the field of life service as the main source of knowledge, the team combed the user's needs, scenarios and main concerns from bottom to top through key technologies such as tag mining, tag relationship mining and tag-business association, and completed the map construction.

The construction of tag knowledge map is divided into the following four parts: knowledge extraction, relationship mining, map labeling and map application.

① knowledge extraction

Tag mining adopts a simple sequential tag architecture, including single-span tag mining and skip tag mining. In addition, it also combines semantic discrimination or context discrimination to obtain more accurate labels through remote supervised learning and result voting.

② Relationship mining

Synonym mining: Synonym mining is defined as finding synonyms of each of n words given a pool of n words and m business tag words. The existing synonym mining methods include search log mining, encyclopedia data extraction, rule-based similarity calculation and so on. , lack of universality. At present, our goal is to find a universal tag synonym mining method that can be widely used in large-scale data sets.

The following is the specific scheme of synonym mining given by the author. Firstly, vector is used to represent offline tag pool or online query tag to obtain vector index, and then vector hash is called to further generate synonym pair candidates for TopN for this tag. Finally, the synonym discrimination model is used. The advantage of this scheme is to reduce the computational complexity and improve the operational efficiency. Compared with generating inverted index candidates, it can recall synonyms without overlap, with high accuracy and simple parameter control.

For tagged data, the mainstream tag word embedding representation methods include word2vec, BERT and so on. Word2vec method is simple to realize, averaging word vectors and ignoring the order of words; BERT can capture richer semantic representation through pre-training, but the effect of directly taking [CLS] flag bit vector is equivalent to word2vec. Sentence -Bert improved the Bert model. TagA and tagB tag vectors are obtained through the pre-training model of two towers, and then the similarity of the two vectors is measured by cosine similarity, thus the semantic similarity of the two tags is obtained.

For unlabeled data, sentence expression can be obtained through comparative learning. As shown in the figure, the original Bert model has higher vector similarity for sentences with different similarities. After comparative learning and adjustment, the vector similarity can better reflect the text similarity.

Contrastive learning model design: firstly, given an emotion, this sample is disturbed to generate a sample pair. Generally speaking, a pair is formed by adding a confrontational attack to the embedding layer, shuffling cards or dropping some words at the lexical level. In the training process, maximize the similarity of the same batch of samples and minimize the similarity of other batches of samples. The final result shows that unsupervised learning can achieve the effect of supervised learning to a certain extent, and the effect of unsupervised learning+supervised learning is significantly higher than that of supervised learning.

Synonym discrimination model design: two tag words are spliced into Bert model, and tags are obtained through multi-layer semantic interaction.

Mining hyponymy in tags: Lexical inclusion relation is the most important source of hyponymy mining, and it can also be mined by semantic or statistical methods. However, the current difficulty is that the upper and lower standards are difficult to unify, and it is usually necessary to modify the algorithm mining results according to the needs of the field.

③ Map labeling: How to construct the relationship between labels and suppliers?

Given a group of tags, the candidate tag interest points can be obtained by blocking the threshold according to the frequency of tags and their synonyms appearing in the UGC/ group list of merchants. There will be a problem, that is, even if the frequency is high, it is not necessarily relevant, and it is necessary to filter bad cases through a merchant mark discrimination module.

Merchant tagging considers three levels of information: tags and merchants, user comments and merchant classification. Specifically, label-merchant granularity, label and merchant information (merchant name, merchant third-level category, merchant top label) are spliced and input into Bert model for discrimination.

The microscopic granularity of user comments can judge whether there is a positive, negative, irrelevant or uncertain relationship (called evidence) between each tag and the comments that mention the tag, so it can be used as a four-category discriminant model. We have two choices. The first one is based on multi-task learning. The disadvantage of this method is that the cost of adding labels is very high. For example, if you add a label, you must add some training data for the label. Finally, the author adopts a discriminant model based on semantic interaction, and inputs tags as parameters, so that the model can discriminate based on semantics, thus supporting the dynamic addition of tags.

The discriminant model based on semantic interaction, first vector representation, then interaction, and finally aggregation of comparison results, is faster, while the method based on BERT has a large amount of calculation but higher accuracy. We balance accuracy and speed. For example, when POI has more than 30 evidences, we tend to use the lightweight method. If there is only a little evidence in POI, it can be judged with high accuracy.

From a macro perspective, there are three main relationships between labels and categories: definitely not, probably, definitely. Generally, the voting results are obtained through the association results at the merchant level, and some rules will be added. When the accuracy requirement is high, manual recheck can be carried out.

④ Map application: the direct application of mining data or the application of knowledge vector representation.

In the scenario related to merchant knowledge quiz, we answer the user's questions based on the merchant rating results and the evidence corresponding to the labels.

First, identify the tags in the user's query and map them to the id, and then pass them to the index layer through the search recall or ranking layer, so as to recall the merchants with marked results and show them to the C-end users. A/B experiments show that the user's long tail demand search experience has been significantly improved. In addition, some online experiments have been done in the field of hotel search, and the search results have been significantly improved through supplementary recall methods such as synonym mapping.

Mainly realized by GNN model, two kinds of edges are constructed in composition, query -POI click behavior and Tag-POI association information; Graph Sage is used for graphics learning. The goal of learning is to judge whether the tag and POI are related or whether the query and POI are clicked, and then sample according to the intensity of correlation. The online results show that there is no online gain when only using Query-POI information for composition, and the online effect is significantly improved after introducing Tag-POI related information. This may be because the ranking model relies on the click behavior information of Query-POI for learning, and introducing graph Sage learning is equivalent to changing a learning method, with relatively small information gain; Introducing Tag-POI information is equivalent to introducing new knowledge information, so it will bring significant improvement.

In addition, only accessing the query -POI vector similarity line can not improve the effect, but the effect is significantly improved after accessing the query and POI vector. This may be because the feature dimension of search is high and it is easy to ignore the feature of vector similarity, so the feature dimension is improved after the query and POI vector are spliced in.

This task predicts the blocked items clicked by users through the currently known items. For example, when obtaining the context representation of an item, the related attribute information is also represented by a vector, so as to judge whether the item has attribute information.

In addition, it can also predict the properties of shielded items, so as to integrate the knowledge map information of tags into the sequence recommendation task. The experimental results show that the accuracy of knowledge information is improved by several orders of magnitude on different data sets. At the same time, we also do the work of online conversion, taking the item representation as vector memory; Specifically, recalling topN items according to the items clicked by users in history can supplement the online recommendation results and significantly improve the food list recommendation page.

The goal of building a knowledge map of dishes is to build a systematic understanding of dishes on the one hand and a relatively complete knowledge map of dishes on the other. This paper expounds the construction strategy of dish knowledge map from different levels.

* * * Understanding of dish names * *

The name of the dish contains the most accurate information of the name of the dish with the lowest cost, and the understanding of the name of the dish is also the premise of the generalization ability of the subsequent explicit knowledge reasoning. Firstly, the key words/main courses in the dish name are extracted, and then the components in the dish name are identified by sequence tags. Different models are designed for these two situations. In the case of word segmentation, word segmentation symbols are added to the model as special symbols. The first model is to identify the corresponding type of each token. For the case without word segmentation, it is necessary to do the task of Span-Trans first, and then reuse the module with word segmentation.

The understanding of dish names is an important source of information, but its knowledge is relatively limited, so a preliminary character reasoning based on deep learning model is proposed, which can summarize different literal expressions. However, the performance is not good when professional knowledge is needed, and occasionally there will be a complete match of words.

Mining some basic knowledge of recipes from texts with rich knowledge content to build a source knowledge base; Then it is mapped to a specific SKU through generalized reasoning. In the reasoning of ingredients, there are many dishes such as braised pork. According to statistics, among the 10 dishes with pork belly, 4 dishes refer to pork belly and 6 dishes refer to pork belly with skin, so the meat is transformed into pork belly with skin. Correspondingly, there are many recipes in the Buddha's Leaping Wall. First, by calculating the probability of each component, you can get a threshold, and then show what the formula is.

Multi-source data mining, based on the results of vegetable name understanding, constructs a solid knowledge triple, and also depends on the generalization rules of vegetable name understanding results. This strategy is mainly suitable for dealing with labels such as ingredients, efficacy and people. This method has good accuracy and generalization ability, but the coverage rate is low.

There are some useful training data in the business, such as the self-consistent in-store classification tree edited by 6.5438+million merchants. Based on these data, 500 million pairs of 30G corpora can be generated. In the model training, the labels/shops classified by the menu will be randomly replaced, and the model will judge whether the labels/shops are replaced; When only the name of the dish is input, the 50% probability of discarding the store name makes the model robust. At the same time, the model is improved by materialization, and the classification label is trained as bert's word. This method is suitable for downstream models. With the data of 10w, the accuracy of the up-down/synonym model in the menu is improved by 1.8%.

First, use ReseNet to edit the pictures of the menu, and use Bert model to encode the text information of the menu. Through comparison and learning loss, we can learn the matching information between the text and the food in the store. The twin-tower model is adopted here. On the one hand, the downstream application is more convenient, and the single-tower model can be used independently, and the images of dishes can be expressed and cached by reasoning. On the other hand, the content of the picture is simple and interactive modeling is not needed. The training goal is to match the pictures and dishes, align the pictures and dishes, and align the labels.

Based on multimodal information, it can be used to predict the category of dishes or complete menu information. For example, it will be more intuitive and accurate to predict "pork and cabbage" with picture information. Based on text and view modal information, multi-view semi-supervised menu attribute extraction is carried out. Taking the extraction of cooking methods as an example, firstly, a training sample of cooking methods (braised pork-braised pork) is generated; Then CNN model is used to train and predict menu cooking methods, and Bert model, Finetune text model or multimodal model are used to predict dish cooking methods based on merchant /tab/ dishes and comment information; Finally, vote on the two models, or splice the two features to predict.

To sum up, we make a corresponding summary of the construction of the knowledge map of dishes. Understanding of dishes is more suitable for the initialization of SKU; Deep learning reasoning model and explicit reasoning model are more suitable for synonyms, hyponyms and cuisines. Finally, we want to solve the problems of incomplete information, multiple attribute dimensions and tag data through multimodal and structured pre-training and reasoning, so this method is suitable for almost all scenarios.

Today's sharing is over. Thank you.

Share guests:

What are the names of the five fingers?

What about cold fish skin? The delicious way to cold fish skin.

What to eat three months after delivery, and what food is not suitable for eating?

Dog tail grass sparrow snake wheat locust vole frog vegetable insect food chain

What's the best match for grandpa Lu's civilians?

How to make mutton bone hotpot

What are the consumption traps in Hulunbeier?

Learn to cook stewed chicken noodles with mushrooms, which are delicious and satisfying, and the aroma is overflowing.

Which beautiful woman can share with me the various methods of cream stew?