This article is mainly compiled from the speech given by Mo Wen, a senior technical expert of Alibaba's Computing Platform Division, at the Yun Qi Conference.
Small saplings grow into big trees; A small acorn can grow into a towering tree.
With the advent of the era of artificial intelligence and the explosion of data volume, in the typical big data business scenario, the most common data business model is to use batch technology to process total data and use stream computing to process real-time incremental data. In most business scenarios, the user's business logic is often the same in batch processing and stream processing. However, the two sets of computing engines used by users for batch processing and stream processing are different.
Therefore, users usually need to write two sets of code. Undoubtedly, this brings some extra burdens and costs. Alibaba's commodity data processing often needs to face two different business processes, incremental and full, so Ali is wondering if we can have a unified big data engine technology, and users only need to develop a set of codes according to their own business logic. In such different scenarios, whether it is full data, incremental data or real-time processing, there can be a complete solution to support it, which is also the background and original intention of Ali's choice of Flink.
At present, there are many options for open source big data computing engines, such as Storm, Samza, Flink, KafkaStream and so on. And batch processing of Spark, Hive, Pig and Flink. However, there are only two choices for computing engines that support both stream processing and batch processing: one is ApacheSpark, and the other is ApacheFlink.
Comprehensive consideration of technology, ecology and other aspects. First of all, the technical idea of Spark is to simulate the flow calculation based on batch. Flink, on the other hand, uses stream-based computing to simulate batch computing.
From the perspective of technological development, there are some technical limitations in simulating the process through batch processing, which may be difficult to break through. Flink is based on stream simulation batch processing and has better scalability in technology. In the long run, Ali decided to use Flink as a unified and universal big data engine as a future choice.
Flink is a unified big data computing engine with low latency and high throughput. In Alibaba's production environment, Flink's computing platform can process hundreds of millions of messages or events per second, with a delay of milliseconds. At the same time, Flink provides one-time consistency semantics. And the correctness of the data is ensured. In this way, the Flink big data engine can provide financial data processing capabilities.
Flink's current situation in Ali
The platform built in Alibaba based on ApacheFlink was officially launched on 20 16, and it was implemented in two scenarios of Alibaba's search and recommendation. At present, all Alibaba businesses, including all Alibaba subsidiaries, have adopted a real-time computing platform based on Flink. At the same time, Flink computing platform runs on the open source Hadoop cluster. YARN of Hadoop is used as resource management scheduling, and HDFS is used as data storage. So Flink can seamlessly interface with Hadoop, an open source big data software.
At present, this real-time computing platform based on Flink not only serves Alibaba Group, but also provides Flink-based cloud product support to the entire developer ecosystem through Alibaba Cloud's cloud product API.
How about the large-scale application of Flink in Alibaba?
Scale: whether a system is mature or not, scale is an important indicator. Flink initially launched Alibaba with only a few hundred servers, and now it has reached tens of thousands of servers, one of the few in the world;
Status data: Based on Flink, the internal accumulated status data is already PB scale;
Event: Today, more than one trillion pieces of data are processed on Flink's computing platform every day;
PS: It can undertake more than 472 million visits per second during the peak period. The most typical application scenario is Alibaba's dual 1 1 big screen;
Flink's road to development
Next, from the perspective of open source technology, let's talk about how ApacheFlink was born and how it grew. And how did Ali come in at this critical moment of growth? What contribution and support have you made to it?
Flink was born in the stratosphere of the European big data research project. This project is a research project of Berlin Technical University. Flink did batch computing in the early days, but in 20 14, the core members of the stratosphere hatched Flink, donated Flink to Apache in the same year, and later became Apache's top big data project. At the same time, the mainstream direction of Flink computing is streaming, that is, streaming computing is used to calculate all big data. This is the background of the birth of Flink technology.
20 14 As a big data engine focusing on streaming computing, Flink began to emerge in the open source big data industry. Different from stream computing engines such as Storm and SparkStreaming, it is not only a high-throughput and low-latency computing engine, but also provides many advanced functions. For example, it provides state calculation, supports state management, supports strong consistency of data semantics, supports event time, and watermarks are out of order.
Flink core concepts and basic concepts
What distinguishes Flink from other stream computing engines is actually state management.
What's the situation? For example, when developing a flow calculation system or doing data processing, you may often need to do statistics on data, such as Sum, Count, Min and Max, and these values need to be stored. Because they are constantly updated, these values or variables can be understood as a state. If the data source is reading Kafka and RocketMQ, you may need to record the reading location and offset. These offset variables are the states to be calculated.
Flink provides built-in state management, which can be stored inside Flink without being stored in an external system. This has the following advantages: first, it reduces the dependence and deployment of the computing engine on external systems, and the operation and maintenance are simpler; Secondly, the performance has been greatly improved: if it is accessed from the outside, such as Redis, HBase must be accessed through the network and RPC. If Flink accesses these variables internally, it only accesses them through its own process. At the same time, Flink periodically keeps checkpoints in these states and stores them in a distributed persistent system, such as HDFS. In this way, when Flink's task goes wrong, it will restore the state of the whole stream from the latest checkpoint, and then continue to run its stream processing. There is no data impact on users.
How does Flink ensure that there is no data loss or redundancy during checkpoint recovery? Make sure the calculation is accurate?
The reason is that Flink uses a set of classic Chandy-Lamport algorithm, and its core idea is to regard this traffic calculation as a traffic topology, regularly insert special obstacles from the source point at the head of this topology, and broadcast the obstacles from upstream to downstream. When each node receives all fences, it will take a snapshot of the status. After each node completes the snapshot, the whole topology will be regarded as a complete checkpoint. Next, no matter what happens, it will be restored from the nearest checkpoint.
Flink uses this classic algorithm to ensure strong semantic consistency. This is also the core difference between Flink and other stateless flow computing engines.
The following is Flink's solution to the disorder problem. For example, if you look at the sequence of Star Wars according to the release time, you may find that the story is jumping.
In traffic calculation, it is very similar to this example. The arrival time of all messages is inconsistent with the actual time in the source online system log. In the process of stream processing, it is hoped that messages will be processed according to the order in which they actually appear at the source, not according to the time when they actually arrive at the program. Flink provides some advanced event time and watermarking technology to solve the problem of disorder. So that users can process messages in an orderly manner. This is a very important feature of Flink.
Next, I will introduce the core concepts and ideas when Flink started, which is the first stage of Flink's development. The second stage is 20 15 and 20 17. This stage is also the time for the development of Flink and the intervention of Alibaba. The story comes from a survey we did in the search division in 20 15 years. At that time, Ali had his own batch processing technology and stream computing technology, both self-developed and open source. However, in order to think about the direction and future trend of the next generation big data engine, we have done a lot of research on new technologies.
Combined with a lot of research results, we finally come to the conclusion that solving the general needs of big data computing and integrating the computing engine of batch flow are the development direction of big data technology, and finally we choose Flink.
But Flink of 20 15 is not mature enough, and its scale and stability have not been put into practice. Finally, we decided to set up a Flink branch in Ali and make a lot of modifications and improvements to Flink to adapt to the super-large business scene of Alibaba. In this process, our team not only improved and optimized the performance and stability of Flink, but also made a lot of innovations and improvements in the core architecture and functions, making contributions to the community, such as Flink's brand-new distributed architecture, incremental checkpoint mechanism, credit-based network traffic control mechanism, streaming SQL and so on.
Alibaba's contribution to Flink community
Let's look at two design cases. The first one is that Alibaba has reconstructed Flink's distributed architecture, clearly layering and decoupling Flink's job scheduling and resource management. The first advantage of this is that Flink can run locally on various open source resource managers. After the improvement of this distributed architecture, Flink can run natively on HadoopYarn and Kubernetes, the two most common resource management systems. At the same time, Flink's task scheduling is changed from centralized scheduling to distributed scheduling, which enables Flink to support larger clusters and obtain better resource isolation.
The other is to implement incremental checkpoint mechanism, because Flink provides stateful calculation and regular checkpoint mechanism. If there are more and more internal data, the checkpoint will become bigger and bigger, which may eventually lead to failure. After providing incremental checkpoints, Flink will automatically find out which data is changed incrementally and which data is modified. At the same time, only these modified data are persisted. In this way, with the passage of time, checkpoints will not become more and more difficult, and the performance of the whole system will be very stable, which is also a very important feature we have contributed to the community.
After the streaming media capability of Flink was upgraded from 20 15 to 20 17, the Flink community gradually matured. Flink has also become the most mainstream computing engine in the streaming media field. Because Flink originally wanted to be a big data engine with unified streaming and batch processing, this work started from 20 18. In order to achieve this goal, Alibaba proposed a new unified API architecture and a unified SQL solution. At the same time, after the various functions of stream computing have been improved, we think that batch computing also needs various improvements. No matter in task scheduling layer or data shuffling layer, there is a lot of work to be improved in fault tolerance and ease of use.
Investigate its reason, here are two main points to share with you:
● Unified API stack
● Unified SQL scheme
Let's take a look at the status quo of FlinkAPI stack. Developers who have studied Flink or used Flink should know. Flink has two basic APIs, one is data stream and the other is data set. The data stream API is provided to stream users and the data set API is provided to batch users, but the execution paths of these two APIs are completely different, and even different tasks need to be generated to perform. So this is in conflict with the unified API, and the unified API is not perfect, and it is not the final solution. On top of the runtime, there should be a unified basic API layer for batch processing integration, and we hope that the API layer can be unified.
Therefore, we will adopt a DAG (finite acyclic graph) API as the unified API layer of batch process in the new architecture. For this finite acyclic graph, batch calculation and flow calculation do not need to be explicitly expressed. Developers can plan whether the data is a stream attribute or a batch attribute only by defining different attributes at different nodes and different edges. The whole topology is a unified semantic expression, which can integrate batch flows. The whole calculation does not need to distinguish between flow calculation and batch calculation, but only needs to express its own needs. With this API, Flink's API stack is unified.
In addition to the unified basic API layer and unified API stack, the SQL solution is also unified at the upper level. For batch SQL, we can think that there are data sources of stream calculation and batch calculation, and these two data sources can be simulated as data tables. It can be considered that the data source of stream data is a constantly updated data table, while the data source of batch data can be considered as a relatively static table with no updated data table. The whole data processing can be regarded as a query of SQL, and the final result can also be simulated as a result table.
For flow calculation, the result table is a constantly updated result table. For batch processing, the result table is equivalent to the updated result table. From the whole SOL semantic expression, flow and batch can be unified. In addition, both stream SQL and batch SQL can use the same query to represent reuse. In this way, all stream batches can be optimized or parsed by the same query. Even many stream and batch operators can be reused.
The future direction of flink
First of all, Alibaba should be an all-round unified big data computing engine based on the essence of Flink. Put it on the ground of ecology and scene. At present, Flink is the mainstream streaming computing engine, and many Internet companies have reached a consensus that Flink is the future of big data and the best streaming computing engine. The next important task is to make Flink make a breakthrough in batch computing. In more scenarios, it has become the mainstream batch computing engine. Then the flow and batch switch seamlessly, and the boundary between flow and batch becomes more and more blurred. With Flink, both flow calculation and batch calculation can be performed in one calculation.
The second direction is that Flink is supported by more languages, not only Java, Scala, but also Python and Go for machine learning. In the future, I hope to develop Flink computing tasks in richer languages, describe computing logic and connect more ecosystems.
Finally, I have to say AI, because many big data computing needs and data volumes are supporting very popular AI scenarios. Therefore, on the basis of improving the batch ecology of Flink, we will continue to go up and improve the machine learning algorithm library of upper Flink. At the same time, Flink will learn from mature machines and learn deeply. For example, Tensorflow on Flink can integrate ETL data processing of big data with feature calculation, feature calculation and training calculation of machine learning, so that developers can enjoy the benefits brought by multiple ecosystems at the same time.
What is the Alibaba Cloud scene?
This is an information system product made by Alibaba, mainly aimed at small and medium-sized enterprises. Its server hardware support is put in the cloud by Ali. It is equivalent to Ali helping you manage the data. The customer's hardware investment is very low, and the system management cost is also very low, usually in the form of annual fee.
Why do companies such as Alibaba and Tencent put their servers in the United States?
In order for Americans to enjoy the services of Alibaba and Tencent, in addition, domestic people need related services when they go to the United States.
Which industry does Alibaba belong to?
Alibaba's main business belongs to e-commerce, including internet finance, electronic payment and logistics. At the same time, Alibaba's continuous development also involves a wider range of fields, such as media and Internet of Things.
Alibaba Group has its own industries: Alibaba, Taobao, Alipay, Ali Software, Ali Mama, Word of Mouth, Alibaba Cloud, China Yahoo, Ceramics, Taobao Mall, China Wang Wan, Juhua, Yunfeng Fund and Ant Financial.
How big a server does an online shop need?
Taobao shop does not need to apply, because it is a virtual space. It can be used directly in the computer room built by Alibaba itself, without bringing its own server. Directly decorate the store and put the goods on the shelves.
Is the data center of Qiandao Lake in Ali built at the bottom of the lake?
Yes, a server center of Alibaba is at the bottom of Qiandao Lake. Alibaba Cloud Qiandao Lake Data Center has a building area of 30,000 square meters, with a floor of * *1/kloc-0, which can accommodate at least 50,000 devices. As a template for the construction of water-cooled industrial data center, it is very innovative and representative. Data centers don't need electric cooling 90% of the time. Deep lake water flows through the data center through a completely closed pipeline to help the server cool down, and then flows through Zhongsuxi, which is 2.5 kilometers away from Qingxi New City, as an urban landscape. After natural cooling, return to Qiandao Lake.