"The best network is like a god, it is everywhere, and it doesn't feel her existence ... it is both complicated and simple, which is the direction of our efforts." ? -Sun Chenghao, a senior network technology expert in Alibaba Cloud.
In August, 2065438, the main venue of Hangzhou Conference, Alibaba Cloud product director He introduced Tian Fei 2.0, a fully upgraded version of cloud operating system independently developed by Alibaba Cloud. As one of the core components of Tian Fei 2.0, Luoshen unveiled her mysterious veil for the first time. Sun, a senior network technology expert in Alibaba Cloud, gave a more detailed explanation to Luo Shen at the subsequent special session of future network technology. This paper systematically explains the concept, evolution, structure and characteristics of Tian Fei Luoshen for the first time.
What is the flying goddess?
There is a virtual network layer between the physical network and the user-perceived network. In Alibaba Cloud, we named this virtual network system Luoshen. Luo Shen is the system that Ali Yuntian Fei is responsible for the virtual network. She provides customers in Alibaba Cloud with a wealth of network products, such as VPC and SLB. It is also the network infrastructure of more than 100 cloud products such as ECS, RDS, OSS and NAS. She also supported many businesses of Alibaba Group and Ant Financial Group, such as e-commerce, payment and logistics. On a global scale, Luoshen serves more than one million users in all walks of life. During the peak hours of Internet traffic, such as the Double Eleven, World Cup and Spring Festival travel rush Peak, we will silently escort every consumer's smooth network experience.
Many people know that Alibaba Cloud has an Apsara system. Each component in the system is named after different gods, including Pangu, Fuxi, Nuwa and Shennong. Pangu is a distributed file system and Fuxi is a distributed scheduling system. Why is the virtual network system called Luoshen? In ancient times, river transportation was a very important means of transportation, just like today's network. So when we named the virtual network system, we named the river god Luoshen.
The Structure of The Goddess of Flying Luo
Luoshen is a part of Alibaba Cloud distributed operating system Tian Fei. In the infrastructure of Tian Fei, the top layer is all kinds of cloud products, including the familiar cloud products RDS, ECS, VPC, SLB and so on. These are the three basic components of Tian Fei, including storage system Pangu, resource management Fuxi and network management Luoshen. In other words, Luoshen not only supports Alibaba Cloud's network cloud products, but also plays an important role in supporting the network infrastructure of other cloud products.
Speaking of the technical architecture of Luoshen, Luoshen system consists of three modules.
–Data plane, control plane and management plane.
The data plane is responsible for processing data packets in the cloud network. Just like the network cable and routing switching equipment in the physical world, it sends data packets from the sender to the destination with high efficiency and low delay. Similarly, Luoshen data plane also contains various components with different functions, including virtual switches supporting various types of computing forms, DCN gateways for data center interconnection, Internet gateways for connecting public networks and cloud networks, hybrid cloud gateways for offline interconnection, load balancing gateways providing load balancing capabilities and intelligent access gateways providing terminal access capabilities. In order to improve the forwarding performance of these components, Luoshen not only uses soft forwarding technology, but also widely uses the combination of software and hardware and even pure hardware technology.
The control plane controls how packets are processed. It's Luo Shen's business brain. Technically, Luoshen's control plane is a hierarchical and distributed control system. The bottom device controller is mainly responsible for controlling and managing various components of the data plane. At the same time, each region has a virtual network controller, and the whole world has a global routing controller. The regional virtual network controller is responsible for the management and scheduling of the local cloud network, and the global routing controller is responsible for coordinating and scheduling the resources of each region to form a global cloud network. Based on virtual network controller and global routing controller, NFV controller completes the configuration and abstraction of advanced virtual network functions of VPN and other products.
Luoshen's management plane is the center of network operation and maintenance, managing a large number of network elements and users. The mass here refers to tens of millions of virtual machines and millions of network elements. In order to achieve this, Luoshen's management platform is based on big data and machine learning technology. It carries out real-time/off-line calculation and data modeling on the massive data generated in the network operation process, and drives the advance planning of network resources, the daily maintenance of network systems and the intelligent operation of network products. The whole management plane includes a set of high-performance and distributed data analysis system, and the data analyzed by it is provided to the intelligent operation and maintenance system to complete the whole life cycle of network products such as resource planning, network construction, system change, real-time monitoring, fault escape and product operation. Finally, the effects of eliminating unattended network changes, finding problems before users, efficient and simple fault escape, and enriching comprehensive products and user operations are achieved.
The road of technological evolution of Feishen
Luo Shen can become one of the four pillars of flying, not in a day. The evolution of Luoshen has gone through four stages.
The first is the classic network stage. At present, there is only one concept of network, that is, public network bandwidth. The problem in the classic network stage is that users can't customize the network topology, which leads to users' failure to complete the hybrid cloud connection under the cloud. In order to solve this problem, Luo Shen entered the VPC stage. In the VPC stage, Luoshen has virtualized millions of networks in each region, and users can completely customize this network. With the increasing scale of the network, Luoshen has also entered the global network stage from the regional network. At present, Luo Shen mainly solves the problem of how to better manage the very large-scale network. Cloud enterprise network and cloud connection network constitute the two main characteristics of the third generation Luo Shen.
After meeting the needs of major customers, we began to think about how to further enhance the user experience. What is the core appeal of users to the network? In fact, the biggest complaint of customers is that the network is strong enough and reliable enough to avoid problems. Just like using water and electricity, users don't need to know where power stations and pumping stations are. Therefore, Luo Shen hopes that the network is insensitive to users and ubiquitous. The development of Luoshen is an evolutionary process from 0 to 1, to 100, and then back to 0. This is the direction of our efforts to develop the next generation of Luoshen, and it is also the thinking behind our first proposal of the concept of no net in the industry.
The characteristics of flying Luoshen
The key features of Luo Shen include security, flexibility and reliability, which are also the key features of Luo Shen's ultimate net-free state.
Security is the basic disk, because the superposition technology isolates the network logic, and the user's network will not interoperate at all before. Luoshen also includes various encryption services, which can create deeper security for users. Elasticity has two numbers, one is secondary forwarding performance elasticity, Luoshen supports elasticity from 1MB to 1TB in one second, and the other is scale elasticity. Luoshen single network supports 10w computing node size. In this way, Luoshen can not only support services as small as virtual web hosts, but also support massive peak traffic such as double 1 1 zero. When we talk about reliability, we mean the parameter of annual average time to failure. The single instance failure time caused by Luoshen is only 50ms, which is extremely short.
Key design
Next, we will analyze the key design of Luoshen's elasticity and reliability in detail. The data plane of Luoshen system itself is a huge switch. As we all know, the forwarding chip of the switch is pipelined to process data packets, and the hardware processing will never stop, so will the data plane of Luoshen. From the beginning of a data packet entering Luoshen system to the whole process of leaving Luoshen system, all components in Luoshen system will not be interrupted, so the data plane that only handles one thing must be efficient. Luoshen's data plane includes the forwarding technology and architecture combining software and hardware. In addition, the network of Luoshen system will never be interrupted due to maintenance, which means that all components inside Luoshen support hot upgrade.
From the point of view of reliability, multi-computer room disaster tolerance is the basis of high availability. When the cloud computer room of a data center in Alibaba Cloud starts to be deployed, the physical facilities will be deployed first, and then the Luoshen system will be deployed. At this time, there are computing clusters, gateways and control platforms in the computer room. There are our virtual switch components on the computing cluster. The key nodes of data plane and control plane are all deployed in the cluster, and the problem of a single service node will not have any impact on users. When the host of the virtual machine has serious problems such as downtime, it can be migrated within the scope of the computer room, and the migration itself will not have any impact on the network attributes and connectivity of the virtual machine. Cluster gateway and controller nodes will be deployed in each cloud computer room. With the increase of computer rooms, a circular backup relationship will be automatically formed in the cloud computer room. When a new computer room is built and the Luoshen system is deployed, it will automatically join this backup chain. In this way, when the key nodes of a computer room have problems due to abnormality, they can automatically switch to the secondary backup computer room, and the Luoshen system of the backup computer room provides services. This multi-level disaster recovery mechanism ensures that users can resume business in a short time.
?
In addition to multi-computer room disaster tolerance, how to quickly find bugs and quickly recover is another key point of reliability. In order to solve this problem, Luo Shen first designed a dyeing system based on process. If Luoshen system is regarded as a whole switch, then from the characteristics, Luoshen system is a switch that supports traffic tracking and has various rich strategies. Below the Luoshen system are the devices and switches of the physical network. Through the ability of stream marking and the set strategy of Luoshen system, you can have the ability to dye, mirror, sample and track the stream of specific messages in both physical and virtual networks. Logs generated by these operations will be calculated in real time after collection. If the traffic is abnormal, an alarm and a log will be generated and sent to the administrator. Some alarms can trigger the automatic handling and recovery of faults. Some data will be processed by calculation, resulting in data reports and user portraits, and can also give users a cool big screen. This is essentially the ability to digitize.
Concluding remarks
Tian Fei Luoshen's mission is to make the Internet simpler. As we all know, AWS put forward the concept of serverless. Similar to serverless, Luoshen takes the concept of no network as the design goal. We hope that users will no longer care about network topology, network bandwidth, network address and other professional technologies, so that users will not be aware of the existence of the network. Netless is first realized by constantly improving the flexibility and reliability of the network. In addition, the key feature is NAAS, which allows users to only care about network communication, without having to care about various components of the network.