Crawlers, also known as web crawlers, are also called web robots, which can automatically collect and organize data information on the Internet on behalf of people.
You can think of it as a spider crawling on the Internet. The Internet is like a big web, and a crawler is a spider crawling around on this web. If it encounters its prey (required resources), then it will grab it.
The crawler obtains the calories of various foods and the ingredients of various delicacies. Once you master the crawler, you can build your own database, then write a program to filter the foods that meet your own calorie requirements, and then use a random function Just generate a menu for yourself to choose from.
Crawlers can crawl the content of a website or an application and extract useful values ??in batches. For example, if you want to crawl all the highly praised answers to a question on Zhihu locally and save them, or collect Price comparison of flight price information on many air ticket websites, public opinion analysis on various forums, stock bars, Weibo, and public accounts, high-frequency words that have climbed to the fourth level, etc.
Crawler composition:
The function of the Web crawler system is to download web page data and provide data sources for the search engine system. Many large-scale network search engine systems are based on Web data collection. Search engine system, this shows the importance of Web crawlers in search engines.
In the system framework of web crawlers, the main process consists of three parts: controller, parser, and resource library. The main job of the controller is to assign work tasks to each crawler thread in multi-threads; the main job of the parser is to download web pages and process web pages. The processed content includes JS script tags, CSS code content, space characters, HTML tags, etc. content.