Recommended as follows:
1, God's Archer Cloud Crawler.
The Archer Cloud is a big data application development platform that provides developers with a complete set of data collection, data analysis and machine learning development tools, and provides specialized data crawling, real-time data monitoring and data analysis services for enterprises. Powerful features, involving cloud crawler, API, machine learning, data cleaning, data selling, data customization and private deployment.
2,Octopus
Octopus data collection system to completely independent research and development of the distributed cloud computing platform as the core, can in a very short period of time, easy to obtain a large number of standardized data from a variety of different websites or web pages, to help any need to obtain information from the web page of the customer to achieve automated data collection, editing, and standardization, get rid of the dependence on manual search and data collection, so as to reduce the cost of obtaining information and improve efficiency.
3.GooSeeker
GooSeeker's advantage is obvious, is its versatility, for simple websites, it defines the rules, get the xslt file, the crawler code almost do not need to modify, can be combined with the use of scrapy, to improve the speed of crawling.
Introduction:
A web crawler (also known as a web spider, a web robot, or more often called a web chaser in the FOAF community) is a program or script that automatically crawls the World Wide Web according to certain rules. Some other less frequently used names are ants, autoindexers, simulation programs, or worms.