Because the amount of data is large, so the single machine can not be processed, so the use of distributed storage and computing
How to get better performance on this basis that is to master the principles of distributed computing, such as distributed computing Mapreduce to know how to stream the data
Distributed analytics
Distributed analytics
Basically based on this paradigm, although it is used the same as single machine, but can write efficient algorithms, you must know the principle of distributed computing. are basically based on this paradigm, although used and stand-alone the same, but can write efficient algorithms You must understand the principles