Current location - Recipe Complete Network - Healthy recipes - Introduction of big data training course, what to learn in big data learning course?
Introduction of big data training course, what to learn in big data learning course?
The following courses are mainly aimed at the easy-to-understand and simple introduction of zero-based big data engineers at each stage, so that everyone can better understand the big data learning course. The curriculum framework is a zero-based big data engineer course with big data.

First, the first stage: static web page foundation (HTML+CSS)

1. difficulty: one star

2. Class hours (technical knowledge points+stage project tasks+comprehensive ability)

3. The main technologies include: html common tags, CSS common layout, style, positioning, static page design and production methods, etc.

4. The description is as follows:

Technically, the technical code used in this stage is simple, easy to learn and easy to understand. From the later course level, because we focus on big data, we need to exercise programming skills and thinking in the early stage. According to the analysis of our project manager who has developed and taught for many years, to meet these two points, J2EE is the best technology to understand and master in the market at present, but J2EE is inseparable from page technology. So in the first stage, our focus is on page technology. Adopt the mainstream HTMl+CSS in the market.

Second, the second stage: JavaSE+JavaWeb

1. difficulty: two stars

2. Class hours (technical knowledge points+stage project tasks+comprehensive ability)

3. The main technologies include: java basic grammar, java object-oriented (class, object, encapsulation, inheritance, polymorphism, abstract class, interface, common class, internal class, common modifier, etc.), exception, collection, file, IO, MYSQL (basic SQL statement operation, multi-table query, sub-query, stored procedure, transaction, distributed transaction) JDBC, and so on.

4. The description is as follows:

It's called Java Foundation, with technical points from simple to deep, module analysis of real business projects, and design of various storage methods.

And implementation. This stage is the most important stage of the first four stages, because all the later stages are based on this stage, and it is also the stage with the highest degree of learning big data. At this stage, the team will be contacted for the first time to develop and produce a real project with front and back office (first stage technology+second stage technology comprehensive application).

Third, the third stage: front-end framework

1. Easy program: two stars.

2. Class hours (technical knowledge points+stage project tasks+comprehensive ability): 64 class hours.

3. The main technologies include: Java, Jquery and annotation reflection used together, XML and XML parsing, parsing new features of dom4j, jxab and jdk8.0, SVN, Maven and easyui.

4. The description is as follows:

On the basis of the first two stages, turning static into dynamic can enrich the content of our web pages. Of course, if there are professional front-end designers from the perspective of market personnel, the goal of our design at this stage is that the front-end technology can exercise people's thinking and design ability more intuitively. At the same time, we also integrate the advanced features of the second stage into this stage. Make the learner by going up one flight of stairs.

Fourth stage: enterprise-level development framework

1. difficulty program: three stars

2. Class hours (technical knowledge points+stage project tasks+comprehensive ability)

3. The main technologies include: Hibernate, Spring, SpringMVC, log4j slf4j integration, myBatis, struts2, Shiro, redis, process engine activity, reptile technology nutch,lucene, webServiceCXF, Tomcat cluster and hot standby, MySQL read-write separation.

4. The description is as follows:

If you compare the whole JAVA course to a pastry shop, you can make a Wu Dalang baked wheat cake in the first three stages (because it is purely manual-too troublesome), while the learning framework can open a Starbucks (high-tech equipment-saving time and effort). As far as the requirements for the post of J2EE development engineer are concerned, the technologies used in this stage must be mastered, and the courses we teach are higher than the market (there are three mainstream frameworks in the market, and we teach seven framework technologies), and they are driven by real commercial projects. Requirements documents, general design, detailed design, source code testing, deployment, installation manuals, etc. will be explained.

Fifth, the fifth stage: getting to know big data.

1. difficulty: three stars

2. Class hours (technical knowledge points+stage project tasks+comprehensive ability)

3. The main technologies include: the first part of big data (what is big data, application scenarios, how to learn big databases, Virtual machine concept and installation, etc.), Linux common commands (file management, system management, disk management), Linux Shell programming (SHELL variables, loop control, application), hadoop introduction (Hadoop composition, stand-alone environment, directory structure, HDFS interface, MR interface, simple SHELL, java accessing Hadoop), HDFS (introduction, SHELL, Use of IDEA development tools, construction of fully distributed cluster), MapReduce application (intermediate calculation process, Java operation MapReduce, program running, log monitoring), Hadoop advanced application (YARN framework introduction, configuration items and optimization, CDH introduction, environment construction), extension (see MAP side optimization, COMBINER usage, TOP K,SQOOP export, snapshots of other virtual machines, rights management commands.

4. The description is as follows:

This stage is designed to enable newcomers to have a relatively big concept of big data. How is it relative? After studying the pre-course JAVA, I can understand how the program runs on a single computer. Now, what about big data? Big data is to run programs in clusters of large-scale machines for processing. Of course, big data is to process data, so similarly, the storage of data has changed from single-machine storage to multi-machine large-scale cluster storage.

(You ask me what a cluster is? Ok, I have a big pot of rice. I can finish it by myself, but it will take a long time. Now I ask everyone to eat together. Call someone when you are alone. What if there are too many people? Is it a crowd? )

Then big data can be roughly divided into: big data storage and big data processing. So at this stage, our course has designed the standard of big data: HADOOP big data is not running on WINDOWS 7 or W 10, which we often use, but the most widely used system: LINUX.

Sixth stage: big data database

1. difficulty: four stars

2. Class hours (technical knowledge points+stage project tasks+comprehensive ability)

3. The main technologies include Hive introduction (Hive introduction, Hive usage scenario, environment construction, architecture description and working mechanism), Hive Shell programming (table building, query statement, partition and bucket, index management and view), Hive advanced application (DISTINCT implementation, groupby, join, sql transformation principle, java programming, configuration and optimization) and hbase introduction. Hbase SHELL programming (DDL, DML, Java operation table building, query, compression, filter), detailed description of Hbase module (introduction of REGION, HREGION SERVER, HMASTER, ZOOKEEPER, configuration of ZOOKEEPER, integration of HBASE and Zookeeper), advanced features of HBase (reading and writing process, data model, reading and writing hotspots of pattern design, optimization and configuration).

4. The description is as follows:

This stage is designed to let everyone understand how big data handles large-scale data. Simplify the programming time and improve the reading speed.

How to simplify it? In the first stage, if complex business association and data mining are needed, it is very complicated to write MR programs by yourself. So at this stage, we introduced HIVE, a data warehouse in big data. Here is a keyword, data warehouse. I know you're going to ask me, so I'll say first, the data warehouse is usually a huge data center for data mining analysis, and it stores these data, usually large databases such as ORACLE,DB2, etc. These databases are usually used for real-time online business.

In a word, it is relatively slow to analyze data based on data warehouse. But the convenience is that it is relatively simple to learn as long as you are familiar with SQL, and HIVE is such a tool, a SQL query tool based on big data, and this stage also includes HBASE, which is the database in big data. Wonder, didn't you learn a data "warehouse" called HIVE? HIVE is based on MR, so it is quite slow to query. HBASE can query data in real time based on big data. One main analysis and the other main query.

Seventh stage: real-time data acquisition

1. Easy program: four stars

2. Class hours (technical knowledge points+stage project tasks+comprehensive ability)

3. The main technologies include: Flume log acquisition, Introduction to KAFKA (message queue, application scenario, cluster construction), detailed explanation of KAFKA (partition, theme, receiver, sender, integration with ZOOKEEPER, Shell development, Shell debugging), advanced use of KAFKA (java development, main configuration, optimization project), data visualization (introduction of graphs and CHARTS, classification of Charts tools, histogram and pie charts, 3D graphs and maps). Introduction to STORM (design ideas, application scenarios, processing procedures, cluster installation), STORM development (Stromvn development, writing local programs of STORM), STORM advanced (java development, main configuration, optimization projects), KAFKA asynchronous sending and batch sending timeliness, KAFKA global messages in order, and Storm multi-concurrent optimization.

4. The description is as follows:

The data source of the previous stage is based on the existing large-scale data set, and the result after data processing and analysis has a certain delay, usually the data processed is the data of the previous day.

Example scenarios: website anti-theft chain, abnormal customer account, and real-time credit investigation. What if these scenarios are analyzed based on the data of the previous day? Is it too late? So in this stage, we introduce real-time data acquisition and analysis. It mainly includes: FLUME real-time data acquisition, which is supported by a wide range of sources, KAFKA data receiving and sending, STORM real-time data processing, and data processing second level.

VIII. Stage VIII: SPARK data analysis

1. Easy program: five stars

2. Class hours (technical knowledge points+stage project tasks+comprehensive ability)

3. The main technologies include: introduction to SCALA (data types, operators, control statements, basic functions), advanced use of SCALA (data structures, classes, objects, characteristics, pattern matching, regular expressions), advanced use of SCALA (high-order functions, Cory functions, partial functions, tail iterations, self-contained high-order functions, etc.), and introduction to SPARK (environment construction, infrastructure, operation mode, etc.). SPARK SQL, SPARK Advanced (DATA FRAME, DATASET, SPARK STREAMING Principle, SPARK STREAMING Support Source, KAFKA and SOCKET Integration, Programming Model), SPARK Advanced Programming (Spark-GraphX, Spark-Mllib Machine Learning), SPARK advanced application (system architecture, main configuration and performance optimization, fault and stage recovery), SPARK ML KMEANS algorithm, SCALA implicitly transforms advanced features.

4. The description is as follows:

Let's talk about the previous stage, mainly the first stage. HADOOP is relatively slow in analyzing large-scale data sets based on MR, including machine learning and artificial intelligence. And it is not suitable for iterative calculation. SPARK is an alternative product of MR in analysis. How to replace it? Let's talk about their operating mechanism first. HADOOP is based on disk storage analysis, while SPARK is based on memory analysis. You may not understand what I said, but to put it more vividly, it is like taking a train from Beijing to Shanghai. MR is a green leather train, and SPARK is a high-speed rail or maglev. SPARK, on the other hand, is developed based on SCALA language, and of course it supports SCALA best, so learn the SCALA development language first in the course.

In the design of the data course of Keda University, the positions in the market require technology and are basically fully covered. Moreover, it is not simply to cover the job requirements, but the course itself is a complete big data project process from front to back.

For example, from the storage and analysis of historical data (HADOOP,HIVE,HBASE) to the real-time data storage (FLUME,KAFKA) and analysis (STORM,SPARK), these are interdependent in real projects.