1. What is statistical modeling
Statistical modeling is a process of establishing statistical models and exploring and processing batch data by using various statistical analysis methods with computer statistical analysis software as a tool, which is used to reveal the factors behind the data, interpret social and economic phenomena, or make predictions or judgments on economic and social development. With the rapid popularization and extensive development of computer and network technology, we are faced with the challenge of data and information explosion. How to quickly and effectively upgrade data into information, knowledge and intelligence is an important topic for statisticians. Statistical modeling perfectly combines statistical methods and computer technology, drives statistical thinking oriented by data analysis, discovers and digs the laws behind data, and provides better and more statistical information for economic and social development.
The topics of the contest generally come from practical problems that have been properly simplified and processed in social, economic and management sciences. Participants are not required to master in-depth expertise in advance, but only need to learn the basic contents of statistics, master statistical analysis methods skillfully, and have certain statistical work experience. The topic has great flexibility for contestants to exert their creative ability. Participants should complete a paper (that is, answer sheet) including the hypothesis, establishment and solution of the model, the design and computer realization of the calculation method, the analysis and test of the results, and the improvement of the model according to the requirements of the topic. Competition awards are based on the rationality of assumptions, the creativity of modeling, the correctness of results and the clarity of text expression.
let's take a look at what statistical modeling is from the following example.
Case: What conclusions can be drawn from traffic accident data?
basic data: traffic accident data of all provinces, municipalities and autonomous regions since the reform and opening up. The data should include motor vehicles (freight, buses, cars, agricultural vehicles, tractors, motorcycles and engineering vehicles, etc.), non-motor vehicles (bicycles and tricycles), others (such as electric bicycles and motor tricycles, although it may be illegal), disabled vehicles, animal-powered vehicles, pedestrians, etc.; The data should also include the accident level, the number of accidents, the number of deaths, property losses, the number of injured people, etc. Occupation, age, driving experience, education level, and whether or not to drink and drive (very important! ), whether you are tired of driving, whether you are using a mobile phone, speed, road conditions (street, ordinary highway, grade highway, expressway), accident time period, etc. (these are the standard records of the traffic control department). The data should cover at least 11 years (preferably with monthly data).
additional data: the economic data of each province, city and autonomous region in the corresponding year, including the mileage of various roads and the number of various motor vehicles.
Question:
1. Find out the probability (and influencing factors) of various accidents of various vehicles, and the influencing variables of the number of these accidents (such as age, whether to drink alcohol, mountainous areas or downtown areas, time periods, roads, vehicle types, etc.).
2. Find out what factors (variables) are most likely to cause accidents, what factors (variables) are most likely to cause serious personal injuries, and what factors (variables) cause the greatest property losses.
3. Find out the characteristics of accidents in all provinces, municipalities and autonomous regions, and classify them according to the accident mode, and compare them according to the economic classification. Explain the relationship between traffic accidents and economic development.
4. Find out the trends of accidents in different regions and the whole country, and the relationship between these trends and the economy (including road mileage, number of motor vehicles, etc.). And predict future accidents.
5. rank the provinces, municipalities and autonomous regions according to various variables related to traffic accidents.
requirements: everything is based on data. Any statistical method adopted should explain the conditions and assumptions. The results of any output should be explained and explained.
According to the above cases, it is not difficult to form such a judgment: in a certain sense, statistical modeling is a propositional composition, which has the following characteristics:
First, statistical modeling starts from the actual situation of economic and social development, and finds out the development trends and laws of things. If it is divorced from this point, statistical modeling will lose its meaning.
second, statistical modeling starts from data, finds out the relationship between data, and speaks with data, which is the biggest characteristic of statistical modeling.
thirdly, statistical modeling effectively combines statistical analysis methods with computer technology, including data collection and analysis by statistical analysis software.
fourthly, statistical modeling involves data collection, collation and analysis, which requires the modeler's ability comprehensively.
second, the process of statistical modeling
(1) clarify the problem. Statistical modeling emphasizes problem-oriented, therefore, it is necessary to clarify the problems that need to be solved first.
(2) information collection: on the basis of clarifying the problem, according to the requirements of the topic, collect and sort out all kinds of necessary information from the available database.
(3) model hypothesis: make necessary and reasonable assumptions about the problem by using statistical analysis methods, so as to highlight the main features of the problem and ignore the secondary aspects of the problem.
(4) model construction: according to the assumptions made and the relationship between things, construct the relationship between various quantities, turn the problem into a statistical analysis problem, and pay attention to adopting appropriate statistical analysis models and methods as far as possible.
(5) model solution: use the built model to calculate and get some information related to the problem. If necessary, the problem can be further simplified or further assumptions can be made.
(6) model analysis: analyze the obtained information to form a judgment, and pay special attention to whether the obtained results are stable when the data changes.
(7) result test: analyze the actual meaning of the obtained result and compare it with the actual situation to see whether it is in line with the reality. If it is not ideal, it should be modified, supplemented or re-modeled.
(8) writing a paper: a paper is formed on the basis of the above, which should include the explanation of the problem, the description of the hypothesis, the process of model construction, the solution results of the model, the main conclusions and the evaluation of the conclusions.
III. Basic Contents of Statistical Modeling Papers
The submitted papers should include three parts:
(1) Title and abstract part
Title-write a more accurate title
Abstract-211-311 words, including the main features of the model, modeling methods and main results.
(II) Main part
1. Question raising and problem analysis.
2. Model establishment:
(1) Propose hypothetical conditions, define concepts and introduce parameters;
(2) model construction;
(3) model solving.
3. Calculation method design and computer implementation.
4. Main conclusions or findings.
5. Result analysis and test.
6. discussion-advantages and disadvantages of the model, and significance of the result.
7. references.
(3) Appendix
Calculation program, block diagram.
various solving calculus processes and calculating intermediate results.
various graphs and tables.
the so-called difficult is not easy, and the easy is not difficult. It is not easy to judge one or two for each exact standard. However, what is certain is that learning is easy and not learning is difficult. Hope to encourage it.