Most people get confused by "model", it's really just a convenient approximation of reality, that science gives us, either as something very precise, or more of an average, and sometimes just as a convenient shape (like a bell curve for many biological data) to quickly grasp what looking at pages and pages won't give us. My main regret is science and math are communicated badly...
A "model" is an equation or a simpler, portable "substitute for" larger set of cumbersome "Data" points.
The terms "mathematical model", "statistical model", and "data model" are often used interchangeably.
A mathematical model is typically more precise and accurate than a statistical model, but it is also more difficult to create and use.
A statistical model is less precise and accurate than a mathematical model, but it is more flexible and can be used to represent more complex systems.
A data model is the least precise and accurate of the three, but it is also the easiest to create and use.
To understand the difference between the models let's use toys as an example:
1. Mathematical Modeling:
- Imagine you have a toy car. You push it, and it moves a certain distance. If you push it harder, it goes further. A mathematical model is like a rule that says, "If you push this hard, the car will go this far."
2. Statistical Modeling:
- Now, imagine you have a bag of different colored marbles. You want to guess which color you'll pick without looking. A statistical model is like a magic trick that helps you guess the color based on how many of each color are in the bag.
3. Data Modeling:
- Think of your toy box. You have different sections for cars, dolls, blocks, and so on. Data modeling is like organizing your toy box so you know where each toy goes and can find them easily.
In short:
- Mathematical: rules for how toys work.
- Statistical: guess things about toys.
- Data: organizing a toy box.
Statistical model example:
Imagine you're tracking the salaries of employees in a company over several years. As time goes on, some employees get raises, some stay the same, and some might even take a pay cut.
1. Collecting Sample Points:
- Instead of analyzing the salary of every employee in the company, you decide to look at a few of them (this is your sample). You record their salaries over the months for years.
2. Drawing the Regression Line:
- Picture a graph where you plot each month on the x-axis and the corresponding salary on the y-axis. For each employee in your sample, you place a dot based on their salary for that month.
- Now, you draw a straight line that best fits these dots. This line represents the general trend of salaries over time.
3. Using the Equation of the Line:
- The line you drew has a mathematical formula. This formula can predict the average salary of an employee based on the number of years (aggregate of months) they've worked.
In the business world, we assume that the sample of employees we've chosen represents the entire company. If our sample is random and large enough, and the company's salary trends are consistent, our predictions will likely be accurate!
- So, instead of discussing each employee's salary individually, you can use this model (equation) to get a general idea. It's a simpler way to understand the overall salary trend in the company!
Data mining and data modeling are two closely related terms in the field of data science. However, they have different meanings and serve different purposes.
Data mining is the process of extracting knowledge from large datasets. This can be done by identifying patterns, trends, outliers, or associations. Data mining is often used to make predictions about future events or to improve decision-making.
Data modeling is the process of creating a representation of data. This can be done by creating a logical model, a physical model, or a visual model. Data modeling is used to understand the data, to make it easier to manage, and to create data-driven applications.
In general, data mining can be used to identify patterns in data, and then data modeling can be used to create a model that represents those patterns for predictions about future events or to improve decision-making.
A hospital might use data mining to identify patients who are at risk for a disease. This information could then be used to create a model that predicts which patients are most likely to develop the disease. This model could then be reused for finding the right patients for preventive measures, such as early screening or treatment.
Hope it's clear now.
And I wanted to add even when we are using laws of physics, because of chaos theory, initial states can create different models, that's why when you see the future path of a hurricane, you'll see, multiple path, this is called ensemble. Reality is complex, and even with our best understanding we are scratching the surface. That's why I'm so vocal about science and math education, and against the dogma filled religious indoctrination that wastes and warps minds. We need more rational minds that can focus on reality than the bullshit and human hubris related stuff all around us. We are the apes that just has a small window of opportunity, if we can take it, we'll do well, or we'll perish together like a layer of scum on this rock.