|
|
| Line 1: |
Line 1: |
| − | ===Simple Linear Regression===
| |
| − | https://www.youtube.com/watch?v=nk2CQITm_eo&t=267s
| |
| | | | |
| − |
| |
| − | In general, there are 3 main stages in Linear regression:
| |
| − |
| |
| − | : '''1'''. Using '''Least-squares''' to fit a line to the data
| |
| − |
| |
| − | : '''2'''. Calculating <math>R^2</math>
| |
| − |
| |
| − | : '''3'''. Calculating a <math>p-value</math> for <math>R^2</math>
| |
| − |
| |
| − |
| |
| − | <br />
| |
| − | : '''1. Using Least-squares to fit a line to the data'''
| |
| − | <blockquote>
| |
| − | [[File:Linear_regression1.png|400px|thumb|right|Takinf from https://www.youtube.com/watch?v=nk2CQITm_eo&t=267s]]
| |
| − | <!-- [[File:SimpleLinearRegression2.png|600px|center|]] -->
| |
| − |
| |
| − | :* First, draw a line through the data.
| |
| − |
| |
| − | :* Second, calculate the '''Residual sum of squares''': Measure the distance from the line to each data point (residual), square each distance, and then add them up.
| |
| − | ::: The distance from a line to a data point is called a '''residual'''
| |
| − |
| |
| − | :* Then, we rotate the line a little bit and calculate the RSS. We do this many times.
| |
| − | :* ...
| |
| − |
| |
| − | :* Then, the line that represents the linear regression is the one corresponding to the rotation that has the least RSS. The regression equation:
| |
| − |
| |
| − | :: <math> y = a + bx </math>
| |
| − |
| |
| − | :: The equation is composed of 2 parameters:
| |
| − |
| |
| − | ::* Slope: <math> b </math>
| |
| − | ::: The slope is the amount of change in units of <math>y</math> for each unitchange in <math>x</math>.
| |
| − |
| |
| − | ::* The <math>y-axis</math> intercept: <math> a </math>
| |
| − | </blockquote>
| |
| − |
| |
| − |
| |
| − | <br />
| |
| − | : '''2. Calculating <math>R^2</math>'''
| |
| − | <blockquote>
| |
| − |
| |
| − | In the following example, they are using different terminology to the one that we saw in Section [[Data_Science#The_coefficient_of_determination_R.5E2]]
| |
| − |
| |
| − | It is very important to note how the result of <math>R^2</math>. In our example <span style='color: red'>''' There is a 60% reduction in variance when we take the mouse weight into account '''</span> or <span style='color: red'>''' Mouse weight "explains" 60% of the variation in mouse size. '''</span>
| |
| − |
| |
| − | [[File:Linear_regression2.png|600px|thumb|center|Takinf from https://www.youtube.com/watch?v=nk2CQITm_eo&t=267s]]
| |
| − |
| |
| − |
| |
| − | [[File:Linear_regression3.png|600px|thumb|center|Takinf from https://www.youtube.com/watch?v=nk2CQITm_eo&t=267s]]
| |
| − |
| |
| − | </blockquote>
| |
| − |
| |
| − |
| |
| − | <br />
| |
| − | : '''3. Calculating a <math>p-value</math> for <math>R^2</math>'''
| |
| − | <blockquote>
| |
| − | We need a way to determine if the <math>R^2</math> value is statistically significant. So, we need a <math>p-value</math>.
| |
| − |
| |
| − | </blockquote>
| |
| − |
| |
| − |
| |
| − |
| |
| − |
| |
| − |
| |
| − | <br />
| |