Página de pruebas
Contents
- 1 Correlation
- 1.1 Measuring Correlation
- 1.1.1 Pearson correlation coefficient - Pearson s r
- 1.1.2 The coefficient of determination Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle R^2}
- 1.1.3 Testing the "generalizability" of the correlation
- 1.2 Correlation Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle \neq} Causation
- 1.3 Examples
- 1.1 Measuring Correlation
Correlation
In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. https://en.wikipedia.org/wiki/Correlation_and_dependence
Where moderate to strong correlations are found, we can use this to make a prediction about one of the variables given that the other variable is known.
The following are examples of correlations:
- There is a correlation between ice cream sales and temperature.
- Blood alcohol level and the odds of being involved in a traffic accident
- Phytoplankton population at a given latitude and surface sea temperature
Measuring Correlation
Pearson correlation coefficient - Pearson s r
The Pearson correlation coefficient (PCC), also referred to as Pearson's r, the Pearson product-moment correlation coefficient (PPMCC),
Karl Pearson (1857-1936)
The Pearson correlation coefficient is a measure of the strength of the relationship between two variables. It provides an exact way of determining the type and degree of a linear correlation between two variables.
Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle r = \frac{\sum_{i=1}^{n}((x_i - \bar{x})(y_i - \bar{y}))}{\sqrt{\sum_{i=1}^n(x_i - \bar{x})^2\sum_{i=1}^n(y_i - \bar{y})^2}} }
- Where Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle \bar{x}} and Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle \bar{y}} are the means of the Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle x} (independent) and Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle y} (dependent) variables, respectively, and Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle x_i} and Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle y_i} are the individual observations for each variable.
- Values of Pearson's r range between -1 and +1.
- The direction of the correlation:
- Values greater than zero indicate a positive correlation, with 1 being a perfect positive correlation.
- Values less than zero indicate a negative correlation, with -1 being a perfect negative correlation.
- The degree of the correlation:
Degree of correlation Interpretation 0.8 to 1.0 Very strong 0.6 to 0.8 Strong 0.4 to 0.6 Moderate 0.2 to 0.4 Weak 0 to 0.2 Very weak or non-existent
The coefficient of determination Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle R^2}
[Noel] https://en.wikipedia.org/wiki/Coefficient_of_determination https://en.wikipedia.org/wiki/Total_sum_of_squares https://en.wikipedia.org/wiki/Residual_sum_of_squares
Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle R^2}
is a measure of how well the regression predictions approximate the actual data values. An of 1 means that predicted values perfectly fit the actual data.
Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle R^2}
is termed the coefficient of determination because it measures the proportion of variance in the dependent variable that is determined by its relationship with the independent variables. This is calculated from two values: [Noel]
- The total sum of squares: Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle SS_{tot} = TSS = \sum_{i=1}^n (y_i - \bar{y}_i)^2 }
- This is the sum of the squared differences between the actual Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle y} values and their mean.
- Proportional to the variance of the data.
- The residual sum of squares: Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle SS_{res} = RSS = \sum_{i=1}^n (y_i - \hat{y}_i)^2 } = Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle \sum_{i=1}^n (y_i - f(x_i))^2 }
- This is the sum of the squared differences between the predicted Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle y} values (Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle \hat{y}_i} ) and their respective actual values.
- The coefficient of determination:
- Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle R^2 = 1 - \frac{SS_{res}}{SS_{tot}} }
Testing the "generalizability" of the correlation
Having determined the value of the correlation coefficient (r) for a pair of variables, you should next determine the likelihood that the value of r occurred purely by chance. In other words, what is the likelihood that the relationship in your sample reflects a real relationship in the population.
Before carrying out any test, the alpha (Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle \alpha} ) level should be set. This is a measure of how willing we are to be wrong when we say that there is a relationship between two variables. A commonly-used Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle \alpha} level in research is 0.05.
An Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle \alpha} level to 0.05 means that you could possibly be wrong up to 5 times out of 100 when you state that there is a relationship in the population based on a correlation found in the sample.
In order to test whether the correlation in the sample can be generalized to the population, we must first identify the null hypothesis Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle H_0} and the alternative hypothesis Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle H_A} .
This is a test against the population correlation co-efficient (Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle \rho} ), so these hypotheses are:
- Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle H_0 : \rho = 0 } - There is no correlation in the population
- Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle H_0 : \rho \neq 0 } - There is correlation
Next, we calculate the value of the test statistic using the following equation:
Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle t^* = \frac{r\sqrt{n-2}}{\sqrt{1-r^2}} }
So for a correlation coefficient Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle r}
value of -0.8, an Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle r^2}
value of 0.9 and a sample size of 102, this would be:
Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle t^* = \frac{0.8\sqrt{100}}{\sqrt{0.1}} = \frac{8}{0.3162278} = 25.29822 }
Checking the t-tables for an Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle \alpha}
level of 0.005 and a two-tailed test (because we are testing if Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle \rho}
is less than or greater than 0) we get a critical value of 2.056. As the value of the test statistic (25.29822) is greater than the critical value, we can reject the null hypothesis and conclude that there is likely to be a correlation in the population.
Correlation Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle \neq} Causation
Even if you find the strongest of correlations, you should never interpret it as more than just that... a correlation.
Causation indicates a relationship between two events where one event is affected by the other. In statistics, when the value of one event, or variable, increases or decreases as a result of other events, it is said there is causation.
Let's say you have a job and get paid a certain rate per hour. The more hours you work, the more income you will earn, right? This means there is a relationship between the two events and also that a change in one event (hours worked) causes a change in the other (income). This is causation in action! https://study.com/academy/lesson/causation-in-statistics-definition-examples.html
Given any two correlated events A and B, the following relationships are possible:
- A causes B
- B causes A
- A and B are both the product of a common underlying cause, but do not cause each other
- Any relationship between A and B is simply the result of coincidence.
Although a correlation between two variables could possibly indicate the presence of
- a causal relationship between the variables in either direction(x causes y, y causes x); or
- the influence of one or more confounding variables, another variable that has an influence on both variables
It can also indicate the absence of any connection. In other words, it can be entirely spurious, the product of pure chance. In the following slides, we will look at a few examples...
Examples
Causality or coincidence?