A theory or hypothesis often predicts a relationship between two variables. How do scientists evaluate whether data support or disprove a relationship? The answer is a form of "moving average" called a regression. The most common form of regression analysis is the linear regression in which the "best straight line" calculated for a set of x-y data used to explain the relationship between them. For example, the ideal gas law predicts that the pressure of a gas increases linearly as the temperature changes. The data below show the results of an experiment to measure the pressure and temperature over a range.
Finding the best fit
The blue line represents the best straight line that fits the data. In a regression analysis, "best" means the unique line that results in the smallest average difference between the line and each data point. It is easy to "eyeball fit" the best straight line with a ruler but, most data software has a function to calculate the best straight line. The diagram shows the calculation on a spreadsheet. The averages of x and y have a line on top and the Greek letter sigma, ∑, which means "the sum of" so ∑x2 means "sum all the values of x2."
The coefficient of regression, R
The "goodness" of the fit is described by the correlation coefficient, R, which varies from +1 to -1. A value of R = +1 indicates a perfect straight line with a positive slope (strong direct x-y relationship). A value of R = -1 describes a perfect straight line with a negative slope (strong inverse relationship). As R gets closer to zero, the linear relationship weakens. When R is equal to 0, the data do not have a linear relationship. The diagram below shows the patterns associated with different values for the correlation coefficient.