Better Regression Analysis
Few tools are as abused as poor regression analysis. How many times have we seen a shot gun scatter plot and a misguided researcher insisting there is a relationship? Too many to count. Admittedly, there is a certain perfection to regression that appeals to engineers. From chaos comes order; from a few data points a master equation. A master equation that can prove conclusively that line speed does indeed affect the coating thickness or that wall thickness affects instrument response. Unfortunately, nature is not so accommodating. From my experiences in performing regression analysis, there are several common pitfalls that can happen.
No research question defined
Most common when working with secondary data, not defining a research question is a fundamental mistake. A proper research question is: "Does a change in temperature affect the conductivity of the material?" or "Is there a relationship between the number of customer service representatives and customer satisfaction?" Poor research questions are statements like "would be interesting to know if something affects instrument response." The answer is undoubtedly, yes, something in the universe affects instrument response, but so what? What is important to the problem at hand?
Not plotting data
Regression has the annoying habit of capitalizing upon random variation. It is possible to get a decent Pearson r without a true relationship. Plotting the data will show outliers that are skewing the data.
Plotting with unequal axes
In order to show relationships that are not there, some people will purposely skew their axes to stretch out or shrink the plot. For a standard random plot, stretching one axis while leaving the other one fixed would lead to a linear looking function. Conversely, compressing the axis for a linear function would tend to make it look more random and therefore lead a researcher to conclude there was no relationship, even though there may be.
Arbitrary removal of outliers
A sin most common with PhDs, outliers are often purged to get higher correlation coefficients to support a preferred theory. Outliers should be eliminated if an investigation concludes that something was definitely incorrect regarding the outlier. For example, in customer satisfaction research, sending a survey to a customer who is a major stockholder could be considered an outlier since he or she is biased. A manufacturing outlier could be a physical measurement with an out of calibration instrument. Otherwise, outliers should be left in since they do represent the full span of the possible variation of the correlation.