- THE MAGAZINE
In a well-run manufacturing facility, a very large number of properties and process variables will be measured on a continuous basis. These variables can range from raw material properties to batching data to intermediate product measurements to final inspection data, and a lot more. The databases that can be generated often become enormous and offer only raw data, not yet information. Are there correlations between any of these properties or variables? Do finished product defects correlate to raw material properties, batching information or results from individual workers, such as slip casting employees?
Finding a ConnectionThe computer, via a good software package, can perform multivariate correlation analyses to give an initial analysis of which, if any, variables show correlation with one another. For example, the correlation analysis in Figure 1 shows whether or not various types of cracks on finished product correlate to one another and to total defects. For the purpose of brevity, we have selected only crack losses from a larger body of defect data.
The graphical output of this particular program (in the format selected) includes a table of multivariate correlations comparing defects by pairs, a scatterplot matrix and pairwise correlation comparisons. The scatterplot matrix is comprised of pairwise plots of all the data with a statistically generated ellipse surrounding the points. Pairs with greater correlation will have more elliptical plots. Lack of correlation will generate a circle (if the x-y axes are equal in the plot). Such a plot helps us visualize the data better.
The pairwise correlation table and graph give more information about the correlation, or lack thereof, between the various pairs through a correlation fraction. The higher this fraction is (closer to 1 or -1), the more the variables are correlated. The significance probability gives a fractional value (multiply by 100 to get percent likelihood), indicating the likelihood that the data are not correlated. Low numbers indicate higher correlation. Remember, we are always dealing with statistical probabilities, not proven fact.
For this example, there is a strong correlation between foot cracks and rim cracks, and between foot cracks and body cracks. When foot cracks are high, so are rim and body cracks, and vice versa. The analysis indicates that there is a 3+% chance that these types of cracks are not statistically correlated. What does this imply? Are these types of cracks caused by the same phenomena? There is also a good negative correlation between pedestal cracks and body and foot cracks. When pedestal cracks drop, body and foot cracks tend to rise. What does this mean? Correlation analysis cannot answer such questions; all it can tell us is that on a statistical basis, these loss measurements appear to be correlated.
Understanding the ConnectionCorrelation analysis can help us discover connections that might otherwise, and often do, remain hidden. It can allow us to ask questions that we can then test by additional experiments or analysis. It cannot prove cause and effect. It can hint at it, but additional work must be done to show true cause and effect relationships and to justify appropriate action. In one example, a company was able to demonstrate that financial losses increased when the specific surface area of the casting slip decreased. A batching problem was suspected, but since there were no records whatsoever of the batching of plastic (clay) materials, there was no way to directly show that suspected batching problems had any correlation with plant performance.
Did the correlation analysis absolutely prove the connection? In all fairness, it did not. But there was no other logical explanation for a decrease in measured casting slip surface area other than a batching problem, considering that the measured surface areas of all raw materials was factored in. And the statistical correlation was very, very high. A strong case was made for installing better batching systems for ball and china clays. Only with such improvements would the nature of the causes of such correlation become clearer.
Correlation is a powerful statistical tool, but like all such tools, it must be used wisely and with great care. You can prove almost anything with statistics if they are inappropriately applied or used. And it is easy to assume that cause and effect relationships are proven by correlation analysis when in fact the data are reflecting the results of some other unknown cause providing parallel effects. The data you have "correlated" have that correlation because both have the same causal root. Can correlation analysis uncover all correlated phenomena? Certainly not. Can it help shed light on relationships that are not obvious, expected or easily seen? You bet! And that makes it a very useful tool.