Process Control: An Introduction to Statistics
Central TendencyFirst, we need to know where our values are centered so that we can make comparisons. The most commonly used measure of central tendency is the mean, or average, value. If we have n values, the average is defined as shown in Equation 1, where xave is the population mean, and n is the number of values in the population. Simply stated, we add up all the values in our population and divide by the number of values.
Another measure of central tendency is the median value, or the value at the halfway point of our data. If we have 101 data points, and we rank them from lowest to highest, the median value is the 51st value. Fifty values will be lower, and fifty will be higher. If we have an even number of values, 100 for example, the median value is the average of the middle two values (50 and 51). In a true normal distribution, the mean and median will be equal. However, in an actual assemblage of data, the numbers will be similar, but not identical. As the number of values increases, the difference will decrease.
The mode or modal value is the most commonly occurring value in the data group. It is the peak value in the frequency distribution of the data.
Variability or DispersionDispersion is the measure of how our data are distributed about their center. We will generally have scatter or variability. If we have a lot of data, and we plot the frequency of occurrence of this data about the mean, we will get a frequency distribution. The most commonly occurring distribution in real life is the normal or Gaussian probability distribution, which is shown in Figure 1. In this distribution, the data are uniformly or symmetrically distributed about the mean, which is also the most common value. Be aware that not all distributions we will encounter in manufacturing are of this type. Mechanical strength data, for example, are often not symmetric but are skewed to one side.
The graph shown in Figure 1 also allows us to introduce the statistical measures of dispersion, the prime one being standard deviation, i.e., a numerical measure of the spread or dispersion in our data. This deviation, which is often referred to as the “bell-shaped curve,” is defined as the point on either side where the curve changes from convex (looking from above) to concave. These points are indicated by the s and -s symbols in the figure.
Two different types of standard deviation are the population standard deviation (s) and the sample standard deviation (S). Any scientific calculator or computer spreadsheet will calculate the standard deviation values very accurately in microseconds.
Another measure of dispersion or variability is the range, defined as the highest value minus the lowest value. This is most useful for small data sets, where the calculated standard deviation is less meaningful from a mathematical standpoint. For large data sets, however, the range can be misleading, since unusually low or high values can give misleading information.
A Simple ExampleLet’s suppose you want to compare the mechanical strength of two ceramic compositions processed through your production facility. We have produced a large number of samples of each composition and measured the strength of 25 samples randomly selected from each lot. By adding up all the values for composition A and dividing by 25, we find that it has an average strength of 35,300 psi (243 MPa). The average strength of composition B is found to be 40,100 psi (276 MPa). We have found that composition B is, on average, stronger than composition A.
Let’s further suppose that we use a calculator to determine the sample standard deviation for each composition and get a value of 2118 psi (14.6 MPa) for A and 4411 psi (30.4 MPa) for B. We can now see that although composition B is stronger, it has quite a bit more variability. We can make some predictions if we assume the data follow the normal distribution shown in Figure 1 (we would want to test for this assumption). In Figure 1, we see some fractional numbers just above the horizontal axis. These give the proportion of values we would expect to find in each segment of a normally distributed database. For example, we would expect 34.1% of the data to found between the average value and the average value plus one standard deviation (+ 1s). We would expect 68.2% of our data to be between the average and ±1s. In our example, we would expect 68.2% of our strength values for composition A to fall in the range of 35,300 ± 2118 psi, or between 33,182 and 37,418. Similarly, we would expect 68.2% of the values for B to fall in the range of 40,100 ± 4411 psi, or between 35,689 and 44,511 psi.
We can take this one step further and look at the proportion of data we would expect to find within ± 3s of the average. We can add up the fractional values for the 1s, 2s, and 3s segments on both sides of the average and predict that 99.87% of our data should fall within ±3s of the average value (the numbers in Figure 1 lack enough decimal places to get this number). For composition A, virtually all of our values should be between 35,300 ± 6354 psi (3 x 2118), or between 28,946 and 41,654 psi. For B, our values can be expected to lie within the range of 40,100 ± 13,233 psi, or between 26,867 and 53,333 psi. On the low strength end of each group, we can actually expect a few of the B composition samples to have lower strengths than the A group, and we could actually predict how many. If we had a critical application where a few lower strength pieces might be catastrophic, we might actually pick composition A over B even though it has lower average strength. Remember, we tested only 25 samples (it is a destructive test), but if we had produced 1000 pieces of each, we would predict only one value in each group to be outside the 3s ranges given above.
These predictions are based on the assumption that we have normal distributions, which can be difficult to determine using only 25 pieces in each group. However, these analyses can still provide us with quite a bit of information about our compositions.
Please be forewarned that statistical analysis in its fullest is complicated, abstract and fraught with many avenues of misapplication and misanalysis. Our series of articles will not make you enough of expert to apply statistical tools to the complex ceramic manufacturing processes; however, they should give you a basis for understanding this useful tool.