How many times have you been faced with estimating the average weight, length, width, or specific gravity of a molded part in a supplier's lot? While 100% inspection is often recommended, it is not an option when destructive testing is applied or when time is an issue. Likewise, if an engineer is troubleshooting a process, he or she would like a rough answer quickly than a precise answer in a day or two. Both of these questions can be addressed with effective sampling.
In order to determine the most effective sample size, one needs to know how accurate the determination needs to be and the variation within the population. How accurate is expressed in the units of measurement. For example, if one is measuring the temperature of a water bath, the accuracy could be to the nearest 5F, 1F, 0.1F, 0.01F, etc. The higher the accuracy, the greater the sampling requirements. If the required accuracy is unknown, then compare it to the specification. If a part has a specification that is 5 - 10 units, then the variable should be known to the nearest 1 unit, and not the nearest 0.1 since it presumably adds no value.
Next, one must estimate how correct the final determination must be. This is the level of confidence. The most common values are 90%, 95%, 99%, or 99.9%. For typical troubleshooting, 90% is usually more than sufficient. If this is for a product specification question, then 95% is considered acceptable, with 99% for critical values. 99.9% level of confidence should only be used when the prediction must be correct 999 times in 1000. This is rarely used for all practical purposes.
Finally, the standard deviation of the population must be estimated. This can be quite tricky. Larger standard deviations give more samples, so if there is no prior knowledge, try estimating variation as 10% of the target. This is usually more than enough to estimate the worse case variation. If the process has some historical data, then use that value. If there is time and money, 15 or 30 samples can be taken and measured and a sample standard deviation used to represent the population standard deviation. If the samples are truly random, this is usually the best alternative.
Using the last three pieces of data, the number of samples can be calculated using the rearranged formula for a confidence interval:
n = (z * s / d )
2Where s is the standard deviation, d is the accuracy, and z is a number based on the level of confidence. For reference, some values of z with its corresponding level of confidence are:
| Level of confidence | z |
| 90% | 1.65 |
| 95% | 1.96 |
| 99% | 2.58 |
| 99.9% | 3.30 |