Home     Consultant Biography     Testimonials     Typical Client Results     Papers     Downloads     Services     Contact      
You Can Rely on It
Improving equipment reliability is a key component of any lean manufacturing initiative. Unreliable equipment is often compensated by larger inventories. If a machine can be down for as much as a week, then most planners will add that to their safety stocks. Poor equipment reliability also reduces available capacity. Most equipment is budgeted with a certain amount of downtime. Oftentimes, this is based on manufacturers or engineering recommendations. If the equipment is less reliable then expected, then overall capacity is diminished and must be compensated with options such as more overtime. A lengthening of the overall supply chain is another method that planners use to compensate for poor equipment reliability. By forcing orders to be placed further out, planners can reduce the risk of inadequate customer service brought on by unreliable equipment.

Measure the reliability
The first step in improving equipment reliability is measuring the overall system availability. System availability is the probability that a piece of equipment will be ready to run when it is scheduled. Measuring system availability is relatively simple. Over a period of a few production runs, the time of every start and stop is collected. This can be done manually with pen and paper or electronically with a programmable controller. About thirty runs over a series of days, shifts, and products will give the best representation of the data without excessive cost.

The difference between each start and its respective stop defines the run length. The average of all of these is the mean time between failure (MTBF). This represents the average time the equipment can run before it is stopped. Some of this will be driven by the physics of the equipment: it runs out of materials every four hours and must be stopped to replenish or a jam happens every five minutes. Regardless, all stoppages must be documented.

The average difference between the stop time and the next start time is the mean time to repair (MTTR). This represents the length of time it takes to bring equipment back up and running. This also can have a great deal of variability due to the nature of the stoppage. Even a simple equipment jam can take minutes or days to clear depending on how the equipment was designed.

With the MTBF and the MTTR, the system availability is computed as follows:

A = MTBF / (MTTR+MTBF)

Larger values of A indicate more reliable equipment while smaller values less reliable. For example, a system with an availability of 0.90 has a 90% probability of being up when needed. Another way to interpret this is that the equipment is “up” 90% of the time or unavailable 10% of the time.

The system availability may be very different from the measures already reported by the manufacturing operation. Many manufacturers remove lunch breaks, planned downtime, weekends, etc. from their production numbers. This is not an accident. With management pressures to continuously improve, there is a tremendous incentive to maximize yield in a monthly report. This measure includes everything, which in some ways is more accurate. Just because manufacturing chose not to operate a piece of equipment does not imply that the equipment is not taking up air-conditioned space on a plant floor. There are definite fixed costs associated with system availability that are sometimes forgotten.

Increase the Run Time
Based on the equation for system availability, one method of improving the availability is to increase the mean time to failure or the run length. It will come as no surprise to many, but increasing the preventive maintenance frequency on the system is one method to improve system reliability. This may seem obvious, since almost all equipment has preventive maintenance scheduled and completed. But how many times is the schedule increased in response to failure data? The schedule is almost never increased. Because this is a source of “downtime,” most manufacturing operations try to reduce preventive maintenance. Another reason why preventive maintenance is ineffective as it is usually scheduled when convenient for the operation, but not necessarily when the equipment requires it. Many times preventive maintenance is forced into a quarterly, semi-annual, or an annual schedule although it may be required hourly, daily, or weekly. How does one know this? By looking at the repair logs and spare parts usage for a piece of equipment one can determine if the preventive maintenance must be increased. If a machine goes down every four months with an O-ring failure, and it is on an semi-annual preventive maintenance schedule, then there is a disconnect between the “voice” of the process and the plant maintenance procedures.

Another method for increasing the mean time to failure is to add redundant systems. For example, a client had a piece of large electronic equipment that required external cooling. Every time the air conditioning failed, the machine overheated, blew a card, and then went down for weeks. If it was caught before the failure, a portable cooling system would then be used (i.e. a window fan). Why not install a permanent fan so that when the air conditioning died, the system would not go down? Redundant systems do not necessarily require expensive tooling or machine upgrades. When applied as inexpensive workarounds or assists, they can substantially improve reliability without too much interference. Another example of a redundant system is enabling equipment to be restocked while it is running. Many times, stable runs are interrupted when a material outage occurs. This can be avoided by having two bins, one actively being used by the system and the second in a safe area so it can be restocked.

The third method to improve run times is to optimize the run conditions. Most equipment has a preferred operating range for given material and system parameters. Oftentimes, the highest power or fastest line speeds are chosen as “optimal,” even though the system may have triple the stoppages due to jams or equipment breakdown. Conversely, if there is insufficient demand to meet available capacity, equipment speeds may be reduced to compensate for less demand for shops with hourly pay scales. Setpoints should be chosen to maximize overall yield over the long term. This can be accomplished through techniques known as statistically designed experiments or design of experiments (DOE). Historical data can also be used to determine the optimal run parameters; however, this assumes that at some time in the past, the optimal run was determined which is often not the case.

Reduce the Repair Time
As an alternative to improving the overall system availability, the repair time can be shortened for failures enabling quicker recovery. The computer industry is an excellent example of this. Because individual components are very complex, each personal computer is comprised of removable cards and components. When the system fails, the components can be quickly swapped out until the failed item is discovered and isolated. In most warranty situations, the actual failure mode is not even identified as the cost of discovering the failure is higher than the repair cost.

One of the easiest ways to reduce repair times is to maintain an appropriate inventory of spare parts. Spare parts management are often neglected in modern manufacturing and left to individual maintenance personnel or a central parts shop. In both cases, personnel have to get approvals to purchase any item and often stock based on price, not need. For example, parts shops often have a plethora of tie wraps, O-rings, wiring, and other low cost items, but how often do they stock multiple drives, programmable controllers, power supplies, long lead time or other high cost items? They may carry one at best. Most times, they will not carry any as they are rewarded for minimizing inventories while not penalized for excessive equipment downtime. The best method to address this problem is for an organization to first recognize this is a logistics area and must be managed like one. That means incorporating lead times, failure rates, and inventories into the management of spare parts.

Another method for improving repair times is the implementation of “quick change” parts and procedures. Quick change parts are usually modifications to the system design to make the substitution of components quicker. This can include replacing solid parts with assemblies, installing additional valves to quickly route or drain fluids, or substituting screws with clamps. Procedures usually entail making sequentially executed operations parallel. Instead of just delegating the repair or changeover to a single person, procedures can be developed that enable a team of two or three to execute the repair five or six times faster by doing operations in parallel. For example, if a change from one product to another requires rerouting some piping, hand cleaning a tank, and restocking the machine with new labeling, a single person would do each one at a time, clean up after each step, and take longer than three individual people doing each step simultaneously.

Maintenance personnel and engineers are oftentimes tempted to make ad hoc system upgrades or improvements without any measurable benefit. This “helpful elf” syndrome is often sold under the guise of continuous improvement. In reality, most of these changes make no difference and consume both money and time to implement. How can one identify operations where this is prevalent? This is very easy; maintenance personnel and engineers are often proud of their achievements and never bashful about discussing them. If the business does not show reduce downtime or increased yields, then their changes had no effect. Even worse, without a policy of verifying the effectiveness of equipment changes, changes will sometimes be implemented that increase downtime. However, this is usually quickly spotted and not in place for long. The larger issue is how are changes being implemented without verification? This problem can also be aggravated by a compensation system that rewards based on tasks and not results. If an engineer knows his or her bonus is tied to the completion of more projects, then there is a large incentive to initiate projects regardless of the need.

Conclusions
In conclusion, equipment reliability can be measured and improved by addressing its average time to failure and repair. By implementing or improving preventive maintenance schedules, optimizing run conditions, and using redundant systems, the average run length can be extended. Likewise, increasing part inventories, developing quick change parts and procedures, and eliminating needless system upgrades can improve the average repair time leading to an overall improvement in system availability.