Running any operation is no simple task.
Malfunctions, breakdowns, and even failure in processes can throw a wrench into the works of your company and cause a wide array of problems, including delays, unfulfilled orders, loss in revenue, and more.
Unfortunately, it’s not always clear where those wrenches originated, making fixing and preventing similar situations much harder.
Root Cause Analysis (RCA) is the process of finding underlying causes for an observable effect, like in the cases described above.
So, in other words, RCA is what allows you to put your Sherlock Holmes cap on and start finding solutions for the issues your company faces.
In this article, we’ll discuss everything you need to know about Root Cause Analysis to help you run a reliable operation, including what it is, how it’s used, its types, and how to perform each one.
Root Cause Analysis explained
As explained above, Root Cause Analysis does exactly what it says on the tin – analyzes the root causes of an observable fact or issue. It seeks to answer 3 questions:
1.) What happened?
2.) Why did it happen?
3.) How can we prevent it from happening again?
Initially, RCA is a reactive process investigating the source of a problem.
However, following successful identification and resolution of the issue, RCA becomes a proactive mechanism, seeking to prevent the same thing from happening again.
It’s worth mentioning that RCA can also be used to identify and replicate desirable outcomes.
This process is often called the “3 R’s of RCA”:
1.) Recognize an issue and investigate its root cause
2.) Rectify the underlying problem
3.) Replicate the process to prevent similar situations or improve processes
Typically, RCA will determine the cause to be either physical, human, or organizational.
What are the limitations of Root Cause Analysis?
You should keep three limitations in mind if you’re considering using Root Cause Analysis for your business.
Firstly, RCA is a complex process that requires a lot of field experience and resources for data collection, review, and resolution.
Secondly, RCA results only in strong correlation, not causation.
Thirdly, if you only fix the symptom of an issue, there are high chances the problem will pop up again, and you’ll need to start the entire RCA process from scratch.
Where is Root Cause Analysis used?
Root Cause Analysis can be helpful in a variety of industries. However, it’s primarily used in more technical fields. These include:
- Risk & Safety
- Manufacturing
- Change Management
- Information Technology
- Pharmaceutical Research
- Complex Event Processing
- Industrial Engineering & Robotics
- Industrial Processes & Quality Control
- Disaster Management & Accident Analysis
What are the types of Root Cause Analysis?
Root Cause Analysis can be divided into 5 types based on the processes and problems it focuses on. These are:
- Safety-based RCA focuses on health risks and safety and analyzes workplace accidents.
- Production-based RCA focuses on manufacturing and quality assurance.
- Process-based RCA focuses on business and manufacturing processes.
- Failure-based RCA focuses on engineering and maintenance and analyzes equipment breakdowns.
- Systems-based RCA combines multiple RCA types, typically investigating the relationship between equipment failure and human or process error.
When should you do a Root Cause Analysis?
Typically, you’ll want to carry out a Root Cause Analysis if you detect an issue in an aspect of your business related to the ones described above.
However, RCA works especially well for persistent faults, critical failures, and exploring failure impacts.
How to do Root Cause Analysis
Root Cause Analysis is a complex process. The first step is knowing what it can do and when it should be used.
But now that we’ve gotten that out of the way, it’s time to really get into the meat of things.
In this section, we’ll outline the 4 core steps to carry out a Root Cause Analysis successfully.
1.) Define the problem
The first thing you should do when starting RCA is define the problem itself, its symptoms, scope, and the direction of the analysis you want to take (i.e., specific aspects of the machine or process you want to investigate).
As part of this initial step, we highly recommend you write down a “problem statement” that answers the following questions:
- How would you describe the problem?
- How does it manifest? / What can you see happening?
- What are the symptoms?
2.) Collect data
The second thing you should do is collect all the relevant data you can.
This includes proof of the problem, duration, impact, and anything else you can think of.
As we mentioned earlier, this can take up a significant amount of time, so our tip is to utilize some of the other technologies and processes you have at your disposal.
Predictive and preventive maintenance checklists, in particular, can prove extremely helpful, as they can either identify issues ahead of time or offer insights into causes present in previously solved breakdowns.
3.) Map out the events
Next, you should create a chronological timeline of the events before and after the problem’s manifestation, as this can offer vital information about potential factors and impacts.
Furthermore, it also helps differentiate causal events from non-causal ones.
Ask yourself the following three questions to get started:
- What sequence of events made the problem possible?
- What conditions are or were present?
- What other issues influence the situation?
Once you have a good grasp of the situation, we highly recommend you create a causal graph like the one shown below.
4.) Solve the root issue
Finally, having identified the root cause of the issue, all you need to do is fix it.
However, if the same problem pops up again down the line, it should signal that you made a mistake somewhere in the process, and you will have to do the Root Cause Analysis all over again.
Root Cause Analysis techniques
Having explained how to carry out a Root Cause Analysis, it’s time to look at some of the most effective RCA techniques.
Their usefulness varies across industries and specific use cases, as each has its own “pros and cons”.
The goal of this section is not to show you the best single solution but to help you compile a set of RCA techniques that best match your business.
In this section, we discuss the following 5 Root Cause Analysis techniques:
- The “5 Why’s” Analysis
- Fishbone / Ishikawa Diagram
- Failure Mode and Effect Analysis (FMEA)
- Fault Tree Analysis (FTA)
- Pareto Charts
The “5 Why’s” analysis
Initially created by Sakichi Toyoda for Toyota’s Root Cause Analysis, the entire concept of this technique is based on emulating a child-like mentality and asking “why” until you reach the answer you’re looking for.
Benefits of the “5 Why’s Analysis”:
It is a simple and effective approach to identifying a root cause. Furthermore, it helps reveal chains of events and relationships between different factors and problems.
When to use the “5 Why’s Analysis”:
This technique is best suited for simple to moderately-complex issues. It is most effective in situations where human error is involved.
For an example, Nicole Zabel, a software developer at Key2Act, uses process similar to 5 Why’s when finding and resolving bugs.
“When working on software bugs, I to keep asking why something is happening until the main root of the issue is identified. I try to break things down into smaller increments, start with changes at a simple level, prove them out, and then slowly and methodically increase the complexity,” Nicole Zabel, Software Developer at Key2Act.
Fishbone / Ishikawa diagram
Another Japanese invention, this technique gets its name from its creator Kaoru Ishikawa, who created it as a way of assessing process quality in the shipbuilding industry.
Its other name is based on the distinct shape, as you can see below.
The Fishbone / Ishikawa Diagram considers that multiple factors can contribute to the same problem.
This is sometimes also called the 5 “M” Framework, as it focuses on man / mind power, machines, measurement, methods, and material.
Benefits of the Fishbone / Ishikawa diagram:
This technique is excellent for brainstorming, as it helps with visualization. It can also reveal bottlenecks and room for improvement in the process.
When to use the Fishbone / Ishikawa diagram:
It is best suited to solving complex issues and finding alternative viewpoints.
Failure mode and effect analysis (FMEA)
The Failure Mode and Effect Analysis is a proactive process that combines reliability and safety engineering with quality control to predict issues by analyzing past data.
It is a highly complex technique that demands the assembly of a diverse, cross-functional team to asses each aspect, system, and process individually.
In the process, it considers function, necessity, etc,
FMEA assigns each aspect a Risk Priority Number (RPN) based on how likely it is to fail.
The RPN is based on an equation of Severity x Occurrence x Detection.
Each business assigns its personal threshold at a certain height, and if an aspect’s RPN rises above that boundary, it knows to tweak it.
Benefits of the failure mode and effect analysis (FMEA)
This technique enables the early identification of issues.
It also helps collective or tribal knowledge, reduces process development time and cost, and improves quality, reliability, and safety.
When to use the failure mode and effect analysis (FMEA)
This technique is perfect for designing new processes or updating old ones, planning for quality improvements, or just better understanding process issues.
Fault Tree Analysis (FTA)
This technique uses so-called “boolean logic”, based on using the words “and”, “or”, and “not”.
It was designed to help map out the relationships between faults and the subsystems of a machine.
Benefits of Fault Tree Analysis (FTA)
One of FTA’s biggest advantages is that it uses deduction to find causes.
It highlights critical elements related to machine failure, helps visualize the relationships, promotes effective communication, and even accounts for human error.
When to use Fault Tree Analysis (FTA)
FTA works best when we know the effect of a particular failure. It is helpful for designing new solutions and finding faults in “fault-tolerant systems”.
Pareto charts
Also known as the 80-20 rule, this technique helps indicate the frequency of defects and their cumulative effects.
Pareto Charts propose the idea that 80% of all malfunctions are caused by 20% of parts. They can also help with identifying a common theme between issues.
Benefits of Pareto charts
This technique helps rank defects in order of their severity and can determine their cumulative effect.
When to use Pareto charts
Pareto Charts are best used in analyzing problems in processes that deal with the frequency of failure, time, and cost. They can also help with narrowing down a list of potential causes.
How to get started with Root Cause Analysis
Unlike traditional paper-based methods, digital solutions can be deployed at a company-wide scale at nearly a moment’s notice, help prevent human error, withstand harsh weather conditions, and are eco-friendly to boot.
There, these technologies can play vital role in modern Root Cause Analysis.
By using digital forms and questionnaires, you can vastly improve the efficiency and accuracy of your data collection.
They integrate with your Customer Management Software (CMS) and Enterprise Resource Planning (ERP) systems to create a holistic view of all your vital data in one place.
Thanks to that, you can make important data-driven decisions and, yes, even carry out Root Cause Analyses more efficiently.
So our last tip for today is this: before you start with Root Cause Analysis, take a moment to think about the tools and systems you’re currently using.
And if you’re somehow missing a CRM, ERP, or a means for digital data collections, you may want to invest there first.