Managing Alarms for Best Results

Alarm Decision Making Logic

Predictive maintenance systems are composed of condition based monitoring system that generate alarms to signify anomalies from the baseline normal or healthy operation. These anomalies are the predictive "failure" or asset health component of the system. The challenge is not all anomalies are failure related. So the key is determining if an alarm is failure related. This requires additional inspections, diagnostics to determine if a failure is underway and when to release for a repair. This process was summarized in the PRIOR POST: The Monitron Pilot: Managing Alarms for Best Results - Part 1 and shown in detail below in the form of a complex logic diagram. This logic is applicable to all sensor-based predictive maintenance systems.

WHY IT MATTERS: All Predictive Maintenance (PdM) systems experience this challenge regardless of the underlying AI-based algorithmic technique or python library used. Managing this variabiltiy is the key to success, and this involves combining the AI generated alarms with human technicians making the final decisions with addtional data. Managing variability and it is the human interpretation that makes alarm management as much art as science.

REVIEW of PART 1 which sets context for detailed logic shown below in PART 2

Alarm problem categories - the job of alarm mangement is also complicated by :

Not all alarms are failure related, so they can be triggering a valid anomaly.
Valid alarms can easily be generated 30 days in advance of actual repair requirement.
In the early weeks of a valid alarm, causality can be undetectable, so it is diffcult to determine if it is valid.

The Monitron alarm system has been setup to facilitate productive decision making in this art and science. The first step is understanding how alarms states flow through the Monitron system and how they drive the art and science of decision making.

NOTE: Use the Monitron online documentation as your primary reference. See Amazon Documentation: Understanding warnings and alerts. The following tips are meant to be supplemental to the Monitron documentation.

Alarms Show in the Monitron Dashboard - they are the key indicator to alert technicians to a possible anomaly. Alarms must be validated by supporting data.

Sample Monitron Mobile Dashboards Showing Alarm States

Alarm States – The following table describes the alarm status for each sensor. In Monitron, sensors are assigned to a position on the asset.

The sensor indicates this position on the asset is Healthy. All measured values are within their normal range, no alarms have been triggered.

A warning has been triggered by the sensor at this position indicating early signs of a potential failure condition. We recommend that you monitor the equipment closely and initiate an investigation during an upcoming planned maintenance event.

An alarm has been triggered by the sensor, indicating that the machine vibration or temperature is out of the normal range. We recommend investigating the issue at the earliest opportunity. An equipment failure might occur if the issue isn't addressed.

The alarm state of the sensor at this position has been acknowledged by a technician, but not yet completed. It signifies to the maintenance team that a more formalized investigation will begin.

Changing Alarm States in Dashboard – The Monitron dashboard is used to manage the state of alarms.

To confirm that you are aware of the alarm issue, choose Acknowledge to place the alarm in maintenance state.

After an abnormality has been acknowledged and repaired, Resolve the issue in the mobile app.

Alarm State Progression – Changes to alarm states occur in two ways:

Automated Changes – Steps 1& 2 – these alarms are generated and managed by the platform. For Monitron, alarm states are set by ISO Vibration standards, machine learning algorithms for vibration and temperature. Decisions are based on inspections shown in the yellow highlighted area.
Technician Changes – Steps 3&4 – they are managed by a human maintenance technician. In the Monitron dashboard, the Acknowledge button changes the alarm state from Warning / Alarm to Maintenance. The Resolve Alarm button closes out the Maintenance alarm state and changes it back to Healthy. Decisions are based on diagnostics inspections shown in the blue highlighted area.

START of PART 2

Maintenance Alarm Detailed Decision Logic – Five key decisions are required for every alarm. Completion of these steps with the Resolve button resets ths alarm back to Healthy.

Step 1 – Issue or Update Work Order: There are two approaches based on maintenance culture:

Structured approach: a formalized WO is issued and/or updated every time a technician visits an asset. With multiple visits to an assets, these are typically updates to the existing WO. This is managed within the CMMS.
Unstructured approach: a more informal approach with ad-hoc spreadsheet documentation. Some organizations do not want a WO released until specific repair is planned or structured PM cycles are in place.

Step 2 – Inspect Asset: The technician creates a mix of classic diagnostics to complement the sensor data, including:

Vibration based spectral frequencies
Acoustical frequencies
Heat or infrared signatures

Step 3 – Failure Found: the inspection verifies a failure is under way with one of the two decisions:

Schedule for repair during a planned downtime (unless failure imminent). If using an unstructured WO approach, issue a repair WO.
Continue monitoring the failure underway, but not ready to determine if this is a false positive. Many alarms start well in advance of the need to start a repair.

Step 4 – No Failure Found: there are one of two decisions:

Continue monitoring - wait for a failure to develop. It is common for the analytics in the vibration platform to indicate failure underway weeks before it shows with diagnostic data.
Determine if false positive (false alarm) - after a period of monitoring and additional diagnostics support that a failure was not underway and likely alarm caused by other factors including transients in operation.

Step 5 – Resolve Alarm Status: switch alarm state from Maintenance back to Healthy

Document alarm case for leadership.

PRIOR POST: The Monitron Pilot: Managing Alarms for Best Results - Part 1

NEXT POST: The Monitron Pilot: Baxter Realizes 60x ROI

GetIQ.ai is a blog about building AI solutions for augmenting decision making and empowering people that make them. It's authored by the engineers at DecisionIQ.

ABOUT DECISIONIQ: We are "boots on the ground" factory engineers expert at the adoption of AI and machine learning into operations. As consultants and system integrators we bring our experience in a mix of a well structured programs that enable our clients to produce winning results and maximum ROI. You will learn to use AI to build competitive advantage by increasing plant uptime, quality and yield. With 25 POCs, Pilots and 16 deployments we have achieved for our client’s ROIs as high as 60x and a cumulative $22 million in savings.

AN AWS PARTNER: We provide capability in design of AI solutions for industrial applications. We work with the AWS Industrial IoT services stack, cloud migrations and Amazon Monitron Pilots and deployments. For qualifying customers, many of these capabilities are eligible for AWS subsidized funding.

Get IQ.ai

Managing Alarms for Best Results – Part 2

Recent Posts

コメント

From the engineers at DecisionIQ