The intelligent alarm management system - IEEE Software - IEEE Xplore

7 downloads 0 Views 453KB Size Report
Most alarms are of little value to opera- tors and are considered nuisance alarms. (operators would do nothing except press the button to acknowledge and si-.
feature

user interface design

The Intelligent Alarm Management System Jun Liu, Khiang Wee Lim, Weng Khuen Ho, Kay Chen Tan, Rajagopalan Srinivasan, and Arthur Tay, National University of Singapore

larm systems are the most important tools that process plant operators use to improve plant performance and monitor plant safety.1–4 However, a recent survey by the UK Health & Safety Executive (HSE) of alarm systems in the chemical and power industries shows that

A ■

Nuisance alarms often clutter and obscure a plant operator’s view of critical information, with potentially severe consequences. The Intelligent Alarm Management System suppresses nuisance alarms and provides advisory information to help operators focus on important alarm information and take correct, quick actions. 66

IEEE SOFTWARE





Process plant operators are often heavily burdened with alarms both during steady-state operation and following plant upsets (one alarm every two minutes in normal operation and more than 100 alarms in the 10 minutes after an upset). Most alarms are of little value to operators and are considered nuisance alarms (operators would do nothing except press the button to acknowledge and silence them). Few standing alarms (those that remain in alarm status)—perhaps only approximately 6 percent during normal operation—relate to active operational problems.5

This survey also shows that poor performance of existing alarm systems actually puts lives at risk and contributes to major plant damage, production loss, and environmental impact, leading to losses of millions of UK pounds. A typical oil refinery suffers an estimated loss of three to 10 million

Published by the IEEE Computer Society

pounds per year that might be avoidable with better alarm systems. Similar findings in other countries corroborate that inadequate alarm systems are a crucial problem.6 The Abnormal Situation Management Solution Concepts Team estimated that the US petrochemical industry alone could save up to $10 billion annually by avoiding or at least better dealing with abnormal situations.6 So, introducing intelligent alarmhandling knowledge to manage alarm systems is becoming increasingly important. We’ve developed the Intelligent Alarm Management System for suppressing nuisance alarms and providing advisory information to help panel operators focus quickly on important alarm information and take quick, correct action. This research is a collaboration between the Chemical and Process Engineering Centre, the National University of Singapore, a major Singapore refinery, and Yokogawa Engineering Asia. This multidisciplinary project brings together expertise including plant personnel, electrical and chemical engineers, computer 0740-7459/03/$17.00 © 2003 IEEE

Yokogawa CENTUM CS/CS-PLUS Vnet

Distributed control system Vnet Data I/O (through HISOPC server) IAMS Statistical Analysis

IOP Analysis

Nuisance HI/LO Analysis

Criticality Analysis

Monitor & Recover

Alarm/ Trend/ Knowledge Database

Ethernet

Data I/O (through EXAOPC server) Engineering Engineering Information workstation workstation command station

Failure protection system

Standing Alarm Analysis Client

IAMS graphical user interface

FPS

FPS



Statistical Analysis counts the alarm numbers in real time using a moving

Vnet

CENTUM CS/CS3000

Field control station

Ethernet using TCP/IP

IAMS We developed IAMS using Microsoft Visual C++ and incorporating special-purpose algorithms, process knowledge, and control system expertise. IAMS runs around the clock on the Human Interface Station, a dedicated Windows NT PC on top of a distributed control system (DCS). Figure 1 shows the IAMS framework and the hardware environment. The alarm system consists of three main blocks: the DCS, IAMS, and a failure protection system (FPS). A Yokogawa CENTUM CS/CS-PLUS DCS controls the refinery plant. Vnet, a Yokogawa proprietary protocol, is the control bus for the DCS internal communication. OPC (OLE for Process Control) servers facilitate communication between the DCS and the IAMS or FPS. The OPC servers HISOPC and EXAOPC let IAMS and the FPS access real-time process data from the DCS. The FPS is also written in C++ and runs as a background task on the NT station (on top of the DCS) that acts as EXAOPC. Communication between the IAMS and FPS occurs over Ethernet using TCP/IP. IAMS consists of the GUI, the Data I/O function, the Alarm/Trend/Knowledge Database, and six dedicated subblocks:

HISOPC server

Server FPS interface

(a)

scientists, and a control system vendor.7–10 Results show that IAMS can effectively suppress nuisance alarms, thus increasing the alarm system’s efficiency.

EXAOPC server

IAMS

(b)











window for different time periods, different tags, different message types, and different alarm statuses. Nuisance HI/LO Analysis analyzes process high or low alarms and suppresses repeating high or low alarms. (For more on process alarms, see the sidebar.) IOP Analysis identifies the cause of input open (IOP) alarms (see the sidebar) in real time and suppresses nuisance IOP alarms. Criticality Analysis gives a criticality tag (very important, important, less important, or calculation-related) to each alarm message. Standing Alarm Analysis shows standing (current) alarms, warns the operator for ramping alarms (which are triggered by a process variable that is in high or low alarm status and keeps on increasing or decreasing), and resets standing nuisance alarms. Monitor & Recover shows changes made to DCS alarm settings and auto-

Figure 1. The Intelligent Alarm Management System (a) framework and (b) hardware environment.

Process Alarms Most process variables have three upper and three lower limits. For the upper limits, the lowest limit will give the high alarm, the middle limit will give the high-high alarm, and the highest limit (which is usually the end of the measurement scale) will give an input open alarm. For the lower limits, the highest limit will give the low alarm, the middle one will give the lowlow alarm, and the lowest limit (usually the scale’s other end) will give an input open alarm. A process alarm is related to a process variable other than the manipulated variable.

March/April 2003

IEEE SOFTWARE

67

(d) (b)

(a)

(c)

Figure 2. The IAMS graphical user interface: (a) information window, (b) toolbar, (c) button area, and (d) menu bar.

matically restores alarm settings when the nuisance status is cleared or a measurement instrument is repaired. IAMS communicates with the DCS through the Data I/O function, which reads real-time process trend data and alarm messages from the DCS. The Alarm/Trend/ Knowledge Database uses the real-time process data and alarm message to build a historical database. Then, the six subblocks use the historical database together with process knowledge (such as predefined key process variables) to count alarm numbers, do criticality analysis, identify nuisance alarms, and so on. Finally, the subblocks send new alarm settings down to the DCS through the Data I/O function to suppress nuisance alarms and provide online advisory information through the GUI to assist the operators. The GUI The powerful yet user-friendly GUI (see Figure 2) makes all functions easy to use. It displays guidance information, criticality, statistics, and an alarm management overview, and lets the operator suppress nuisance IOP alarms. It consists of the information window, toolbar, button area, and menu bar. The toolbar and button area have the same function. From left to right, the tool68

IEEE SOFTWARE

h t t p : / / c o m p u t e r. o r g / s o f t w a r e

bar buttons correspond to the buttons in the button area. When the user moves the cursor to a button in the toolbar or button area, an explanation of the button appears in the window’s bottom line. Furthermore, buttons in the button area are grouped for easy operation. The STA button starts the system; STP stops it. SPR enables and RST disables the alarm suppression function. However, SPR and RST will become invalid when communication between IAMS and the FPS fails, because the alarm suppression function is automatically disabled. CFG initiates configuration work. SFT provides a control loop status report over the last work shift. MTN provides maintenance information. MON provides monitoring information (which alarm settings have changed or need to be changed). IOP suppresses nuisance IOP alarms. GID shows guidance information (that is, the information in Figure 2) and all critical alarms that occurred. AMO shows an overview of the alarm management system. SEC, MIN, HR, DAY, and SPE show alarm statistics reports for the last 10 seconds, 5 minutes, hour, day, or a specific time period. CRIT, CAL, ORD, IMP, and EMG show all, calculation-related, ordinary, important, or critical alarm messages. STD shows standing alarms. Database management An external or network database for realtime alarm messages and second-interval process trend data is not directly available. So, the Alarm/Trend/Knowledge Database incorporates databases for alarm messages and trend data. The alarm message database is updated every few seconds, and it records all alarm information for the last hour. The process trend database is updated every few seconds; it records only the necessary trend data for the last few minutes for those process variables being monitored (including alarm status, alarm suppression status, recently occurring status, and so on). When a process variable is no longer being monitored, IAMS automatically removes the related trend data from the process trend database. In this way, the optimized built-in database will not be large even if there are many process variables and alarms. So, IAMS overcomes the scalability problem and works for any large-scale process plant.

Alarm analysis and suppression Our study of refinery alarms revealed two types of nuisance alarms: ■



Repeating nuisance alarms. The same alarm continually arises and clears over a period of time. Standing nuisance alarms. The alarm signal persists for a long time owing to the dead band (the difference between the alarm occurring level and the alarm recovering level).

The DCS often uses the dead band to suppress repeating nuisance alarms. However, the dead band will also delay alarm recovery and cause standing nuisance alarms, so it cannot be too small or too large. The current DCS alarm system has a dead band of 2 percent of the scale yet still has many repeating and standing nuisance alarms. So, we developed advanced algorithms and approaches for IAMS to further suppress repeating and standing nuisance alarms on the basis of real-time process data and statistics. Repeating nuisance alarms include repeating process high or low alarms as well as repeating IOP alarms. Three instances of the same alarm within five minutes are considered repeating nuisance alarms. Every 10 seconds, IAMS counts the alarm number for each alarm over the last five minutes and calculates the online moving average and standard deviation of the process variable related to the alarm. As long as IAMS detects repeating process high or low alarms, it dynamically changes their alarm setting according to the moving average and standard deviation to keep the process variable in alarm status. IAMS will restore the alarm setting to its original value in either of these situations: ■ ■

The process value leaves the original alarm setting. The original setting is better than the new one for a given time period.

tor can choose to put the corresponding process variable into alarm-off status to mask it. This variable will automatically revert to alarm-on status if it goes back to normal status (within scale range) for a period of time. The operator can also manually turn an IOP alarm to the alarm-on status whenever desired. Again, this ensures plant safety. A process high or low alarm that stands for more than six hours with its process trend data in the dead band for the last five minutes is considered a standing nuisance alarm. Every 10 seconds, IAMS checks standing alarms and the corresponding process trend data to detect such alarms. As soon as IAMS detects a standing nuisance alarm, it automatically resets the alarm online by temporally changing the alarm setting to counteract the dead-band effect and then quickly reverting the alarm to its original value. In this way, IAMS removes standing nuisance alarms from the standing alarm list on the panel without any negative effect. In this article, alarm suppression has different meanings for different nuisance alarms. Suppression of repeating high or low nuisance alarms means to reduce or eliminate the number of repeats (one alarm is still standing, but fewer or no repeats occur). Suppression of a standing nuisance alarm means to reset or clear it. Suppression of repeating IOP nuisance alarms means to mask them (alarms are still generated but are invisible from the operator panel). Intelligent advisory information This information guides the panel operator toward the most appropriate response. Such guidance includes ■ ■



This ensures that normal alarm function (not a repeating nuisance but a true alarm) is unaffected so that plant safety will not be compromised. IAMS uses a different method to suppress IOP nuisance alarms because their generation mechanisms are different. Once IAMS has identified an IOP nuisance alarm, the opera-

We developed advanced algorithms and approaches for IAMS to further suppress repeating and standing nuisance alarms on the basis of real-time process data and statistics.





Telling the operator which alarms are emergent or critical Helping the operator differentiate between signals indicating that a process value is out of range and nuisance signals due to faulty instruments Giving the operator early warning of alarms that will lead to violation of high-high or low-low limits (for more on these limits, see the sidebar) Helping the operator with shift changes (handover) by providing alerts on loops requiring attention (for example, control loops that changed and should be reset) Providing an online maintenance report March/April 2003

IEEE SOFTWARE

69

stores the changed IAMS alarm settings to their original values according to the original alarm settings stored in the FPS.

Number of alarms

4,000 3,000

Results Since June 2000, IAMS has been operating continuously on two OPC stations in a major Singapore refinery. The system monitors thousands of process variables and alarm messages. As Figure 3 shows, IAMS has greatly reduced the number of alarms during normal operating conditions. During normal, steady operating conditions, the average alarm rate was approximately

2,000 1,000 0

Before project

During IAMS development and test

Continuous running after testing

Figure 3. The reduction in number of alarms achieved by IAMS at a Singapore refinery.





of problems such as a bad transmitter, a broken wire, or an outdated alarm setting Providing complete alarm statistics and an alarm management overview that are useful for analysis, evaluation, and reconfiguration

Failure protection system IAMS will send the previous alarm settings to the FPS whenever it changes an alarm setting on the DCS. IAMS will also check the connection status with the FPS regularly. If the connection is down, IAMS restores all the alarm settings that it has changed, switches off all nuisance alarm suppression functions, and informs the operator. As soon as the connection is reestablished, IAMS resumes nuisance alarm suppression. The FPS also checks the IAMS watchdog tag on the DCS. When IAMS fails to communicate with the DCS, the watchdog tag is not updated. The FPS detects this and reFigure 4. Alarm numbers with and without IAMS.





One alarm per minute during the month before our project, which the HSE survey considers “difficult to cope with” One alarm per 1.5 minutes during the IAMS test (about 14 months), which the survey considers “likely to be overdemanding” and close to the industrial average One alarm per three minutes when IAMS ran continuously, which the survey considers “should be manageable”

Furthermore, during normal, steady operating conditions, IAMS reduced alarms that were standing for more than half an hour by approximately 20 percent every day. During abnormal operating conditions (such as shutdowns in August 1999 and November 2000), the daily alarm number decreased from 11,000 to 5,600. Figure 4 compares alarm numbers with and without IAMS implementation. IAMS decreased the daily alarm number during normal and abnormal

10,000

Number of alarms

8,000 6,000

With IAMS Without IAMS

4,000 2,000 0 4/15

7/31

8/4

9/4

9/5

9/6

9/18 10/12 11/8 Date

70

IEEE SOFTWARE

h t t p : / / c o m p u t e r. o r g / s o f t w a r e

11/9 11/16 11/17 11/18 11/19

operating conditions. The surge in the alarm number on 9 November corresponds to one abnormal operating condition.

P

lant operators have widely welcomed and accepted IAMS, which has eased their job. With IAMS running, the process alarm rate has greatly decreased for both normal and abnormal operating conditions. Although giving the exact cost saved is difficult, because incidents tend to be unique, random, and unpredictable, IAMS has played an important role in preventing incidents and economic loss and has had a positive impact on the plant’s safety.

ery through Nuisance Alarm Suppression and Advisory Support,” Proc. 4th IFAC Workshop On-Line Fault Detection & Supervision, Pergamon, New York, 2001, pp. 81–86. 10. Y.X. Xi et al., “Fault Diagnosability of Finite-State Automaton Models,” Proc. 3rd Asian Control Conf., CDROM, Daheng Electronic Press, Beijing, 2000, pp. 7–12.

For more information on this or any other computing topic, please visit our Digital Library at http://computer.org/publications/dlib.

About the Authors Jun Liu is a research fellow at the Institute of Chemical and Engineering Sciences. His re-

search interests include process control, alarm management, software development, supervisory control, and nonlinear control. He received his BEng, MEng, and PhD in control engineering from Southeast University in Nanjing, China. Contact him at ICES, Block 28, Ayer Rajah Crescent, Unit #02-08, Singapore 139959; [email protected].

Acknowledgments We performed this research at the National University of Singapore’s Chemical and Process Engineering Centre.

References 1. P. Andow, “Abnormal Situation Management: A Major US Programme to Improve Management of Abnormal Situations,” Proc. IEE Colloquium Best Practices in Alarm Management (Digest 1998/279), Institution of Electrical Engineers, London, 1998, pp. 4/1–4/4. 2. M. Wilson, “Alarm Management and Its Importance in Ensuring Safety,” Proc. IEE Colloquium Best Practices in Alarm Management (Digest 1998/279), Institution of Electrical Engineers, London, 1998, pp. 6/1–6/3. 3. D.C. Brown and M. O’Donnell, “Too Much of a Good Thing? Alarm Management Experience in BP Oil, Part 1: Generic Problems with DCS Alarm Systems,” Proc. IEE Colloquium Stemming the Alarm Flood (Digest 1997/136), Institution of Electrical Engineers, London, 1997, pp. 5/1–5/6. 4. M. O’Donnell and D.C. Brown, “Too Much of a Good Thing? Alarm Management Experience in BP Oil, Part 2: Implementation of Alarm Management at Grangemouth Refinery,” Proc. IEE Colloquium Stemming the Alarm Flood (Digest 1997/136), Institution of Electrical Engineers, London, 1997, pp. 5/7–5/9. 5. M.L. Bransby and J. Jenkinson, “Alarm Management in the Chemical and Power Industries: Results of a Survey for the HSE,” Proc. IEE Colloquium Best Practices in Alarm Management (Digest 1998/279), Institution of Electrical Engineers, London, 1998, pp. 5/1–5/10. 6. I. Nimmo, “Adequately Address Abnormal Situation Operations,” Chemical Eng. Progress, vol. 91, no. 9, Sept. 1995, pp. 36–45. 7. J. Liu et al., “An Intelligent Alarm Management System in a Refinery Plant,” Proc. Chemical and Process Eng. Conf. 2000 (CPEC 2000) (in conjunction with Regional Symp. Chemical Eng. 2000), CD-ROM, Chemical and Process Eng. Centre, Nat’l Univ. of Singapore, 2000, pp. 1–8 (Process Design and Development Session 7, paper 4). 8. J. Liu et al., “An Algorithm for Reducing Repeating Nuisance Alarms in a Refinery Plant,” Proc. 3rd Asian Control Conf., CD-ROM, Daheng Electronic Press, Beijing, 2000, pp. 13–16. 9. J. Liu et al., “Intelligent Alarm Management in a Refin-

Khiang Wee Lim is the executive director of the Singapore Institute of Manufacturing

Technology, on leave from the National University of Singapore’s Department of Electrical and Computer Engineering. His research interests are in supervisory control and fault diagnosis. He received his BEng in electrical engineering from the University of Malaya and his DPhil in engineering science from Oxford University. He is a senior AdCom member of the IEEE Industrial Electronics Society. Contact him at the Science and Eng. Research Council, A*STAR, 10 Science Park Rd., #01-01 The Alpha, Singapore 117684; [email protected]. Weng Khuen Ho is an associate professor in the National University of Singapore’s Department of Electrical and Computer Engineering. His research interests include process control, alarm management, and control for semiconductor manufacturing. He received his BEng (Hons) and PhD in electrical engineering from the National University of Singapore. Contact him at the Dept. of Electrical and Computer Eng., Nat’l Univ. of Singapore, Singapore 119260; [email protected].

Kay Chen Tan is an assistant professor in the National University of Singapore’s Department of Electrical and Computer Engineering. His research interests include evolutionary computation, intelligent control and design automation, system identification, and multiobjective optimization. He received his BEng and PhD in electronics and electrical engineering from the University of Glasgow. He is a member of the IEEE. Contact him at the Dept. of Electrical and Computer Eng., Nat’l Univ. of Singapore, Singapore 119260; [email protected].

Rajagopalan Srinivasan is an assistant professor in the National University of Sin-

gapore’s Department of Chemical and Environmental Engineering. His research interests include abnormal-situation management, design of safe and environmentally benign chemical processes, and business process optimization. He received his BTech from the Indian Institute of Technology, Madras, and his PhD from Purdue University, both in chemical engineering. Contact him at the Dept. of Chemical and Environmental Eng., Nat’l Univ. of Singapore, 10 Kent Ridge Crescent, Singapore 119260; [email protected].

Arthur Tay is an assistant professor in the National University of Singapore’s Department

of Electrical and Computer Engineering. His research interests include knowledge-based control, process control, and semiconductor manufacturing. He received his BEng and PhD in electrical and computer engineering from the National University of Singapore. He is a member of the IEEE and of SPIE. Contact him at the Dept. of Electrical and Computer Eng., Nat’l Univ. of Singapore, Singapore 119260; [email protected].

March/April 2003

IEEE SOFTWARE

71