universidade de lisboa instituto superior tÃ©cnico

UNIVERSIDADE DE LISBOA INSTITUTO SUPERIOR TÉCNICO

Machine Learning and Deep Learning for Prognostics and Predictive Maintenance of Aeronautical Equipment

Márcia Lourenço Baptista

Supervisor: Doctor Elsa Maria Pires Henriques Co-Supervisor: Doctor Helmut Prendinger

Thesis approved in public session to obtain the PhD Degree in Leaders for Technical Industries Jury final classification: Pass with Distinction and Honour

2018

UNIVERSIDADE DE LISBOA INSTITUTO SUPERIOR TÉCNICO Machine Learning and Deep Learning for Prognostics and Predictive Maintenance of Aeronautical Equipment

Márcia Lourenço Baptista Supervisor: Doctor Elsa Maria Pires Henriques Co-Supervisor: Doctor Helmut Prendinger

Thesis approved in public session to obtain the PhD Degree in Leaders for Technical Industries Jury final classification: Pass with Distinction and Honour

Jury Chairperson: Doctor Rogério Anacleto Cordeiro Colaço, Instituto Superior Técnico, Universidade de Lisboa Members of the Committee: Doctor Luís Manuel Braga da Costa Campos, Instituto Superior Técnico, Universidade de Lisboa Doctor Elsa Maria Pires Henriques, Instituto Superior Técnico, Universidade de Lisboa Doctor Marco Paulo Seabra dos Reis, Faculdade de Ciências e Tecnologia, Universidade de Coimbra Doctor Francisco António Chaves Saraiva de Melo, Instituto Superior Técnico, Universidade de Lisboa Doctor Kai Frank Goebel, Lulea University of Technology, Sweden Doctor Ricardo José Nunes dos Reis, Embraer Portugal S.A., Individualidade reconhecida na área em que se insere a tese

Funding Institutions Fundação para a Ciência e a Tecnologia (FCT) - MIT Portugal Program 2018

thesis on natural and exact sciences

Machine Learning and Deep Learning for Prognostics and Predictive Maintenance of Aeronautical Equipment

´ m arcia lourenç o baptista

To my parents and my sister.

CONTENTS

summary

v

´ sum ario

vi

list of publications

vii

list of conference presentations

vii

list of figures

ix

list of tables

x

preface

xi

acknowledgements

xiv

acronyms

xvii

thesis 1

2

3

4

91

introduction

1

1.1

The Science of Prognostics . . . . . . . . . . . . . . . . . . . . . . . . .

2

1.2

Prognostics and Predictive Maintenance . . . . . . . . . . . . . . . . .

3

1.3

Modeling Approaches in Prognostics . . . . . . . . . . . . . . . . . .

4

1.4

The Aircraft System . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

6

1.5

Problem and Research Question . . . . . . . . . . . . . . . . . . . . .

7

1.6

Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

9

1.7

Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

13

background

15

2.1

Prognostics and Health Management . . . . . . . . . . . . . . . . . .

16

2.2

Historical Evolution of Prognostics . . . . . . . . . . . . . . . . . . . .

17

2.3

The Relationship between Diagnostics and Prognostics . . . . . . . .

21

2.4

Basic Concepts of Prognostics . . . . . . . . . . . . . . . . . . . . . . .

27

2.5

Prognostics Approaches . . . . . . . . . . . . . . . . . . . . . . . . . .

30

case study of engine bleed valve

37

3.1

Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

37

3.2

Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

40

3.3

Classical Machine Learning . . . . . . . . . . . . . . . . . . . . . . . .

41

3.4

Kalman Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

45

3.5

Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

48

case study of a gas turbine engine

57

4.1

Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

57

4.2

Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

58

iii

contents

4.3 4.4 4.5 5

6

Classical Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . Kalman Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

cmapss 5.1 Background . . . . . . . . . . . . . . . . . . . 5.2 Data . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Results . . . . . . . . . . . . . . . . . . . . . . 5.3.1 Regime detection and baselining . . . 5.3.2 Noise Filtering . . . . . . . . . . . . . 5.3.3 Elbow Detection . . . . . . . . . . . . 5.3.4 Elbow detection and RUL Prediction

. . . . . . .

conclusion 6.1 The Importance of Case Studies to Prognostics 6.2 Technical Dimension . . . . . . . . . . . . . . . 6.3 Economic Dimension . . . . . . . . . . . . . . . 6.4 Social Dimension . . . . . . . . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

58 60 61

. . . . . . .

67 68 70 72 72 74 75 76

. . . .

77 78 78 79 80

bibliography

83

publication i

93

publication ii

109

publication iii

123

publication iv

141

publication v

157

publication vi

173

publication vii

189

publication viii

217

publication ix

235

publication x

251

iv

S U M M A RY

M

any processes in aeronautics involve valuable assets such as turbines, engines, and motors, among others. Reducing the cost of repairing or replacing capital intensive assets makes predictive maintenance essential. This dissertation explores the concept of predictive maintenance and prognostics. It applies computational data-driven techniques to model complex technical problems. Experiments on three real-world industrial cases from aeronautics reveal the importance of using advanced machine learning methods to predict fault incidence. A wide range of models from machine learning and deep learning are compared in their ability to estimate and predict important health events such as failure. These developments enable the identification of influential variables to the maintenance outcome. The identified risk factors are used in a systematic, reproducible way to gain new insights into the studied applications. In particular, past experience and health monitoring data are found useful to predict damage trajectories and incipient faults. The experimental results provide evidence of benefit from utilizing a combination of different sources of data. Results also suggest that data from the aircraft on-board systems, if well explored in a maintenance environment, could enact a significant role in the prediction of condition events. The dynamics of engineering systems complicates their analysis, so a recursive solution based on Kalman filtering is applied to deal with the temporal dependencies of predictions. Deep learning methodologies such as recurrent neural networks are also considered to compose dynamic models of degradation behavior that best capture the temporal and causal aspects of physical transformation. Promising results are obtained by the use of these latter deep learning techniques. Overall, this doctoral work highlights the importance of considering data-driven approaches while predicting the maintenance needs of complex and valuable technical systems. It also provides a comprehensive comparison of different data-driven techniques from classical machine learning to deep learning. The proposed solutions are general enough to be applicable to other industrial cases involving the prognostics of system health events.

v

´ RIO SUMA

N

´ ´ a industria aeronautica existem diversos bens de capital intensivo, como turbinas, motores e válvulas de alta criticidade. Estes equipamentos requerem regimes de manutença˜ o preditiva que permitam agendar atempadamente acçoes ˜ de manutença˜ o de modo a a evitar custos desnecessários. A reduça˜ o de custos a este n´ıvel torna a manutença˜ o preditiva essencial para a protecça˜ o e segurança de bens e serviços. A presente tese versa o tema da manutença˜ o preditiva e centra-se no uso de metodologias de machine learning para prognostico de ´ condiça˜ o. A tese foca três casos de estudo da a´ rea da manutença˜ o aeronáutica a partir de cuja análise decorre a importância de utilizar técnicas avançadas de previsão de incidentes. Um leque alargado de modelos e´ explorado de modo a estimar diferentes necessidades de manutença˜ o. Uma das vantagens destes métodos reside na sua capacidade de identificar as variáveis cr´ıticas da prática preditiva. Fazendo uso desta capacidade, os factores de risco identificados são empregues de forma sistemáticas na investigaça˜ o de diversas hipoteses. Das di´ versas experiências resulta que a experiência passada e os dados de condiça˜ o são essenciais na previsão da trajectoria de degradaça˜ o e processos de falha. Os resul´ tados experimentais obtidos demonstram quantitativamente as vantagens de combinar diferentes tipos de dados no mesmo modelo de prognostico. Em particular, ´ dados de sistema de diagnostico, como as mensagens de degradaça˜ o geradas a ´ partir do computador central de manutença˜ o do avião, demonstram ter um papel decisivo na previsão de eventos de condiça˜ o. A dinâmica subjacente aos sistemas de engenharia torna a sua análise complexa, sendo que e´ utilizada uma soluça˜ o recursiva baseada em filtros de Kalman para linearizar as prediçoes ˜ finais dos modelos. São também alvo de investigaça˜ o diversas técnicas de deep learning como as redes neuronais recorrentes. Estas técnicas visam enquadrar o prognostico en´ quanto exerc´ıcio de captura de informaça˜ o temporal e causal. Em sumário, este doutoramento pretende destacar a importância do uso de técnicas data-driven na prediça˜ o das necessidades de manutença˜ o de sistemas complexos e de elevada criticidade. A contribuiça˜ o deste trabalho reside na extensa comparaça˜ o de diferentes técnicas orientadas a dados desde a abordagem clássica de machine learning até deep learning.

vi

L I S T O F P U B L I C AT I O N S

I Baptista, M., Sankararaman, S., de Medeiros, I. P., Nascimento Jr, C., Prendinger, H., & Henriques, E. M. (2017). Forecasting Fault Events for Predictive Maintenance using Data-driven Techniques and ARMA Modeling. Computers & Industrial Engineering. in press II Baptista, M., de Medeiros, I. P., Malere, J. P., Nascimento, C., Prendinger, H., & Henriques, E. M. (2017). Comparative case study of life usage and data-driven prognostics techniques using aircraft fault messages. Computers in Industry, 86, 1-14. in press III Baptista, M., Henriques, E. M, de Medeiros, I. P., Malere, J. P., Nascimento, C., & Prendinger, H. (2018). Remaining useful life estimation in aeronautics: Combining data-driven and Kalman filtering. Reliability Engineering & System Safety, 1-14. in press

L I S T O F C O N F E R E N C E P R E S E N TAT I O N S

I Pereira, S., Baptista, M., Mizioka, L., Henriques, E. Data-Driven Quality Prognostics for Automated Riveting Processes. In IEEE Aerospace Conference, 2018 (pp. 1-12). IEEE. II Baptista, M., Nascimento Jr, C. L., Prendinger, H., & Henriques, E. A Case for the Use of Data-driven Methods in Gas Turbine Prognostics. In Annual Conference of the Prognostics and Health Management Society, 2017. (pp. 1-11). PHM Society III Baptista, M. L., de Medeiros, I. P., Malere, J. P., Nascimento, C. L., Prendinger, H., Henriques, E. (2017, March). Aircraft on-condition reliability assessment based on data-intensive analytics. In IEEE Aerospace Conference, 2017 (pp. 1-12). IEEE. IV Baptista, M., de Medeiros, I. P., Malere, J. P., Prendinger, H., Nascimento Jr, C. L., Henriques, E. A Comparison of Data-driven Techniques for Engine

vii

contents

Bleed Valve Prognostics using Aircraft-derived Fault Messages. In European Conference of the Prognostics and Health Management Society, 2016 (pp. 112). PHM Society V Baptista, M., de Medeiros, I. P., Malere, J. P., Prendinger, H., Nascimento Jr, C. L., Henriques, E. Improved Time-Based Maintenance in Aeronautics with Regressive Support Vector Machines. In Annual Conference of the Prognostics and Health Management Society, 2016 (pp. 1-12). PHM Society

viii

LIST OF FIGURES

Figure 1

Putting prognostics in the loop . . . . . . . . . . . . . . . . . .

5

Figure 2

Taxonomy of prognostics approaches. . . . . . . . . . . . . . .

5

Figure 3

Benefits of prognostics for different stakeholders. . . . . . . .

8

Figure 4

Bathtub curve. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

18

Figure 5

Age reliability patterns. . . . . . . . . . . . . . . . . . . . . . .

19

Figure 6

Maintenance strategies of RCM. . . . . . . . . . . . . . . . . .

20

Figure 7

RCM Decision Logic Tree.

21

Figure 8

Difference between prognostics and diagnostics.

Figure 9

Basic steps of prognostics and diagnostics.

. . . . . . . . . . . . . . . . . . . . . . . . . . .

23

. . . . . . . . . .

26

Figure 10

The concepts of fault and failure. . . . . . . . . . . . . . . . . .

28

Figure 11

Representation of the prognostics process. . . . . . . . . . . .

29

Figure 12

Model-based prognostics architecture.

. . . . . . . . . . . . .

31

Figure 13

Data-driven prognostics architecture. . . . . . . . . . . . . . .

34

Figure 14

Bleed system schematic.

. . . . . . . . . . . . . . . . . . . . .

39

Figure 15

Pictures of engine bleed valves. . . . . . . . . . . . . . . . . . .

40

Figure 16

Histogram of unscheduled removals of engine bleed valve. .

41

Figure 17

Mean absolute error for Case-EBV. . . . . . . . . . . . . . . . .

44

Figure 18

Mean Absolute Error for Case-EBV with and without KF. . .

46

Figure 22

Median Absolute Deviation for Case-EBV with and without KF. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

48

Predictions vs Actuals for sample removals (Models of type III). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

52

Predictions vs Actuals for sample removals (Models of type III) (continued). . . . . . . . . . . . . . . . . . . . . . . . . . . .

53

Predictions vs Actuals for sample removals (Models of type III) (KF vs DL). . . . . . . . . . . . . . . . . . . . . . . . . . . .

54

Predictions vs Actuals for sample removals (Models of type III) (KF vs DL) (continued). . . . . . . . . . . . . . . . . . . . .

55

Mean Absolute Error (MAE) over time for the models of type IV. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

56

Median absolute error over time of machine learning methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

59

Mean absolute error over time of machine learning methods.

60

Figure 23 Figure 24 Figure 25 Figure 26 Figure 27 Figure 28 Figure 29

ix

Figure 30 Figure 31 Figure 32 Figure 33 Figure 34 Figure 35

Simplified schematics of CMAPSS simulated engine. . Regime detection and baselining. . . . . . . . . . . . . . Results of denoising using Wavelet neural networks. . . Example of RUL Estimation for a certain removal. . . . Sample Removal # 542 (Deep Learning). . . . . . . . . . Sample Removal # 444 (Machine Learning and Kalman tering). . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . Fil. . .

71 73 75 76 79

. . . . . . . . .

25 43 45 50 62 64 65 71 72

80

L I S T O F TA B L E S

Table 1 Table 2 Table 3 Table 4 Table 5 Table 6 Table 7 Table 8 Table 9

Definitions of prognostics and diagnostics. . . . . . . Quantitative results for Case-EBV. . . . . . . . . . . . . Quantitative results for Case-EBV (No Filter / Filter). Quantitative results for Case-EBV. . . . . . . . . . . . . Quantitative results for Case-EBV. . . . . . . . . . . . . Quantitative results for Case-EBV. . . . . . . . . . . . . Quantitative results for Case-EBV. . . . . . . . . . . . . 21 measurement variables of CMAPSS datasets. . . . . Description of the CMAPSS datasets. . . . . . . . . . .

x

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

P R E FA C E

T

his dissertation contains selected results from my doctoral studies. The main focus of these studies has been on applying data-driven methods to predict maintenance necessities in aeronautics. I started these studies four years ago as a PhD candidate in the MIT Portugal Program at Instituto Superior Técnico. This thesis is the report of this process. This journey started with the work of a previous collegian of mine, Jean-Loup Loyer. His thesis on the use of empirical methods to model the life-cycle costs of jet engines grabbed my attention ever since Professor Relogio Ribeiro first men´ tioned it. Until then I was mostly interested in predicting human factors and had never thought about the possibility of applying predictive techniques to engineering systems. The work of Jean-Loup raised interest at Embraer Brazil and motivated the proposal of a similar problem related to data-driven approaches in predictive maintenance. Here, I must first thank Embraer Brazil and Professor Elsa Henriques for believing in my abilities to handle the project despite my initial limited knowledge on the topic of aeronautics and prognostics.

In the beginning stages of this project a collaboration with Embraer Brazil was formally established to develop new approaches to maintenance. Concretely, I was challenged with the problem of developing data-driven models to predict the replacements and repairs of a critical engine valve of the aircraft bleed system. During the first two years of the project, I traveled to Brazil twice where I spent in total 6 months at Instituto Tecnologico de Aeronáutica in São José dos Cam´ pos in the laboratory of Intelligent Machines headed by Professor Cairo Lucio ´ Nascimento Junior. This experience was quite rewarding. As a result of this internship I developed a broader knowledge of the industrial reality of Embraer and also gained a deeper contact with the field of prognostics. Some of my initial data-driven prognostics models were developed during this period and several insights were gained into the field of predictive maintenance. The collaboration developed at this stage developed into two papers on conferences sponsored by the Prognostics and Health Management Society (Publications I and II), and one article on Computers in Industry (Publication III). Upon my return to Portugal the work on data-driven models for predictive maintenance continued. Despite the positive initial results in scientific terms, I felt

xi

preface

that the expectations in regards to industrial application were not yet met. The obtained accuracy was significantly better than that of the classical experiencebased models of Weibull analysis, but the models were not able to derive interconnected temporal predictions. The prediction error of the models was below the high standards of the aeronautics industry. At this time I was facing the major challenge of increasing both the accuracy and the precision of the prognostics models. After months of trial and error, it became clear that the traditional datadriven techniques, based on classical machine learning were probably not able to capture the exact degradation behavior observed in the data. After a year of fine-tuning and improvements, the developed models started to evolve into more advanced forms of prognostics. An article in Computers and Industrial Engineering was published documenting part of this progress (Publication V). Feature engineering contributed largely to the performance improvements; having more sophisticated predictor variables greatly helped capture the damage progression. The combination of different sources of data was also found to be able to yield very positive results both in terms of precision and accuracy. The results of this combined approach to study prognostics were presented at the Aerospace Conference (Publication IV). In addition to this improvement, a recursive solution based on Kalman filtering was applied to achieve time-dependent predictions. Kalman filtering allowed to perform a better inference of the dynamics of the valve in study. The results of this dynamic approach to the data-driven prognostics were presented in (Publication VI) and were another important contribution of this doctoral work. In 2017 I had the opportunity to travel to Japan where I started working under the supervision of Professor Helmut Prendinger in the Digital Content and Media Sciences Research Division of National Institute of Informatics. During a period of 6 months I was fortunate to be in a dynamic and exciting environment where I was exposed to learning approaches that were very different from what I had previously been working on. Importantly, I met a team of deep learning enthusiasts that helped me a lot to understand the possible benefits and also the limitations of these novel methodologies. After six months of intensive work, industrially relevant results started to appear. The use of deep learning techniques such as long-short term memory networks were showing promising and exciting potential. Two papers became direct result of this work (Publications VII and VIII). While in Portugal and following the positive results of the previous case study I started working in another real-world maintenance scenario from aeronautics. Here, the goal was not to predict replacement and repair but instead the overhaul of a modern commercial engine. As the data-driven framework developed in

xii

preface

Japan was general enough to be applied to different data domains, it was easy to adapt it to the reality of overhaul prediction. From this work resulted the publication of another paper on the 2017 Prognostics and Health Management Society’s annual conference (Publication IX). Initially, I expected to have less positive results on this study than in the former case. However, the results were very positive. The higher degree of complexity of the system was in a certain way compensated by the high quality of the available data and by the useful predictors that could be extracted from it. On the last year of my thesis I was invited to visit NASA prognostics group in California, USA. This was an enriching experience as I got to meet new people working in the field and thus discuss research. During the presentation of my work, interest piqued when the deep learning results were presented. On the many discussions I had during my stay, it became clear to me that other recursive techniques such as the unscented Kalman filter could be a better solution for the first case study. Further research on deep learning methods was also encouraged. The last three months of my thesis were spent at NASA researching deep learning methods under the supervision of Dr. Shankar Sankararaman and Dr. Kai Goebel. The preliminary results of these studies are also enclosed in this document (Publication X). The research of the past three years has been quite rewarding. I am a believer that research and the industry should be close to each other. The opportunity to work in an applied project was a fit for me both personally and academically. It allowed me to work with so many smart people in a range of different and complex problems. Also, I could travel to so many interesting places with such diverse cultures and learning possibilities. I can not thank too much my country Portugal and Fundaça˜ o para a Ciência e Tecnologia (FCT) for sponsoring this life changing journey.

I hope you enjoy your reading, Márcia Baptista Lisbon, December 31, 2017

xiii

ACKNOWLEDGEMENTS

M

y curiosity towards the world of computer science started when I accidentally broke my sister’s computer by leaving a floppy disk inside of it. A first thank you must go to my parents and sister for their understanding of this incident. The unconditional support of my parents and sister for my studies and interests was essential to this thesis. I am thankful for their help and guidance in everything I have accomplished.

No dissertation can be complete without a lot of both technical and emotional support. I was fortunate to have met some of the most intelligent and caring people during this thesis. Diverse perspectives, cultures and wisdom got translated into this doctoral studies. Perhaps the most important contributor to this work is Professor Elsa Maria Pires Henriques. Professor Elsa Henriques has an uncanny ability to immediately understand how to solve analytical problems, and an unique ability to quickly adapt to new industrial realities. She can connect to people on several levels with her reasonableness, integrity, and fairness. She is very special and her research and teaching abilities expresses her creative mind, warm heart and feminine strength to encourage and support students and colleagues. She accepted the role of being my main supervisor and never wavered in her support of my work at any of the numerous bumpy points along this research path. No student could ask for a more talented and supportive supervisor. Also, I will be forever in her debt for her true friendship. I owe more of what I am to her than I realized or had a chance to acknowledge. My gratitude also belongs to my supervisor in Japan, Professor Helmut Prendinger, for his efforts in ensuring that I was always on the right path. Words can not express how much thankful I am for his guidance and support during my thesis and previous research. I can honestly say that I partly owe this doctoral thesis to him as he was relentless in convincing me to continue with my doctoral studies. Also, his Austrian accent and his trademark sentence “Oh really?” kept me at balance through many situations. I would also like to give credit to my supervisor in Brazil, Professor Cairo Lucio ´ Nascimento Junior, whose knowledge in many fields contributed to improve my problem-solving skills. His enthusiasm and energy were crucial on the first steps of this journey. A personal thank you goes to his patience and encouragement

xiv

preface

whenever my enthusiasm was waning. Practical suggestions and also added a healthy dose of motivational vigor and energy greatly helped me face my initial fears and difficulties. I am also grateful to Dr. Shankar Sankararaman for offering me the opportunity to visit NASA during one week in California and for stepping up to help me in the last months of my thesis. I was received with such a warming welcome by all the members of the team of prognostics that I do not know how to thank them enough. A special thank you also goes to Dr. Kai Goebel. His invaluable advice and support were also important. I would also like to acknowledge Embraer and the many people who have made it possible for this collaboration. First, to Dr. Ricardo Reis from Embraer Portugal for organizing the first contacts with Embraer Brazil and for helping establishing this important bond. While in Brazil I was received with great affability and warm smiles by all the members of the prognostics team at Embraer. I have to note the patience and openness of all staff to help and explain all the industrial and technology concepts involved in the industrial problem. A special thank you goes to my industrial tutors at Embraer Brazil, Ivo Paixão de Medeiros and João Pedro Pinheiro Malere, for their thoughtful guidance and continuous availability throughout this project. Their guidance counseling has been instrumental in my work. I can still remember the long hours that I and Professor Elsa Henriques spent discussing several technical issues with them over teleconference line. Our monthly meetings were integral in developing ideas, providing feedback and advancing the research. While at Brazil I was fortunate to have worked in a laboratory that strives for academic excellence. It was a pleasure being part of the restricted group of students of Instituto Tecnologico de Aeronáutica. Without them my moments at Brazil would ´ have been much more dull and probably less successful. Many thanks to Felipe César and José Brito. It has also been a pleasure working together with all the members of Prendinger’s lab, especially my friend Mohammadamin Barekatain. Not less importantly, I must thank my Professors, secretarial staff and colleagues in Portugal. I would like to thank my colleagues from MIT Portugal with whom I shared classes during my first year of doctoral studies. My gratitude goes to João Reis, João Manaia and V´ıtor Anes for being the closest of friends during this first year. From Instituto de Engenharia de Mecânica (IST) thank you to Professor Inês Esteves Ribeiro and Professor Paulo Peças. A special acknowledgment must go to Professor Antonio Relogio Ribeiro for being one of the most caring and helpful ´ ´

xv

preface

Professors of IST. Also, many thanks to Paula Taborda for helping me arrange my trips and for her cheerful support. A very special thank you goes to Sara Pereira for making this journey more enjoyable and introducing me to the world of data-driven manufacturing. Financial support from the Fundaça˜ o para a Ciência e Tecnologia (scholarship reference SFRH /BD//52347/2013), Instituto de Engenharia Mecânica (IDMEC) (Polo ´ IST) as well as the National Institute of Informatics is appreciated.

xvi

ACRONYMS

PHM SVR NN KNN PCA RF LR LU RNN ML DL AI DD

Prognostics and Health Management Support Vector Regression Neural Networks K-Nearest Neighbors Principal Component Analysis Random Forests Linear Regression Life Usage Recurrent Neural Network Machine Learning Deep Learning Artificial Intelligence Data-driven

xvii

THESIS

1 INTRODUCTION

1.1

The Science of Prognostics . . . . . . . . . . . . . . . . . . . . . . . . .

2

1.2

Prognostics and Predictive Maintenance . . . . . . . . . . . . . . . . .

3

1.3

Modeling Approaches in Prognostics . . . . . . . . . . . . . . . . . .

4

1.4

The Aircraft System . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

6

1.5

Problem and Research Question . . . . . . . . . . . . . . . . . . . . .

7

1.6

Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

9

1.7

Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

13

W

hat if we could improve our ability to predict the future? Every time we make a grocery list, choose a route to work or talk about the weather, we are making a prediction about the future. These predictions define almost everything we do, from future plans to daily life. The science of prediction is at the center of society, but even the most simple systems can pose the most difficult forecasting problems - for example, calculating the physics of a bouncing ball is a non-trivial task. While it is possible to predict the ball’s direction and velocity, these predictions are probabilistic in nature and are still beset with a great deal of uncertainty. Deterministic systems are so sensitive to tiny variations in the initial conditions that their observed evolution can often appear random. Even small effects can and often do have large effects on the behavior of systems, not only in physics but also in other fields. If prediction permeates everyday life, science, with its preoccupation with everything that is empirically observable, is particularly amenable to prediction. The simple act of advancing a hypothesis about the natural world is itself a prediction. Any scientific hypothesis will deduce/induce predictions, by reasoning [27]. Prediction is therefore at the center of concern in scientific domains. An emerging and particularly important scientific domain is engineering. Here, the subject of prediction is engineered systems as a combination of components that work in

1

introduction

synergy to collectively perform a useful function. Examples of engineered systems include machinery, infrastructure, and electronic equipment. The correct operation of these systems tends to be critical to the sustainability of many organisations and industries. Note that prediction is related to the term engineered system and not engineering system. The latter term is used to designate complex systems composed of many parts whose management involves social, economical and technical dimensions [16]. Although prediction in engineering is deeply rooted in the field of engineering systems, it can also be applied to any engineered system, even simple ones, such as a simple mechanical valve. This chapter introduces the topic of this thesis: the science of prognostics and predictive maintenance in engineering. It starts by explaining the science of prognostics in Section 1.1. In Section 1.2 the importance of prognostics to predictive maintenance is discussed. This chapter also provides an overview of the major modeling techniques in prognostics in Section 1.3. Section 1.4 concerns aeronautics and aircraft systems, the field of application of this work. Section 1.5 describes the problem and defines the research questions. Section 1.6 describes summarily the methods used and the conclusions of each part of the thesis. In Section 1.7 comes the thesis outline.

1.1

the science of prognostics

In engineering, there is a specific terminology for the development of prediction models. The act of making predictions about engineered systems is called prognostics [25]. Prognostics comes from the greek feminine noun “gnosis” which means “knowledge”. “Gnosis” itself is a special type of knowledge: it usually refers to a direct communion with the divine. This association is understandable. The ability to predict the future has always fascinated the human imagination. We are beings intrigued by the things that will or can happen - it is part of our spirit. Pro-gnosis, the science of “knowing ahead”, is important to understand the physical world around us. The goal here is to have systems that are fully automated and reliable, that can perform their operating functions according to design specifications. Ultimately, the idea is to have machines capable of learning from their mistakes, all the while optimizing their operations and maintenance as autonomous asset units. These autonomous systems are expected to have far-reaching consequences for many industry sectors and represent an important opportunity for companies to increase their efficiency and reduce costs. Prognostics has a considerable wide range of drivers from safety of operations, high performance, to mission completion, among others. Prognostics and pre-

2

1.2 prognostics and predictive maintenance

dictive models are central to engineering, particularly to maintenance decisionmaking. It is important to be able to predict equipment behavior, namely events related to the condition of the equipment, such as failure. In the past, estimates of events of condition were typically at the discretion of professional engineers based on individual experience and expertise. Estimates of event conditions may also be derived via statistics from a large population. As the complexity of the available data increases, the science of prognostics can assist equipment owners and manufacturers in making accurate predictions, based on past and present data. Nowadays, the use of these models is likely to increase, as equipment operators and manufacturers seek to improve their decision processes to achieve better performance and reduce overall costs.

1.2

prognostics and predictive maintenance

A field where prognostics plays an important role is maintenance. It is critical to maintain systems in their best shape and avoid downtime that can cause largescale disruption as well as social and financial consequences. And as modern technologies allow for increasingly complex systems, maintenance techniques and tools need necessarily to move towards prescriptive analytics. Indeed, over the past few years, the need for predictive maintenance has been reinforced by several findings [38, 39, 51, 52] demonstrating that prognostication can reduce the number and impact of failures, extend the time between maintenance intervals, maximize operational performance, and reduce overall life-cycle costs. The use of prognostics in maintenance is described as predictive maintenance. Predictive maintenance, also known as condition based maintenance, is critical to minimize system failure and to increase reliability and availability. The main promise of this type of maintenance is the optimal scheduling of corrective maintenance, and the prevention of unexpected equipment failure or malfunction. The goal of predictive maintenance is to make engineered systems more reliable and predictable by the continuous monitoring of performance in real-time and by estimation of future maintenance needs. Predictive maintenance works by fitting a network of sensors directly into machines or engineering structures to measure condition signals such as load on motors, oil quality, vibration readings and so on. Technologies such as thermal imaging, sensor technologies and sonic and ultrasonic analysis allow to detect signs of damage or degradation and to plan repair and replacements ahead of time before actual failure or other problems occurs. All of these signals are automatically measured by a prognostics model that is responsible for generating estimates of the next time to failure/inspection.

3

introduction

To better understand the science of prognostics in maintenance it is important to consider its fundamental assumptions [88]: • The connections between the condition of engineered systems, aging and longevity are manifold and various; • Engineered system are not only affected by the passage of time but also by usage factors, and by environmental conditions; • System aging and degradation is a monotonic and possibly non-linear process that manifests itself in several ways (there are some exceptions to the monotonic assumption, for example batteries, semiconductors, etc.); • There are early signs of aging that can be detected before the end of life of the system (i.e before loss of function) • It is possible to model the wear process of the equipment; • The performance of prognostics improves as more data become available. In other words, the uncertainty of prediction can be reduced as time goes by and more information is available. Without these assumptions, the discipline of predictive maintenance could not exist. These notions take a pivotal role in maintenance, as without them the behavior of engineered systems would be in many respects chaotic and unsafe. Fundamentally, it is important to acknowledge that there is not always a direct connection between aging and the condition of systems but that it is possible to model the process of degradation and estimate its outcome. If these assumptions are met, it is possible to perform prognostics. Fig. 1 illustrates the relationship how to put prognostics within the maintenance loop. The figure shows the flow of information between prognostics and decision planning. The flow starts with the engineered system sending information to the prognostics model. In between some kind of data pre-processing may occur. The prognostics module is then responsible for generating predictions that are used in maintenance planning, either by humans or by humans aided by computer machines.

1.3

modeling approaches in prognostics

The close relationship between maintenance and prognostics has been studied by many researchers and has resulted in the development of several models. Generally, prognostics can be realized using two methods, namely, model-based and data-driven methods (see Fig. 2). The distinction between these two modeling approaches is not always clear-cut. The widespread assumption is that model-based

4

1.3 modeling approaches in prognostics

Figure 1 – Putting prognostics in the loop. Prognostics is integral to predictive maintenance. In this context, the main function of prognostics is to generate predictions from conditionmonitoring data to help decide when to perform the next maintenance action. Some predictive maintenance loops include a pre-processing stage that converts the condition monitoring data to knowledge representations (features).

Figure 2 – Taxonomy of prognostics approaches. There are fundamentally two approaches to prognostics: model-based and data-driven modeling. These are competing approaches, but in practice can be combined, which is often known as hybrid prognostics.

approaches refer to models derived from first-principles (e.g. physics based) and data-driven approaches refer to models learned from data. Despite the ubiquity of these definitions, these notions are ambiguous as both the data-driven and the model-based approaches use models of some kind, and both are driven by some kind of data. In practice, all prognostics models are in fact developed from a combination of system knowledge and system data. Actually, despite being competing approaches, it is possible to combine data-driven and model-based approaches, which is known as hybrid prognostics [53]. The key to understand the difference between data-driven and model-based methodologies lies in the meaning of data. In model-based approaches, data by itself have no specific meaning and is only provided with meaning when put into context by a well-formed theoretical model. Here, data are used to design the initial theory or model that slowly and conditionally adapts to new data. The model is the central aspect, being used to predict when the underlying theory should be altered. If the data happen to not fit the model, instead of questioning the model, one may go, at least initially, into questioning the data. The importance that data-driven approaches gives to data is the opposite: the data-driven approach finds value in collecting as much data as possible in order to build a model grounded on empirical evidence. This view puts data at the center of attention, as the first responsible

5

introduction

for model development and improvement. The understanding is that data in and by itself can support the theory/model and that interpretation improves mostly with data. In data-driven approaches, there is no such strong notion of theory as the model is grounded mostly on data. A large number of studies have been devoted to the prognostics of system damage and failure. Some of the most important reviews in the field include the works in [31, 38, 39, 45, 76, 77, 92]. There has been intensive research both in the field of model-based and data-driven methods while hybrid systems are still a maturing field. This dissertation is concerned with data-driven methods for engineered systems. The choice for the use of data-driven techniques can be explained by three main reasons. First, because engineered systems are often characterized by a complex and dynamic structure, the development of model-based methods for the predictive maintenance of these systems can represent an arduous task. Second, complex and critical systems are usually equipped with sensor devices that record and collect large volumes of Condition Monitoring (CM) data. Given the growing availability of data, it makes sense to use a methodology focused on empirically driven models. Third, data-driven techniques have seen a great deal of improvement on the last couple of years. The breakthroughs in machine learning and artificial intelligence, such as deep learning and reinforcement learning have significantly raised the bar in terms of the accuracy of several pattern recognition tasks, such as object detection or activity recognition [49]. Specifically, in prediction several advances have been made. For instance, it is now possible to predict traffic flow using approaches driven by big data [55]. In biology, deep learning has helped predict the structure of proteins [83]. Prediction driven by data is expected to be a promising endeavor. Supported by these reasons, data-driven modeling techniques take a central role in this work.

1.4

the aircraft system

There are few engineered systems whose design and implementation challenge the field of prognostics like aircraft systems. The commercial aircraft, with its several interacting subsystems, such as the propulsion, avionics, bleed, and flight control subsystems is a complex system. As a complex engineered system, the aircraft poses a challenging problem [16]. The underlying technical problem is difficult as the aircraft is a system composed of several parts, whose degradation and faults can affect other components in the aircraft. Multiple fault symptoms may be generated at the same time, leading to the system sensors reporting anomalous or faulty behaviors. Combining this information in a way that leads to accurate and long-term

6

1.5 problem and research question

fault prognostics is a hard task. The social and economic aspects of aircraft systems are also encompassing. For instance, commercial aviation is a fast growing sector that is associated with a number of social and economic interactions and expected benefits and a wide range of environmental consequences. These tensions makes the aircraft system a challenging problem from a socio-economical perspective.

1.5

problem and research question

To improve asset performance at reduced cost, strategies like predictive maintenance have evolved with the passage of time. In recent years, a considerable number of data-driven prognostics models have been proposed [76, 77, 87]. In aeronautics, research in data-driven prognostics is also extensive [31, 66]. However, prognostics has not yet attained a high technology readiness level [75]. Building accurate and computationally efficient prognostics is a challenging task [19, 31, 32, 84]. The question that this thesis aims to address concerns the improvement of prognostics performance in challenging industrial problems. In concrete terms, the specific goal is to increase both the accuracy and the precision of data-driven prediction methods in complex maintenance problems of engineered systems: RQ1: How to improve the performance of prognostics in complex maintenance problems in aeronautics? To answer this question hypothesis H1 is proposed: Adopting a data-driven approach to prognostics has a direct and positive impact on the maintenance of complex systems. This hypothesis drives the entire research project. Focus is put on exploring both classical and more advanced data-driven methods in several use cases. The obtained scientific and industrial results will show a clear improvement on the accuracy and precision of the predictions made. With this work the aim is to make a contribution not only to aeronautics but also to the general field of prognostics. This doctoral thesis is framed within the doctoral program of Engineering Design and Advanced Manufacturing (EDAM)/ Leaders for Technical Industries (LTI) supported by the MIT Portugal program. This program is focused on the development of scientific research and the development of the socio economic context of Portugal. Within this context, the present work deals with problems that have both a social, economic and technical perspective. To address these perspectives, three case studies are presented and used to explore the different methodologies proposed in this thesis. Two of these cases are real-world industrial problems. The Turbofan dataset CMAPSS from NASA repository is also used to test the models.

7

introduction

Figure 3 – Benefits of prognostics for different stakeholders. Prognostics has a role not only in maintenance but also in other social, economic and technical aspects.

The CMAPSS dataset [74] has been extensively studied [66] and is a reference for the evaluation of data-driven prognostics algorithms. The two first cases are important to investigate the industrial relevance of the work and to test the applicability of the results to real-world problems. The use of a benchmark dataset is important to validate the technical results. A brief description of each case study follows: • Case-EBV: this real-world case study involves field data describing the unscheduled replacement and repairs of a critical component of the air management system of the aircraft. • Case-Engine: this case study describes the analysis of real-world dataset data of scheduled overhauls of a modern aircraft engine. • Case-CMAPSS: this case explores the benchmark dataset CMAPSS from NASA repository (Turbofan failures) The method and the results of the three case studies have a range of implications, as shown in Fig. 3. The figure shows the relevance of prognostics to different areas. Concretely, there are five key dimensions: maintenance, engineering, customer satisfaction and logistics. In regards to the dimension of maintenance itself, prognostics can lead to a better scheduling of maintenance actions as well as an increase system uptime (or at least an increase on the system available time) and a reduction of the number of unnecessary interventions. These effects influence not only maintenance but also other relevant areas. They have an impact on the engineering process as the increased availability and reliability of the system can help validate the design of the product. Also, it is possible to gain insight into

8

1.6 contribution

the design requirements of new products by using prognostics algorithms, such as random forests or principal component analysis, to identify the underlying factors behind the process of damage and failure. Better maintenance processes also affect the customers of the final system. To meet customer expectations it is necessary to continuously monitor and improve processes to enhance product reliability. For instance, in aeronautics, situations of airplane on ground (AOG), i.e., situations where the aircraft can not fly due to the non-availability of a critical component can be highly detrimental to the image of the airline operating the aircraft. Also, an AOG can cost an airline hundreds of thousands of dollars in lost revenue, per aircraft, per day. For the dimension of logistics, the better prediction of unexpected actions leads to a reduced inventory of spare parts and to a smaller carbon footprint on the environment. Given the importance of the described dimensions, these perspectives are considered and discussed throughout this thesis.

1.6

contribution

As the availability of sensor data increases and technology gets more sophisticated, several data-driven methods are emerging, opening space for new developments in predictive maintenance. However, most of the studies in the field explore only one technique and do not examine the similarities and differences of the estimators [26]. This lack of comparative works might be explained by the difficulties in exceling in a wide range of data-driven methodologies. Machine learning techniques are different and each requires a specific mindset. Each technique approaches problems using a different approach and to understand them it is necessary to invest time understanding its unique mechanics. Also, some authors that explore different techniques opt in favor of ensemble methods rather than on the comparative research method [35, 64]. Nevertheless, comparative studies are needed to identify the strengths and limitations of each method. Also, the comparison of different methods is important to elucidate the most effective methods for the application in question. This doctoral work addresses this issue by carrrying out a comprehensive analysis of data-driven methods. Throughout this thesis, regardless of the case study and research question, the comparison of different techniques is always an implicit goal. As a baseline, this dissertation focuses on classical machine learning. In particular focus is on how data-driven methods based on classical machine learning methods can be used to improve predictive maintenance. The contribution of the classical machine learning to the field of maintenance has been shown in numerous works including [35, 64, 82, 94] to name a few. Methods em-

9

introduction

ployed include neural networks [30], random forests [6], support vector regression [18, 91], and instance-based methods [2]. The main results of this effort suggest that classical machine learning is well suited for prognostics. Concretely it was found that: • The data-driven approach can outperform traditional scheduling methods in maintenance. – Run-to-failure data can be better captured by classical machine learning approaches than traditional reliability methods (Weibull analysis [1]). The same is true for condition monitoring data and data from the diagnostics systems on-board the aircraft; – The combination of different sources of data yields better prognostics than the singular use of each one; – Different techniques show different accuracy and precision according to the case at study. The studied data-driven methods produced good results. However, while the initial results in the first case (Case-EBV) with classical data-driven methods were promising in term of scientific results they were not fully satisfactory from an industrial point of view. The target goal of 10 days (mean absolute error) was not achieved in the first experiments. The second and third cases (Case-Engine and Case-CMAPSS) yielded more reasonable outcomes at this stage. To address the issue of the performance of the classical machine learning methods a recursive technique based on linear Kalman filtering was selected[57]. In all case studies the level of noise in the predictions was considerable. This followed from the way classical machine learning operates when dealing with temporal data. In traditional methods such as random forests, support vector regression or neural networks there is no notion of temporal dependence between the predictions: the equipment is not modeled as a dynamic system. Instead, each prediction is derived with no previous knowledge of the previous calculations. In other words, predictions are point estimates disconnected in time. Recursive solutions are desirable for online state filtering problems as they can improve performance by denoising the end estimates of the data-driven models. In particular, the Kalman filter is a simple and linear solution that has already been applied with success in prognostics [4, 64]. In this work this solution is used to address the noise of the predictions in order to increase the performance of classical data-driven methods. The main results of these studies suggest that combining Kalman filtering with classical machine learning yields positive results. Secondary findings include:

10

1.6 contribution

• The use of linear Kalman filtering can help improve the performance of datadriven prognostics methods. – Kalman filtering improved the precision of predictions over time; – Models, regardless of source of data (condition-monitoring, data from diagnostics systems or a combination of the two) can benefit from Kalman filtering. The model that resulted from the combination of all data sources yielded the best results. The first finding is especially important in industrial terms: the ability to provide a clear vision of the predictions is of fundamental importance. The second finding meets the expectation that different techniques have distinct ways of dealing with data. The third finding validates the use of different input data sources in the same model. Despite the promise of classical machine learning, these methods need to be further improved to better cope with the scale, diversity and the distributed nature of the condition monitoring data [41, 78]. The volume and temporal nature of these data is already showing the limitations of traditional approaches to prognostics [58]. Some of the major challenges include dealing with distributed requirements, heterogeneous sources of data, and learning from high resolution and temporal data [41, 78]. The evolution of existing aviation decision-support systems for the effective handling of big data in maintenance operations is expected as the engineers that monitor snapshot data from aircraft necessarily need more advanced statistical tools to reliably analyze full flight data. The Kalman filtering framework presented positive results in these terms. However, this recursive solution has three major disadvantages. First, the initial conditions to start the recursive process are not defined automatically. The filter is designed in such a way that its initial configurations are often determined by a trial-error approach. Kalman filter performance is thus constrained by the expertise of the model designer, which greatly impacts the prognostics. Also, Kalman filtering is a recursive technique which means that it is assumed that parameter variations have a stationary distribution. However, and especially for degradation behavior, health monitoring parameter changes can be highly non-stationary i.e. there are periods where the parameters change frequently and other periods in which the parameters are constant. To deal with the previous issues it is important to have a variable estimator memory. A kind of data-driven technique that addresses this topic is recurrent neural networks. These networks work by keeping a continuous, latent state that persists. This latent state is advantageous to exploit the information provided by the

11

introduction

preceding processing steps. This forms a kind of a short and long-term memory that helps to make sense of a single estimate in a time series of predictions. Put differently, the memory of the recurrent neural network provides a context for the calculations. The promising results obtained from applying recurrent neural networks to prognostics demonstrated the power of these models to capture the meaningful temporal information found in sensor data. The recurrent methods explored here include the long-short term memory (LSTM) [34], gated recurrent unit (GRU) [11, 12] and the simple recurrent neural network [33]. The main results of these studies suggest that deep learning is a promising alternative to classical machine learning. Secondary results include: • Recurrent neural networks (deep learning) can outperform classical datadriven methods. – Recurrent neural networks with more layers (deep learning) perform better than recurrent neural networks with a single layer; – The best performing recurrent neural network architecture (simple recurrent network, LSTM or GRU) depends on the industrial case; Given the positive results with the recurrent neural networks, the final part of this thesis focus on the development of additional deep learning methods for prognostics. A wide range of methods are used and studied with application to the different stages of prognostics development. Here the main findings are that neural networks and deep learning are a powerful solution for most of the prognostics problems. Secondary findings include: • Deep learning methods can be applied with success to the several stage problems of prognostics – Self-organizing maps are useful for regime clustering. – Multi-layer perceptron networks can successfully perform data baselining. – Wavelet neural networks, radial-basis functions and supervised multilayer perceptron can successfully denoise health monitoring signals. – Recurrent neural networks are useful for detection of change from linear to exponential degradation. To sum up, the results of this dissertation represent broad range of research on different aspects of prognostics and predictive maintenance, with substantial con-

12

1.7 outline

tributions being made to improve the existing methods as well as novel techniques. The main contributions of this work are the following: • Comprehensive comparison of classical data-driven methods for prediction in maintenance • Proposal of two data-driven models to forecast maintenance events • Comparison of classical data-driven methods with traditional Weibull analysis • Proposal and comparison of models using different data sources namely: run-to-failure data, condition monitoring data and data from on-board diagnostics systems of an aircraft • Development of a recursive solution to denoise and filter the predictions of data-driven models • Proposal of a recurrent neural network architecture to predict maintenance needs. • Proposal of several deep learning solutions for the different stages of prognostics. These contributions aim to fill a number of gaps in prognostics and in predictive maintenance, by addressing a broad range of industrial topics, many of which are still yet to be fully developed.

1.7

outline

In the remainder of this manuscript, first described is the background of this thesis in Chapter 2. The following chapters focus on each of the three case studies. The results of these chapters may differ from the published results as improvements have been made since the publication of the papers. Nevertheless, the reader is invited to consult the appended publications to gain more insight into the developed work. Chapter 6 discusses and concludes the thesis.

13

2 BACKGROUND

2.1

Prognostics and Health Management . . . . . . . . . . . . . . . . . .

16

2.2

Historical Evolution of Prognostics . . . . . . . . . . . . . . . . . . . .

17

2.3

The Relationship between Diagnostics and Prognostics . . . . . . . .

21

2.4

Basic Concepts of Prognostics . . . . . . . . . . . . . . . . . . . . . . .

27

2.5

Prognostics Approaches . . . . . . . . . . . . . . . . . . . . . . . . . .

30

T

o support the execution of predictive operations, maintenance relies on the disciplines of diagnostics and prognostics. Diagnostics relates to methods for detecting and/or measuring faults or failures in engineered systems. Prognostics on the other hand deals with the prediction of health condition and remaining useful life based on the present state and previous history as well as on anticipated future load. The main difference between diagnostics and prognostics stands in the schedule, as diagnostics happens after fault/failure while prognostics is a prior activity. Although sharing the same principles, diagnostics focuses on identifying the nature or cause of damage and failure while prognostics involves a prediction about how failure will evolve. Diagnostics and prognostics are integrated in the framework of Prognostics and Health Management (PHM), a set of methodologies and tools that enables the assessment and prediction of the reliability of systems. PHM comprises monitoring technologies, data collection methods, and advanced statistical analysis of environmental and operational parameters.

As the value and the complexity of engineered systems increases, performing PHM efficiently and effectively is becoming a complex challenge. Within PHM, prognostics is one of the most difficult problems to deal with. In this chapter a review of the main contributions to the field is provided. The chapter starts with a brief description of how prognostics integrates within the framework of PHM in Section 2.1. A brief historical review of prognostics is provided in Section 2.2. The relationship between diagnostics and prognostics is discussed in Section 2.3.

15

background

Some of the basic concepts of prognostics are illustrated in Section 2.4. The two main approaches to prognostics are discussed in Section 2.5.

2.1

prognostics and health management

Prognostics and Health Management (PHM) is an enabling technology that provides solutions for monitoring and managing the health of engineered systems. This discipline is recent but has already changed dramatically the nature of maintenance in a broad range of applications [52, 79] from a wide range of fields, including aircraft, electronics, computers, manufacturing, civil and military infrastructure, and bio-implantable systems. Concretely, PHM has been formally applied to aeronautics and the military ever since the beginning of the Joint Strike Fighter (JSF) program in 2001 [24]. This military aircraft program was one of the most complex and important efforts of the 21st century to date. A central theme of the JSF program was the development of autonomic logistics with minimal human intervention and a sophisticated PHM system. The PHM system was responsible for detecting and predicting incipient faults ahead of time in order to prevent its progress to system failure. The monitoring and predictive capabilities of the F-35 Joint Strike Fighter (JSF Program) were central to enable a more effective health management of assets. The JSF program was fundamental to lay out the main ideas of PHM: 1. that the use of sensor technologies is important to obtain reliable condition monitoring information, and 2. that advanced algorithms for diagnostics and prognostics are key for health management Several companies have started applying PHM concepts in aeronautics. For instance, Rolls Royce [9], Pratt & Whitney [21] and General Electric [59] have been using PHM for the health management of their engines. Also, aircraft manufacturers such as Embraer Brazil [56] and Airbus [29] are developing their own PHM systems. Beyond the field of aeronautics, many other areas have adopted PHM methodologies. For example, in electronics, companies such as Compaq, Cisco, Dell, HP are investing in PHM solutions to address the increasing demands for more reliable and affordable systems. Advances in electronics are expected to have a considerable impact on various fields, from automotive to military. Here, the major PHM challenges are the management of the soft faults and intermittent failures that characterize the operation of electronic devices. Specifically, in the computer field,

16

2.2 historical evolution of prognostics

PHM is expected to improve the reliability and availability both for critical applications and for overall operation. PHM has also played and will continue to play a role in the Industry 4.0 process [50]. The manufacturing industry has seen a major revolution in maintenance with a natural evolution of preventive maintenance to predictive maintenance that has led to significant improvements in maintenance and logistics, reducing downtime and enhancing product quality. Besides the more developed fields of aeronautics, electronics, and manufacturing, other domains such as the smart grid and infrastructure management are taking steps to implement technologies for real-time condition monitoring, diagnostics and prognostics. These technologies are expected to play a major role in the reliability and safety of valuable assets such as wind turbines, bridges, and civil and military buildings. More into the future, ultra-high reliability systems, such as implantable biological systems, will require PHM to be sustainable.

2.2

historical evolution of prognostics

Prognostics is becoming increasingly important in maintenance policies as this discipline is becoming one of the central aspects of reliability centered maintenance. The idea behind reliability centered maintenance (RCM) was first developed in the context of the airline industry to optimize maintenance procedures and minimize related costs. This effort was made possible, in great part, due to United Airlines which, throughout the 1960s, invested on a complete review of its preventive maintenance strategies and ways of improvement [80]. This assessment was lead by a core group at United, headed by Bill Mentzer, Tom Matteson, Stan Nowland, and Harold Heap [80]. The result of this organization-wide collaboration was a novel framework for maintenance that soon become popular and quickly revolutionized the airline industry [80]. The findings of this pioneering work would be documented in 1978 in “Reliability-centered maintenance”, by Nowland and Heap, with this book becoming one of the most important works in reliability, laying out the principles of modern maintenance [61]. Coming from the airline industry, RCM would soon be adopted by other areas, such as by the military and utility (electric power generation) industry [80] and more recently by manufacturing [65] and transportation [23]. Perhaps the most influential finding of Nowland and Heap’s study [61] was the evidence against the supposed direct relationship between age and failure rate. Their findings contradicted the long-held assumption about the effectiveness of time-based maintenance. Concretely, the study showed that only a small percentage (4%) of equipment actually replicated the behavior of the bathtub curve, one of

17

background

Figure 4 – Bathtub curve. The bathtub curve describes the traditional product life cycle in what concerns reliability. At the beginning of the component’s life there are various defects that can pass initial screening and lead to failure in a short time. Then, random failures can occur until the components start to age out.

the most prevalent concepts of reliability theory. The bathtub curve is a common model to describe the failure rate of a population over time. The model postulates that the failure rate of most products yields a curve that resembles the drawing of a bathtub curve. The curve consists of three dinstinct regions: (1) infant mortality, (2) random failure and (3) wear out (see Fig. 4). In addition to the finding that the bathtub curve was not always the most appropriate reliability model, the study of Nowland and Heap [61] also identified other important degradation curves. Fig. 5 illustrates the different types of age reliability curves and their corresponding probabilities. As shown, the majority of the equipment (68%) followed the F scheme with an infant mortality period and a constant or slowing increasing failure rate period. Scheme F, scheme D (increasing followed by constant failure rate) and E (constant failure rate) formed 89% of total equipment. This considerable proportion of equipment, due to their pattern, needed a more sophisticated approach to maintenance than a simple setting of an operating age limit. Only 11% of the equipment showed a degradation pattern that could fit that strategy. At that time, these findings were quite surprising, and subsequent studies conducted by the US Department of Defense and nuclear utilities confirmed Nowlan and Heap’s work [80]. Until the release of Nowlan and Heap’s book, time-based maintenance was the most advanced maintenance strategy at the time. Sometimes also called preventive maintenance [60], experience-based or hard time maintenance [80], this kind of strategy consists in scheduling a routine plan of activities based on usage (e.g. airplane flights) or time measures (e.g. calendar time). By definition, a timebased maintenance plan is based on the analysis of past history of maintenance

18

2.2 historical evolution of prognostics

Figure 5 – Age reliability patterns for non-structural aircraft equipment (adapted from [61]).

activities or run-to-failure data. Some authors argue that time-based maintenance already uses some form of prognostics (prediction) to develop its interval decisions [20, 76, 77, 86]. However, time-based maintenance exhibits two characteristics that makes it unsuitable to prognostics: first it is very unlikely that the time-based interval of a system is updated over its lifetime and second there is no concept of prediction involved, but instead of scheduling. The problem of time-based maintenance is its lack of precision: interventions can occur either too late or too early. The situation of unscheduled interventions are especially serious in safety-critical industries, such as aeronautics or nuclear energy, where the cost of failure is extremely high. It was essentially to help solve these challenges that RCM was created. RCM was born out of the need to have a structured process that based on real-time data about the system performance could advise on the most effective approach to maintenance. Please note that even though closely related, RCM is not the same as predictive maintenance or PHM: it is rather an umbrella term for a variety of methods, tools, and principles, all with the purpose of improving maintenance. Please refer to [60] for a comprehensive description of the framework and its use. Among other issues and concerns, RCM is an ongoing process that aims to identify the most adequate maintenance strategy at each time. It seeks the optimal

19

background

Figure 6 – Maintenance strategies of reliability centered maintenance (RCM) (adapted from [60]). RCM seeks to optimize the use of corrective, time-based, predictive and proactive maintenance. Corrective maintenance is applied when there are redundant mechanisms in place and the asset is non-critical. Time-based maintenance is used for aging items while predictive is for items that exhibit a random failure pattern in regards to age. More proactive forms of maintenance act at the product design level.

combination of (1) corrective, (2) time-based, (3) predictive and (4) proactive maintenance in increasing order of complexity. The optimal mix of the various types of maintenance should be driven by the system’s risk tolerance and budget. Please see Fig. 6 which aims to illustrate the main strategies of RCM. The principal features of each strategy are shown below each block. The simplest strategy, corrective maintenance, also known as reactive, repair, fix-when-fail, or run-to-failure maintenance occurs only when the equipment has a failure. In this approach there is no precluding prognostics activity: if the item fails and there are no spare parts, operation delays can occur. Also, there is no way to influence when the failure occurs as there is no failure prevention. This kind of maintenance is characterized by a high percentage of unscheduled maintenance actions and a large inventory of spare parts. Examples of aeronautics components where corrective maintenance is applicable include non-critical assets, such as comfort cooling, coffee machines, water heaters or items with some kind of redundancy mechanism. Time-based, also known as preventive or interval-based maintenance, is a more complex form of maintenance. It also does not involve any prognostics activity as interventions are based solely on regularly scheduled inspection/replacements. As predictive maintenance is expected to become more accurate, time-based maintenance is expected to be increasingly abandoned in favor of the former approach. Nevertheless, for highly critical assets, the time-based strategy, when applied in combination with predictive maintenance, can help reduce the frequency and seriousness of unscheduled failures. Predictive maintenance or condition-based maintenance is the third maintenance generation. Based on condition assessment, this strategy uses prognostics to avoid unscheduled maintenance events and to plan replacements/repairs in advance. Proactive maintenance consists in collecting data from the results achieved and the use of this data back to improve design and future maintenance. Failure Modes and Effects Analysis (FMEA), Failure Modes and Effects Critical Analysis (FMECA) and Root-Cause Failure Analysis (RCFA) are among some of the analytic tools used by this approach.

20

2.3 the relationship between diagnostics and prognostics

Figure 7 – Reliability Centered Maintenance (RCM) Decision Logic Tree (adapted from [60]).

Fig. 7 depicts the iterative RCM process used for the majority of NASA facilities and collateral equipment [60]. The decision tree shown is general enough to be applied to other domains. The chart shows how, in RCM, the maintenance strategies, rather than being applied independently, build on top of each other to take advantage of their relative strengths while avoiding their weakest points. Corrective maintenance is applied when there is no need to apply complex strategies. Non-critical or redundant items usually fall into this category. Preventive maintenance can also be used in combination with predictive maintenance to ensure a high level of safety and reliability. Predictive maintenance is at the center of RCM, often aided by proactive maintenance to improve reliability and minimize costs. Predictive maintenance is hence a key enabler of RCM. In this work, focus will be on predictive maintenance modeling and the related prognostics activities. The goal here is to understand and improve the existing prognostics approaches to predictive maintenance. With this effort, aim is on advancing research both in prognostics and in RCM.

2.3

the relationship between diagnostics and prognostics

Diagnostics and prognostics are two important activities of PHM. However, it appears that there is no consensus on the boundaries between these fields [79]. This topic has been thoroughly discussed in the literature with authors usually

21

background

opting for different definitions [79]. This lack of agreement can be explained in part by the fact that PHM has been applied to many different domains. Over time, each field of study has developed its own terminology leading to a multitude of sometimes conflicting definitions. For example, the topic of study of PHM may differ from domain to domain. For instance, in electronics, interest is usually on the event of discharge [92]. In infrastructure management, the major concern is with the degree of fault [46]. In the more traditional fields of PHM, such as aeronautics or transportation, interest is on predicting the end-of-life (EOL) of the equipment, that is, the point where the system no longer meets its standard specifications [14]. In manufacturing, the goal is usually the prediction of when a machine will perform incorrectly, such as when a measurement/task is incorrect [28]. To avoid confusion, it is important to propose a definition for prognostics/diagnostics general enough to apply to any field. Tab. 1 reviews some of the most important definitions of prognostics and diagnostics. As shown, usually the definitions refer to faults/final failures. The most encompassing definition is the one in [25]. The definition states that any deviation from nominal operational performance can be subject to prediction/diagnostics. Here, the term nominal refers to a “normal” or “expected” condition. However, the use of the term nominal condition can be misleading as this term is commonly used to designate an expressed value, by opposition to an exact real value. For instance, to say that a car battery has a nominal voltage of 12V means that the battery voltage fluctuates around the 12V. In this thesis, it is proposed that any health event of an engineered system, either simple or complex, can be prognosticated/diagnosed. Here, a health event is defined as any event that influences the health condition of the system. Regarding the actual definitions of prognostics and diagnostics, it is important to understand the source of misunderstanding by examining Tab. 1. From an analysis of the proposed definitions it is possible to identify some points of agreement and disagreement. Common assumptions include: 1) Prognostics involves some form of prediction, 2) Diagnostics and prognostics are closely related and 3) both prognostics and diagnostics relate to a health phenomenon of the system. Most authors agree that diagnostics consists of fault detection and isolation. However, in [25] it is considered that prognostics is also responsible for tracking the evolution of the fault. This lack of agreement follows from the fact that the distinction between the two fields is vague, and authors often practice in both fields during the course of their research. Some authors go to the extent to consider that prognostics capabilities are an evolution of diagnostics [37]. For instance, it is stated in [8] that the diagnostic state of the system should be extended with prognostics capabilities. In [25] an even more encompassing definition is presented. Accord-

22


Figure 8 – Difference between prognostics and diagnostics. The goal of diagnostics is an indepth exploration of the causes of a health event at the present moment, while the goal of prognostics is to predict the event ahead of time. Time is critical to distinguish these two activities: prognostics concerns the future while diagnostics begins after an event/need has been detected.

ing to the authors, prognostics also consists in the detection of the source of the problem. This conception conflicts with the traditional definition of diagnostics in which diagnostics is separated from prognostics. In this work, prognostics is defined as the science of making predictions about the health condition of engineered systems. As in [25], it is considered that prognostics can encompass the detection and isolation of failure, but in the future. While diagnostics happens in the present, prognostics foresees the future, attempting to predict the timing of the event of interest and related information, such as what component(s) will fail and to which extent. The main activities of prognostics are the same as that of diagnostics (detection, isolation and identification) but they involve a prediction/forecast. In diagnostics, the activities are the same but they happen at the present moment. When it is necessary to diagnose a fault/event, the detection of the problem occurs without prediction. In other words, diagnostics provides a retrospective of the current problem, while prognostics provides a retrospective of a future problem. It is also important to note that diagnostics can happen at any moment and not necessarily after prognostics, as many authors contend. For instance, it may be necessary to perform a diagnostics of the current situation as a preventive measure, despite prognostics signaling the problematic event much later. Fig. 8 illustrates the distinction between prognostics and diagnostics. As shown, prognostics encompasses a period of time. A full prognostics predicts not only the timing of the next event but also what happens in between the current moment and the event, such as the degradation pathway of the equip-

23

background

ment. Prognostics can happen at any moment as well as diagnostics. To sum up, the following definition of prognostics is proposed: Prognostics concerns the prediction, isolation and identification of a future health event of an engineered system. The definition of diagnostics follows: Diagnostics concerns the detection, isolation and identification of the current condition of an engineered system. Please note that the previous definitions concern engineered systems and not engineering systems as some authors contend (Tab. 1). The latter term is used to define complex systems composed of many parts whose operation and management involves social, economical and technical dimensions [16]. Although PHM is deeply rooted in the field of engineering systems, the proposed definitions are general enough to applied to any engineered system, even simple ones, such as in a scenario of Internet of Things (IoT) [93]. In Fig. 8 two important events are identified: the EOL of the equipment and the event of failure. Please note the difference between these two: EOL is concerned with the time when the equipment no longer operates according to the minimum standards while failure refers to the point in time where the equipment condition reaches zero. Theoretically, it is desirable that a PHM system is able to predict/diagnose correctly the EOL before the final failure. In this work, the event of interest is the EOL but this may coincide with the actual failure. For the sake of clarity, the term fault/failure will be used extensively to denote the event of interest throughout this thesis. Some authors such as [77] argue that diagnostics is dependent of prognostics and that the opposite is also true. The view of this work is somewhat different. Diagnostics should happen (but not only) at the moment where prognostics has detected a problem. Here, the desirable output is that the predictions of prognostics are confirmed by the detection mechanisms of diagnostics. Also, the number of diagnostics interventions (preventive actions) before the prognosticated event should converge to zero as the prognostics capabilities improve. In this work and in contrast with other definitions, there is no notion of diagnostics preceding prognostics. It is argued that diagnostics should happen at the moment where the event is predicted to happen, but other diagnostics interventions may be necessary. As diagnostics and prognostics share the same activities and both use health monitoring data of the present and the past, it is often easy to develop a diagnostics model from a prognostics model. In a certain sense, prognostics is a more developed form of diagnostics that includes some extra activities, such as the estimation

24


Table 1 – Definitions of prognostics and diagnostics. Ref.

Definition (quote)

[8]

“(diagnostics) employ several fault detection and diagnostic technologies ranging from simple thresholding to rule-based algorithms. However, these technologies have not specifically focused on the ability to predict the future condition (prognostics) of a machine based on the current diagnostic state of the machinery and its available operating and failure history data. An advanced prognostic capability is desired because the ability to forecast this future condition enables a higher level of condition-based maintenance for optimally managing total Life Cycle Costs (LCC).”

[7]

“Fault detection / diagnostic reasoning... determines if an engine component has moved away (degraded) from 100% along a known path... to a point where engine performance may be compromised... Prognosis is the assessment of the engine’s current health and a prediction of the engine’s future health. ”

[5]

“Diagnostics has traditionally been defined as the ability to detect and sometimes isolate a faulted component and/or failure condition. Prognostics build upon the diagnostic assessment and are defined here as the capability to predict the progression of this fault condition to component failure and estimate the remaining useful life (RUL).”

[32]

“Essentially, prognostics provides the predictive part of... a comprehensive health management system and so complements the diagnostic capabilities, that detect, isolate and quantify the fault, and prognostics in turn is dependent on the quality of the diagnostics system... what distinguishes prognostics from diagnostics is the provision of a lead-time or warning time to the useful life or failure, and that this time window is far enough ahead for the appropriate action to be taken... In summary, predictive prognostics is that part of the overall PHM capability that provides a prediction of the lead-time to a fault / failure event in sufficient time for it to be acted on. ”

[5]

“Diagnostics has traditionally been defined as the ability to detect and sometimes isolate a faulted component and/or failure condition. Prognostics build upon the diagnostic assessment and are defined here as the capability to predict the progression of this fault condition to component failure and estimate the remaining useful life (RUL).”

[77]

“We define diagnostics to include fault isolation and fault identification, so that full diagnostics requires determining the specific fault mode, rather than just reporting which sensor has an unusual value. We define prognostics to be detecting the precursors of a failure, and predicting how much time remains before a likely failure. Prognostics is the most difficult of these tasks. One must be able to detect faults before one can diagnose them. Similarly, one must be able to diagnose faults before one can perform prognostics.”

[52]

“Diagnostics can be summarized as the process of identifying and determining the relationship between cause and effect in that its function is to isolate faults and identify failure root causes. Prognostics can be interpreted as the process of health assessment and prediction, which includes detecting incipient failure and predicting RUL.”

[36]

Prognostics is “analysis of the symptoms of faults to predict future condition and residual life within design parameters” Diagnostics is “the examination of symptoms and syndromes to determine the nature of faults or failures (kind, situation, extent)”

[25]

“Prognostics is the science of making predictions of engineering systems. It is part of a suite of techniques that determine whether a system is behaving within nominal operational performance and – if it does not – that determine what is wrong and how long it will take until the system no longer fulfills certain functional requirements.”

25

background

Figure 9 – Basic steps of prognostics and diagnostics. The first step of prognostics is to predict the health event, such as end of charge, fault or failure. The following steps are the isolation and identification of the future problem. The final step is to estimate the confidence of the predictions. Diagnostics can happen at any time and it involves the detection of the current condition of the equipment. It also determines which component is failing and due to which cause(s).

of the time to failure and uncertainty management of the predictions. Fig. 9 shows the basic steps of both paradigms. As shown, diagnostics and prognostics have related but different architectures. The main activities of prognostics are (1) event prediction, (2) fault isolation and identification and (3) uncertainty management. The first activity, event prediction, tries to answer the question of how the health process is expected to progress. For example, in a battery application, the question tends to be how and when the discharge is going to happen. For other engineered systems, such as an aircraft, the question is usually when the end-of-life will be reached and how the degradation process is going to evolve. The steps of fault isolation and identification try to answer the questions of which component will be most affected to which extent and what will be the origin of the problem. The activity of uncertainty management attempts to measure the uncertainty associated with the prediction(s). As it is not possible to provide exact predictions, it is important to estimate the degree of confidence. Diagnostics is restricted to the activities of fault detection, isolation and identification as it does not involve prediction. Currently, and as more sophisticated prediction technologies become available, the research efforts are focusing on prognostics [52]. Following this trend, this work

26

2.4 basic concepts of prognostics

focuses on prognostics in order to limit the scope of this work. Diagnostics is not considered in depth.

2.4

basic concepts of prognostics

In this work, prognostics is defined as the prediction, isolation and identification of a future health event of an engineered system. Here, health event means any phenomenon related to the condition of the equipment/system. Traditionally, events of interest include identifying failures or faults. Inconsistency in the definition of these and other terms has led to varying interpretations of the same concepts. For instance, in [85] a failure is defined as “an internal action in the system generated by a malfunction in one or more subsystems that try to take it to an undesired state”. In contrast, failure is defined in [40] as “the abnormal condition [of an equipment/system] whose performance has degraded such that it is functionally unacceptable”. These authors include failure in the broader class of faults, which are defined as “known (or previously analyzed) and repeatable abnormal condition(s)”. An anomaly is defined as an “unknown (i.e., not seen before, not fully understood, or intermittent) abnormal condition(s)”. The most common definition of failure is provided in [17]. Here, failure is defined as “when a component, machine, or a process fails and everything stops”. This lack of agreement on the terminology presents a major challenge to the standardization of the field. One of the most important attempts to provide standard definitions in PHM is in ISO 13372:2012 [36]. This work follows this standard for the definitions of failure, fault and anomaly. According to [36], a failure is a deviation of the service provided by a system from the correct service. A complete failure of the main capability of a machine is defined as a catastrophic failure. Since a complete failure may not imply a catastrophe (e.g. power discharge), it is preferred to use the term complete/final/terminal failure to define such type of failures. ISO 13372:2012 defines fault as a degradation or abnormal behavior (deviation from standard condition - anomaly), which may, but not necessarily, lead to failure. The concepts of fault and failure are somewhat related as there can not be a failure without a fault, but a fault can exist without resulting in a failure. Fig. 10 illustrates the relationship between these concepts. As shown, there can be an anomaly or degradation in the system, i.e. a fault, but that does not necessarily imply that the deviation/ degradation is critical enough to result in a failure. Conversely, a fault is a precursor of a failure and therefore it can not exist without the latter. There are two ways to prevent and mitigate faults: 1) fault avoidance where the goal is to avoid the existence of faults during any phase of the operational lifecycle

27

background

Figure 10 – The concepts of fault and failure. A fault is an abnormal condition. i.e. an anomaly, or degradation which may lead to a failure.

and 2) fault tolerance, which consists in finding ways to continue to perform a required function in the presence of faults. It is also important to emphasize that faults can be caused both by an anomaly, i.e. deviation from normal operation [36], or by a degradation process. An anomaly can be manifested in a variety of ways, corresponding to the failure modes of the system. It is possible to identify two major types of fault/failure modes: 1) service fault/failure, when the service provided by the system deviates from the expected one and 2) timing fault/failures, when the delivery time of the service is not correct. An important concept in prognostics that relates directly to the concepts of fault and failure is the end-of-life (EOL). This concept refers to the time by which the system no longer fulfills at least one of its requirements leading to failure. Formally, an anomaly occurs when the performance of the system lies outside some desired region of acceptable behavior. Outside this region, there is a failure. Formally and following the notation of [15], the desired performance is expressed through a set of c constraints, such that: C = {Ci }ci=1 , where Ci 2 B = {0, 1}

(2.1)

where Ci = 1 if the constrain Ci is satisfied and Ci = 0 otherwise. These individual constraints may be combined into a threshold function TEOL , where 8 > :0, otherwise where if TEOL = 0, an anomaly is detected and a failure occurs.

28

2.4 basic concepts of prognostics

Figure 11 – Representation of the prognostics process. In prognostics the goal is usually to calculate the end-of-life (EOL) of the equipment, i.e. the point where at least one requirement is not fulfilled. At any point between t0 and terminal failure it is possible to predict the EOL and the associated remaining useful life (RUL). These predictions may or may not correspond to the true EOL and RUL.

Using TEOL , it is possible to formally define EOL as the earliest time at which TEOL is met: EOL(tP ) , inf{t 2 R : t > tP ^ TEOL = 1}

(2.3)

where tP is a given point in time. Please note that the EOL does not necessarily mean complete failure. The system might reach the EOL, i.e. not be working as expected, but complete failure (when all constraints are met) may not occur. The failure might introduce some noticeable degradation of the ability of the system to perform but it does not necessarily imply a terminal failure, i.e. a ceasing of operation. A concept related to EOL is the remaining useful life (RUL). The remaining useful life of an equipment is described using the EOL as RUL(tP ) , EOL(tP ) - tP

(2.4)

To provide probabilistic estimates of EOL and RUL, the following distributions are defined p(EOLtP |t0:P ) 2 [0, 1] and p(RULtP | t0:P ) 2 [0, 1] where tP represents a given prediction point.

29

(2.5)

background

Fig. 11 illustrates the typical process of prognostics. After failure or when the system operates for the first time, the equipment condition is at its relative maximum (point t0 ). From there on, the degradation path follows a stochastic process as the one depicted in Fig. 11. The equipment condition deteriorates until it reaches the point of functional failure (EOL). Along the degradation curve, it is possible to predict the EOL point and the RUL of the equipment. It is expected that as degradation progresses, it becomes easier to predict the EOL and RUL, as more information about the evolution of failure becomes available.

2.5

prognostics approaches

The mathematical model is an important tool for prognostics. They take different forms: some models are deterministic, others stochastic; some are discrete, others continuous; and some run given parameterized dynamical laws while others describe exact algebraic relationships between important variables. Because natural phenomena are complex and any physical observation is subject to random observational errors, it is unrealistic to expect a mathematical model to perfectly reproduce the real-world. Models are imperfect by definition. They only shed light on a fraction of the total picture - simplifying assumptions are necessary to reproduce and predict the likely behavior of a physical system. Prognostics model development proceeds from a preliminary conceptual model through formalization into a mathematical model. It is an iterative process with adaptive refinement as more data becomes available and assumptions are developed, reviewed and approved. If the model fits the data well and the assumptions are adequately met, the resulting estimates can, in theory, provide an accurate description of the system’s behavior. Prognostics models fall into two general classes | statistical (empirical) or mechanistic (dedutive) - or they may be a combination of the two [48]. Empirical models focus on describing the data with the specification of very few a priori assumptions about the data being analyzed. An example of an empirical model is Peck’s temperature-humidity relationship [63]. The model empirically determines an acceleration formula that provides direct correlation of the effects of temperature and humidity on epoxy over-molded electronic components. On the other hand, mechanistic models attempt to incorporate knowledge of the system into the model, while describing the available data. In such models, mathematical equations govern the evolution of natural processes in space and time. They provide a source of general knowledge that can be used to improve the understanding of systems. In recent years both empirical and mechanistic techniques have been proposed in the field of prognostics [52] in diverse areas from

30

2.5 prognostics approaches

Figure 12 – Model-based prognostics architecture. Model-based prognostics is generally divided into two sequential problems: a joint state-parameter estimation problem, in which, using the model, the health of a system or component is determined based on the observations; and a prediction problem, in which, using the model, the state-parameter distribution is simulated forward in time to compute end of life and remaining useful life. The first problem is typically solved using a filter solution.

mechanical to electrical and civil engineering. In this work, it is provided a concise review of mainstream models in prognostics, including the past research on data-driven (empirical) and model-based (deductive) approaches. To initiate this, the general architecture of each approach is presented. In the most widely used model-based architecture, the system is described using a general nonlinear model: ˙ x(t) = f(t, x(t), ✓(t), u(t), v(t))

(2.6)

y(t) = h(t, x(t), ✓(t), u(t), n(t))

(2.7)

where x(t) 2 Rnx is the nx -dimensional state vector, ✓(t) 2 Rn✓ is the n✓ dimensional vector that specifies the model damage parameters whose true values are unknown and possibly time-varying. Here, u(t) 2 Rnu is the input vector, v(t) 2 Rnv is the process noise vector and f is the state equation that propagates the state at time t to t + 1. y(t) 2 Rny is the output vector, n(t) 2 Rnn is the measurement noise vector, and h is the output equation. In this architecture, the solution to the prognostics problem takes the perspective of joint state-parameter estimation. In discrete time t, the state x(t) and ✓(t) are estimated, and these estimations are used to predict EOL and RUL at desired time points. The prognostics architecture is shown in Fig. 12. The system is provided with inputs u(t) and provides measured outputs y(t). Prognostics may begin at t = 0, with the damage estimation module determining estimates of the states and unknown parameters, represented as a probability distribution p(x(t), ✓(t)|y0:t ) given the discrete sequence of observations up to time tP , denoted as y0:tP . The prediction module uses the joint state-parameter distribution, along with hypothesized future inputs, to compute EOL and RUL as probability

31

background

distributions p(EOLtP |y0 : tP ) and p(RULtP |y0 : tP ) at given prediction times tP . For a more detailed explanation of a model-based framework please see [42]. Model-based approaches rely on a physical damage model, which is combined with measured data to predict the future EOL and RUL. The behavior of the physical model depends on the wear parameters that are estimated and updated together with the state vector based on the measured data. The RUL is predicted by progressing the state vector until it reaches a threshold. A central aspect of model-based approaches is the validation of the physics model describing the behavior of damage. There is some literature on the general topic, such as the hypothesis test and the Bayesian method. Classical hypothesis testing is a standard technique that makes use of t-tests to compare the sample mean with the population mean for a known variance. Bayesian methodologies use the Bayes’ theorem to estimate prior probabilities of model acceptance. Dealing with a case study of mine hunting at sea (model for the Dutch navy), an important work in model validation is in [47]. Here, the author emphasizes that validation is a continuing process (the environment keeps changing, so the model must be updated and revalidated) and points to the difficulties in assuring the appropriateness of the model simulated environment to the real environment as the major challenge in validation. Sargent is another author who has made important contributions [71–73] to the study of how to validate and improve simulation models. In the automotive industry, the authors in [67] test various statistical methodologies for model validation, based on classical hypothesis testing, Bayesian statistics and principal component analysis (pattern recognition techniques). A fatigue life prediction model for composite materials, and a residual stress prediction model for a spot-weld joint in an automotive structure are validated using the proposed methodologies. The same authors also propose [68] a Bayesian methodology to calibrate and improve simulation models with experimental data. Despite these overall efforts and the importance of model validation, in prognostics it is rare to see this topic being thoroughly discussed. This can be explained by the growing complexity of physics models and hence the increasing difficult to perform model validation. Still, it is inevitable that this topic assumes greater importance in recent years. Model parameter estimation is another central issue in model-based prognostics. There are several estimation algorithms such as Bayesian inference, Kalman filter [44], particle filter [89], and extended Kalman filter [43]. Here, observations are used to estimate and update unknown parameters in the form of a probability density function. This updated distribution function is called the posterior distribution, which is obtained by multiplying the prior distribution, i.e. prior

32


knowledge of the unknown parameters, and the likelihood function, that is the probability distribution function value of measured data conditional on the given parameters. There are several studies concerning parameter estimation in prognostics. The simplest form of parameter estimation are linear Kalman filters. For example, in [81], a linear Kalman filter, based on a reduced order electrochemical model, is used to estimate a battery’s internal electrochemical state from external current and voltage measurements. Linear Kalman filters are also particularly popular in engine health estimation [3, 54]. Despite the simplicity of the Kalman filter, this recursive solution has two main limitations: 1) it assumes the system has a linear behavior and 2) it assumes all of its distributions i.e, the measurement noise, the process disturbance and the main state variable are Gaussian. These two restrictions make the linear Kalman filter a mathematically elegant and computationally efficient solution. However, the use of the linear Kalman filter is restricted to a small subset of linear problems with Gaussian distributions. Another variant of the Kalman filter which solves some of the problems of the linear Gaussian filter is the extended Kalman filter. The extended Kalman filter solves the problem faced by the linearity assumption by linearizing a non-linear function about its first order Taylor series expansion. Several authors rely on extended Kalman filter solutions in their model-based approaches. The particle filter is another recursive solution, which overcomes both limitations of the Kalman Filter. This filtering method makes no assumptions about the dynamics of the state-space or the density function: the state-space model can take any non-linear form and the initial state and noise do not need to follow a Gaussian distribution. Given these desirable properties the particle filter is one of the most common approaches to model-based prognostics. Important contributions to the field include the work in [62, 70, 95]. There are some limitations to the model-based approach which makes it unsuitable in many situations. Developing a high fidelity model can be very costly, timeconsuming and computationally intensive. Further, there are situations, in which, due to the complexity of the actual system, it is not possible to develop a physics model. Even if such a model is developed, this expensive investment tends to be system specific and have low re-usability. In view of these limitations, data-driven modeling is often a more suitable approach for prognostics. This follows from several reasons. First, data-driven models are flexible as it is possible to adapt them to new applications without much effort. Also, the low costs of algorithm development and the little or no expert knowledge required tend to make this approach the preferable choice.

33

background

Figure 13 – Data-driven prognostics architecture. Model-based prognostics is generally divided into two sequential steps: a training stage, in which, a model (or hypothesis) is constructed from a training dataset using a machine learning algorithm; and a prediction stage, in which, using the model, and based on a new dataset of observations, predictions of the RUL and EOL are made.

Fig. 13 depicts the general architecture of data-driven approach. The overall concept is simple: a model is learned from a set of data, known as the training data. This model, also called hypothesis, is then used to make new predictions of the RUL and EOL based on a new set of data. The distinguishing characteristic between the training and the new dataset is that the training dataset contains information about the RUL/EOL of each observation, while in the new data this information does not exist as it is expected to be predicted. There are two ways to do RUL estimation using the data-driven approach: 1) direct or 2) indirect approach. In the first approach, the model of system behavior directly calculates the remaining useful life while in the second approach, the model is used to estimate the system degradation. Here, the RUL is estimated when the degradation curve intersects a failure threshold. The use of the first approach is generally dependent on the availability of run-to-failure data, i.e., time series of actual failure times which are not available for many systems in statistically meaningful quantities - despite a dearth of normal data (which are useless for prognostics). Without this information it is necessary to extrapolate the actual failure times using indirect techniques such as the threshold technique. Data-driven prognostics also faces several challenges. Because this approach deals with multivariate and noisy data it requires the use of robust algorithms. Missing data and outliers are important problems in this approach. These kind of models are often perceived as black-boxes where the inner workings of the models are too complex and abstract to be fully understood. Due to this results may not be entirely intuitive and the understanding of the system may require additional effort. Further, this kind of methods often deal with large datasets which make the prognostics practice a computationally challenging task. Well-designed algo-

34


rithms and suitable resources can help solve this problem, but this characteristic of data-driven models remains a challenge nonetheless. Also importantly, finding the right balance between over-fitting and over-generalization while training these models is often a major challenge as it is not always easy to calibrate these models.

35

3 C A S E S T U D Y O F E N G I N E B L E E D VA LV E

3.1

Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

37

3.2

Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

40

3.3


41

3.4


45

3.5

Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

48

T

his work builds on three cases to increase knowledge of prognostics and predictive maintenance in aeronautics. This chapter provides an introduction to the case that deals with the unscheduled removals (actual failures) of a critical valve of the aircraft engine. This work has been done in collaboration with the Brazilian aircraft manufacturer Embraer and it is based on real data. This chapter documents the most important results of this case. It starts with the industrial background in Section 3.1, briefly explains the data and pre-processing steps in Section 3.2 and documents the major findings in Section 3.3 to 3.5.

3.1

background

The environmental control system is the aircraft system responsible for maintaining a comfortable environment for a given payload, either goods, living matter, or people. This system manages the temperature, pressure and air flow of the aircraft pressure vessel which includes the cockpit, cabin and aircraft interior compartments. Despite the variety of airplanes, the basic design of environmental control systems is remarkably similar [13]. In simplified terms, in these systems, air is first compressed to high pressure and temperature and then conditioned, before delivery to the cabin and cockpit to maintain a comfortable environment, i.e. to keep temperature, pressure, and composition, within acceptable limits.

37


An important subsystem of the environmental control system is the bleed air system. This system, also called pneumatic system, is in charge of the air supply from the compressor stage to the downstream systems. The bleed system supplies the air required by the downstream consumers while regulating the pressure and temperature of the air from the engines. Bleed air is supplied in flight by the engines or the auxiliary power unit. The bleed system that is studied in this work is similar to that of an E135/E145 depicted in Fig. 14. As shown, the bleed system is a complex network of valves and other components that regulate flow between the engine and the aircraft.

38

Figure 14 – Bleed system schematic. The bleed system of an Embraer E135/E145 (Source: www.embraer.com).

3.1 background

39


(a) Rolls-Royce Bleed Valve

Engine (b) Pratt & Whitney JT8D Aircraft Engine Bleed Valve

Figure 15 – Pictures of engine bleed valves. Bleed valves are provided to control the transfer of air between engine and remaining aircraft compartiments (Source: www.ebay.com).

One of the most important and critical components of the bleed system is the engine bleed valve. The engine bleed valve (EBV) is a small valve, that although not directly forming part of the core engine it is strictly necessary to sustain its operation. The EBV controls the air supply from the engine to the air conditioning system in aircraft. Fig. 15 displays two real bleed valves, one from Rolls Royce and another from Pratt & Whitney.

3.2

data

In this study data from 588 removals of EBVs are considered. Here, removal is a maintenance procedure where all of the original system configuration is removed and replaced by a new or repaired unit. The studied removals are, for the most cases, unscheduled removals, meaning they were unexpected events that led to emergency interventions. Given the nature of the removals, we can consider that majority of the studied data correspond to actual valve failures. Fig. 16 shows an histogram of the original data. As shown, there is a large dispersion of time between removals: after a removal, a valve can last between 0 to 1134 days. The Medcouple rule [90] was used to remove outliers from the original data. Fig. 16 shows Kernel density estimates [69] and Weibull fitting on the cleaned data. As shown in the figure, the data is well fit to the Weibull. More details about the data pre-processing process is given in Publication I and III. In addition to the removal data there is data about the health monitoring signals of the engine bleed system. This is a multivariate time series of around 100GB. The

40

3.3 classical machine learning

Figure 16 – Histogram of unscheduled removals of engine bleed valve (EBV). The engine bleed valve in study shows a large dispersion of time between removals ranging from 0 to 1134 days. This wide range of failure makes the case a challenging problem.

signals are transmitted in digital format from the aircraft to the ground station at each second. There is also a dataset of asynchronous fault events generated by the maintenance central computer of the aircraft. Here, a fault event is an event generated whenever a component of the bleed system is not operating according to recommended guidelines. The fault events represent mild or moderate deviations from normal behavior, but do not necessarily indicate failure or the need for an immediate repair. Each fault event is characterized by an aircraft ID, an event datetime and a fault code. This work explores 90 codes that relate to the entire bleed system. Due to confidentiality issues, the meaning of each code, in physical terms, was not disclosed.

3.3

classical machine learning

This section presents the results of applying traditional machine learning algorithms to different data sources on the Case-EBV1 . Concretely, five machine learning approaches are tested, which include Linear Regression (LR), K-nearest Neighbors (KNN), Neural Networks (NN), Support Vector Regression (SVR) and Ran1 Please refer to publication I, III and IV for more details.

41


dom Forests (RF). Four different classes of data are tested. Models of type I concern only maintenance data, models of type II concern maintenance data and fault events, models of type III concern maintenance data and sensory signals, and models of type IV concern all data. Quantitative results are presented in Tab.2 for the metrics of Mean Error (ME), Mean Absolute Error (MAE), Root Mean Square Error (RMSE) and Median Absolute Deviation (MAD).

42

43

MAD

RMSE

MAE

I

ME

IV

III

II

I

IV

III

II

I

IV

III

II

I

IV

III

II

Data Source

Metric

0.00 ± 0.00

110.72 ± 36.57

76.69 ± 21.54

6.00 ± 34.60

LU

35.04 ± 14.77

66.56 ± 24.44

65.85 ± 15.76

6.63 ± 1.35

41.86 ± 10.01

42.74 ± 19.70

92.46 ± 22.53

111.48 ± 34.96

29.94 ± 4.71

30.31 ± 12.72

71.86 ± 13.61

79.56 ± 20.81

-0.16 ± 9.77

1.03 ± 20.01

-0.46 ± 28.71

-0.33 ± 34.28

LR

35.73 ± 13.58

68.32 ± 25.04

84.54 ± 11.82

26.75 ± 3.31

42.82 ± 8.76

46.05 ± 19.84

97.89 ± 22.01

115.15 ± 32.14

30.85 ± 5.16

32.15 ± 13.24

73.75 ± 15.70

82.89 ± 18.97

-0.72 ± 7.46

2.50 ± 21.23

-3.30 ± 27.86

0.40 ± 35.67

KNN

34.83 ± 14.52

67.08 ± 24.57

77.88 ± 16.74

4.67 ± 5.27

41.97 ± 10.43

42.29 ± 20.53

96.57 ± 22.30

110.80 ± 35.04

29.63 ± 4.78

29.95 ± 13.49

71.03 ± 15.51

77.90 ± 21.71

0.00 ± 8.90

-0.27 ± 19.80

22.70 ± 36.86

1.23 ± 32.33

NN

Table 2 – Quantitative results for Case-EBV.

34.82 ± 14.50

66.99 ± 24.57

72.23 ± 15.49

0.14 ± 0.03

43.12 ± 11.39

43.31 ± 22.18

95.73 ± 26.26

116.45 ± 42.88

27.97 ± 5.90

28.05 ± 15.16

66.44 ± 16.62

70.32 ± 26.73

11.53 ± 8.56

11.54 ± 18.40

30.90 ± 24.42

37.81 ± 33.30

SVR

41.76 ± 17.03

67.30 ± 24.44

69.18 ± 10.44

32.36 ± 5.52

30.89 ± 7.35

45.90 ± 21.21

108.16 ± 27.37

120.49 ± 34.05

22.70 ± 4.23

30.42 ± 15.29

81.56 ± 20.44

89.17 ± 22.32

-2.23 ± 8.89

9.15 ± 18.94

21.23 ± 29.32

-7.50 ± 33.12

RF



Figure 17 – Mean absolute error for Case-EBV. Mean absolute error (MAE) of the different methods (LU - Life Usage, LR - Linear Regression, KNN - K-Nearest Neighbors, NN - Neural Networks, SVR - Support Vector Regression and RF - Random Forests) using different data sources (I, II, III and IV).

As it can be observed from Tab. 2, the models with the worst performance belong to type I (models based on removal times). As shown in Fig. 17, the red bars models of type I, are consistently higher than the remaining bars, meaning these models exhibit the largest mean absolute errors. Interestingly, the baseline model (the life usage LU model) is surpassed in performance by the I-SVR. This happens because the SVR model is able to provide better estimates of future removal times. This result is further explored in Publication II and Publication V. Unsurprisingly, the models of type II (removal times and fault events) and type III (removal times and sensory signals) report a better performance than the models of type I (removal times). This reinforces the hypothesis that the inclusion of several sources of information into the data-driven models can lead to better forecasts of failure times. This can be observed from Fig. 17. The plot shows that the mean absolute error is lower when information about sensory signals and/or fault events is considered in the models. The error decrease is especially substantial in the case of sensory signals. The results confirm the intuition that it is easier to extract information from sensory signals than from fault events. Fault events are computed to be understood by humans and not by machines. In contrast, sensory signals are continuous variables by nature, a type of information that is easy to process and understand by computers. Also, it is important to note that fault events are usually not a highintegrity source of information as they are computed from heuristics provided by manufacturing experts. These heuristics can fail to accurately capture the reality

44

3.4 kalman filtering

Table 3 – Quantitative results for Case-EBV (No Filter / Filter). Metric

Data

LR

KNN

NN

SVR

RF

ME

II

-0.46 / 0.81

-3.30 / 2.10

22.70 / 22.10

30.90 / 32.31

21.23 / 22.39

III

1.03 / 1.63

2.50 / 3.15

-0.27 / 0.62

11.54 / 12.16

9.15 / 9.30

IV

-0.16 / 0.28

-0.72 / -0.34

0.00 / -0.24

11.53 / 11.96

-2.23 / -1.85

II

71.86 / 74.64

73.75 / 70.45

71.03 / 74.63

66.44 / 73.41

81.56 / 83.28

III

30.31 / 30.55

32.15 / 31.58

29.95 / 29.68

28.05 / 31.29

30.42 / 31.03

IV

29.94 / 21.69

30.85 / 22.34

29.63 / 21.13

27.97 / 20.95

22.70 / 15.18

II

65.85 / 13.86

84.54 / 15.97

77.88 / 16.36

72.23 / 16.50

69.18 / 13.17

III

66.56 / 6.34

68.32 / 7.68

67.08 / 6.15

66.99 / 6.80

67.30 / 7.25

IV

35.04 / 4.56

35.73 / 5.85

34.83 / 4.46

34.82 / 4.98

41.76 / 4.60

MAE

MAD

of damage of failure behavior and can result in false alarms or result in missed events. In regards to models of type IV, it can be seen that these models show a good performance in terms of mean absolute error (MAE) and root mean square error (RMSE), often comparable to the models of type III. The best results are attained by the the random forests with a mean absolute error of 22.70 days and a root mean square error of 30.89. These results show that the RF algorithm is capable of combining with success fault events and sensory signals. This can be explained by the facility that decision trees have to integrate information in the form of thresholds and if-else statements.

3.4

kalman filtering

This section presents the results of applying traditional machine learning algorithms combined with Kalman filtering to different data sources on the Case-EBV2 . Concretely, five machine learning approaches are tested which include Linear Regression (LR), K-Nearest Neighbors (KNN), Neural Networks (NN), Support Vector Regression (SVR) and Random Forests (RF). Models of type II concern maintenance data and fault events, models of type III concern maintenance data and sensory signals, and models of type IV concern all data. Quantitative results are presented in Tab. 3 for the metrics of Mean Error (ME), Mean Absolute Error (MAE) and Median Absolute Deviation (MAD). As can be observed from Fig. 18, the models of type II (fault events) and III (sensory signals) register a small decrease of performance when Kalman filtering is 2 Please refer to publication VI for more details.

45


Figure 18 – Mean Absolute Error for Case-EBV. Mean Absolute Error (MAE) of the different methods (LU - Life Usage, LR - Linear Regression, KNN - K-Nearest Neighbors, NN - Neural Networks, SVR - Support Vector Regression and RF - Random Forests) using different data sources (II, III and IV) with and without Kalman filter (KF).

applied, except the K-nearest neighbors. In contrast, for the models of type IV there is a significant increase of performance. In fact, the best model is the RF IV (all data) with Kalman filter. This model is able to attain an error of 15 days on average. These results suggest that Kalman filtering is most effective when all sources of data are considered. To better understand the results please consider Fig. 19, 20 and 21 which show the evolution of the Mean Absolute Error (MAE) over time for the models of type II, III and IV respectively. The plots contrast the error when the Kalman filter is not applied (plots on the left) and when it is (plots on the right). As shown, the performance of the methods is similar with or without Kalman filter except near the removal. Here, the models with Kalman filter present better results. This performance increase is especially noticeable for the models of type II (Fig. 19). This result is important as it shows that the application of the Kalman filter can have a positive impact upon the most important fraction of life of the equipment - the end of life. Please note that it is not very important (in absolute terms) to have a large error far from the removal but it is critical to have a large error near the removal. For example, imagine that there are 100 days left for the removal and the model predicts 80. This is not as critical as when there are 20 days left and the model predicts 40. In absolute terms the difference is the same - 20 days, but the impact of an early prediction is different from a late prediction. Accordingly, it is possible to argue that even for models of type II and III, the application of the Kalman filter is beneficial.

46

3.4 kalman filtering

(a) MAE over time (No Kalman Filter)

(b) MAE over time (Kalman Filter)

Figure 19 – Mean absolute error over time. Mean Absolute Error (MAE) over time for the models of type II with and without Kalman filter (KF).



Figure 20 – Mean absolute error over time. Mean Absolute Error (MAE) over time for the models of type III with and without Kalman filter (KF).



Figure 21 – Mean absolute error over time. Mean Absolute Error (MAE) over time for the models of type IV with and without Kalman filter (KF).

47


Figure 22 – Median Absolute Deviation for Case-EBV. Median Absolute Deviation (MAD) of the different methods (LU - Life Usage, LR - Linear Regression, KNN - K-Nearest Neighbors, NN - Neural Networks, SVR - Support Vector Regression and RF - Random Forests) using different data sources (II, III and IV) with and without Kalman filter (KF).

The most positive influence of the Kalman filter is perhaps on the precision of the models as it can be observed from Tab. 3 and from Fig. 22. The low Median Absolute Deviation (MAD) of the Kalman-based models in comparison to the other models without Kalman filter confirm the expectations that Kalman filtering can improve significantly the precision of the models and hence their interpretability. Please note that it is as important to have as accurate as precise models. Models on average should report good estimates but the model estimates should also show agreement among each other. In industrial terms, it is necessary to have precise predictions so that the human operator can decide more easily what the most reliable prediction is on which to base his/her final maintenance decision. Less precise predictions mean more scattered predictions which can confuse and make the job of the operator more difficult.

3.5

deep learning

This section presents the results of applying different deep learning models to different data sources on the Case-EBV3 . Concretely, six Recurrent Neural Network (RNN) models are tested which include the shallow and the deep versions of the simple Recurrent Neural Network (SRNN), Long-short Term Memory (LSTM) and Gated Recurrent Unit (GRU). Models of type II concern maintenance data and fault events, models of type III concern maintenance data and sensory signals, and 3 Please refer to publication VII and VIII for more details.

48

3.5 deep learning

models of type IV concern all data. Quantitative results are presented in Tab. 4 for the metrics of Mean Error (ME), Mean Absolute Error (MAE) and Median Absolute Deviation (MAD). Deep learning is applied to each different type of model (II, III and IV). Hereafter, the results are compared per type of model.

49

50

MAD

66.56/6.34

35.04/4.56

III

IV

29.94/21.69

IV

65.85/13.86

30.31/30.55

II

71.86/74.64

IV

II

1.03/1.63

-0.16/0.28

III

III

-0.46/0.81

II

ME

MAE

LR

Metric

35.73/5.85

68.32/7.68

84.54/15.97

30.85/22.34

32.15/31.58

73.75/70.45

-0.72/-0.34

2.50/3.15

-3.30/2.10

KNN

34.83/4.46

67.08/6.15

77.88/16.36

29.63/21.13

29.95/29.68

71.03/74.63

0.00/-0.24

-0.27/0.62

22.70/22.10

NN

34.82/4.98

66.99/6.80

72.23/16.50

27.97/20.95

28.05/31.29

66.44/73.41

11.53/11.96

11.54/12.16

30.90/32.31

SVR

41.76/4.60

67.30/7.25

69.18/13.17

22.70/15.18

30.42/31.03

81.56/83.28

-2.23/-1.85

9.15/9.30

21.23/22.39

RF

18.31

22.90

42.98

24.07

29.04

62.14

13.14

- 3.84

-38.54

SSRNN

Table 4 – Quantitative results for Case-EBV.

14.87

27.22

43.74

21.15

35.46

65.96

0.96

10.51

-49.84

SLSTM

13.61

27.70

41.22

21.69

34.99

68.55

-3.58

11.18

-47.61

SGRU

17.58

19.58

42.75

25.11

26.78

64.50

-3.25

-7.27

-50.21

DSRNN

19.22

33.64

49.65

28.95

37.28

71.45

-30.85

9.98

-42.30

DLSTM

13.61

23.11

43.87

22.43

30.79

65.84

-9.34

6.87

-40.16

DGRU


3.5 deep learning

Considering the models of type II (Fault events) when deep learning is applied, the results are positive. In respect to Mean Absolute Error (MAE) (Tab. 4), the deep learning models of type II present better results than the conventional models of type II. This is attested by the result of 62 days of the shallow SRNN, the best deep learning method of type II, against the result of 64 days of the SVR of type II. The improvement is small but positive. Considering the models of type III (Sensory signals) the results are also positive. Looking at Tab. 4 in respect to Mean Absolute Error (MAE) (Tab. 4), the deep learning models of type III are able to achieve a better performance than the models of type II and the machine learning models of type III. This is attested by the result of 26 and 29 days of the deep SRNN and the shallow SRNN respectively. The improvement is however rather small ( 2 days compared to the best result of the SVR with 28 days). Here, it is difficult to argue which method is the best, whether the traditional machine learning models or the deep learning models. The plots of Fig. 23 and Fig. 24 compare the forecasts against the actual values of different removals of type III (Sensory signals). It contrasts the deep learning models against the traditional machine learning methods. As shown, the deep learning exhibits more easy to interpret estimates than traditional machine learning. Considering the models of type IV (All data) the results are also promising. Looking at Tab. 4 in terms of Mean Error (ME), it can be seen that the models of type III exhibit a small bias (except the DLSTM model). In respect to Mean Absolute Error (MAE) (Tab. 4), the deep learning models of type IV show a considerable improvement over the models of type III even though it is not enough to beat the performance attained by the random forests model with Kalman filter. The plots of Fig. 27 show the evolution of the MAE over time. As shown, the results are exceptionally good near the removal. The precision of the deep learning models is also good (Tab. 4). Overall, we can conclude that deep learning models present a promising alternative to traditional machine learning models, especially for prediction near the potential failure.

51


(a) Sample Removal #1 (No Kalman Filter)

(b) Sample Removal #1 (Deep Learning)

(c) Sample Removal #2 (No Kalman Filter) (d) Sample Removal #2 (Deep Learning)

(e) Sample Removal #3 (No Kalman Filter) (f) Sample Removal #3 (Deep Learning) Figure 23 – Predictions vs Actuals for sample removals. Predicted values versus actual values for different removal samples of traditional Machine Learning vs Deep Learning (Models of type III).

52

3.5 deep learning

(a) Sample Removal #4 (No Kalman Filter)


(c) Sample Removal #5 (No Kalman Filter) (d) Sample Removal #5 (Deep Learning)

(e) Sample Removal #6 (No Kalman Filter) (f) Sample Removal #6 (Deep Learning) Figure 24 – Predictions vs Actuals for sample removals (continued). Predicted values versus actual values for different removal samples of traditional Machine Learning vs Deep Learning (Models of type III).

53


(a) Sample Removal #1 (Kalman Filter)


(c) Sample Removal #2 (Kalman Filter)

(d) Sample Removal #2 (Deep Learning)

(e) Sample Removal #3 (Kalman Filter)

(f) Sample Removal #3 (Deep Learning)

Figure 25 – Predictions vs Actuals for sample removals. Predicted values versus actual values for different removal samples of traditional Machine Learning (KF) vs Deep Learning (Models of type III).

54

3.5 deep learning

(a) Sample Removal #4 (Kalman Filter)


(c) Sample Removal #5 (Kalman Filter)

(d) Sample Removal #5 (Deep Learning)

(e) Sample Removal #6 (Kalman Filter)

(f) Sample Removal #6 (Deep Learning)

Figure 26 – Predictions vs Actuals for sample removals (continued). Predicted values versus actual values for different removal samples of traditional Machine Learning (KF) vs Deep Learning (Models of type III).

55



(b) MAE over time (No Kalman Filter) Zoom

(c) MAE over time (Kalman Filter)

(d) MAE over time (Kalman Filter) - Zoom

(e) MAE over time (Deep Learning)

(f) MAE over time (Deep Learning) Zoom

Figure 27 – Mean absolute error over time. Mean Absolute Error (MAE) over time for the models of type IV.

56

4 CASE STUDY OF A GAS TURBINE ENGINE

4.1

Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

57

4.2

Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

58

4.3


58

4.4


60

4.5

Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

61

T

his chapter resumes the work on a case that deals with the major overhauls of a modern gas turbine engine from commercial aviation. This work is also based on real data. This chapter documents the most important results of this case. Similarly to the previous case, it starts with the industrial background in Section 4.1, the data in Section 4.2 and documents the major findings in Section 4.3 to 4.5.

4.1

background

The jet engine (gas turbine engine), the prime power source of an aircraft, is one of the most valuable and critical systems. Jet engines provide primary power (thrust) and secondary power (flight control, air management, electrical power) to the aircraft. They contribute to a large extent to the overall aircraft value and are a competitive edge for both aircraft operators and manufacturers. Periodically, the jet engine is submitted to an overhaul. The term “engine overhaul” means the process of maintaining and restoring the engine and its subsystems to serviceable conditions. The overhaul is a more thorough and comprehensive maintenance intervention than a standard inspection or repair. The process involves the 1) disassembly of the engine, followed by an 2) inspection to detect any part that may be damaged or defective, 3) the repair or replacement of these parts, and

57


finally 4) the reassembly, testing and trial-run to return the system back to its full operating level. The serviceable life of a jet engine involves a great deal of uncertainty. There are many factors that come into play when determining the engine life. The metal components are subject to heating and cooling cycles due to the combustion as well as the cool-down period. These thermal fatigue cycles contribute to degrade the engine. Metal-to-metal wear also take a toll on the engine. Adding to this are the operational conditions, such as interval of use and environmental conditions. All of these factors play a role in the reliability of the engine. To account for all these factors, overhauls are scheduled at fixed interval times or when a problem is discovered by the monitoring services or in a hot-section inspection.

4.2

data

The data describe the evolution of performance of a set of commercial jet engines between approximately ten years in different intervals of time for each engine. Formally, the data consists of a cross-sectional time series in the sense that for each engine, there is a multi-variate series that represents the temporal progression of the engine sensor signals. These signals are measured at three different flight phases: 1 measurement is taken at take-off, another at climb and 3 other at cruise. Overall, we analyze around 3GB of raw data. In addition to performance signals, there is also information about the engine overhauls. An engine overhaul can be defined as a comprehensive inspection that involves removing and disassembling the engine, testing all its sub-systems, cleaning and replacing parts as needed and then reassembling the engine. The dataset includes fixed-interval and conditionbased overhauls.

4.3

classical machine learning

This section presents the results of applying traditional machine learning algorithms to the Case-Engine. Concretely, the same five machine learning approaches of case-EBV are tested, namely Linear Regression (LR), K-Nearest Neighbors (KNN), Neural Networks (NN), Support Vector Regression (SVR) and Random Forests (RF). Two different classes of data are tested. Models of type I concern only maintenance data and models of type II concern maintenance data and sensory signals. Quantitative results are presented in Tab. 6 for the metrics of Mean Error (ME), Median Absolute Error (MdAE), Root Mean Square Error (RMSE), Relative Weighted

58


Figure 28 – Median absolute error over time of machine learning methods.

Accuracy (RWA), Median Absolute Deviation (MAD) and Computational Performance (Time). As shown in Tab. 6, the plot confirms in a visual way that the models of type II (Overhaul data and sensory signals) have a better performance than the models of type I (Overhaul data). The improvement is quite significant, reaching the value of 176 days for the best performing method, the SVR. Of note is the fact that the machine learning methods have approximately the same performance. This hints at a good configuration of the models. Despite having different operational principles, the models are able to reach approximately the same output. These results suggest that the studied data-driven methods may be able to reach similar performance when well configured. Some of the data-driven methods of type I are able to beat the performance of the baseline method (life usage). It can be seen that the SVR-I has a lower mean absolute error than the life usage model. Despite being a small improvement it is an important incentive for the use of data-driven methods on maintenance data in detriment of the traditional life usage model. Regarding the models of type II, it is arguable what is the best choice of model for this kind of application. The chart of Fig. 28 shows the evolution of the MdAE metric over the lifetime of the equipment. As shown, the model with the lowest error near the end of life is actually the neural networks model. Here it can be seen that the the support vector regression, the best performing model in average terms, actually has the worse performance near the overhaul. This can be explained by the fact that this model has some tendency to be flat and hence to output predictions around the median value of the dataset of overhaul times. This fact

59


Figure 29 – Mean absolute error over time of machine learning methods.

explains why the model has a worse performance near the overhaul than in the mid life of the equipment. Finally, a general drawback of the models of type II over the models of type I is computational performance. Since the models of type I handle only maintenance data their computation is quicker. In terms of computational performance, the models of type II show substantial differences. The support vector regression is the method which takes the longest to run and, not surprisingly, the linear regression is the fastest model of type II.

4.4

kalman filtering

This section presents the results of applying machine learning algorithms combined with Kalman filtering to the Case-Engine. Models of type I concern only maintenance data, models of type II concern maintenance data and sensory signals and models of type III result from the application of Kalman filtering to the end estimates of the models of type II. Quantitative results are presented in Tab. 7 for the metrics of Mean Error (ME), Median Absolute Error (MdAE), Root Mean Square Error (RMSE), Relative Weighted Accuracy (RWA), Median Absolute Deviation (MAD) and Computational Performance (Time). Tab. 7 compares the models in terms of MdAE. As shown, the models of type III (Overhaul data and sensory signals combined with Kalman filtering) have a better performance than the models of type I (Overhaul data) and type II (Overhaul data

60

4.5 deep learning

and sensory signals). The improvement is especially significant for the support vector regression with this method reaching the average error of 150 days. The chart of Fig. 29 shows the evolution of the MdAE metric over the lifetime of the equipment. As shown and as expected, with the application of the Kalman filter the methods exhibit a decreasing error towards the end of the life of the equipment. Here, it is difficult to select a winner method as the methods exhibit similar average error over the life of the equipment. It can be said that the support vector regression is the best model but the difference from the remaining methods is not dramatic. Also and as a drawback of the support vector regression, from 100% to 80% of the life of the equipment, the method shows the worse performance. Again, the precision of the models of type III (Overhaul data and sensory signals combined with Kalman filter) is higher than the precision of the models of type II (Overhaul data and sensory signals). The models exhibit different precision with the K-nearest neighbors having the lowest average error. Finally, and in regards to computational performance, the performance of the models does not deteriorate significantly with the application of the Kalman filter.

4.5

deep learning

This section presents the results of applying machine learning algorithms combined with Kalman filtering to the Case-Engine. Models of type I concern only maintenance data, models of type II concern maintenance data and sensory signals, models of type III result from the application of Kalman filtering to the end estimates of the models of type II and models of type IV represent deep learning methods. The quantitative results are presented in Tab. 5 for the metrics of Mean Error (ME), Median Absolute Error (MdAE), Root Mean Square Error (RMSE), Relative Weighted Accuracy (RWA), Median Absolute Deviation (MAD) and Computational Performance (Time). In general, the models of type IV exhibit considerable bias, with the exception of the DLSTM. This type of bias is especially severe as it signals a systematic over-prediction (prediction > actual). In terms of MdAE, the best model of type III (Overhaul data and sensory signals combined with Kalman filtering), the SVR, has a worse performance than the best models of type IV (Deep Learning), namely the DSRNN and the DLSTM. This result is interesting in several ways. First, the DSRNN stands out as a best performing model despite its relative simplicity in comparison to the DLSTM and the DGRU. Also surprising is the low computational performance that is necessary

61


Table 5 – Quantitative results for Case-Engine. Metric ME (days)

LU I II

38.67 ± 46.45

III MdAE (days)

I II

I II

241.38 ± 22.00

300.80 ± 29.43

III RWA (%)

I II

48.62 ± 6.94

III MAD (days)

I

NA

III I II III Metric

SVR

RF

2.97 ± 31.60

6.34 ± 57.13

-60.50 ± 41.33

-20.28 ± 40.67

112.26 ± 20.65

121.24 ± 56.54

80.31 ± 28.95

112.86 ± 24.43

234.03 ± 24.05

237.33 ± 27.75

236.68 ± 25.84

229.52 ± 25.77

170.05 ± 14.27

169.36 ± 14.03

181.17 ± 34.85

152.12 ± 9.87

177.05 ± 11.39

183.74 ± 19.16

136.62 ± 83.95

180.72 ± 66.52

150.34 ± 14.28

176.42 ± 7.36

297.76 ± 34.06

300.70 ± 35.21

299.55 ± 35.64

303.55 ± 36.43

164.10 ± 9.26

177.30 ± 13.77

174.70 ± 14.29

184.48 ± 35.26

199.87 ± 15.93 50.88 ± 6.79

40.63 ± 15.64

202.22 ± 19.84

201.93 ± 64.68

156.61 ± 17.95 230.79 ± 25.06

184.47 ± 16.49 173.38 ± 19.21 297.49 ± 35.39

203.64 ± 8.19

204.60 ± 17.14 160.75 ± 8.87

50.39 ± 6.92

50.04 ± 5.95

51.19 ± 6.32

51.16 ± 6.26

36.50 ± 23.14

38.35 ± 18.98

39.93 ± 24.03

38.12 ± 22.64

40.53 ± 16.36

39.72 ± 10.99

36.71 ± 19.49

41.84 ± 15.92

NA

NA

NA

NA

NA

52.06 ± 6.05

55.31 ± 5.62

57.67 ± 3.69

71.43 ± 9.52

54.31 ± 5.53

9.19 ± 1.36

0.48 ± 0.07

16.30 ± 1.82

17.43 ± 2.78

1.63 ± 0.20

5.80 ± 3.23

25.55 ± 3.80 0.24 ± 0.13

NN

149.79 ± 24.03

36.72 ± 22.35

II

Time (s)

0.01 ± 46.14

KNN

149.30 ± 11.92 105.89 ± 13.55

III RMSE (days)

LR

0.69 ± 0.08

25.59 ± 4.89 4.92 ± 1.46

19.29 ± 2.01

135.92 ± 116.78

35.62 ± 5.24

25.80 ± 4.79 49.22 ± 6.45

129.55 ± 67.52

2062.94 ± 101.37

2248.88 ± 90.59

226.61 ± 2.86

337.84 ± 12.91

SSRNN

SLSTM

SGRU

DSRNN

DSLSTM

DGRU

ME (days)

IV

207.92 ± 325.71

155.51 ± 249.91

195.53 ± 308.61

116.98 ± 117.94

54.52 ± 145.46

177.69 ± 260.42

MdAE (days)

IV

242.75 ± 273.19

201.19 ± 184.06

231.33 ± 240.80

116.98 ± 117.94

123.60 ± 81.29

217.45 ± 189.76

RMSE (days)

IV

300.21 ± 256.90

244.48 ± 176.61

262.28 ± 227.48

148.18 ± 128.04

151.49 ± 102.12

251.85 ± 177.37

RWA (%)

IV

16.77 ± 14.38

29.54 ± 17.52

26.23 ± 19.40

39.11 ± 34.91

42.45 ± 29.08

29.74 ± 17.33

MAD (days)

IV

65.29 ± 78.34

50.99 ± 68.44

54.36 ± 63.01

46.14 ± 66.57

71.98 ± 69.28

44.73 ± 64.81

Time (s)

IV

44.81

131.64

85.08

96.12

261.85

217.35

62

4.5 deep learning

to attain the deep learning results. Here, the SVM falls behind with a much more computationally intensive process. Another result of these experiments is the better performance of the deep methods versus the shallow ones. This hints that deep architectures are needed to solve these kind of problems.

63

64

Time (s)

MAD (days)

RWA (%)

I

RMSE (days)

II

I

II

I

II

I

II

I II

MdAE (days)

II

0.24 ± 0.13

NA

48.62 ± 6.94

300.80 ± 29.43

241.38 ± 22.00

38.67 ± 46.45

ME (days)

I

LU

Metric

0.69 ± 0.08

9.19 ± 1.36

52.06 ± 6.05

NA

40.63 ± 15.64

50.88 ± 6.79

199.87 ± 15.93

297.76 ± 34.06

234.03 ± 24.05 177.05 ± 11.39

149.30 ± 11.92

0.01 ± 46.14

LR

4.92 ± 1.46

0.48 ± 0.07

55.31 ± 5.62

NA

40.53 ± 16.36

50.39 ± 6.92

202.22 ± 19.84

300.70 ± 35.21

237.33 ± 27.75 183.74 ± 19.16

149.79 ± 24.03

2.97 ± 31.60

KNN

135.92 ± 116.78

16.30 ± 1.82

57.67 ± 3.69

NA

39.72 ± 10.99

50.04 ± 5.95

201.93 ± 64.68

299.55 ± 35.64

236.68 ± 25.84 180.72 ± 66.52

136.62 ± 83.95

6.34 ± 57.13

NN

Table 6 – Quantitative results for Case-Engine.

2062.94 ± 101.37

17.43 ± 2.78

71.43 ± 9.52

NA

36.71 ± 19.49

51.19 ± 6.32

203.64 ± 8.19

303.55 ± 36.43

229.52 ± 25.77 176.42 ± 7.36

150.34 ± 14.28

-60.50 ± 41.33

SVR

226.61 ± 2.86

49.22 ± 6.45

54.31 ± 5.53

NA

41.84 ± 15.92

51.16 ± 6.26

204.60 ± 17.14

297.49 ± 35.39

230.79 ± 25.06 184.47 ± 16.49

156.61 ± 17.95

-20.28 ± 40.67

RF


65

Time (s)

MAD (days)

RWA (%)

RMSE (days)

MdAE (days)

III

II

I

III

II

I

III

II

I

III

II

I

III

II

I

III

0.24 ± 0.13

NA

48.62 ± 6.94

300.80 ± 29.43

241.38 ± 22.00

38.67 ± 46.45

ME (days)

I II

LU

Metric

1.63 ± 0.20

0.69 ± 0.08

9.19 ± 1.36

25.55 ± 3.80

52.06 ± 6.05

NA

36.72 ± 22.35

40.63 ± 15.64

50.88 ± 6.79

164.10 ± 9.26

199.87 ± 15.93

297.76 ± 34.06

170.05 ± 14.27

177.05 ± 11.39

234.03 ± 24.05

105.89 ± 13.55

0.01 ± 46.14 149.30 ± 11.92

LR

5.80 ± 3.23

4.92 ± 1.46

0.48 ± 0.07

25.59 ± 4.89

55.31 ± 5.62

NA

36.50 ± 23.14

40.53 ± 16.36

50.39 ± 6.92

177.30 ± 13.77

202.22 ± 19.84

300.70 ± 35.21

169.36 ± 14.03

183.74 ± 19.16

237.33 ± 27.75

112.26 ± 20.65

2.97 ± 31.60 149.79 ± 24.03

KNN

129.55 ± 67.52

135.92 ± 116.78

16.30 ± 1.82

19.29 ± 2.01

57.67 ± 3.69

NA

38.35 ± 18.98

39.72 ± 10.99

50.04 ± 5.95

174.70 ± 14.29

201.93 ± 64.68

299.55 ± 35.64

181.17 ± 34.85

180.72 ± 66.52

236.68 ± 25.84

121.24 ± 56.54

6.34 ± 57.13 136.62 ± 83.95

NN

Table 7 – Quantitative results for Case-Engine.

2248.88 ± 90.59

2062.94 ± 101.37

17.43 ± 2.78

35.62 ± 5.24

71.43 ± 9.52

NA

39.93 ± 24.03

36.71 ± 19.49

51.19 ± 6.32

184.48 ± 35.26

203.64 ± 8.19

303.55 ± 36.43

152.12 ± 9.87

176.42 ± 7.36

229.52 ± 25.77

80.31 ± 28.95

-60.50 ± 41.33 150.34 ± 14.28

SVR

337.84 ± 12.91

226.61 ± 2.86

49.22 ± 6.45

25.80 ± 4.79

54.31 ± 5.53

NA

38.12 ± 22.64

41.84 ± 15.92

51.16 ± 6.26

160.75 ± 8.87

204.60 ± 17.14

297.49 ± 35.39

173.38 ± 19.21

230.79 ± 25.06 184.47 ± 16.49

112.86 ± 24.43

-20.28 ± 40.67 156.61 ± 17.95

RF

4.5 deep learning

5 CMAPSS

5.1

Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

68

5.2

Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

70

5.3

Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

72

5.3.1

Regime detection and baselining . . . . . . . . . . . . . . . . .

72

5.3.2

Noise Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . .

74

5.3.3

Elbow Detection . . . . . . . . . . . . . . . . . . . . . . . . . .

75

5.3.4

Elbow detection and RUL Prediction . . . . . . . . . . . . . .

76

B

enchmarking datasets are important to provide a basis for comparison. In prognostics, a relevant contribution to this effect are the CMAPSS datasets1 . These datasets are the result of a compilation effort by NASA’s Prognostic Center of Excellence (CoE) with the goal of promoting research in the field of prognostics and health management. The CMAPSS datasets represent challenging degradation scenarios where advanced topics of prognostics can be explored, including sensor noise, effects of different operating conditions and presence of multiple fault modes. This chapter provides an introduction to the CMAPSS case. This work has been done in collaboration with NASA Ames Research Center and its goal is to provide a deep learning solution for RUL estimation. This chapter documents the most important results of this case. It starts in Section 5.1 with a background on deep learning, in Section 5.2 briefly explains the data and documents the major findings in Section 5.3.

1 The CMAPSS datasets are available at: http://ti.arc.nasa.gov/tech/dash/pcoe/prognostic-datarepository/

67

cmapss

5.1

background

Despite the early success of neural networks during the 1960s and the beginning of the 1970s, the ascent of methods such as random forests and support vector machines coupled with the failure of backpropagation during the late 90s made this period a difficult one for research in the field. At the time, there was some research on neural networks but the community was small and interest in the topic was relatively low for several years, as seen by the number of citation counts during that phase. Overall, there was a widespread negative sentiment towards neural networks as they were seen as a bit of a “dead end”. While theoretically sound, in practice, the techniques lacked applicability, as it was not possible to train a network with more than a couple of layers in most cases. This considerably hindered progress in the field. Despite the slow pace of research, a group of researchers from the University of Toronto, Geoffrey Hinton, Simon Osindero, and Yee-Whye Teh would publish in 2006 a breakthrough paper that would bring back interest on neural nets. The paper was titled “A fast learning algorithm for deep belief nets” and introduced a new algorithm to train neural networks more effectively. The idea at its core was that multi-layer nets could be trained more effectively by initializing the set of weights with a generalized backpropagation algorithm as opposed to random initialization. This was an important advance. Even though the proposed approach was deeply rooted in previous work, the interest on what is called today “deep learning” can be considered to have started with this fundamental paper. The basic idea of the work was to first pre-train each layer of a deep belief network (DBN) with unsupervised training, initializing the weights in a way better than random assignment and then complete the training of the network with supervised learning. Each layer of the DBN starts out as a Restricted Boltzmann Machine (RBM), a Boltzmann Machine without connections within the hidden and visible layers. Here an RBM is an energy-based model for unsupervised learning that consists of one layer of visible units which represents the input data and one layer of hidden units. The proposed pre-training algorithm, Contrastive Divergence (CD), had been previously introduced in 2002 in “Training Products of Experts by Minimizing Contrastive Divergence”, also by Hinton. Using the CD algorithm, the layers of DBNs can be trained in a greedy way by adjusting the weights until they are able to approximate the input from the previous RBM. This pre-training process requires no labels and the objective is to put the weights in the appropriate range for further optimization. After this initial process and because the selection of weights

68

5.1 background

is meant to approximate the input data, the network performs much better in the subsequent supervised learning stages. Following this paper came a succession of important contributions to the field. For instance in 2007 in “Greedy Layer-Wise Training of Deep Networks” discussed how deep networks were more capable of generating complex representations of data and hence solving complex problems than shallow methods such as onelayers nets or support vector machines. Also importantly, the authors argued in favor of pre-training algorithms as a means to improve the efficiency of the training process and to generate more useful hierarchical intermediate representations of data. The use of RBM pre-training is not the central aspect here as the pretraining of other layers also proved to work well in practice. The key point here was actually to have many layers of increasingly complex representations against the traditional approach of having to hand-design features with signal processing techniques or general feature extraction methods. Importantly, these steps were essential to disprove the long-held belief that neural networks could not be trained well. Yoshua Bengio and Yann Le Cun made this clear in “Scaling Algorithms Towards AI”: “Until recently, many believed that training deep architectures was too difficult an optimization problem. However, at least two different approaches have worked well in training such architectures: simple gradient descent applied to convolutional networks [LeCun et al., 1989, LeCun et al., 1998] (for signals and images), and more recently, layer-by-layer unsupervised learning followed by gradient descent [Hinton et al., 2006, Bengio et al., 2007, Ranzato et al., 2006]. Research on deep architectures is in its infancy, and better learning algorithms for deep architectures remain to be discovered. Taking a larger perspective on the objective of discovering learning principles that can lead to AI has been a guiding perspective of this work. We hope to have helped inspire others to seek a solution to the problem of scaling machine learning towards AI.” These advances were important to the emergence of deep learning and to counterspread the belief that neural networks could not be trained effectively. However, perhaps the most important step towards efficient neural network training was the increase of the available computational power. The limitations of weak CPU parallelism would be replaced during the 1990s and beginnings of the 2000s with the parallel computing resources of GPUs. A student in Hinton’s lab, Dahl would soon prove how important computational power was when he discovered how to train and simulate neural networks using his computer’s graphics card. At the time GPUs were roughly 70 times faster than CPUs. Resorting to GPUs made possible to turn weeks of work into days

69

cmapss

and days into a couple of hours. Despite the relative importance of algorithmic developments, this result was perhaps even more important as it paved the way for novel advancements based on brute force approach alone coupled with parallelized computations. As the age of big data was starting and big companies such as Microsoft and Google were trying to find new ways to deal with the massive amount of data collected, the movement of deep learning that began in 2006 would timidly but rapidly made it into the industry. A series of scientific contributions followed. Importantly, it soon became clear that unsupervised pre-training was not essential to deep learning. Purely supervised training with the correct starting weight scales and activation function could also yield promising results. So, the central question arises: why did purely supervised learning with backpropagation did work poorly in the past? Hinton summarized the findings in four essential points: 1) the working datasets were thousands of times too small, 2) computers were million of times too slow, 3) weights were not being properly initialized and 4) the wrong type of non-linear activation was being used. To summarize, in the past, the popularity of artificial neural networks was low due to poor performance and scalability. However, thanks to the advancement of computing power (GPUs), availability of big data, and the development of techniques for training deep neural networks on large dataset, interest on neural networks and deep learning was revived.

5.2

data

The CMAPSS data consist of run-to-failure time-series of a turbofan simulation environment [22] that represents an engine model of the 90,000lb thrust class. Fig. 30 shows a simplified diagram of the engine simulated in CMAPSS. Failure is induced in diverse sections of the engine system under different operational and environmental (various altitudes and temperatures) conditions. The aim is to simulate continuous degradation trends that result in functional failure. Overall, the simulation records 21 health indicators, which are listed in Tab. 9, that reflect the degradation of different subsystems of the engine. The characteristics of the datasets are described in Tab. 9. As shown, dataset #1 is the simplest dataset with one operating condition and one fault mode. Datasets #2 and #4 are the most challenging datasets with six operating conditions and one/two fault modes. Dataset #3 presents two fault modes and one operating condition.

70

5.2 data

Table 8 – 21 measurement variables of CMAPSS datasets.

Measurement variables

Description

Units

T2

Total temperature at fan inlet

ºR

T24

Total temperature at fan inlet

ºR

T30

Total temperature at HPC outlet

ºR

T50

Total temperature at LPT outlet

ºR

P2

Pressure at fan inlet

psia

P15

Total pressure in bypass-duct

psia

P30

Total pressure at HPC outlet

psia

Nf

Physical fan speed

rpm

Nc

Physical core speed

rpm

Epr

Engine pressure ratio (P50/P2)

-

Ps30

Static pressure at HPC outlet

psia

Phi

Ratio of fuel flow to Ps30

pps/psi

NRf

Corrected fan speed

rpm

NRc

Corrected core speed

rpm

BPR

Bypass Ratio

-

FarB

Burner fuel-air ratio

-

HtBleed

Bleed Enthalpy

-

Nf dmd

Demanded fan speed

rpm

PCNfR dmd

Demanded corrected fan speed

rpm

W31

HPT coolant bleed

lbm/s

W32

LPT coolant bleed

lbm/s

Figure 30 – Simplified schematics of CMAPSS simulated engine (Source: [22]).

71

cmapss

Table 9 – Description of the turbofan degradation datasets available from NASA repository (CMAPSS).

Dataset

Nº of fault modes

Nº of conditions

Nº of train units

Nº of test units

#1

1

1

100

100

#2

1

6

260

259

#3

2

1

100

100

#4

2

6

249

248

5.3

results

Deep learning represents a promising alternative to analyze and simulate mechanical big data for RUL prediction. Importantly, neural networks have the ability to construct increasingly complex knowledge representations in an automated way, without the need for direct human intervention - they allow for effortless knowledge discovery which otherwise would require a tedious process of feature engineering. This is especially important for the field of prognostics, where the quality of RUL estimates is severely dependent upon signal processing techniques that perform denoising, change detection, selection or compression of health monitoring signals. In order to reach accurate prognostics these stages tend to take a lot of time and entail a high degree of effort and expertise. In this section the results of applying deep learning methods to each stage of the prognostics endeavor are presented.

5.3.1

Regime detection and baselining

Machine operating regimes are characterized by repeatable and relatively stable operational conditions, which help to compare different equipment as well as follow up on units operating under similar conditions. Usually, when we refer to the operating regimes of a commercial flight engine we talk about the typical speed–altitude-operating regimes of various classes of civil aircraft. However, an operating regime can be characterized by any set of operating conditions and/or flight maneuvers. It is especially important to identify operating regimes when one is developing local models for each regime or a global model that requires the initial baselining of the data. Fig. 31a shows an original signal with all signals combined. In this picture the trend of degradation is not as clear as in in other pictures of singular signals such as in Fig. 31d.

72

5.3 results

(a) Original signal

(b) Signal 1

(c) Signal 2

(d) Signal 3

Figure 31 – Regime detection and baselining.

73

cmapss

Traditionally, clustering algorithms such as K-means are used to perform this characterization process. Here, experience has shown that usually a small number of candidate parameters is sufficient to achieve an acceptable regime dis-aggregation. However, this process requires selecting and knowing in advance the number of operational conditions present in the data. To address the generalization issue, this thesis proposes an unsupervised neural network solution to the regime identification problem. Self-organizing maps are compared to other clustering solutions such as DBScan or k-means. This solution is found to not require the setting of the number of regimes and to act quite efficiently even in the presence of uninteresting features.

5.3.2

Noise Filtering

Signal processing is a task of central importance to prognostics. Health monitoring variables convey important informative signals that represent the degradation information but also contain noise from sensor sources, varying environmental conditions, and other inspection factors. A deep learning solution is proposed for processing a sequence of damage signals from a health monitoring system to reduce the noise and enhance the damage estimate. Despite having been reported in many filtering and denoising applications there is not much literature on the use of wavelet neural networks in prognostics. It is hypothesized that wavelet neural networks can be of utility for the denoising of health monitoring signals. Specifically, a WNN is proposed where the input is a normalized version of the x coordinates of the signal (time coordinate) and the output is the original raw signal. As the output of a wavelet neural network is a linear weighted combination of wavelet basis functions it is easy for the network to learn how to denoise the signals with minimal mean square error. A direct connection is added to the traditional WNN model to account for linear portions of the signal. Different wavelet functions are tested such as exponential, morlet, mexican hat. The model is compared against a wide range of other methods ranging from the simple moving average to other more standard filtering techniques. In this paper, Differential Evolution (DE) is used for the optimization of the WNN architecture. The proposed approach shows near perfect results when the noise is additive and less when the noise is proportional. Results are validated on synthetic data resembling the datasets from the NASA Ames CoE (see Fig. 32).

74

5.3 results

Figure 32 – Results of denoising using Wavelet neural networks.

5.3.3

Elbow Detection

Fault forecasting is an important consideration in prognostics and health management. A little explored way for detecting faults ahead of time is to use trend shift detection to identify when the equipment starts to change away from its nominal state. This paper presents a trend shift detection approach that employs a deep learning approach to estimate the passage of nominal to faulty condition of a given technical system. This moment is designated as the “elbow point” given the shape of the intersection between the linear and exponential line. Specifically, different kinds of recurrent neural networks are compared in their ability to detect the the passage of linear degradation to a state of exponential degradation. Uni-directional and bi-directional versions of Long Short Term Memory (LSTM) Networks, Gated Recurrent Units (GRU) network and simple recurrent neural networks (SRNN) are tested. The goal of the models is to classify a sequence under the nominal (positive label) or faulty condition (negative label). The models’ performance is evaluated according to the model which shows superior performance in detecting zero false positives. The Area Under the ROC Curve (AUC), error distribution and other important metrics are also evaluated with the Long Short Term Memory Network showing the best performance. The baseline model is the sliding window z-score statistical test which has shown good performance in previous applications. The data used to test the methods are synthetic, generated to mimic the CMAPSS dataset from NASA Ames CoE. The level of noise, and curve fitting parameters are estimated from this dataset using standard procedures. The

75

cmapss

(a) RUL Estimation (No Elbow detection)

(b) RUL Estimation (Elbow Detection)

Figure 33 – Example of RUL Estimation for a certain removal.

results suggest that the deep learning models can outperform the baseline and reach a mean absolute error of around 10 days with zero false positives vs the 20 days of the z-score method. Visual comparison on the actual CMAPSS data also attest the superiority of the LSTM method. The unidirectional methods are proven to be more effective than bi-directional methods. Despite the use of two time directions, the bidirectional networks do not seem to beat the limitations of each single uni-directional network. Interestingly, the unidirectional SRNN has a good performance compared to the other methods. This result suggests that even a simple deep network can detect the elbow easily.

5.3.4

Elbow detection and RUL Prediction

RUL prediction is the most import step of prognostics. Our thesis regarding RUL prediction was that a prognostics system with an efficient elbow point model could perform RUL prediction more accurately. To test this hypothesis we integrated out best stand-alone elbow point detection model, the LSTM elbow point detection model, within a prognostics system and compared against a prognostics system without elbow point detection model. The integration yielded a significant overall performance increase of around 30% in terms of absolute error in RUL estimation on the C-MAPSS dataset. As an example Fig. 33 compares an engine degradation being captured by a prognostics system that takes into account elbow detection vs a system that ignores this perspective. Publication X adds more detail to this experience.

76

6 CONCLUSION

6.1

The Importance of Case Studies to Prognostics . . . . . . . . . . . . .

78

6.2

Technical Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . .

78

6.3

Economic Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . .

79

6.4

Social Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

80

T

he value of predictive maintenance is well known by the major players in the aeronautics industry. However, the challenges of big data in commercial aviation are still daunting. Leveraging the opportunities offered by big data and real-time analytics is one of the most important topics within the industry. More and more, the airlines, manufacturers and maintenance providers are investing in predictive techniques to boost their maintenance practices and use advanced prognostics and health management technology. The goal is to minimize costs, convert unscheduled maintenance to expected events and reduce asset downtime. Here, machine learning methods are emerging for classification and prediction purposes. These methods are able to learn computational models that reason about the behavior of the equipment without relying on hard-coded rules or explicitly programmed instructions while opening space for new and promising developments. In this thesis it is the hope of the author to have contributed to a broader understanding of machine learning and deep learning methods in the field of prognostics and predictive maintenance. This chapter discusses the major difficulties faced during this process and the importance of the contribution. It starts in Section 6.1 by analyzing the importance of case studies for prognostics. It then delves in the technical, economical and social dimensions of the problem of prognostics in Section 6.2 to 6.4.

77

conclusion

6.1

the importance of case studies to prognostics

Case studies have an important place in prognostics research. They enable the study of how a certain phenomenon is influenced by the context it is in. They allow having detailed information about a particular scenario enabling a greater in depth analysis of a particular situation. The purpose of using case studies in prognostics is to research the uniqueness of a real-world industrial case. The goal here is to understand the peculiarity of the case in order to extrapolate conclusions to other situations. Case studies are particular but they can be helpful in showing relationships between the studied variables and models. Even though they tend to be highly context specific, case studies can serve as a basis for analytic generalization, meaning some of their findings can be extrapolated to a broader context. Also, cases of real damage/failure are important as their results often stimulate further research and investment. In aeronautics, case studies can be difficult to carry out and analyze. Even though case studies are the way of bringing reality into prognostics practice, it is not always easy to investigate the failure and damage behavior of certain aeronautical components such as engines and other critical parts. Access to field data is usually a significant barrier to the research development. Data can be protected by confidentiality agreements as well as privacy laws. Also, run-to-failure data are often difficult to come by, making the analysis of actual failure a challenge. Despite the limitations, case studies are of paramount importance to validate and evaluate the research done in prognostics as they are the closest to reality. A brief discussion of the results of each case study and their socio-economical and technical implications is provided in this chapter.

6.2

technical dimension

There are several technical challenges in developing prognostics models for maintenance. Data-driven methods rely forcefully on large amounts of data describing equipment degradation progression. This requirement is a major challenge since it is often difficult to obtain data of failure progression. Critical systems such as commercial aircraft are not allowed to run until failure due to obvious safety reasons. It is possible to experiment in laboratory but not in the field, under real world conditions. Most often than not, it is only possible to have access to maintenance data, data which results from expert yet inaccurate recommendation. Since the quality of the data greatly determines the outcome of data-driven methods, having data that depend on human judgment complicates the analysis considerably. To start,

78

6.3 economic dimension

Figure 34 – Sample Removal # 542 (Deep Learning).

there may be maintenance actions that are registered in wrong dates. For instance, consider Fig. 34 which shows the predictions of one removal sample of the CaseEBV. Here, near the removal, 4 days before the removal, the model DSRNN (dark blue dots) outputs some deviant predictions. These predictions may not necessarily mean that the model is incorrect but may indicate instead that a removal happened four days before the recorded date. Besides incorrect recording, there may also be maintenance actions that are not recorded but still occur. For instance, in the Case-EBV there are several actions that can be carried out on the valve which do not correspond to a valve removal. As an electro-mechanical device, the valve may be reset electronically which can influence both the fault events and the health monitoring events of the system. To illustrate this situation please consider Fig. 35. Here, the several models alert for a possible failure at the mark of 70 days before the actual removal. Hypothetically, the models may be correct. An intervention such as a reset may have happened at the 70 days, which resulted in the system being ready to operate again. All these predicaments complicate the problem and should be considered technically.

6.3

economic dimension

Predictive maintenance brings the promise of increasing system availability and safety while reducing costs. This follows from better product quality, reduced maintenance backlog and enhanced logistics. While the benefits of predictive maintenance are difficult to quantify, it is clear that this practice is advantageous both from a maintenance viewpoint and from the broader perspective of the entire supply chain. For example, in the aeronautical industry, the largest possible

79

conclusion

Figure 35 – Sample Removal # 444 (Machine Learning and Kalman Filtering).

saving comes from the possible avoidance of an aircraft accident. Just one avoided accident can pay for the prognostics system cost. Moreover, it is not possible to calculate the value of preventing ”critical failures” as such events involve the human life. Prognostics systems are growing in popularity mostly due to the reduction of costs and the increase of revenue that they can bring. However, these systems can be subject to statistical error. Errors can be of two kinds: underestimation or overestimation of the actual failure time. If undetected, these errors can lead to unnecessary maintenance actions, in case of underestimation errors, or ultimately to system failure, in case of overestimation errors. For instance, it was found in a comparative study [10] that predictive maintenance can improve system performance as much as 10 to 15% beyond what is achieved using preventive maintenance. However, the results also suggest that moderate statistical error can render predictive maintenance inferior to preventive maintenance and severe statistical error can render predictive maintenance inferior to run-to-failure. It is therefore important to note that prognostics systems can in fact drive economical benefits but only if they attain considerable levels of accuracy.

6.4

social dimension

Despite progress in the data gathering functions, full utilization of the capabilities provided by data is still incipient. This is particularly true for aeronautics where the existing prognostics systems are far from achieving the degree of accuracy and robustness required for full automation. The human factor is of central importance here. Without the help of specialized labor it would not be possible to

80

6.4 social dimension

efficiently track problems in critical systems such as the air management system or the aircraft engine. It is foreseen that the human factor will continue to play a major role even when prognostics systems are improved. The years of experience and the creativity of the human mind will be difficult to be replaced by a fully automated system. Instead, it is advised that the two sources of knowledge are combined to produce more reliable and accurate maintenance programs. The vision is to develop prognostics systems that can bring increased productivity to the human element. With automation comes a series of advantages and disadvantages. If a prognostics system can reduce unnecessary workload and bring relief from small human errors, it can also create automation-induced failures, produce false alarms and, perhaps more importantly, increase the mental workload necessary to perform the additional monitoring of the system. To achieve the correct balance between automation and the human factor it is important to ensure that technology is there to assist the expert naturally while letting him/her be the final authority in determining failure. The operator should be able to override the system’s recommendations and use his/her own skills and experience when necessary. The difficult balance between human intervention and automation is more easily attained if the systems are equipped with models that are easy to interpret and understand. In this thesis, special attention was given to this dimension of the problem. Adding a sense of temporal order to the estimates was a driving goal of this work. Here, both the Kalman filter solution and the deep learning methods proved to have positive results. It is hoped that this small contribution can help promote a more widespread sensibility of this issue. Often, focus is on the accuracy of the models but the interpretation of the models is of central importance too, especially in this context. Dealing with datasets such as the ones from Case-EBV and Case-Engine means dealing with a dataset where the “ground truth” is a blurry concept. In machine learning it is critical to identify what we aim to predict and what is the object of prediction. When dealing with a record of maintenance actions it is important to understand that what we are predicting is actually the human decision for a maintenance event and not failure itself. A multitude of factors plays a role in the decision of the operator, some of which may be impossible to predict in view of the information provided. For instance, logistics and economic imperatives may prevent/cause a maintenance action without a clear rationale behind. Some actions may be simply impossible to predict.

81

conclusion

If there can be maintenance actions that are unpredictable there may also be maintenance actions that have happened but were not registered. For example, in the Case-EBV, the operator has the option of resetting the EBV valve. Even though this is not a maintenance action in the pure sense, it is an action upon the valve that is not registered in the maintenance logs and that can have a significant impact upon the system. It is advisable that all actions upon the equipment, even apparently unimportant actions such as a reset, are registered and used in future prognostics models.

82

BIBLIOGRAPHY

BIBLIOGRAPHY

[1] Robert B Abernethy, JE Breneman, CH Medlin, and Glenn L Reinman. Weibull analysis handbook. Technical report, Pratt and Whitney, 1983. [2] David W Aha, Dennis Kibler, and Marc K Albert. Instance-based learning algorithms. Machine learning, 6(1):37–66, 1991. [3] Jeffrey B Armstrong and Donald L Simon. Implementation of an integrated onboard aircraft engine diagnostic architecture. National Aeronautics and Space Administration, Glenn Research Center, 2012. [4] Piero Baraldi, Francesca Mangili, and Enrico Zio. A kalman filter-based ensemble approach with application to turbine creep prognostics. IEEE Transactions on Reliability, 61(4):966–977, 2012. [5] Pundarikaksha Baruah and Ratna B Chinnam*. Hmms for diagnostics and prognostics in machining processes. International Journal of Production Research, 43(6):1275–1293, 2005. [6] Leo Breiman. Random forests. UC Berkeley TR567, 1999. [7] Tom Brotherton, Paul Grabill, Dariusz Wroblewski, Richard Friend, Bill Sotomayer, and John Berry. A testbed for data fusion for engine diagnostics and prognostics. In Aerospace Conference Proceedings, 2002. IEEE, volume 6, pages 6–6. IEEE, 2002. [8] Carl S Byington and Michael J Roemer. Prognostic enhancements to diagnostic systems for improved condition-based maintenance (military aircraft). In Aerospace Conference Proceedings (IEEE), volume 6, pages 6–2815. IEEE, 2002. [9] K Calhoun. Health management at rolls-royce, Last accessed 10 Jul 2017. [10] Mauricio Carrasco and CR Cassady. A study of the impact of prognostic errors on system performance. In Reliability and Maintainability Symposium, 2006. RAMS’06. Annual, pages 1–6. IEEE, 2006. [11] Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078, 2014.

85

Bibliography

[12] Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555, 2014. [13] National Research Council et al. The airliner cabin environment: air quality and safety. National Academies Press, 1986. [14] Matthew Daigle and Chetan S Kulkarni. End-of-discharge and end-of-life prediction in lithium-ion batteries with electrochemistry-based aging models. 2016. [15] Matthew J Daigle and Kai Goebel. A model-based prognostics approach applied to pneumatic valves. International journal of prognostics and health management, 2(2):84–99, 2011. [16] Olivier L De Weck, Daniel Roos, and Christopher L Magee. Engineering systems: Meeting human needs in a complex technological world. MIT Press, 2011. [17] Daniel P Dennies. The organization of a failure investigation. Journal of Failure Analysis and Prevention, 2(3):11–16, 2002. [18] Harris Drucker, Christopher JC Burges, Linda Kaufman, Alex J Smola, and Vladimir Vapnik. Support vector regression machines. In Advances in neural information processing systems, pages 155–161, 1997. ¨ [19] Omer Faruk Eker, Faith Camci, and Ian K Jennions. Major challenges in prognostics: study on benchmarking prognostic datasets. 2012. [20] Hatem M Elattar, Hamdy K Elminir, and AM Riad. Prognostics: a literature review. Complex & Intelligent Systems, 2(2):125–154, 2016. [21] S Flarity. Phm experience at utc and pratt & whitney: Challenges and opportunities. In PHM society conference, San Diego, CA, USA, volume 1, 2009. [22] Dean K Frederick, Jonathan A DeCastro, and Jonathan S Litt. User’s guide for the commercial modular aero-propulsion system simulation (c-mapss). 2007. [23] Fausto Pedro Garci, Felix Schmid, Javier Conde Collado, et al. A reliability centered approach to remote condition monitoring. a railway points case study. Reliability Engineering & System Safety, 80(1):33–40, 2003. [24] Jeremiah Gertler. F-35 joint strike fighter (jsf) program: Background and issues for congress. Library Of Congress Washington Dc Congressional Research Service, 2009.

86

Bibliography

[25] K. Goebel, M. Daigle, A. Saxena, S. Sankararaman, Roychoudhury I., and J. Celaya. Prognostics: The Science of Making Predictions. 2017. [26] Kai Goebel, Bhaskar Saha, and Abhinav Saxena. A comparison of three datadriven techniques for prognostics. In 62nd meeting of the Society for Machinery Failure Prevention Technology (MFPT), pages 119–131, 2008. [27] Vinod Goel, Brian Gold, Shitij Kapur, and Sylvain Houle. The seats of reason? an imaging study of deductive and inductive reasoning. NeuroReport, 8(5):1305–1310, 1997. [28] KM Goh, Benny Tjahjono, Tim Baines, and S Subramaniam. A review of research in manufacturing prognostics. In International Conference on Industrial Informatics, pages 417–422. IEEE, 2006. [29] Jayant Sen Gupta, Christian Trinquier, Kamal Medjaher, and Noureddine Zerhouni. Continuous validation of the phm function in aircraft industry. In Reliability Systems Engineering (ICRSE), 2015 First International Conference on, pages 1–7. IEEE, 2015. [30] Simon Haykin. Neural networks: A comprehensive foundation. Prentice Hall PTR, 1994. [31] Aiwina Heng, Sheng Zhang, Andy CC Tan, and Joseph Mathew. Rotating machinery prognostics: State of the art, challenges and opportunities. Mechanical Systems and Signal Processing, 23(3):724–739, 2009. [32] Andrew Hess, Giulio Calvello, Peter Frith, SJ Engel, and David Hoitsma. Challenges, issues, and lessons learned chasing the “big p”: real predictive prognostics. In Aerospace Conference, 2006 IEEE, pages 1–19. IEEE, 2006. [33] Sepp Hochreiter. Untersuchungen zu dynamischen neuronalen netzen. Diploma, Technische Universität Munchen, 91, 1991. ¨ [34] Sepp Hochreiter and Jurgen Schmidhuber. Long short-term memory. Neural ¨ computation, 9(8):1735–1780, 1997. [35] Chao Hu, Byeng D Youn, Pingfeng Wang, and Joung Taek Yoon. Ensemble of data-driven prognostic algorithms for robust prediction of remaining useful life. Reliability Engineering & System Safety, 103:120–135, 2012. [36] Condition ISO. Condition monitoring and diagnostics of machines — vocabulary. International Standard, ISO 13372:2012, 2012.

87

Bibliography

[37] Keith M Janasak and Raymond R Beshears. Diagnostics to prognostics-a product availability technology evolution. In Reliability and Maintainability Symposium, 2007. RAMS’07. Annual, pages 113–118. IEEE, 2007. [38] Andrew KS Jardine, Daming Lin, and Dragan Banjevic. A review on machinery diagnostics and prognostics implementing condition-based maintenance. Mechanical systems and signal processing, 20(7):1483–1510, 2006. [39] Kamran Javed, Rafael Gouriveau, and Noureddine Zerhouni. State of the art and taxonomy of prognostics approaches, trends of prognostics applications and open issues towards maturity at different technology readiness levels. Mechanical Systems and Signal Processing, 94:214–236, 2017. [40] Link C Jaw. Recent advancements in aircraft engine health management (ehm) technologies and recommendations for the next step. ASME Paper No. GT200568625, 2005. [41] Saint John Walker. Big data: A revolution that will transform how we live, work, and think. Taylor & Francis, 2014. [42] Marine Jouin, Rafael Gouriveau, Daniel Hissel, Marie-Cécile Péra, and Noureddine Zerhouni. Particle filter-based prognostics: Review, discussion and perspectives. Mechanical Systems and Signal Processing, 72:2–31, 2016. [43] Simon J Julier and Jeffrey K Uhlmann. Unscented filtering and nonlinear estimation. Proceedings of the IEEE, 92(3):401–422, 2004. [44] Rudolph Emil Kalman et al. A new approach to linear filtering and prediction problems. Journal of basic Engineering, 82(1):35–45, 1960. [45] Man Shan Kan, Andy CC Tan, and Joseph Mathew. A review on prognostic techniques for non-stationary and non-linear rotating systems. Mechanical Systems and Signal Processing, 62:1–20, 2015. [46] Srinivas Katipamula and Michael R Brambley. Methods for fault detection, diagnostics, and prognostics for building systems—a review, part i. Hvac&R Research, 11(1):3–25, 2005. [47] Jack PC Kleijnen. Statistical validation of simulation models. European journal of operational research, 87(1):21–34, 1995. [48] Jerald F Lawless. Statistical models and methods for lifetime data, volume 362. John Wiley & Sons, 2011. [49] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. Nature, 521(7553):436–444, 2015.

88

Bibliography

[50] Jay Lee, Behrad Bagheri, and Hung-An Kao. A cyber-physical systems architecture for industry 4.0-based manufacturing systems. Manufacturing Letters, 3:18–23, 2015. [51] Jay Lee, Jun Ni, Dragan Djurdjanovic, Hai Qiu, and Haitao Liao. Intelligent prognostics tools and e-maintenance. Computers in industry, 57(6):476–489, 2006. [52] Jay Lee, Fangji Wu, Wenyu Zhao, Masoud Ghaffari, Linxia Liao, and David Siegel. Prognostics and health management design for rotary machinery systems—reviews, methodology and applications. Mechanical systems and signal processing, 42(1):314–334, 2014. [53] Linxia Liao and Felix Kottig. Review of hybrid prognostics approaches for ¨ remaining useful life prediction of engineered systems, and an application to battery life prediction. IEEE Transactions on Reliability, 63(1):191–207, 2014. [54] Feng Lu, Yiqiu Lv, Jinquan Huang, and Xiaojie Qiu. A model-based approach for gas turbine engine performance optimal estimation. Asian Journal of Control, 15(6):1794–1808, 2013. [55] Yisheng Lv, Yanjie Duan, Wenwen Kang, Zhengxi Li, and Fei-Yue Wang. Traffic flow prediction with big data: a deep learning approach. IEEE Transactions on Intelligent Transportation Systems, 16(2):865–873, 2015. [56] Joao Pedro Pinheiro Malere and Luis Gustavo dos Santos. Challenges for costs and benefits evaluation of ivhm systems. SAE International Journal of Aerospace, 6(2013-01-2183):484–491, 2013. [57] DQ Mayne. Optimal non-stationary estimation of the parameters of a linear system with gaussian inputs. International Journal of Electronics, 14(1):101–112, 1963. [58] William Q Meeker and Yili Hong. Reliability meets big data: opportunities and challenges. Quality Engineering, 26(1):102–116, 2014. [59] T Mooney. Health management for in-service gas turbine engines, 27 Sept–1 Oct 2009. [60] Reliability Centered Maintenance NASA. Guide for facilities and collateral equipment, 2000. [61] F Stanley Nowlan and Howard F Heap. Reliability-centered maintenance. Technical report, DTIC Document, 1978.

89

Bibliography

[62] Marcos Orchard, Gregory Kacprzynski, Kai Goebel, Bhaskar Saha, and George Vachtsevanos. Advances in uncertainty representation and management for particle filtering applied to prognostics. In Prognostics and health management, 2008. phm 2008. international conference on, pages 1–6. IEEE, 2008. [63] D Stewart Peck. Comprehensive model for humidity testing correlation. In Reliability Physics Symposium, 1986. 24th Annual, pages 44–50. IEEE, 1986. [64] Leto Peel. Data driven prognostics using a kalman filter ensemble of neural network models. In Prognostics and Health Management, 2008. PHM 2008. International Conference on, pages 1–6. IEEE, 2008. [65] W Pujadas and F Frank Chen. A reliability centered maintenance strategy for a discrete part manufacturing facility. Computers & industrial engineering, 31(1-2):241–244, 1996. [66] Emmanuel Ramasso and Abhinav Saxena. Review and analysis of algorithmic approaches developed for prognostics on cmapss dataset. In Annual Conference of the Prognostics and Health Management Society 2014., 2014. [67] Ramesh Rebba, Shuping Huang, Yongming Liu, and Sankaran Mahadevan. Statistical validation of simulation models. International Journal of Materials and Product Technology, 25(1-3):164–181, 2006. [68] Ramesh Rebba, Sankaran Mahadevan, and Shuping Huang. Validation and error estimation of computational models. Reliability Engineering & System Safety, 91(10):1390–1397, 2006. [69] Murray Rosenblatt et al. Remarks on some nonparametric estimates of a density function. The Annals of Mathematical Statistics, 27(3):832–837, 1956. [70] Bhaskar Saha, Kai Goebel, Scott Poll, and Jon Christophersen. Prognostics methods for battery health monitoring using a bayesian framework. Instrumentation and Measurement, IEEE Transactions on, 58(2):291–296, 2009. [71] Robert G Sargent. Validation and verification of simulation models. In Proceedings of the 36th conference on Winter simulation, pages 17–28. Winter Simulation Conference, 2004. [72] Robert G Sargent. Verification and validation of simulation models. In Proceedings of the 37th conference on Winter simulation, pages 130–143. winter simulation conference, 2005. [73] Robert G Sargent. Verification and validation of simulation models. Journal of simulation, 7(1):12–24, 2013.

90

Bibliography

[74] Abhinav Saxena and Kai Goebel. C-mapss data set. NASA Ames Prognostics Data Repository, 2008. [75] Abhinav Saxena, Indranil Roychoudhury, Jose R Celaya, Sankalita Saha, Bhaskar Saha, and Kai Goebel. Requirements specifications for prognostics: An overview. American Institute of Aeronautics and Astronautics, 2010. [76] Mark Schwabacher. A survey of data-driven prognostics. Aerospace, page 7002. 2005.

In Infotech@

[77] Mark Schwabacher and Kai Goebel. A survey of artificial intelligence for prognostics. In AAAI fall symposium, pages 107–114, 2007. [78] Mohak Shah. Big data and the internet of things. In Big Data Analysis: New Algorithms for a New Society, pages 207–237. Springer, 2016. [79] JZ Sikorska, Melinda Hodkiewicz, and Lin Ma. Prognostic modelling options for remaining useful life estimation by industry. Mechanical Systems and Signal Processing, 25(5):1803–1836, 2011. [80] Anthony M Smith. Reliability-centered maintenance. McGraw-Hill New York, 1993. [81] Kandler A Smith, Christopher D Rahn, and Chao-Yang Wang. Model-based electrochemical estimation and constraint management for pulse operation of lithium ion batteries. IEEE Transactions on Control Systems Technology, 18(3):654–663, 2010. [82] Abdenour Soualhi, Kamal Medjaher, and Noureddine Zerhouni. Bearing health monitoring based on hilbert–huang transform, support vector machine, and regression. IEEE Transactions on Instrumentation and Measurement, 64(1):52–62, 2015. [83] Matt Spencer, Jesse Eickholt, and Jianlin Cheng. A deep learning network approach to ab initio protein secondary structure prediction. IEEE/ACM transactions on computational biology and bioinformatics, 12(1):103–112, 2015. [84] Bo Sun, Shengkui Zeng, Rui Kang, and Michael G Pecht. Benefits and challenges of system prognostics. IEEE Transactions on reliability, 61(2):323–335, 2012. [85] Rocco Tarantino, Ferenc Szigeti, and Eliezer Colina-Morles. Generalized luenberger observer-based fault-detection filter design: an industrial application. Control Engineering Practice, 8(6):665–671, 2000.

91

Bibliography

[86] Tiedo Tinga. Application of physical failure models to enable usage and load based maintenance. Reliability Engineering & System Safety, 95(10):1061–1075, 2010. [87] Kwok L Tsui, Nan Chen, Qiang Zhou, Yizhen Hai, and Wenbin Wang. Prognostics and health management: A review on data driven approaches. Mathematical Problems in Engineering, 2015, 2015. [88] Serdar Uckun, Kai Goebel, and Peter JF Lucas. Standardizing research methods for prognostics. In International Conference on Prognostics and Health Management, pages 1–10. IEEE, 2008. [89] Rudolph Van Der Merwe, Arnaud Doucet, Nando De Freitas, and Eric A Wan. The unscented particle filter. In Advances in neural information processing systems, pages 584–590, 2001. [90] E Vandervieren and Mia Hubert. An adjusted boxplot for skewed distributions. Proceedings in Computational Statistics, pages 1933–1940, 2004. [91] Vladimir Vapnik. The nature of statistical learning theory. Springer, 1995. [92] Nikhil M Vichare and Michael G Pecht. Prognostics and health management of electronics. IEEE transactions on components and packaging technologies, 29(1):222–229, 2006. [93] Xiaoli Xu, Tao Chen, and Mamoru Minami. Intelligent fault prediction system based on internet of things. Computers & mathematics with applications, 64(5):833–839, 2012. [94] Zeqi Zhao, Bin Liang, Xueqian Wang, and Weining Lu. Remaining useful life prediction of aircraft engine based on degradation pattern learning. Reliability Engineering & System Safety, 164:74–83, 2017. [95] Enrico Zio and Giovanni Peloni. Particle filtering prognostic estimation of the remaining useful life of nonlinear components. Reliability Engineering & System Safety, 96(3):403–409, 2011.

92

P U B L I C AT I O N I

Baptista, M., de Medeiros, I. P., Malere, J. P., Prendinger, H., Nascimento Jr, C. L., & Henriques, E. A Comparison of Data-driven Techniques for Engine Bleed Valve Prognostics using Aircraft-derived Fault Messages. Prognostics and Health Management Society, 2016

93

A Comparison of Data-driven Techniques for Engine Bleed Valve Prognostics using Aircraft-derived Fault Messages Marcia Baptista1 , Ivo P. de Medeiros2 , Joao P. Malere3 , Helmut Prendinger4 , Cairo L. Nascimento Jr.5 , and Elsa Henriques6 1,6

Instituto Superior Tecnico, Universidade de Lisboa, Lisbon, 1049-001, Portugal [email protected], [email protected] 2,3

Technol. Dev. Dept., Embraer SA, Sao Jose dos Campos, Brazil [email protected], [email protected]

5

Instituto Tecnologico de Aeronautica (ITA), 12228-900, So Jose dos Campos-SP, Brazil [email protected]

4

National Institute of Informatics, 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430, Japan [email protected]

A BSTRACT Prognostics plays an increasingly important role in preventive maintenance and aircraft safety. An approach that has recently become popular in this field is the data-driven technique. This approach consists in the use of past data and advanced statistics to derive estimates for the reliability of an equipment without relying on any physics or engineering principle. Data-driven models have been based on two types of historical data: past failure times and health monitoring data. A kind of health monitoring data rarely used in data-driven models are aircraft-derived maintenance messages. These data consist of fault messages derived from the aircraft onboard systems to notify any unexpected events or abnormal behavior as well as to send warning signals of equipment degradation. Fault messages have not received much attention in aircraft prognostics mostly due to its asynchronous and qualitative nature that often causes difficulties of interpretation. The main goal of this paper is to show that data-driven models based on fault messages can provide better prognostics than traditional prognostics based on past failure times. We illustrate this comparison in an industrial case study, involving a critical component of the engine bleed system. The novelty of our work is the combination of new predictors related to fault messages, and the comparison of datadriven methods such as neural networks and decision trees. Our experimental results show significant performance gain compared to the baseline approach. Marcia Baptista et al. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 United States License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

1. I NTRODUCTION Aircraft maintenance and repair operations, including unscheduled maintenance, account for 10-20% of the direct operating costs of an airline (Knotts, 1999). As a business with extremely thin margins (recorded in 2014 at 2.2%, IATA 2015 Annual Review), pressure to reduce costs while increasing service quality drives the need for better maintenance planning in the airline industry. An engineering discipline claimed to be able to reduce maintenance costs by 25% (Camci, 2005) is failure prognostics. This discipline attempts to identify the best timing to conduct a maintenance action by predicting when the health condition of an equipment evolves beyond an acceptable threshold (Coble & Hines, 2011). Due to this active role in preventing system failure, prognostics can play a decisive role to improve airplane reliability, and lengthen maintenance check intervals in aviation. A technique that has recently become popular in prognostics is the data-driven technique (Si, Wang, Hu, & Zhou, 2011; Schwabacher & Goebel, 2007). This approach is based on the assumption that the behaviors of a complex system cannot be fully grasped by a physically based model. Instead, advanced statistics and machine learning methods are used to learn a model directly from a set of data that is representative of all the behaviors found in the system (Schwabacher & Goebel, 2007). Data-driven prognostics has been based on two types of data: survival (time to event) data and health monitoring data (Si et al., 2011). Survival data usually consist of failure or re-

1

A NNUAL C ONFERENCE OF THE P ROGNOSTICS AND H EALTH M ANAGEMENT S OCIETY 2016

industrial case study involving the maintenance life cycle of a critical component of the jet engine. The novelty of our work is the use of new predictors related to the fault messages of the aircraft maintenance system. Our proposed data-driven models combine fault data and a wide range of machine learning techniques. We study linear models, support vector machines, Bayesian models, instancebased learning, and decision trees. Please note that most of these techniques have already been studied in prognostics (Si et al., 2011) but not with fault data.

Figure 1. Fault messages registered between 01/01/11 10:42 and 29/12/11 22:33 for a commercial aircraft. placement times (Moreira & Nascimento Jr., 2012). Health monitoring (HM) data consist of any data related to the estimation of the equipment current degradation. This definition includes aircraft sensory signals as well as inferred degradation signals. A kind of health monitoring data rarely studied in aircraft prognostics are aircraft-derived fault messages. These messages consist of early warnings derived asynchronously by the aircraft onboard systems whose primary function is to signal a fault, that is, a deviation from standard operation (Isermann & Balle, 1997). Fault messages are derived from processing equations that range from simple constructions, such as when a sensory signal (e.g temperature) registers a sudden spike or exceeds a predetermined threshold, to more elaborate combinations of sensory signals. Figure 1 shows an example file describing the fault messages of a commercial jet. In the file, each row represents a fault message that is characterized by a processing date and a fault class description (numeric code). Research on the general field of fault-based prognostics is incipient for two main reasons. First, fault messages have been used mostly for diagnostics and fault analysis purposes (Strong, 2014). Second, these data are qualitative by nature and thus harder to analyze than quantitative data. The interpretation of a fault message is usually a nontrivial process, influenced by all the elements involved at its creation. In this paper we investigate how fault messages can enhance prognostics in aviation. In particular, we aim to show that fault-based data-driven models can provide better performance than the simplest kind of prognostics, i.e prognostics based on survival data. This comparison is illustrated in an

This paper is organized as follows. Section 2 provides an overview of related work. Section 3 describes our data set and methodology. Topics such as algorithms used and data processing steps are discussed. Section 4 presents our research hypothesis and results. A discussion of the results is presented in Section 5. Finally, Section 6 summarizes the major findings of this study and discusses future research work. 2. R ELATED W ORK The first developments in aviation of prognostics and health management (PHM) technologies date back to to the Joint Strike Fighter (JSF) program launched in 1993. This program, costing nearly $1 trillion over the course of its lifetime, was able to design fighter jets with unprecedented processing and reasoning capabilities that relied on advanced sensor technology (Steidle, 1997). Despite the initial efforts of the JSF program, only in recent years have the most advanced prognostics tools and methods been developed and implemented for the purposes of commercial aircraft maintenance (Heng, Zhang, Tan, & Mathew, 2009). Several approaches to prognostics have been proposed, ranging from low fidelity models, such as historical failure rate models, to high-fidelity physics-based models (Byington & Roemer, 2002). Life usage models, or failure rate models, are often considered the simplest and most widely used form of prognostics (Schwabacher & Goebel, 2007). This kind of model is useful when a physical model of the component is unfeasible and there is insufficient sensor data to assess the equipment condition. Life usage models rely on the survival times (time to event) of a large sample of components to predict the remaining time to a repair or failure of an individual component. Here, the variable of interest is the equipment operational life (Frangopol, Kallen, & Van Noortwijk, 2004) predictions are based on the passage of time and/or measures of usage such as an airplane number of flights (i.e. cycles). Rausand and Høyland (2004) propose different statistical distributions to model life usage, such as exponential, Weibull, normal, log-normal, logistic, log-logistic and Gamma distri-

2


butions. In prognostics, the exponential and Weibull distributions are the most commonly used methods (Abernethy, 1996). The exponential distribution is simple and easy to apply and the Weibull has the ability to adjust to different reliability stages namely, infant, mature and wear out phases. Another approach to prognostics is model-based algorithms. Here, “model-based” means a (more or less) hand-coded representation of human knowledge about the system (Schwabacher & Goebel, 2007). Traditional modelbased techniques include state-space models (Isermann, 2006) and dynamic ordinary or partial differential equations (Vachtsevanos, Lewis, Roemer, Hess, & Wu, 2006). The most sophisticated model-based techniques include rulebased expert systems such as SHINE (James & Atkinson, 1990) and Gensym G2 (GensymWebsite, 2007). Other examples of model-based techniques are finite-state machines (Williams & Nayak, 1996; Kurien & Nayak, 2000), and qualitative reasoning (Weld & Kleer, 1989). Unlike electronic or electrical systems, mechanical equipment typically fails slowly as structural failures progress to a critical level. Monitoring these trends provides an opportunity to assess degradation and better estimate the remaining useful life of the equipment over a period of time. Here, the typical scenario is a slow but progressive change to a major structural failure mostly due to fatigue effects. Repetitive stresses induced by factors such as vibration loads (high cycle fatigue) or temperature cycles (low cycle fatigue) form frequently the basis for the equipment damage accumulation. This complex combination of variables often makes it difficult to develop a successful physics-based failure model (Mathur, Cavanaugh, Pattipati, Willett, & Galie, 2001). In such cases, empirical forecasting models, that is, data-driven approaches, are often preferred (Sankavaram et al., 2009). Many data-driven techniques have been used in structural prognostics, from multivariate statistical methods, such as dynamic principal components (PCA), linear and quadratic discriminants and partial least squares (PLS), to black-box methods, such as neural networks, support vector machines, decision tree classifiers, graphical models and fuzzy logic (Pecht, 2008). One of the most popular data-driven approach to prognostics is to use artificial neural networks to model the system (Goebel, Saha, & Saxena, 2008). The statistical performance of this technique has been shown in a number of real-life applications (Brotherton, Jahns, Jacobs, & Wroblewski, 2000; Wang & Vachtsevanos, 2001). Furthermore, neural nets, such as radial basis functions (RBFs), have been shown to be capable of “novelty detection” (identifying unexpected events in face of past history) (Brotherton & Johnson, 2001). Despite the considerable number of works proposing new data-driven approaches to fault prognostics, comprehensive

studies comparing the different approaches are rare. The few studies put forth seem to indicate that performance depends highly of the application domain. For instance, the work of He and Shi (2002) appears to contradict the established idea that neural networks produce the most accurate results. When studying valves in reciprocating pumps, the authors found that support vector machines (SVMs) yielded better accuracy than traditional neural nets in fault detection. These findings however, were only relative to their specific diagnostics field. Further research could determine whether and when approaches such as SVMs are able to produce more accurate prognostics results than neural nets. Perhaps one of the most comprehensive comparison study on data-driven techniques is the work of Loyer, Henriques, and Wiseall (2014). Here, the authors compare a wide range of binary classifiers from linear regression to ensemble models to predict the probability of servicing a jet engine component at a major shop visit. Distinct approaches are discussed according to their ability to capture better or worse different perspectives of the data. The authors consider that ensemble models based on trees (random forests and boosted trees) present a good compromise between performance and interpretability while neural nets offer the best absolute performance. In most data-driven models, input comes directly from routinely monitored sensory signals such as calorimetric, spectrometric and calibration data or power, vibration, acoustics, temperature, pressure, oil debris, currents and voltage data. Despite the importance of these data sources, some authors (Galar, Palo, Van Horenbeek, & Pintelon, 2012; Strong, 2014) have advocated the use of alternative data sources, such as the aircraft onboard control systems. Data derived from onboard control systems consist in fault messages that aim to detect damaged or faulty equipment. These messages are widely used to define maintenance and repair procedures but have been seldom used in aircraft prognostics. Please note that the term fault, as it is used here, means an unexpected deviation of at least one characteristic property or parameter of the system from the acceptable, or standard condition (Isermann & Balle, 1997). This notion differs from that of failure, which is a permanent interruption of the equipment ability to perform its function under its regular set of operating conditions (Isermann & Balle, 1997). To isolate the failure of an aircraft equipment and eliminate as much cascade effects, onboard systems use fault processing equations to combine and elaborate on the various sources of flight data. As a result, fault data is different from sensor data in the sense that they are categorical and more prone to a subjective interpretation. One of the few attempts to explore fault data in prognostics is the work of Strong (2014). Here, the author used fault 3


messages to estimate the remaining time to failure of two industrial devices in a nuclear plant: an actuator and a motor. To construct its prognostics parameters Strong used simple methods based on averages and counts of the number of fault messages. A merging procedure was used to fuse the faultbased prognostic parameters to sensor data in a general path model (GPM) (Lu & Meeker, 1993). The main result of the study was that integrated prognostic parameters had significantly higher accuracy than parameters based solely on the information of fault messages. Despite the importance of Strong’s (2014) work, research on this topic could be improved in several ways. First, other modeling techniques besides GPM could be investigated. The GPM technique has been subject to some discussion (Garvey & Hines, 2007; Coble & Hines, 2011) due to use of a single degradation signal and the assumption of a failure threshold. A more complex combination of predictors could also provide more accurate results. It is along these lines that we present this work. 3. M ATERIALS AND M ETHODS In this section, we describe our data set and methodology. Section 3.1 presents an exploratory analysis of our data using statistical methods and unsupervised techniques such as Kmeans clustering. In section 3.2 we briefly discuss some of the methods and techniques used in our experiments. 3.1. Dataset The experiments in this study are based on a real-world data set from a major aircraft producer. The data set reports on the 588 removals of a system of two identical bleed valves and on the 700 000 fault messages recorded from 39 commercial jets (two airline companies). The removals were recorded between January 2010 and June 2015 while the messages were collected between October 2011 and November 2014. The two-valve system studied here is considered a single system prone to failure, that is, as a multi-component system. The two-valve system analyzed in this study is a critical element of the engine bleed air system – it allows the selection of either left, right or both engines as bleed air sources. Figure 2 illustrates in a simplified way the studied bleed system. The valves of interest, the engine bleed valves, are located between the compressor section of the engine(s) and the heat exchanger of the bleed system. Engine bleed valves are line-replaceable units (LRUs) requiring frequent removal – they are designed to be removed and replaced quickly at the operating level in order to restore the engine bleed system to an operational ready condition. These valves are also rotables in the sense that they can be repeatedly and economically restored to a fully serviceable condition. Most often, these valves are replaced by new or repaired

Figure 2. General schematics of an aircraft bleed air system. inventory items. Less frequently, engine bleed valves are removed, repaired on the fly and put back on the aircraft. This type of replacement accounts for around 20%-30% of all repairs in our data set. The recording of removal dates is sometimes vulnerable to human mistakes by maintenance staff. Accordingly, we used a cleaning procedure to identify missing removals and other recording errors. In particular, the medcouple outlier method (Hubert & Vandervieren, 2008) was used to identify abnormal long/short time to removals. The results of this method were considered more plausible than results of the traditional box plot method (Tukey, 1977). While the latter approach detected 49 outliers (time to removal above 277 days), the medcouple detected only 4 outliers (time to removal above 653 days), a more reasonable proportion of 0.68% outliers in the overall set of 588 removals. In our data set, time to (next) removal is a random variable with a probability density that resembles a Weibull distribution as shown in Figure 3. The cumulative distribution chart of Figure 4 illustrates how our empirical data sample is well fit to the theoretical Weibull model. In addition to removal events, our data set also comprises fault data for the 39 jets. These data consists of all the automatic fault messages exchanged between the aircraft central maintenance computer (CMC) and ground facilities between October 2011 and November 2014. For each fault message, we have the following information: (1) date of message transmission and (2) processing code. These processing codes consist of 92 distinct numeric characters. Please review Figure 1 for an example of fault messages. It is important to note that the analyzed fault messages do not provide direct information on the condition of the engine bleed valves. Instead, the messages convey information about the overall health of the bleed air system, such as when the system overall temperature goes beyond a given limit. 4


Figure 3. Probability Density Function (PDF) of time to next removal.

Since fault messages do not provide explicit information on the particular degradation of each unit in the bleed system, detection of whether and which assets require repair is a complex process. For instance, note Figure 5 where we show the arrival of 7 different types of messages for aircraft 24. Here, it is shown that the rate of messages (for instance messages 5290 and 5410) is often a trigger for a removal. However, it is not clear how the total number of a specific kind of messages triggers a removal take the example of message 5290: sometimes it triggers a removal around the 20 messages while at other times this happens at 60 messages. This difference is often related to minimum equipment list (MEL) requirements. Overall, our data set comprises around 5 and 2 hundred thousand messages for airline 1 and airline 2, respectively. From this set we ignored messages for which there was no previous or next removal as it was not possible to calculate accurate cumulative statistics for these messages. Overall, almost 150 thousand messages (19%) were disregarded in this process. For a graphical representation of this data cleaning process please see the event plot of Figure 6a. The plot illustrates the timeline of each of the 39 jets along the x-axis. Here, removals are marked as black circles, messages as black lines, and ignored removals and messages are marked in red. Package NbClust (Charrad, Ghazzali, Boiteau, & Niknafs, 2012) combined with K-means detected two clusters of messages according to time to next message as shown in Fig. 6b. In the first group are messages spaced less than 1.4 hours from the next message. These messages are exchanged during flight. Messages spaced more than 1.4 hours from their next message are rare (15%) and may be evidence of longer aircraft stops or absence of warnings.

Figure 4. Cumulative Distribution Function (CDF) of time to next removal. Regarding the cumulative sum of all messages, Fig. 6c shows that it is not clear that a threshold for a removal exists. The high dispersion (standard deviation of 2198 days) around the expected value of the number of messages at removal (2016 days) seem to indicate that this factor may not be a good sole predictor of a removal. The same seems to be true for the number of messages of a given code at removal. 3.2. Methodology In this section we describe the methodology followed to investigate the hypothesis of our study: Predictive models based on fault data outperform traditional prognostics models based on survival analysis. Our methodology consisted in comparing two prognostics approaches: (a) life usage models based on time to removal data and (b) data-driven models based on fault messages. The first approach is the traditional approach used in maintenance, where maintenance decisions (e.g., preventive hard time intervals) are determined based on statistical failure time analyses (Ahmad & Kamaruddin, 2012). The second (data-driven) approach combines a sophisticated type of health monitoring data the fault messages derived in real-time from the aircraft onboard systems with advanced techniques from machine learning and artificial intelligence. The main goal here is to improve the accuracy and online predictive power of condition-based prognostics models in aeronautics. In this study the target variable was the remaining time to a removal. Model accuracy was evaluated and compared in terms of mean absolute error (MAE), root-mean-square (RMSE), 5


Figure 5. Timeline of aircraft 24. Table 1. Performance metrics. Metric

Abbr

Mean error

ME

Root mean squared error

RMSE

Mean absolute error

MAE

Formula N 1 X ˆ (T i T i ) N i=1 N r ⌘2 1 X ⇣ˆ Ti Ti N i=1 N 1 X ˆ Ti N i=1

Ti

Note: N stands for number of observations. For each observation i, the model (either life usage or data-driven) predicts Tî for the Ti observed value. Here, variable T means remaining time to a valve removal.

and mean bias errors (average residuals ME). Table 1 details the metrics evaluated for both models.

The life usage approach consisted in applying the WeibullPareto distribution to our data set of removal times. To evaluate our Weibull analysis, 10-fold cross-validation was performed. In this procedure, each test fold was compared against a set of removal events generated from a Weibull model fit to the training data. The data-driven approach was based on removal times and fault messages. Here, we developed distinct models using state-of-the-art data-driven techniques. In particular, we applied five of what have been considered the top 10 algorithms in data mining (Wu et al., 2008): k-nearest neighbor regression (KNN), regression trees (RPART R package), linear support vector machines (SVM), Bayesian generalized linear models (Bayes), and Gradient Boosting with Regression Trees (Boost trees). We also applied linear regression (LM) and neural networks (NN).

6


Figure 6. Event plot of data set (ignored messages and removals marked in red) and kernel density plots of time between messages and number of messages at removal. The data frame of the data-driven models consisted of 520 thousand observations. Each observation described the arrival of a maintenance message and was characterized by 8 attributes: (1) past mean time to removal, (2) past variation of time to removal, (3) time since message, (4) time since message of same code, (5) number of messages since removal, (6) number of messages of same code since removal, (7) previous time to removal, and (8) variation between last two times to removals (drift). We found that two of these attributes, attribute 7 and 8, were highly correlated with each other with an absolute correlation higher than 0.75. Accordingly, we considered as our predictors all features except attribute 7. For a more detailed description of our correlation analysis please refer to Tab. 2, where the matrix of Pearson’s r rank correlation coefficients for all possible pairs of attributes is presented. Since the maintenance message data consisted of time-wise dependent data, we could not use the classical 10-fold crossvalidation scheme to evaluate the data-driven models (Arlot, Celisse, et al., 2010, p. 65-66). Instead, a stratified crossvalidation scheme was used. In the devised method, all the observations corresponding to messages within the same removal interval were in a single fold. This way, it was ensured that each training set contained information that occurred only after the testing sets.

4. R ESULTS In this section, numerical results are presented to illustrate the superiority of data-driven models based on fault messages over traditional life usage prognostics (Weibull analysis). Table 3 presents a comparison of all tested approaches concerning Mean Absolute Error (MAE), Mean Error (ME), Root Mean Squared Error (RMSE) and computational performance. As illustrated in Table 3, our baseline, the Weibull model had, as expected, the worst predictive result of all approaches, both in terms of MAE and RMSE. The best data-driven approaches, Support Vector Machines (SVM) and Boosting trees (Boost), represented an increase of performance of 41.16% and 47.51% in terms of MAE and RMSE respectively. Concretely, and as shown in Figure 7, the MAE of 110.04 days of the Weibull model was significantly higher than the MAE of the remaining approaches, which ranged from 83.70 days for the K-Nearest Neighbors (worst data-driven MAE result) to 64.75 days for the Support Vector Machines (best data-driven MAE result). The RMSE results also favored the data-driven models. As shown in Figure 8, the RMSE of the Weibull model, 162.17, was considerably higher than the RMSE errors of the remaining approaches, which ranged from 117.88 for the Neural 7


Table 2. Correlation table of attributes. Attribute

1

2

3

4

5

1 Past mean time to removal 2 Past variation (std) of time to removal 3 Time since last message 4 Time since message of same code 5 Number of messages since removal 6. Number of messages of same code since removal 7. Previous time to removal 8. Previous variation of time to removals 9. Time to removal

0.70*** 0.01* 0.02*** -0.07*** -0.13*** 0.27*** 0.08*** 0.13***

0.00 0.01*** 0.07*** 0.07*** 0.29*** 0.26*** 0.21***

0.05*** -0.01*** -0.02*** 0.00 0.00 0.01*

0.04*** -0.07*** -0.02*** -0.01*** 0.02***

0.57*** -0.13*** -0.05*** -0.09*** 0.01** 0.92*** -0.06*** -0.01* -0.11*** -0.09***

6

7

8

Note that * means p. as a convex optimization problem: X 1 minimize k✓k2 C( ⇠i + ⇠i⇤ ) 2 i

(10)

where ⇠i and ⇠i⇤ are relaxation factors and C is a penalty factor The applications of SVR to RUL prediction are still mostly restricted to the prognosis of bearings and batteries (Huang et al., 2015). An effort to apply this technique to prognostics in aeronautics is the work of Baptista et al. (2016b). In this study, SVR is shown to be an efficient method for engine bleed valve prognostics. Other techniques which could benefit from further exploration are tree-based methods. In these methods the predictor space is stratified or segmented into a number of simple regions whose splitting rules can be summarized in a tree. The more popular tree-based methods grow multiple trees which are then combined to yield a single consensus prediction. Formally, such a tree-based approach assumes the existence of f such that B 1 X yˆ(t) = f (t, x(t), ✓(t)) = fb (x(t), ✓(t)) B

(11)

b=1

where B is the number of decision or regression trees fb

trained on a sample b of (X, y). Another approach which has yield promising recent results is instance-based regression (Xue et al., 2008; Khelif, Malinowski, Chebel-Morello, & Zerhouni, 2014). In instancebased learning, new problem instances (x(t)) are compared with instances seen in training (X, y) using similaritydistance metrics. Besides the above referred techniques, a large number of machine learning methods can be found in the literature (Schwabacher & Goebel, 2007). As an exhaustive comparison of all these methods is not feasible we selected the techniques that showed the most promise for prognostics (Schwabacher & Goebel, 2007) or that are considered among the most representative algorithms for machine learning (Wu et al., 2008). 3. C ASE S TUDY In this section, the case study is introduced by describing its background and data. 3.1. Background Engine prognostics or the forecasting of engine degradation which supports the on-condition maintenance is and will continue to be a challenging task. This follows mostly due to the uncertainty associated with gas turbine design, environmental and operating conditions. This study aims to explore the incorporation of statistical information to develop a data-driven approach to EOL and RUL estimation of modern aircraft gas turbine engines. To show the effectiveness of our approach we present a field application. In the application, the varying degradation conditions and maintenance actions that happen to the engines are considered. 3.2. Data Common to data-driven approaches is the modeling of the desired response variable using large volumes of historical data (Goebel, Saha, & Saxena, 2008). In this case study, our data set describes the evolution of performance of a set of commercial jet engines between approximetely ten years in different intervals of time for each engine. Concretely, the data consists of a cross-sectional time series in the sense that for each engine, and as exemplified in Figure 4, we have a multi-variate series that represents the temporal progression of the engine HM signals. These signals are measured at three different flight phases: 1 measurement is taken at take-off, another at climb and 3 other at cruise. Overall, we analyze data of 1.8 ⇥ 106 flights (i.e cycles) (average of 3.35 ⇥ 103 ± 103 cycles per engine). This adds up to around 3GB of raw data. In addition to performance signals, there is also informa4


Figure 4. Performance data. For each engine, the evolution of performance is described using a set of engine health monitoring indicators such as temperature, vibration, rotor speed. There is also information about the engine maintenance events. tion about the engine overhauls. An engine overhaul can be defined as a comprehensive inspection that involves removing and disassembling the engine, testing all its sub-systems, cleaning and replacing parts as needed and then reassembling the engine (Seemann, Langhans, Schilling, & Gollnick, 2010). For a better sense of the data, please also consider the histogram of total removal time shown in Figure 5. In the plot, the data is skewed to the right, indicating that the mean (354 ± 285 days) is higher than the median (306 ± 285 days). This implies that short overhauls are more frequent than medium-to-long overhauls. The graph also illustrates how the empirical data sample is well fit to the theoretical Weibull distribution (black fitted line). This finding indicates that, the Weibull distribution is most likely the best fit for the experience-based model. 4. M ETHODOLOGY In order to understand the reliability patterns of jet engines, we perform several experiments using field data, where each experiment analyzes a different data-driven method. This section describes the methodology followed to validate the main hypothesis of this study: H: Condition-based predictive data-driven models based on maintenance data and HM signals are able to outperform preventive experience-based models. To investigate this hypothesis we compare two reliability approaches: (a) experience-based modeling based on the Weibull distribution (I) and (b) data-driven methods based on

Figure 5. Data exploration. This study analyses performance level and full overhaul workscopes and ignores maintenance events lasting less than 1 week. The selected algorithms are used on a right-sided skewed data set of removals shown in the right top plot. As shown, the Weibull distribution is a good fit for the data.

overhaul data and HM signals (II) We aim to show that despite the good fit of the Weibull model to our maintenance data, data-driven models can derive better estimates of the equipment EOL and RUL. 4.1. Experience-based modeling The experience-based approach consists in using 10-fold cross-validation to measure the fit of the Weibull distribution to the data set of overhaul times. We chose cross-validation because this sampling technique has been well studied as a basis for measuring predictive accuracy (Kohavi et al., 1995). In the implemented cross-validation, the whole data set of overhaul times {yi }ni=1 is randomly partitioned into k equal sized subsamples (i.e testing folds). For each fold k, a Weibull distribution (Equation 1) is fitted to the remaining k 1 subsamples. From this fit, the hard-time interval Tk of fold k is computed with set to one (Equations 2 and 3). The performance of the k model is then computed using different quantitative measures of fit such as the mean error AcfTk =

vk vk 1 X 1 X "yi ,ˆyi = (yi vk i=1 vk i=1

Tk )

(12)

where vk is the number of testing observations in fold k and "yi ,ˆyi "yi ,ˆyi its the model bias given by the difference between the real value yi and the predicted value yî of observation i. Tk is computed from + ↵ ( 1 + 1) where ↵, and are 5


Table 1. Description of covariates. Type

Covariate

Unit

Alias

Environmental

Altitude Outside Air Temperature Vibration Intercase Nominal Shaft Speeds Marginal Shaft Speeds Shaft Vibrations Delta pressure in several stations Delta temperatures in several stations Turbine Cooling Air Front and Rear Delta Fuel Flow Engine Pressure Ratio Oil Pressure Oil Temperature Margin in High-Pressure (HP) turbine temperature Time since removal Number of past removals

Feet C IPS % % IPS % C C % Unit PSI C C Take-offs Unit

ALT OAT BB N1, N2, N3 N1 MW, N2 MW, N3 MW VB1, VB2, VB3 DP160, DP20, DP25, DP3 DT160, DT25, DT3 TCAF, TCAR DFF EPR OIP OIT TGT MW TIME N REMOVALS

Mechanical Performance

Systemic Others

parameters of the Weibull distribution fTk (t) fit to fold k The results from the k folds can then be averaged to produce a single performance estimation AcfT

k 1X = AcfTk k i=1

(13)

Algorithm 1 presents the pseudo-code of the approach. 4.2. Data-driven modeling The data-driven approach (II) is based on maintenance data and HM signals of several aircraft engines. Five distinct models are constructed from state-of-the-art techniques: univariate multiple linear regression (LR), K-nearest neighbors (K-NN), feed-forward neural networks (NN), random forests (RF) and linear support vector regression (SVR). Here, the target of prediction is the estimation of the remaining time to an overhaul (ˆ yi ) at the ith flight (i.e. cycle). Formally, we work on a set of past data (X, y) that consists of m observations. Each observation characterizes a flight for a given engine consisting of a set of n covariates as

X m,n

Cov1 Cycle1 x1,1 Cycle2 B B x2,1 = . B .. .. @ . Cyclem xm,1 0

y m,1

Cov2 x1,2 x2,2 .. . xm,2

··· ··· ··· .. . ···

Covn 1 x1,n x2,n C C .. C . A

RUL 1 Cycle1 y1,1 Cycle2 B y2,1 C B C = . B .. C .. @ . A Cyclem ym,1

Table 1 presents the list of considered covariates. The selection of these covariates was steered by discussion with domain experts within limits of the available data. The predictors are grouped into five domains according to the type of effect exerted by the parameter on the engine: maintenancerelated variables, environmental variables, mechanical effects related to moving engine parts (e.g rotation, vibration), performance effects related to the thermodynamics and fluid mechanics of the engine (e.g temperature, pressure), and systemic effects that affect the whole engine. Before its use, tuple (X, y) was submitted to a preprocessing procedure. First, and before any data transformation, there was a cleaning stage: the techniques of Tukey’s boxplot (Tukey, 1977) and Medcouple-based outlier detection method (Brys, Hubert, & Struyf, 2004) were used to detect overhauls extremely long or short which were considered as missing or incorrect data. The medcouple (MC) is a robust skewness measure that allows calculating the nominal data range: [Q1ceaMC IQD, Q3 + cebMC IQD], MC

(14)

xm,n

0

where X m,n denotes the matrix of dependent (covariates) variables and y m,1 the vector of RUL values for each cycle in matrix X

(15)

[Q1ceaMC IQD, Q3 + ce

aMC

0

IQD], MC < 0

(16) (17)

where Q1 and Q3 represent the lower and upper quartiles, respectively, of the data distribution, and IQD = Q3 Q1 is the interquartile distance, a measure of the spread of the data similar to the standard deviation. The rule threshold parameter is taken as c = 1.5, while the other two parameters are taken as a = -4 and b = 3 (typical values).

6


Algorithm 1 Cross-validation of model I Precondition: nk = 10 represents the number of folds and yi represents overhaul time at index i 1: 2: 3: 4: 5: 6: 7: 8: 9: 10:

function C ROSS - VALIDATION(nk , {yi }ni=1 ) AcfT 0 for k 1 to nk do trainSetk , testSetk PARTITION({yi }ni=1 , nk , k) ↵, , fitWeibull(trainSetk ) T + ↵ ( 1 + 1) P vk AcfT AcfT + v1k i=1 (yi T), yi 2 testSetk end for return n1k AcfT end function

. Let performance be zero . Randomly split {yi }ni=1 into nk equal-sized bins . Compute error rate using bias formula

Algorithm 2 Cross-validation of model II Precondition: nk = 10 represents the number of folds and yi represents cycle i 1: 2: 3: 4: 5: 6: 7: 8: 9:

function C ROSS - VALIDATION(nk , {yi }ni=1 ) Acf 0 . Let performance be zero for k 1 to nk do trainSetk , testSetk PARTITION({yi }ni=1 , nk , k) . Randomly split {yi }ni=1 into nk equal-sized bins so that each removal is in a different bin f (t, x(t), ✓(t)) fit(trainSet k) P vk AcfT AcfT + v1k i=1 (yi yî ), yi 2 testSetk . Compute error rate using bias formula and yî = f (ti , x(ti ), ✓(ti )) end for return n1k Acf end function

Please note that the Medcouple method is appropriate for our data as we deal with skewed univariate data (Figure 5) while other outlier detection methods such as the interquartile range only apply to normally distributed data. After the cleaning procedure, data were standardized to zero mean and unit variance. Principal component analysis (PCA) (Dunteman, 1989) was then used for all methods except the neural networks model. After the pre-processing, data (X, y) were used to evaluate the performance of the different regression schemes to engine overhaul estimation. The regression techniques were only trained on a subset of the observations to evaluate how well the techniques could generalize to unseen observations. Evaluation on out-of-model observations was performed on the test set. Here, and as for the Weibull model, we compared the algorithms using a 10-fold crossvalidation setup. In order to evaluate the predicted values, we compute accuracy using established metrics in the field such as mean error or bias (ME) or median absolute deviation (MdAE) (Saxena et al., 2008). The mean error expresses error as vy 1 X ME = (yi vy i=1

yî )

(18)

ME its the model bias for fold k given by the difference between the real value yi and the predicted value yî of observation i. The median absolute error expresses absolute error as MdAE = median {|yi

k yî |}vi=1

(19)

while relative accuracy is a customized metric from the prognostics field (Saxena et al., 2008) computed as RA(%) =

✓ vy 1 X 100 ⇥ 1 vk t=1

yî

v

y median{yi }t=1 yi

◆

(20)

For error spread we compute the median absolute deviation (MAD) as: MAD = median ( {|yi

k k median({yi }vi=1 )|}vi=1 )

(21)

To sum up, the used data-driven architecture is in Figure 2. The architecture is provided with inputs X and corresponding measured outputs y. With this data and the parameter vector ✓ the system is able to estimate function f (t, x(t), ✓(t)). After this, the prognostics module determines the RUL estimates represented as yˆ(ti ). The scheme of 10-fold crossvalidation is used to estimate and validate the model.

where vk is the number of testing observations in fold k and 7


5. R ESULTS After describing the data pre-processing and modeling approach we report our findings in this section. Concretely, we report the results of using the comparative research method to test the main hypothesis H. The goal here is to find evidence that the use of advanced data-driven methods can benefit the field of jet engine prognostics. 5.1. Numerical Results In Table 2 we present the numerical results of this case study, namely of the baseline method (Weibull analysis) and the data-driven models of linear regression (LR), K-nearest neighbors (K-NN), random forests (RF), support vector machines (SVM) and neural networks (NN). The first finding from Table 2 is that the results seem to indicate that all algorithms can outperform in accuracy as measured by the metric of MdAE, experience-based methods (Weibull analysis). Concretely and in our case, the median absolute error (MdAE) of 273 days of the experience-based method was clearly surpassed by the data-driven algorithms of NNET, LR, SVM, K-NN and RF. Here, the best performance was attained by the RF and the SVM with an average median error of 141 calendar days. The results regarding the metric of relative accuracy (RA) were not as expressive as for the median absolute error (MdAE). The higher accuracies were attained by the RF and the SVM models with 72% of accuracy. It is worthwhile to discuss why the models relative accuracy did not differ much. Please note that RA is a metric closely related to median absolute percentage error (MAPE) (Saxena et al., 2008), in fact, it is its inverse (RA = 1 - MAPE). Since this latter is scale sensitive it is important to note that RA will give more importance to incorrect predictions of small actual values than large values, especially when working with low-volume data. Notice that because ‘actual’ is in the denominator of the equation, when the actual value is not zero, but quite small, the MAPE and consequently RA will often take on extreme values. Since in our case the RA values are reasonable, its values reinforce the notion that all the algorithms can come up with a favorable RUL estimation (also when close to the removal). In regards to the precision of the methods, measured by the median absolute deviation (MAD), results were better for the data-driven methods. As the Weibull model outputs the same single output for every observation the mean time between repairs (MTBR) of the training data set, predictions of the removal overall time deviate considerably from each other, leading to a high MAD. The remaining data-driven models were hence able to beat this score having lower errors, except for the K-nearest neighbor algorithm. To sum up, the data-driven models were better in absolute accuracy and also, but not as expressively, in relative accu-

racy. Results were also promising in regards to error spread. These results suggest that there is enough evidence to support hypothesis H. 5.2. Illustrative Example In this section we provide an illustrative example of how the different tested algorithms compute the RUL of a jet engine. As an example, consider Figure 6 which shows the application of the algorithms to 4 different removals. In the Figure, the time index i is shown on the x-axis and the predicted residual life is on the y-axis. The diagonal RUL* depicts the true RUL. Also depicted in the plot are the predictions of the several methods, LR, K-NN, RF, SVM and NN. The closer these predictions to the true RUL* curve, the better the model accuracy. The less dispersed the predictions around the RUL* curve the higher the model precision. Analyzing Figure 6 it can be seen that all algorithms can come up with reasonably good estimates of RUL although these estimates can vary considerably from one algorithm to another. For instance, the LR, the simplest approach, has a tendency to output similar predictions to similar points here, by similar points we mean observations close in time. This tendency results in the LR predictions usually following a more or less well-defined horizontal trajectory. In contrast with the LR, which works by finding the best fitting straight line, the K-NN method uses a similarity search approach to find the best neighbors. Here, predictions tend to have a higher degree of noise as the output of the algorithm is the average of the k-nearest neighbors. Nevertheless, the overall absolute accuracy of the model is considerably higher than that of the LR. The predictions of the RF model have a distinct pattern from the remaining models. The method works by having more dispersed predictions along the RUL* line than the remaining methods. This follows from the way regression decision trees operate. Since decision trees work by a series of local decisions and a random forest is an ensemble of randomly selected decision trees to avoid overfitting the output of the RF model can deviate considerably. Nevertheless, the model accuracy is good and tends to improve near the removal, as the error spread diminishes. The results of the NN algorithm in Figure 6 are also promising. The model is especially good at long-term predictions even though it often fails to provide accurate estimation near the engine removal. Fig. 6 also shows the predictions of SVR, which also follow the targets well. As illustrated, this is the best model among the five, as its predictions deviate the least from the perfect correlation line of RUL*.

8


Table 2. Performance results.

Mean Error (ME) Median Absolute Error (MdAE) (days) Relative Accuracy (RA) Median Absolute Deviation (MAD) (days)

Experience- Linear ReBased gression (Weibull) (LR)

K-Nearest Neighbors (K-NN)

Neural Networks (NN)

Random Forests (RF)

Support Vector Machines (SVM)

113.16 206.26 63.88% 306.09

-113.29 166.91 68.08% 216.46

-175.23 165.97 68.94% 175.15

-164.57 141.57 72.07% 208.49

-31.97 141.26 71.89% 149.42

-153.18 171.20 68.21% 178.09

Figure 6. Remaining useful life prediction trajectory of the 5 data-driven algorithms for 4 removals.

9


6. C ONCLUSION Five different methods for jet engine prognostics models were tested on a proprietary data set. The used methods were linear regression (LR), K-nearest neighbors (K-NN), neural networks (NN), random forests (RF) and support vector regression (SVR). All methods showed a reasonable good performance. The predictions of the SVR method resulted in the best values for the median absolute error and median absolute deviation metrics. The good agreement between the modeled remaining life and the observations for the tested techniques lead to the conclusion that data-driven models can potentially be used as an alternative to traditional experience-based methods such as Weibull analysis. It can be said that datadriven techniques are worth further exploration in the field of engineering prognostics. The tested techniques have great potential for improvement especially in the fields of parameter selection and generalization to other prognostics situations. As future research, we intend to study formal methodologies for validation of data-driven approaches, and investigate fusion of prognostic estimates such as ensemble methods. It is also our intention to perform a deeper analysis of the tested data-driven models with metrics related to prediction horizon, sensitivity to damage state estimation, confidence distribution, evolution of distribution around actual time of failure, and stability/robustness of the prediction. ACKNOWLEDGMENT This work was partially supported partly by the MIT Portugal Program and FCT - Fundacao para a Ciencia e a Tecnologia (SFRH/BD/52347/2013). N OMENCLATURE Nomenclature used in paper follows. EOL HM K-NN LR MAD MAPE MdAE MTBR NN RA RF RUL SVR

End of Life Health Monitoring K-Nearest Neighbors Linear Regression Median Absolute Deviation Mean Absolute Percentage Error Median Absolute Error Mean Time Between Repairs Neural Networks Relative Accuracy Random Forests Remaining Useful Life Support Vector Regression

R EFERENCES Baptista, M., de Medeiros, I. P., Malere, J. P., Nascimento, C., Prendinger, H., & Henriques, E. M. (2017). Compara-

tive case study of life usage and data-driven prognostics techniques using aircraft fault messages. Computers in Industry, 86, 1–14. Baptista, M., de Medeiros, I. P., Malere, J. P., Prendinger, H., Nascimento Jr, C. L., & Henriques, E. (2016a). A comparison of data-driven techniques for engine bleed valve prognostics using aircraft-derived fault messages. In Annual European Conference of the Prognostics and Health Management Society (Vol. 7, p. 13). Baptista, M., de Medeiros, I. P., Malere, J. P., Prendinger, H., Nascimento Jr, C. L., & Henriques, E. (2016b). Improved time-based maintenance in aeronautics with regressive support vector machines. In Annual Conference of the Prognostics and Health Management Society 2016 (Vol. 7, p. 10). Brotherton, T., Jahns, G., Jacobs, J., & Wroblewski, D. (2000). Prognosis of faults in gas turbine engines. In Aerospace IEEE Conference (Vol. 6, pp. 163–171). Brys, G., Hubert, M., & Struyf, A. (2004). A Robustification of the Jarque–Bera Test of Normality. In COMPSTAT 2004 Symposium, Section: Robustness. DePold, H. R., & Gass, F. D. (1998). The application of expert systems and neural networks to gas turbine prognostics and diagnostics. In ASME 1998 International Gas Turbine and Aeroengine Congress and Exhibition. Di Maio, F., & Zio, E. (2013). Failure prognostics by a data-driven similarity-based approach. International Journal of Reliability, Quality and Safety Engineering, 20(01), 1350001. Drucker, H., Burges, C. J., Kaufman, L., A, S., & Vapnik, V. N. (1997). Support vector machines. Advances in Neural Information Processing Systems, 9, 155–161. Dunteman, G. H. (1989). Principal Components Analysis (No. 69). Sage. Ebden, M., Stranjak, A., & Roberts, S. (2010). Visualizing uncertainty in reliability functions with application to aero engine overhaul. Journal of the Royal Statistical Society: Series C (Applied Statistics), 59(1), 163–173. Goebel, K., Saha, B., & Saxena, A. (2008). A comparison of three data-driven techniques for prognostics. In 62nd Meeting of the Society for Machinery Failure Prevention Technology (MFTP) (pp. 119–131). Huang, H.-Z., Wang, H.-K., Li, Y.-F., Zhang, L., & Liu, Z. (2015). Support vector machine based estimation of remaining useful life: Current research status and future trends. Journal of Mechanical Science and Technology, 29(1), 151–163. Jardine, A. K., Lin, D., & Banjevic, D. (2006). A review on machinery diagnostics and prognostics implementing condition-based maintenance. Mechanical systems and signal processing, 20(7), 1483–1510. Khelif, R., Malinowski, S., Chebel-Morello, B., & Zerhouni, N. (2014). Rul prediction based on a new similarityinstance based approach. In 2014 IEEE 23rd Interna10


tional Symposium on Industrial Electronics (ISIE) (pp. 2463–2468). Kohavi, R., et al. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. In IJCAI (Vol. 14, pp. 1137–1145). Kröse, B., Krose, B., van der Smagt, P., & Smagt, P. (1993). An introduction to neural networks. Citeseer. Li, Y., & Nilkitsaranont, P. (2009). Gas turbine performance prognostic for condition-based maintenance. Applied Energy, 86(10), 2152–2161. Riad, A., Elminir, H., & Elattar, H. (2010). Evaluation of neural networks in the subject of prognostics as compared to linear regression model. International Journal of Engineering & Technology, 10(1), 52–58. Saxena, A., Celaya, J., Balaban, E., Goebel, K., Saha, B., Saha, S., & Schwabacher, M. (2008). Metrics for Evaluating Performance of Prognostic Techniques. In Prognostics and Health Management Conference (pp. 1–17). Schwabacher, M. (2005). A survey of data-driven prognostics. In Proceedings of the AIAA Infotech@ Aerospace Conference (pp. 1–5). Schwabacher, M., & Goebel, K. (2007). A survey of artificial intelligence for prognostics. In AAAI fall symposium (pp. 107–114). Seemann, R., Langhans, S., Schilling, T., & Gollnick, V. (2010). Modeling the life cycle cost of jet engine maintenance. Hamburg: Technische Universität HamburgHarburg. Stranjak, A., Dutta, P. S., Ebden, M., Rogers, A., & Vytelingum, P. (2008). A multi-agent simulation system for prediction and scheduling of aero engine overhaul. In Proceedings of the 7th International Joint Conference on Autonomous Agents and Multiagent Systems: Industrial track (pp. 81–88). Tobon-Mejia, D. A., Medjaher, K., Zerhouni, N., & Tripot, G. (2012). A data-driven failure prognostics method based on mixture of gaussians hidden markov models. IEEE Transactions on Reliability, 61(2), 491–503. Tukey, J. W. (1977). Exploratory Data Analysis. Weckman, G. R., Marvel, J. H., & Shell, R. L. (2006). Decision support approach to fleet maintenance requirements in the aviation industry. Journal of Aircraft, 43(5), 1352–1360. Weibull, W. (1951). A statistical distribution function of wide applicability. Journal of Applied Mechanics, 103, 293– 297. Wu, X., Kumar, V., Quinlan, J. R., Ghosh, J., Yang, Q., Motoda, H., . . . others (2008). Top 10 algorithms in data mining. Knowledge and Information Systems, 14(1), 1–37. Xue, F., Bonissone, P., Varma, A., Yan, W., Eklund, N., & Goebel, K. (2008). An instance-based method for remaining useful life estimation for aircraft engines.

Journal of Failure Analysis and Prevention, 8(2), 199– 206. Zaidan, M. A., Harrison, R. F., Mills, A. R., & Fleming, P. J. (2015). Bayesian hierarchical models for aerospace gas turbine engine prognostics. Expert Systems with Applications, 42(1), 539–553. Zaidan, M. A., Mills, A. R., Harrison, R. F., & Fleming, P. J. (2016). Gas turbine engine prognostics using bayesian hierarchical models: A variational approach. Mechanical Systems and Signal Processing, 70, 120–140. Zaidan, M. A., Relan, R., Mills, A. R., & Harrison, R. F. (2015). Prognostics of gas turbine engine: An integrated approach. Expert Systems with Applications, 42(22), 8472–8483. B IOGRAPHIES Marcia Lourenco Baptista (BS and MSc. in Informatics and Computer Engineering. Instituto Superior Tecnico, Lisbon, Portugal, September 2008) is a PhD candidate student at the Engineering Design and Advanced Manufacturing (EDAM) program under the umbrella of the MIT Portugal Program. Her research focuses on the development of data-driven prognostics techniques the aeronautical sector. Her research interests include machine learning applied to health monitoring systems and structural health management. Cairo Lucio Nascimento Junior received the B.S. degree in electrical engineering from the Federal University of Uberlandia (UFU, Uberlandia, Brazil) in 1984, the M.S. degree in electronics engineering from the Instituto Tecnologico de Aeronautica (ITA, Sao Jose dos Campos, Brazil) in 1988 and the Ph.D. degree in electrical engineering from the UMIST, Control Systems Centre (Manchester, UK) in 1994. Since 1986 he has been a lecturer with the Division of Electronic Engineering, ITA, and has supervised (or cosupervised) 9 Ph.D. and 27 M.Sc./Prof. M. students. He is the co-author of a book on intelligent systems and was the chairman for the 1999 Fourth Brazilian Conference on Neural Networks. His current research interests include autonomous systems, artificial intelligence, mobile robotics, control engineering, the development of internet-based remote-access laboratories for engineering education and data mining for knowledge extraction and application on fraud detection and PHM (Prognostics and Health Management). Helmut Prendinger received his Master and Doctoral degrees in Logic and Artificial Intelligence from the University of Salzburg in 1994 and 1998, respectively. Since 2012, he is a full professor at the National Institute of Informatics (NII), Tokyo, after joining NII in 2004 as Associate Professor. Previously, he held positions as research associate (2000 2004) and JSPS postdoctoral fellow (1998 2000) at the University of Tokyo, Dept. of Information and Communication Engineering, Faculty of Engineering. In 11


1996-1997, he was a junior specialist at the University of California, Irvine. His research interests in artificial intelligence include machine learning, intelligent user interface, cyber-physical systems, and the melding of real and virtual worlds, in which areas he has published more than 220 peerreviewed journal and conference papers. His vision is to apply his research to establishing the IT infrastructure for Unmanned Aerial Vehicles, or “drone”. He is a member of IEEE and ACM. Elsa Maria Pires Henriques has a doctorate degree in Mechanical Engineering and is associated professor at Instituto Superior Tecnico in the University of Lisbon.

She is responsible for the “Engineering Design and Advance Manufacturing (LTI/EDAM)” post-graduation. During the last fifteen years she has participated and/or coordinated several national and European R&D projects in collaboration with different industrial sectors, from tooling to automotive and aeronautics, mainly related to manufacturing, life cycle based decisions and management of complex design processes. She has a large number of scientific and technical publications in national and international conferences and journals. She was a national delegate in the 7th Framework Programme of the EU.

12

View publication stats

P U B L I C AT I O N X

Baptista, M., Prendinger, H., & Henriques, E. Prognostics based on Elbow Point Detection using Neural Networks Computers in Industry, 2018, submitted

251

Prognostics based on Elbow Point Detection using Neural Networks Elsevier1 Radarweg 29, Amsterdam Elsevier Inca,b , Global Customer Serviceb,⇤ a 1600

John F Kennedy Boulevard, Philadelphia Park Avenue South, New York

b 360

Abstract The exercise of identifying when the equipment starts to change away from nominal to a faulty state is designated as elbow point detection. This is an important but often overlooked exercise that can be used to enhance the accuracy of remaining useful life estimation. In this paper we address elbow point detection using a deep learning approach. Long Short Term Memory (LSTM) Network, Gated Recurrent Unit (GRU) network and Standard Recurrent Neural Network (SRNN) are tested in their ability to classify a sequence of sensor measurements under the nominal (positive label) or faulty condition (negative label). The baseline detection model is the sliding window Z-test. Several experiments are performed on synthetic data resembling the C-MAPSS data taken from the NASA Ames Prognostics CoE Data Repository. The results are positive suggesting that the proposed models can outperform the baseline and reach a mean absolute error of around 10 days with zero false positives vs the 20 days of the Z-test method. When integrated within a prognostics system, the LSTM elbow point detection model, the best stand-alone classifier model, yields a significant overall performance increase of around 30% in terms of absolute error in RUL estimation on the C-MAPSS dataset. Keywords: Aircraft Prognostics, Elbow Point Detection; RUL Estimation; C-MAPSS 2010 MSC: 00-01, 99-00

⇤ Corresponding

author Email address: [email protected] (Global Customer Service) URL: www.elsevier.com (Elsevier Inc) 1 Since 1880.

Preprint submitted to Journal of –

March 29, 2018

Figure 1: Elbow point. Technical systems undergone inherent degradation due to aging, corrosion, erosion and other physical processes. Often the rate of degradation to which the equipment is subject varies over its lifetime. Typically, the equipment is characterized by a “nominal” degradation rate which defines its “nominal” condition under which the equipment is degrading at a low or controlled rate. After a certain event has occurred (marked in red in the picture), such as a crack, cell corrosion or other specific situation, the rate at which the equipment is degrading can change significantly leading to the equipment being in a “faulty” condition leading the equipment closer to its End of Life (EoL).

1. Introduction

5

10

15

20

25

Mechanical degradation is a gradual physical process which can be accelerated by di↵erent factors. The same system can exhibit di↵erent rates of degradation throughout its life cycle. The typical system will go through at least one change in its life cycle; from ‘nominal’ to ‘faulty’ . In the nominal stage of degradation the rate at which the equipment deteriorates is typically low and the decline in performance is expected to be linear in respect to time. When entering the faulty stage, the system starts to degrade at a faster pace and failure is progressively more likely. Monitoring these changes of performance is important as it allows to better determine the onset of failure mechanisms and to more e↵ectively estimate the remaining useful life (RUL) of the equipment. Prognostics or the estimation of the remaining useful life (RUL) of an equipment is a complex exercise that involves a sequence of (optional and mandatory) steps [1]. One such step is the detection of the change-point (often referred to as elbow point) of the degradation rate of the system. This detection task typically consists in inferring, from the stream of sensor measurements, when does the system change away from a stable performance in which any detectable degradation follows a linear trend to a performance pattern in which degradation experiences somehow a geometric trend. This task, formally designated as elbow point detection [2], is in accordance with the p-f interval basic reasoning, which assumes that a potential failure (p) can be observed some time before the functional failure (f) occurs (see Fig. 1). The importance of elbow point detection follows from two main reasons. First, attempting to estimate the RUL before the elbow-point occurs leads to inaccurate estimates as it does not include data on the more intense degradation process that can anticipate future failure. Accurate RUL estimation has to 2

55

be forcefully based on the rate which occurs when the failure mechanisms are already taking place. Trying to estimate the remaining useful life before this change-point, before any relevant degradation is present in the available sensor data, only introduces inaccuracies in the estimates. Estimations based on data before the elbow point of the equipment health signals will lead to estimated RULs quite large compared to the real ones, as the degradation is still linear and frequently very small. Accurate calculations have to be forcefully based on the higher degradation rate experienced by the equipment within the p-f interval. Secondly and perhaps most importantly, the period of time until the elbowpoint should be modeled as having linear degradation (stable performance) to prevent miscalculations of the RUL after the elbow point. The p-f interval is one of the most important concepts of on-condition maintenance and it should be modeled the most accurately possibly to ensure the correct calculation of the RULs after the elbow point. In this article we argue and hypothesize that by having a clearer notion of where the elbow-point is one can estimate the RUL more accurately leading to a better avert of failure. To address the problem of elbow point detection this paper proposes a deep learning approach. Here, we contrast with the previous literature on the subject [3], which a) either disregards altogether the elbow point detection step, being this the approach of the majority of the works, or b) which follow a statistical approach [2], where usually the variance of either the mean or standard deviation or both, are used over a sliding window to trigger an alarm for pattern shift. This paper presents a novel model where recurrent neural networks (RNNs) are proposed to detect the change-point of the trajectories of a set of simulated gas turbine engines. Accordingly, and in order to quantify the importance of having a sound process of change point detection, we compare several RNN architectures combined with the Echo State Network (ESN) to see the e↵ect of elbow detection on the performance of RUL estimation. The main contributions of this paper are summarized as follows:

60

• Generation of synthetic data: It was necessary to evaluate the performance of the detection models on p-f curves that had well known elbow points. This step involved simulating p-f curves that resembled as close as possible real-world curves. The C-MAPSS datasets from NASA Ames Prognostics CoE Data Repository were used as models.

30

35

40

45

50

65

• Proposal of a novel architecture to perform elbow point detection: A novel architecture based on a recurrent neural network classifier was proposed and di↵erent RNN architectures, such as standard RNN, Long Short Term Memory (LSTM) and Gated Recurrent Unit (GRU), were tested in their ability to identify the elbow-point of the synthetic data. The baseline here was the Z-test. • Integration of the best performing elbow detection model on a prognostics system to perform RUL estimation on the C-MAPSS data: The proposed prognostics system was compared against a system without elbow detec-

3

70

75

tion to analyze the di↵erence in terms of RUL estimation performance on the C-MAPSS data. The remaining of this paper is organized as follows. Section 2 discusses related work in the general field of change-point detection and in the specific field of prognostics. Section 3 presents the modeling framework. Section 4 describes the methodology followed. Results of our experiments are presented in Section 5 and Section 6 concludes the paper. 2. Background

80

85

90

95

100

105

110

The discipline of change point detection covers a broad range of real-world problems and is an active topic in the community of statistics and machine learning. In this section we cover this general topic and also focus on elbow point detection in prognostics. 2.1. Supervised Change Point Detection Change point detection is a complex and broad domain which has been studied in the context of a large number of applications. Its broad range of applications is reflected in the diverse set of formulations and definitions relevant to change detection. Common notions to all analysis is that temporal continuity plays a key in all formulations and that unusual changes, sequences or temporal patterns in the data are the subject of study. Many machine learning and statistical algorithms have been designed for the general purpose of change point detection [4, 5, 6, 7]. Supervised and unsupervised approaches, likelihood ratio methods as well as other techniques have been tested with some success in the most di↵erent fields of application. In this section and since we follow a supervised approach we focus on an overview of supervised techniques to tackle the problem. Supervised learning is an approach to machine learning which consists in a set of algorithms that aim at learning the mapping from input to output data based on a set of labeled data [8]. In this case, supervised change point detection implies that there is training data where change points are known. Considering supervised change point detection it is important to distinguish two main subapproaches: binary and multi-class classification. If the number of possible states is known apriori, the detection can be multi-class and the algorithm can be trained to detect each change state. In this approach, a sliding window is used to classify each data point within a given class. A change point is identified when two consecutive data points are classified as di↵erent states. This approach has the downside of requiring a large amount of training data and each class has to be represented in a balanced way. On the other hand, being able to detect each change class provides information not only about the change point but also about the nature of the change point and of the underlying process of change. A variety of classifiers can be used for this approach. Examples of techniques used in this field include random forests [9, 10], support vector machines [11], nearest neighbor [12] and hidden Markov model [13, 14]. 4

115

Another alternative to treat change point detection is binary classification. In this approach, all the data points in the time series corresponding to change points represent one class while the remaining sequences represent the second class. Even though only two classes need to be learned this framing poses a more complex and challenging problem especially because it su↵ers from extreme class imbalance. For this reason this alternative is usually not so explored even though there are some works such as [15, 16]. 2.2. Change detection in prognostics

120

125

130

135

140

Change point detection is an important topic that can allow for more accurate and precise prognostics. When applied within the scope of prognostics, change point detection can focus on detecting when the equipment changes away from its nominal degradation rate. The goal here can be to detect the instant when the equipment is no longer degrading at a linear “nominal” rate but is instead degrading at a more alarming exponential rate, closer to failure and to the end of life. Despite the importance of change point detection for prognostics this step is usually ignored in most prognostics systems [3]. The literature in the topic is also not extensive. One of the most important contributions to the field is the work in [2] who defines change point detection in the scope of prognostics as “elbow point detection” mostly due to the “elbow” shape that the sensor measurements follow around the change point (see Fig. 1). In the method the authors rely on the variance of the mean/standard deviation over a sliding window to identify the change point. Other authors have addressed the problem of change detection in prognostics using the same sliding window technique combined with a statistical hypothesis testing approach of the work in [2]. For instance, the sequential probability ratio [17] and the rank permutation [18, 19, 20, 17] are commonly used tests in prognostics. Nevertheless, much work could have been developed in the field. Lack of interest in the field might be explained by the traditional focus on the estimation of the RUL. The lack of use of supervised training algorithms for elbow point detection can also be explained by the difficulties to generate sufficient data to develop the regression models. 3. Prognostics System and Detection Model

145

This paper presents a prognostics system based on neural network techniques. The novelty of the presented system lies in its elbow detection model based on recurrent neural networks. In this section we describe the prognostics system and elbow detection model in detail. 3.1. Prognostics System

150

The prognostics architecture (PS1) used in this work is described in Fig. 2. The overall goal of this architecture is to receive as input a stream of n time series representing the sensor data and to estimate the RUL for each time point 5

Figure 2: General prognostics architecture.

t 2 {1, . . . , T }. The proposed architecture is composed of four main building blocks (see Fig. 2). We briefly describe each one of these: 155

160

165

170

• PS1.1 - Synthetic Data Generator: This module is responsible for generating streams of sensory data with well-known elbow points. • PS1.2 - Pre-processing: Both the synthetic and the raw sensory series are subject to pre-processing (PS1.2) before they are fed into the elbow point detection model (PS1.3) and the RUL estimation model (PS1.4). In the proposed architecture simple procedures are used for pre-processing purposes such as standard normalization and exponential filter. The goal here is to baseline and filter the time series. • PS1.3 - Elbow Point Detection Model: The elbow point detection model (PS1.3) proceeds in two steps. In the first step it constructs the model (H) from the pre-processed synthetic data. In the second step, the model is ready to predict the time point (k ⇤ ) at which the equipment degradation is traceable. Here, a deep learning solution is proposed. In the remaining part of the procedure, only the remaining part of the time series is considered for RUL estimation purposes. • PS1.4 - RUL Estimation Model: This module entails the prediction of the equipment RUL. It is based on the analysis of the time series after the elbow point. The architecture relies on a echo state network (ESN) to perform the final RUL estimations. 6

175

In the following sections each of these modules is explained in detail. Particular focus is given to the description of the elbow point detection model. 3.2. PS1.1 - Synthetic Data Generator

180

185

190

Before explaining the module of responsible for generating the set of synthetic time series (PS1.1) it is important to explain the di↵erence between ‘raw’ and ‘synthetic’ sensory time series. When referring to raw sensory time series we refer to the unprocessed series retrieved from the equipment sensors. In contrast, synthetic data are series synthesized from raw data where the elbow point location is synthesized and known. These series are similar in everything to the original raw series but with known elbow points. The need to have this synthesis follows from the fact that most available sensor data streams do not have information on the location of their elbow points and supervised learning approaches to elbow detection require this labeling. The module PS1.1 has the goal of reproducing real-world (or simulated) sensory time series fixing for each trajectory a new know “elbow point”. Having known this change point aims to help develop a supervised model of change point detection. To generate the synthetic data the module proceeds in three steps: • Elbow point generation: First, the elbow point k ⇤ is generated based on a random seed.

195

200

205

• < P-F Interval: After this, the first k ⇤ points are generated according to the linear degradation rate and to the level of noise obtained from the first points of the raw data. Note that to estimate the degree of noise a translation is performed on the data to remove the possible initial linear degradation that the equipment might have. • P-F Interval: Finally, the third step involves generating a random number of points after the elbow point (imposing a limit that the total trajectory length is less than these limit points). These points are generated according to the exponential degradation rate observed from the real data and also according to the level of noise data. The result of this procedure is a collection of trajectories simulating the real dataset, now with well-known elbow points. 3.3. PS1.2 - Pre-processing

210

Every prognostics system tends to have a pre-processing module whose goal is to baseline/filter the data and to mitigate the influence of noise and outliers. In our architecture we choose to address this problematic with two solutions. First, the sensor measurements are normalized according to the standard rule: s(t)norm =

s(t)

7

µ

(1)

where s(t)norm represents the normalized signal at the t time, s(t) is the original signal at time t, and µ and are the mean and the standard deviation values of the signal. 215

Second, and in order to reduce the level of noise and to eliminate the possible outliers that may contaminate the data we use the exponential filter: s(t)smooth = w ⇤ s(t) + (1–w) ⇤ s(t–1)smooth

220

(2)

where s(t)smooth is the output of the filter at the t time, s(t) is the input value at the t time, s(t 1)smooth is the previous output value of the filter, w is the weighting factor in the range [0, 1]. High values of w favor new data over old data and the output responds quickly to input changes but is not so smoothed. Low values of w favor old data over new data with the filter output being more smoothed and responding more slowly to changes in the input. 3.4. Detection Model

225

Formally, elbow point detection is a classification problem where there is a training sample set Dtrain and a testing sample set Dtest . and the problem is, based on the training set Dtrain , to find a function such that H : Xtrain ! L

230

235

240

245

(3)

This function H is the classification model and is used to predict the class labels of new instances as accurately as possible. To do this the model proceeds in two steps: a) first it constructs a supervised elbow detection classifier model H from synthetic data (PS1.3.1) and b) then uses this model to classify the raw sensory time series (PS1.3.2) (see Fig. 2). Training the proposed model requires having information on both the sensory signals and the elbow points of the sensory streams. Accordingly, there is the need to generate Dtrain from the pre-processed streams of synthetic data. Concretely, the training sample set is such so that Dtrain = (Xtrain , L) where Xtrain = {X1 , . . . , Xm } represents the set of training instances, and L = {l1 , . . . , lm } the finite set of classes. Each instance Xi is a vector of size q that represents the set of synthetic sensor signals received at time t. In the training set each instance Xi is assigned a class label li where we consider the existence of only two classes: one positive and one negative class (|L| = 2). The positive class indicates the nominal condition (before the elbow point), while the negative class indicates the faulty condition (after the elbow point). After the training has been complete the elbow point detection model can proceed to the online classification of the raw sensory data (PS1.3.2). Here the model receives the testing sample data Dtest where Dtest = (Ytest ) = {Y1 , . . . , Ym } represent the set of testing instances and Yi is a vector of size q that represents the set of raw sensor signals received at time t. The model processes the data instances classifying each data point as faulty or nominal. The first point classified as faulty is considered the elbow point. 8

Figure 3: Proposed network architecture. The architecture seeks at each time t to analyse a set of measurements and classify them as faulty or nominal. The hidden layer is composed of three hierarchical and temporally connected sub-layers. The activation function is the sigmoid function.

250

255

260

As the concrete classifier we choose to use a recurrent neural network (RNN) architecture. The RNN is a type of neural network that has outperformed previous state of the art algorithms in several domains such as speech recognition [21, 22] or handwriting recognition [23]. This kind of algorithms has the advantage of being able to model the temporal context and of being especially suitable to deal with time series. The RNN model that we use is technically a 3-layer network classifier with dimensions 400-100-100. The architecture is represented in more detail in Fig. 3. As the hidden layer, we opted for the use of the standard RNN layer, the LongShort Term Memory Network (LSTM) layer and the Gated Recurrent Network (GRU) layer. Here, the standard RNN stood out as the first proposal of an RNN model, LSTM was selected for its popularity and GRU for its relative simplicity and training performance. 3.5. RUL Estimation

265

After being pre-processed the prognostics signals are fed into a neural network model to perform the final RUL estimation. Here the RUL for a given time tp is such so that: RU L(tP ) = EOL(tP )

tP

(4)

EOL is defined as the time point at which the equipment no longer meets at least one of a set of functional requirements. These requirements may be expressed using a threshold, beyond which we say the component has failed. Formally the

9

270

EOL can be defined as: EOL(tP ) = inf{t 2 R

275

tP ^ TEOL (t) = 1}

(5)

As neural network model and for this work, we choose to use the echo state network following the positive results obtained previously [2]. Note that, and importantly, only segments of the prognostics signals that are after the estimated elbow points can be considered for calculation of the final RUL. This follows from the fact that it only makes sense to train a model from examples of exponential degradation as failure is only expected to occur at this stage. 4. Methodology

280

This paper has two clear goals: first, to show that the proposed change point model has better performance detecting the elbow point than the baseline Z-test model. Second, to show that a prognostics system integrated with our proposed detection model has a better performance in terms of RUL estimation that a prognostics system without elbow detection model. We describe these two experiments in the following sections. 4.1. Experiment A

285

290

295

It is important to quantify the degree to which the proposed elbow point detection model (DS1.3.1) can perform its function correctly independently of how well the RUL is estimated. Accordingly, we devised an experiment to test the elbow point model alone. In the experiment, all the components of the prognostics system DS1 are used with exception of the module of RUL estimation (PS1.4). The experiment is described in Fig. 4. As shown, the architecture has all its elements with the exception of the RUL estimation module. Also and as shown, the architecture works on top of synthetic data. The training and testing of the classifier H is done with these data as it is necessary to validate and compare the estimated elbow points against its true values. The 10-fold cross validation scheme is used to prevent loss of generalization. As the baseline model for this experiment we use the Z-test [2]. This test uses a sliding window and a two sample Z-test to determine if the means of the past and future segments of data are equal or unequal. This test is also run on top of synthetic data. The formula used in the following: µ1 t= q

300

µ2 12 n1

+

22 n2

(6)

where µ1 and µ2 are the means of the two samples, is the sensibility of the test, 1 and 2 are the standard deviations of the two samples, and n1 and n2 are the sizes of the two samples.

10

Figure 4: Diagram of Elbow Point Model in experiment A.

305

310

315

320

325

4.2. Experiment B To support the claim of the superiority of prognostics systems based on elbow detection methods we contrast the architecture of Fig. 2 against the architecture of Fig. 5. In the prognostics system of Fig. 5 there is no elbow detection model (PS1.3) with the main elements of the architecture being solely the preprocessing module and the RUL estimation module based on the echo state network (PS1.4). 4.3. Data The data that motivates this study is the C-MAPSS engine performance data from the NASA Ames Prognostics CoE Data Public Repository [24]. CMAPSS is a tool coded in the MATLAB and Simulink environment that aims to simulate a realistic large commercial turbofan engine. It mimics an engine model of the 90,000 lb thrust class and it allows simulating operations at (i) altitudes ranging from sea level to 40,000 ft, (ii) Mach numbers from 0 to 0.90, and (iii) sea-level temperatures from -60 to 103o F. The engine diagram in Fig. 6 shows the main elements of the engine model. Here from the 4 possible C-MAPSS datasets, we considered only the C-MAPPS dataset 1, which is characterized by one operating condition and one failure mode. In the C-MAPSS an engine is characterized by an initial rate of degradation. Its evolution through time is characterized by n = 21 sensory time series. Process and measurement noises have been added to the simulation resulting in complex noise dynamics. The complexity of detecting the elbow point greatly stems from the fact that these sensor data are contaminated with noise and that the data can also contain (often non-normally) distributed outliers. 11

Figure 5: Diagram of Baseline Prognostics System.

Figure 6: Simplified diagram of engine simulated in C-MAPSS [25].

12

Table 1: Elbow detection algorithm comparison.

AUC MAE (days)

Standard RNN

GRU

LSTM

Z-Test

0.9973 12.83 ± 0.61

0.9973 13.01 ± 0.64

0.9974 15.82 ± 0.69

NA 20.2 ± 2.31

5. Experiments and Results

330

The evaluation of our model proceeded in two main steps: a) first the elbow point detection algorithm was evaluated on the synthetic data in a supervised way and then b) the algorithm was evaluated within a prognostics system with the real C-MAPSS data. We dedicate the following two sections to the results of these two experiments. 5.1. Experiment A: Elbow Point Detection

335

340

345

350

355

360

We draw the ROC curve of our proposed elbow detection algorithm to calculate the accuracy, sensitivity, specificity rate and AUC of the classification. Analyzing the Receiver Operating Curve (ROC) of the three possible RNN models (see Fig. 7a) it can be seen that any of the proposed RNNs has an acceptable and comparable performance in regards to elbow point detection. Nevertheless, the LSTM shows the largest Area Under the Curve (AUC) value as Fig. 7b better shows. Even though it is important to analyze the ROC curve and the AUC it was also important to restrict the subset of studied models to the ones that exhibited zero false positives. This follows from the fact that it should not be not admissible to have an algorithm that predicts the elbow point ahead of time. This kind of situation greatly a↵ects the quality of the prognostics exercise and should be avoided altogether. We report the MAE of the RNN models having zero false positives. The MAE indicates how much days on average a model deviated from the true elbow point in its predictions. In contrast with the ROC analysis here it is more clear a marked di↵erence between the RNN models, more specifically, the di↵erence of performance between the GRU and the remaining models. While the Standard RNN model exhibits a performance close to the one of the LSTM, the GRU falls somewhat behind. In Tab. 1 we also compare the RNN models against the baseline Z-test. As shown, the RNN models outperform the Z-test by a significant di↵erence. As a final note, please consider the histograms of Fig. 8 which depict the distribution of the the absolute errors of the di↵erent type of elbow point detection models. As previously, the main di↵erence is between the GRU model (Fig. 8c) and the remaining models (Fig. 8a and 8b). Here, the GRU model shows a tendency to have larger errors. The LSTM and the RNN models have more similar profiles.

13

(a) ROC Curve

(b) Zoom of ROC Curve

Figure 7: Receiver Operating Curve (ROC) of the elbow point detection algorithm using di↵erent versions of a recurrent hidden layer.

(a) LSTM

(b) Standard RNN

(c) GRU

Figure 8: Histogram of the absolute error of di↵erent elbow point detection models.

14

Table 2: Comparison of baseline and proposed model in Mean Absolute Percentage Error (MAPE) and Mean Absolute Error (MAE).

MAPE (%) Life percentage 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% Average:

Baseline 0.26 0.25 0.28 0.33 0.37 0.50 0.57 0.65 0.63 0.80 0.46

Model 0.19 0.23 0.25 0.31 0.31 0.36 0.36 0.33 0.34 0.92 0.36

MAE (days) Baseline 68.71

365

370

375

380

385

Model 47.26

5.2. Experiment B: RUL Estimation To evaluate the impact upon RUL prediction of accurately estimating the elbow point, we tested two prognostics systems under the 10-fold cross validation scheme: a) one prognostics system without elbow detection model and b) another with our proposed model. Tab. 2 compares the performance of the two systems in quantitative terms. In particular it shows the average percentage error of the two systems along the lifetime of the equipment. The residual life of the equipment is expressed in percentage terms with residual life here meaning the beginning of the exponential degradation (after the elbow point has been detected). In what concerns the comparison of the two systems Tab. 2 shows that the error of RUL estimation is significantly lower in the prognostics system having the proposed model. Concretely and on average this is an improvement of 10%. Please note that the only di↵erence here in the prognostics system is the elbow detection model. The di↵erence between the two compared prognostics systems is even more expressive when compared in absolute terms. As shown in Tab. 2 the mean absolute error of the prognostics system with elbow detection is significantly lower than of the system with detection. Here we observe a reduction of around 30%. These results strongly support the hypothesis that elbow point detection is essential for prognostics. It is also possible to investigate visually the accuracy of the prognostics system with and without elbow detection by inspecting the predictive trajectories of the engines. In Fig. 9 we show 4 engines, one per row, where the left column shows the prediction without elbow detection and the right column shows the 15

prediction with elbow detection. As shown, the system is better able to output predictions closer to the perfect correlation line when provided with a elbow detection system. 390

395

400

405

6. Conclusion Being able to detect the change-point of an equipment (degradation wise) is of paramount importance to prognostics. This follows from the fact that trying to estimate the remaining useful life (RUL) before the change-point can only lead to inaccurate estimates as accurate calculations have to be forcefully based on a degradation rate which is the correct one. With this work we seek to draw attention to this topic and to the potential of investing e↵orts on it. In this work a suite of recurrent neural network models was proposed for elbow point detection, i.e., to detect when engine performance data starts to change away from their nominal level. Promising results were achieved with three classifier models (RNN, LSTM and GRU). The performance of the best performing RNN model, the LSTM, was further explored in a deep learning prognostics system. The results here were also promising showing a significant increase of performance in RUL estimation of around 30% in absolute error and of around 10% in percentage error. These results encourage the further research of other more accurate and precise algorithms of elbow point detection and the further integration of these type of algorithms within the prognostics exercise. Bibliography

410

1. Jardine AK, Lin D, Banjevic D. A review on machinery diagnostics and prognostics implementing condition-based maintenance. Mechanical systems and signal processing 2006;20(7):1483–510. 2. Rigamonti M, Baraldi P, Zio E, et al. echo state network for the remaining useful life prediction of a turbofan engine. In: annual conference of the prognostics and health management society 2015. 2016:255–70.

415

3. Ramasso E, Saxena A. Performance benchmarking and analysis of prognostic methods for cmapss datasets. International Journal of Prognostics and Health Management 2014;5(2):1–15. 4. Keogh E, Chu S, Hart D, Pazzani M. Segmenting time series: A survey and novel approach. In: Data mining in time series databases. World Scientific; 2004:1–21.

420

5. Chandola V, Banerjee A, Kumar V. Anomaly detection: A survey. ACM computing surveys (CSUR) 2009;41(3):15. 6. Gupta M, Gao J, Aggarwal CC, Han J. Outlier detection for temporal data: A survey. IEEE Transactions on Knowledge and Data Engineering 2014;26(9):2250–67. 16

(a) Engine #1 Baseline

(b) Engine #1 Elbow Detection

(c) Engine #2 Baseline

(d) Engine #2 Elbow Detection

(e) Engine #3 Baseline

(f) Engine #3 Elbow Detection

(g) Engine #4 Baseline

(h) Engine #4 Elbow Detection

Figure 9: Predicted vs actual plots for di↵erent engines of the C-MAPSS dataset.

17

425

7. Aminikhanghahi S, Cook DJ. A survey of methods for time series change point detection. Knowledge and information systems 2017;51(2):339–67. 8. Cook DJ, Krishnan NC. Activity learning: discovering, recognizing, and predicting human behavior from sensor data. John Wiley & Sons; 2015.

430

9. Reddy S, Mun M, Burke J, Estrin D, Hansen M, Srivastava M. Using mobile phones to determine transportation modes. ACM Transactions on Sensor Networks (TOSN) 2010;6(2):13. 10. Zheng Y, Chen Y, Li Q, Xie X, Ma WY. Understanding transportation modes based on gps data for web applications. ACM Transactions on the Web (TWEB) 2010;4(1):1.

435

440

11. Zheng Y, Liu L, Wang L, Xie X. Learning transportation mode from raw gps data for geographic applications on the web. In: Proceedings of the 17th international conference on World Wide Web. ACM; 2008:247–56. 12. Wei L, Keogh E. Semi-supervised time series classification. In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM; 2006:748–53. 13. Cleland I, Han M, Nugent C, Lee H, McClean S, Zhang S, Lee S. Evaluation of prompted annotation of activity data recorded from a smart phone. Sensors 2014;14(9):15861–79.

445

14. Han M, Lee YK, Lee S, et al. Comprehensive context recognizer based on multimodal sensors in a smartphone. Sensors 2012;12(9):12588–605. 15. Feuz KD, Cook DJ, Rosasco C, Robertson K, Schmitter-Edgecombe M. Automated detection of activity transitions for prompting. IEEE transactions on human-machine systems 2015;45(5):575–85.

450

16. Desobry F, Davy M, Doncarli C. An online kernel change detection algorithm. IEEE Transactions on Signal Processing 2005;53(8):2961–74. 17. Ganesan R, Rao A, Das TK. A multiscale bayesian sprt approach for online process monitoring. IEEE transactions on semiconductor manufacturing 2008;21(3):399–412.

455

18. Eklund NH, Goebel KF. Using neural networks and the rank permutation transformation to detect abnormal conditions in aircraft engines. In: Soft Computing in Industrial Applications, 2005. SMCia/05. Proceedings of the 2005 IEEE Mid-Summer Workshop on. IEEE; 2005:1–5. 19. Hu X, Eklund N, Goebel K. Using rank permutation for aircraft engine fault diagnostics. Proceedings of 60th MFPT 2006;.

460

20. Hu X, Eklund N, Goebelf K, Cheetham W. Hybrid change detection for aircraft engine fault diagnostics. In: Aerospace Conference, 2007 IEEE. IEEE; 2007:1–10. 18

465

ˇ 21. Mikolov T, Karafiát M, Burget L, Cernock` y J, Khudanpur S. Recurrent neural network based language model. In: Eleventh Annual Conference of the International Speech Communication Association. 2010:. 22. Graves A, Mohamed Ar, Hinton G. Speech recognition with deep recurrent neural networks. In: Acoustics, speech and signal processing international conference. IEEE; 2013:6645–9.

470

475

23. Graves A, Schmidhuber J. O✏ine handwriting recognition with multidimensional recurrent neural networks. In: Advances in neural information processing systems. 2009:545–52. 24. Saxena A, Goebel K, Simon D, Eklund N. Damage propagation modeling for aircraft engine run-to-failure simulation. In: Prognostics and Health Management, 2008. PHM 2008. International Conference on. IEEE; 2008:1–9. 25. Frederick DK, DeCastro JA, Litt JS. User’s guide for the commercial modular aero-propulsion system simulation (c-mapss) 2007;.

19