SECO: Machine Learning at the Edge: A few applicative cases of Novelty Detection on IIoT gateways

In the paper, we present the design and development of the Machine Learning (ML) modules for two case studies. In both cases, we developed an ML model to learn the system’s normal behaviour so to identify whichever abnormal condition may arise. Such a framework is usually referred to as Anomaly Detection (also known as Fault Detection or Novelty Detection). Our models succeeded at identifying the injected anomalies. In addition, no anomalies were observed when the model was fed with normal data. The results are discussed considering the trade-off between the type of sensors, learning algorithm, training effort, computational demands.

I. INTRODUCTION

Internet of Things (IoT) made it possible to collect a huge amount of real-object data by its connected devices. Monitoring the environment and the machines that make up the current industrial plant has never been more feasible and affordable due to the spread of low-cost and low-power sensors, as well as powerful single board computers (SBC), used as IoT gateways. However, mining the acquired data to exploit its hidden potential and turn it into gold is neither trivial nor easy to transfer into standard processes, given the variety of machinery that can be employed, and the different ways for “sensing” them and checking their health condition.

II. CONDITION MONITORING AND FAULT DETECTION IN IOT SYSTEMS

More or less recent expressions like “fault detection and diagnosis”, “condition monitoring”, “predictive maintenance” and a copious amount of variations on the theme have become popular or gained a new meaning in the jargon of the Smart Factory. Although the boundaries among them are often ill-defined, they commonly refer to a set of different strategies aimed at lowering both unexpected failures and a company’s maintenance costs by the early detection of potentially dangerous or even catastrophic faults. The interaction between the IoT and Artificial Intelligence (AI) has opened the doors to new data-driven approaches for success in the safety and reliability of industrial processes [1]. Using different types of sensors it is now possible to gather and process multifaceted information about the operating machinery and the surrounding environment: current, voltage and other electrical quantities, temperature, pressure and humidity, vibration and acoustic signal, various frequency ranges electromagnetic radiations, and so on [2].

On-line condition monitoring (CM) involves measuring one or more variables – parameters of the condition – on a machine while it is operating with the idea of testing whether they or other derived quantities fall within the “acceptable” bounds w.r.t. a physical, design or safety perspective, or compared to the historical variations of them [3]. This test is conducted using approaches ranging from single parameter threshold checks to rule-based expert systems, from model-based (e.g., finite element) methods to sophisticated machine learning techniques [4]. In general, an alarm is generated if such a test fails, triggering a decision on whether and which kind of corrective actions to adopt. In this sense, CM is the main requirement for predictive maintenance (PM), that is the use of measures data to predict upcoming failures before they occur [5] and determine when it is most cost-effective to intervene on a machine. Hence, maintenance and partial or full machine replacement operations can be scheduled based on the information provided by the condition monitoring system assessments.

A vast literature exists on such subjects that gathers plenty of perspectives and application fields. A decade-old summary by NASA researchers can be found [6] about AI techniques for prognostics, meaning “detecting the precursors of a failure, and predicting how much time remains before a likely failure”. In [7] can be found a review of the recent statistical data-driven solutions for estimating the remaining useful life (RUL), that is the useful life left on an asset at a particular time of operation, which plays a crucial role in condition-based maintenance (CBM). A literature review of prognostic methodologies for CBM [8] classified them into four categories (physical, knowledge-based, data-driven and combination models), focusing on both parametric and non-parametric data-driven approaches. In addition, to provide a detailed summary of “diagnostics and prognostics of mechanical systems implementing CBM with emphasis on models, algorithms and technologies for data processing and maintenance decision-making”, [9] also describes the techniques of multiple sensors data fusion. Among the most recent papers, [10] summarizes the applications of deep learning in machine health monitoring systems, with particular attention to Artificial Neural Networks (ANN). [11] presents an overview of CBM and PM solutions over the last fifteen years and how these conveyed results in different machine parts or processes. conveyed results in different machine parts or processes.

III. CASE STUDIES

The case studies were identified thanks to the participation in a project supported by the Tuscany government for promoting Industrial IoT solutions (IIoT). The IIoT is a two years project funded by the Toscana region focused on the Industry 4.0 started in 2018. Different companies collaborate to create a new framework to facilitate companies to migrate into the new Industry 4.0 perspective, innovate their products and get the advantages from the new growing market of IIoT.

The IIoT Project aims to design, produce and launch a new product to the market. It will include a set of embedded hardware tools, a software platform based on cloud services, ML and professional services. It is dedicated to all the small/middle-sized industries that don't have the capabilities to qualify themselves as a digital entity or as a digital manufacturer in terms of economic or human resources, knowledge or capabilities. The system proposes to reduce the initial knowledge gap for companies and also to spread the initial investment of the whole process. The final output will be a set of hardware (sensors bridges, gateway) fully integrated with a software environment designed to manage all the devices remotely, the data flow and the computational power at different levels. Moreover, a set of ML modules will be provided - with dedicated development processes - to solve the most common industrial application.

There are two main industrial contexts from where the case studies are taken: a kitchen hood manufacturer and an industrial solar power manufacturer.

A. Air ventilation – Kitchen Hood

In this section we describe the analysis of a playground with several interesting aspects, focusing on two details of a device remote monitoring: the prediction of a discrete set of working states; the detection of possible anomalies w.r.t. the standard working behaviour of each state.

The motivations to investigate such a problem can be summarized as real-time monitoring of sensors which provide time series at high frequency and present a fast dynamic which can quickly degenerate into a fault situation; the analysis of a physics measurement that is not directly related to the phenomenon to be monitored; the investigation of an ensemble of models combining both supervised and unsupervised ML techniques; the testing of edge computation performances while running a high-frequency computation on low power Single Board Computer.

The results of a first qualitative analysis show that the global computation requirements are sensible lower than the expectation and that the proposed algorithm is capable of both classifying the state of the systems and detecting the anomalies with a high level of accuracy.

System description

The proposed analysis consists of the remote monitoring of a kitchen hood through an IoT gateway, while completely performing all the required computation at the edge level. In particular, we would like to analyze the working conditions of the hood electrical engine, which can be a general representation of any kind of industrial motor. The IoT Gateway is equipped with an Artificial Intelligence algorithm, capable of performing real-time data acquisition, preserving an intensive workload on the network communication, which can lead to both traffic and security issues [12].

This approach can be useful in different situations. An example, which is the most interesting in our case, is when the system should analyze a phenomenon that produces data with high frequency so that real-time monitoring on the cloud is impossible. In this case, the algorithm directly performs the required pre-processing, producing a summarization of the environment (i.e. in this case the current state and an eventual fault).

The analysis is performed on the hood electrical motors which are used to clean the air. Such a device has three different working speeds, which represent different working configurations. The first task of the system is to classify the current working state, depending on the vibration signature. Of course, the extraction of this information is not a crucial aspect, since it is known from the hood’s PLC. However, the analysis of the input signal, paired with a well-known piece of information, provide a clearly defined way to analyze data and discover the best data processing to enlighten representational characteristics of the phenomenon. The acquisition of the vibrational signature is performed by a piezoelectric microphone, which can acquire data with a high-frequency sampling while filtering possible acoustic noise coming from the external environment. The acquisition of data with this high level of resolution, allowed us to perform more complex analysis in a Novelty Detection (ND) configuration. Indeed, we tried to detect possible anomalies in the airflow filtering mechanism by analyzing changes in the vibrational signature of the states. To this aim, we found that is difficult to train a single ND algorithm that can effectively monitor all the possible configurations. Since we have already trained the classifier which gives us the label of the current state, we decided to develop a dedicated agent for each state.

The global pipeline is depicted in Figure 1. Since, as already said, the system should be capable of analyzing data sampled at a high-frequency rate, the acquisition and the inference steps are interleaved after deployment, while data for the training step can be collected offline.

Once, a buffer of samples, of proper size, has been acquired from the microphone, classic signal processing techniques based on Fourier Analysis are performed to produce relevant information within the frequency domain. Hence, this produces a series of frames, each one including information on a small temporal window. Multiple predictions are then performed by the state classifier on these input data, inferring a series of labels for the frames.

A majority voting approach is then used to produce a single robust label. The final decision is used to select the correspondent model to perform ND. The model takes as input the same frames, providing a novelty score based on the samples seen during the training phase. Since even the novelty score is not perfect, we utilized the predictions on the consecutive examples composing the buffer to run a smoothing window on the scores, gaining robustness w.r.t. the anomaly score.

Results

All the experimentation is performed on a SECO SBC-C23 gateway equipped with an NXP i.MX 6SoloX Applications Processor (Cortex A9) CPU [13]. The input is sampled at 44.1KHz with 16-bit encoding. For each state, we recorded about one minute of data, which made the dataset both for training the Classifier and the ND algorithm. The test data for the anomaly detection task is composed of tracks of about 10 seconds containing (separately) normal data or data recorded while partially occluding the exit of the air filter (for each state, a normal and an abnormal track are recorded). The input features for both the algorithms are the logarithm of the frequency spectrogram power, which is computed on a window of 512 samples with a 25% overlapping (computed with Librosa python library [14,15]), producing a 257 features vector, to which we add information about RMSE of the signal.

Speed	Classification (on single frames)	Anomaly Detection (on sequences of frames)
Speed	Accuracy	Detected Anomalies	False Positive Rate
1	99,79	99,85	1,09
2	97,21	98,79	0,56
3	73,21	72,01	0,00
For each speed of the hood's motor, we reported the classification accuracy in normal conditions (first column). Percentage of detected anomalies (second column) and percentage of false positive (third column) are reported in anomalous conditions while partially occluding the exit of the air filter.

Both the classifier and the detector are based on Neural Networks (implemented in Keras [16]). The classifier is a feed-forward 2-layer neural network (with 100 HU), performing a 3 class prediction. For each class, the ND algorithm is implemented as a simple Auto-Encoder (one hidden layer with 32 HU), considering the reconstruction error as scoring of the abnormality of each pattern. The smoothing is set to consider about 20 predictions. In this configuration, the model's training can be performed directly at the edge, in less than half an hour.
In the test phase, the employed board is capable of providing predictions at about 2.5 Hz, with 27% of CPU load and memory consumption of 2% (out of 4GB). Accuracy and detection performance is summarized in Table 1.

As we can see, the prediction performances are quite high, whit a sensible decrease in state 3, with higher speed and more dynamic in the frequencies. However, the majority voting approach allows us to provide the right classification on each buffer of samples if no state changes occurred within the buffer (however, each buffer is a fraction of a second in this setting). The right label part is a fundamental step to select the correct model to perform ND. Even in this case, the rate of detected anomalies, concerning false alarm is high enough to neglect further consideration about thresholding.

B. Water Pump

The second scenario is based on an all-in-one system capable of absorbing and storing solar energy, reusing it for various services in public or private contexts and the purification of the water in off the grid zones. The solution comes from a public utility context in disadvantaged areas of the planet where the availability of drinking water and electricity is limited, but also suitable for a private environment for domestic energy management. In addition, it can also serve as a Wi-Fi hotspot for the community in which it is placed. It can be equipped with a 2G/3G/4G data connection that can be shared wirelessly with the surrounding devices, becoming a social reference point.

Business Questions to ML Problems to design the sensors

To define the design of the IoT solution for the Survivor Box it was necessary to understand the business needs. The identification of the business problems allowed us to define a set of sensors placed in key positions of the mechanical system. The definition of the sensor set allowed us to transform the business question to different ML problems. The one we describe is the water purification pump fault detection.

The algorithm used to learn the system’s normal behaviour was the Local Outlier Factor (LOF) [17]. During the training phase, LOF aims to identify one or more clusters of “normal” points (here, a point is a 9-dimensional vector where each component corresponds to a specific sensor reading) within the training set. Then, during the inference phase, the degree of abnormality of a new observation is computed based on how isolated the observation is concerning the surrounding neighbourhood of points. Because we were expecting accelerometer readings to be very sensitive to imperceptible oscillations that would characterize even the system’s normal behaviour, we chose not to infer the system’s state (normal or abnormal) based on single predictions of our model, but rather we averaged the model’s predictions over the last 10 points. The model was implemented using Python and the Scikit-learn [18] library. Training the model on 2000 data points took approximately 20 seconds, with a CPU [19]) consumption of 2% and memory consumption of 0.1% (out of 2GB). During the inference phase, the model was able to make about 50 predictions per second. The system’s performances were assessed by injecting anomalies and verifying that they were identified. Moreover, we verified that normal observations weren’t deemed as anomalies.

The two kinds of anomalies that we tested were:

accelerometer anomalies: we tapped onto the surface where the system was placed, thus creating some vibrations
water flow anomalies: we employed a small handle to adjust the water flow level. The anomaly was simulated by letting water flow at a different speed than the one used when training the model

We can clearly state that our model succeeds at identifying the injected anomalies (Table 2). In addition, no anomalies were observed when the model was fed with normal data.

	Predicted
		Normal	Anomaly
True	Normal	1000	0
True	Anomaly	12	1988
The confusion matrix reporting model's predictions over normal data and abnormal data (almost balanced among manual perturbation of the accelerometer of the water flow).

Results

A variation of the environmental conditions, which in turn imply a change in the system’s “normal” operating conditions may negatively affect our model. We believe that there are at least two possible solutions to such a problem: the first one requires the model to be trained on a very large dataset so that possible changes in environmental conditions are “embedded” into the dataset. However, this would dramatically increase training time as well as inference time, since LOF is an instance-based learning algorithm; the second solution, which relies on the fact that training time is low if the number of training samples is also low, aims at retraining the model whenever its normal operating conditions vary.

CONCLUSIONS

Monitoring industrial plants has never been so feasible and affordable due to the widespread of low-cost and low-power sensors as well as powerful single board computers (SBC) used as IoT gateways. The interaction between the IoT and ML has opened the door to new data-driven approaches for success in the safety and reliability of industrial processes. In these two applications, we successfully tested different methods, based on ML, either to predict the state of the system from a set of fixed states or to detect possible anomalous situations, even combining the two approaches. The monitoring has been tested both on direct and indirect measurements w.r.t. the analyzed phenomenon. The system can perform the computation at the edge, allowing us to work with environments requiring very high-frequency data acquisition, which could be crucial in those applications where it’s not possible to share all the data with a remote system. This preliminary work allowed us to explore some playgrounds to evaluate tradeoffs between computational cost and performances on real tasks, laying the foundations for further investigations on more complex monitoring systems in different scenarios.

REFERENCES

[1] Al-Garadi, Mohammed Ali, et al. "A Survey of Machine and Deep Learning Methods for Internet of Things (IoT) Security." arXiv preprint arXiv:1807.11023 (2018).

[2] Si, Jia, Yibin Li, and Sile Ma. "Intelligent fault diagnosis for industrial big data." Journal of Signal Processing Systems 90.8-9 (2018): 1221-1233.

[3] Gustafsson, J., and F. Sandin. "District heating monitoring and control systems." Advanced District Heating and Cooling (DHC) Systems. Woodhead Publishing, 2016. 241-258.

[4] Sin, M. L., W. L. Soong, and N. Ertugrul. "Induction machine on-line condition monitoring and fault diagnosis-A survey." Australasian universities power engineering conference. Vol. 28. 2003.

[5] Jahnke, Patrick. "Machine learning approaches for failure type detection and predictive maintenance." Technische Universität Darmstadt 19 (2015).

[6] Schwabacher, Mark, and Kai Goebel. "A survey of artificial intelligence for prognostics." Aaai fall symposium. 2007.

[7] Si, Xiao-Sheng, et al. "Remaining useful life estimation–a review on the statistical data driven approaches." European journal of operational research 213.1 (2011): 1-14.

[8] Peng, Ying, Ming Dong, and Ming Jian Zuo. "Current status of machine prognostics in condition-based maintenance: a review." The International Journal of Advanced Manufacturing Technology 50.1-4 (2010): 297-313.

[9] Jardine, Andrew KS, Daming Lin, and Dragan Banjevic. "A review on machinery diagnostics and prognostics implementing condition-based maintenance." Mechanical systems and signal processing 20.7 (2006): 1483-1510.

[10] Zhao, Rui, et al. "Deep learning and its applications to machine health monitoring." Mechanical Systems and Signal Processing 115 (2019): 213-237.

[11] Sakib, Nazmus, and Thorsten Wuest. "Challenges and Opportunities of Condition-based Predictive Maintenance: A Review." Procedia CIRP 78 (2018): 267-272.

[12] Čolaković, Alem, and Mesud Hadžialić. "Internet of Things (IoT): A review of enabling technologies, challenges, and open research issues." Computer Networks (2018).

[13] edge.seco.com/en/

[14] McFee, Brian, et al. "librosa: Audio and music signal analysis in python." Proceedings of the 14th python in science conference. 2015.

[15] github.com/librosa/librosa

[16] keras.io

[17] Breunig, M. M., Kriegel, H. P., Ng, R. T., & Sander, J. (2000, May). LOF: identifying density-based local outliers. In ACM sigmod record.

[18] Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12, pp. 2825-2830, 2011.