A fault-tolerant clustering algorithm for processing data from multiple streams

Nowadays often multiple sensors provide real-time data about the same process. However, most data stream analysis algorithms work over a single feature vector. A common approach to overcome this is to combine the features extracted from each sensor into a single vector. The more sensors monitor a process, the more likely there is a malfunction or delay in data transmission. Hence the interest in developing algorithms capable of handling missing data or incorporating delayed data when it becomes available. This work presents a dynamic ensemble clustering algorithm based on evidence accumulation that can process multiple data streams. Each algorithm of the ensemble only processes one data stream, and obtaining the final partition does not require that data from all the streams is available. Therefore, final results can be provided even when there are sensor malfunctions or network delays. Furthermore, if delayed data arrives it can be used to extract evidence from the moment of arrival onwards. The algorithm was applied to identify arrhythmias over 71 standard 12-lead electrocardiogram recordings, achieving a 0.314% ±0.010 error rate. The strategies of combining all the leads in a single feature vector or working with each lead independently were compared, obtaining better results in the second case.

Palabras clave: Dynamic clustering; Ensemble algorithms; Evidence accumulation; Arrhythmias