Early detection of network element outages based on customer trouble calls

作者:

Highlights:

• The sequence of reported failures has information potential for outage detection.

• We get the empirical distributions of how quickly customers report a failure.

• We proposed a hybrid model of detection adapted to the specific environment.

• A two-stage detector, for outage detection and element isolation, is designed.

• The method presented can reduce the outage detection delay time by 2.33 h.

摘要

This paper deals with the issue of early detection of network element outages. Timeliness of outage detection as well as accuracy in finding outages on equipment in a telecommunication network depend on the monitoring system used and its performance. The intent of this paper is to investigate and propose a complementary solution to improve the performance of the existing systems in detecting faults earlier than it was able to do before. In developing our approach two constraints are given. The existing operational environment cannot be changed; threshold tuning and parameter changing cannot be done; furthermore no additional infrastructure investment has been planned. Hence, our approach relies on an alternative method based on a two-stage hybrid statistical and diagnostic detector which we designed in a way that exploits additional available data and avoids alarm monitoring system imperfections. The role of this detector is twofold: early detection of network element outages based on customer trouble calls and rule-based decision making for faulty-element isolation based on knowledge derived from fault and network management data. In this paper we present results of statistical analysis of trouble-reporting data. The analysis showed that the timing of customers' trouble reports and their content have information potential that can be utilized for early detection of outages. The detector is explained in detail and its accuracy and reduction delay is evaluated. The method presented can reduce the outage detection delay time by 2.33 h on average observed in relation to the performance of an existing fault management process which was designed to detect outages solely on the basis of an alarm monitoring system, for the “difficulties in work” type of malfunction. We attained an overall probability of correct detection of 95.3%. Out of the total number of outages that hypothetically could be detected, by using this method we were able to detect 77.5% of cases 1 h before the alarm was raised in the existing alarm system, while 23% of cases were detected 4 h before the actual alarm. The approach has been tested on real telecommunication network data over the period of one year.

论文关键词:Fault management,Broadband network,Early fault detection,Alarm system,Fault detection delay

论文评审过程:Received 13 February 2014, Revised 14 January 2015, Accepted 22 February 2015, Available online 3 March 2015.

论文官网地址:https://doi.org/10.1016/j.dss.2015.02.014