Diagnosis Tools for Telecommunication Network Traffic Management

4 downloads 82088 Views 40KB Size Report
been proposed for network traffic management [1,2,3]. They are usually .... The best classifier in table 1 (34-50-5 (I2)) uses a large number (2005) of. parameters.
Diagnosis Tools for Telecommunication Network Traffic Management Philippe LERAY1, Patrick GALLINARI1 and Elisabeth DIDELET2 [1] LAFORIA - IBP Université Paris 6 - boite 169 4, Place Jussieu 75252 Paris cedex 05 France {leray , gallinari}@laforia.ibp.fr

[2] France Télécom - CNET CNET PAA/ATR 38-40 rue du général Leclerc 92131 Issy les Moulineaux cedex France [email protected]

With the rapid evolution of telecommunication networks, real-time network traffic management is becoming more and more crucial. We propose here a modular system for performing diagnosis at different levels of the network. It is designed as an aid to the operator. We present results of different experiments with a first version of this system which operates at a local level.

1 Introduction With the increasing complexity of telecommunication networks and the demand for new and sophisticated services, the role of real-time network traffic management is becoming more and more crucial. Both statistical and artificial intelligence tools have been proposed for network traffic management [1,2,3]. They are usually aimed at problems like traffic prediction, diagnosis, control or routing. Although this field is expanding rapidly, only a few systems have been implemented to deal with operational conditions, most of them having limited abilities. Reasons for this are the complexity of such tasks, the difficulty to analyse the large amount of data collected on a telecommunication network and to automate parts of the process in order to help the operator. We will focus here on the implementation of a diagnosis system for the detection of abnormal situations via the observation of the telecommunication network. Treating this problem requires the analysis of a large amount of data collected at the different nodes of the network. This analysis involves low level processing techniques, for information selection or detection of local events, and high level methods for correlating different events, appearing at different times or locations, or for decision making. We have for now focused only on the low level stages of such a system, using neural network techniques. These methods have been used recently in different areas of telecommunications [2,4]. To build a diagnosis chain, neural networks are combined into modular systems where different specialized modules cooperate together to solve a global task. Such modular systems can be easily adapted to new configurations of the network structure by replacing, updating or adding specialized modules. In the following, we first describe the diagnosis problem and the simulation tool which has been used to emulate the network (Section 2). We then describe our modular architecture (Section 3) and present different experiments performed using a preliminary version of the system (Section 4).

2 Case study Real world data are not yet available for training real time diagnosis systems. In principle they could be collected at the management centres where all the events appearing on the network are sent to the operator screen. However the corresponding amount of data is tremendous since information about network elements arrive every 1 to 5 minutes. For now, only data for off-line diagnosis, whose periodicity is larger, are available at telecommunication centres. For real time diagnosis, people rely on the use of network simulators and try to emulate real traffic conditions. We briefly describe the one which has been used in this study and the diagnosis problems we have been dealing with. 2.1 The network The network model we consider here is based on the French long-distance network. It consists of 73 centres: 5 main transit centres (MTCs) and 68 secondary transit centres (STCs). In network management centres, diagnosis and control rely heavily on operators. Measurements from MTCs and STCs are aggregated in order to analyse their status, to detect any abnormal conditions such as traffic overloads and/or network failures and to activate traffic controls so as to minimize the effects of the disruption. The various situations : O1 O2 O3 O4 O5 -

conditions we will study for each centre correspond to the following Nominal situation: no abnormal condition. Outgoing overload: concentrated calls from a centre. Incoming overload: concentrated calls towards a centre. Overall overload: increase of traffic over the whole network. Regional overload: traffic increase in an MTC zone.

In the STC case, O4 and O 5 situations are quite similar: in the space of data, these two classes are strongly overlapping. 2.2 The data We have used data generated via the SuperMac simulator developed at CNET (Centre national d'études des télécommunications - France Télécom's Research Centre). This software permits to emulate the main characteristics of a telecommunication network. In particular, it enables us to: - set the nominal traffic at each centre. Figure 1 shows a nominal traffic profile which is typical of a weekday. For now we have used a similar profile for all days of the week. This is a simplification of the real traffic conditions. However, observation of real data clearly shows the existence of characteristic profiles for weekdays, saturdays and sundays. Nevertheless, the fluctuations from one day of the week to another day are small compared to those corresponding to overload (abnormal) situations. - generate data corresponding to the different overload situations described in 2.1 with different overload levels. The latter are expressed as a percentage of the nominal traffic conditions as they are defined at a given time of the day. For example we have generated data corresponding to O 2 or O3 overloads ranging from 150 % to 1000 % perturbation of the nominal traffic. Similarly we made O 4 percentage vary

between 125 % and 300 % and O5 between 125 % and 225 %. These values correspond to observed situations. For each type of overload, 7 days have been simulated with measurements every 4 minutes. Disturbances of 16 minutes are generated randomly during this period with a uniformly distributed overload percentage. All other measurements do correspond to a nominal traffic situation. Since these measurements are much more abundant than all others, some of them have been discarded in order to equilibrate the data for the different situations. We have distributed the 7 days measurements in two databases which have been used respectively for training and testing. The first one contains 4 days, i.e. 4480 examples (1280 from O1, and 800 from each of the other classes), the second one corresponds to 3 days and is made of 3680 examples (1280 from O 1, and 600 from each of the other classes). 100

%

80 60 40 20 0 00

t (hours) 04

08

12

16

20

Fig.1. Nominal traffic profile for 24 hours. The y axis corresponds to a deviation (in %) of a 'standard' traffic.

24

3 A modular architecture Our modular system, inspired from the telephone network structure, is composed of two levels. At the local level, data from STCs and MTCs are processed in order to detect and classify perturbations which may be identified at this level. The global level will be used as a network management centre in order to make a final diagnosis as shown in figure 2. For each STC and MTC, the diagnosis system can be divided in two modules (figure 3): - a classification module (CLASSIF) which determines the centre status (O1 to O5). - one module dedicated to each overload situation (EXPERT-i, where i=2, 3, 4, 5) which indicates the corresponding overload percentage. Local module

CLASSIF

. . . Local module

Global Module

. . . Local module

Fig.2. General diagnosis architecture. Each local module corresponds to a local diagnosis system (figure 3).

Input Data

EXPERT-2

O1 O2 O3 O4 O5

% O2

... EXPERT-5

% O5

Fig.3. Local diagnosis architecture.

4 The STC diagnosis module We now present the performances of our system for local diagnosis operations (CLASSIF and EXPERT modules) for STCs. The neural network used is a multilayer perceptron with one hidden layer. The learning algorithm is a batch conjugate gradient. Learning is stopped either when a plateau is reached for the error on test data, or when overtraining is detected. 4.1 Classification of perturbations (CLASSIF module) Our telephone network simulator produces a set of 18 indicators describing the STC status. Let X(t) denote the vector of these indicators measured at time t and X o(t) the vector corresponding to a nominal situation at t, as shown in figure 1. Since we are interested in the detection of deviations from normal behavior, our overloads are relative to the nominal situation in a network location at a given time of the day, i.e. Xo(t). We performed experiments with different sets of input variables with or without Xo(t) so as to test the importance of different input information for diagnosis. A first statistical variable selection was performed via the procedure DISCRIM of the statistical package SAS [5]. We give below (table 1) performances for the following choices of inputs: - I1: 18 indicators of X(t), - I2: 34 indicators of X(t) and Xo(t) (2 indicators of Xo(t) are equal to zero), - I3: 12 indicators: 7 selected by SAS in I1 and the 5 corresponding ones in Xo(t), - I4: 8 indicators selected by SAS in set I2. Network structure 18-30-5 (I1) 34-50-5 (I2) 12-25-5 (I3) 8-20-5 (I4)

Table 1. after 200 iterations after 1000 iterations Best classification 72.4 % -percentage on test data 81.6 % 83.8 % for different sets of 77.2 % 82.6 % input variables. 77.3 % 82.2 %

The number of hidden units in the experiments has been roughly set by cross validation. The best classifier in table 1 (34-50-5 (I2)) uses a large number (2005) of parameters. The Optimal Brain Damage (OBD) pruning method [6] has allowed us to decrease down to 385 this number without any loss of performance. Several variable selection methods have been proposed in the NN literature. In [7] a measure inspired from OBD is used for pruning input units. In this technique, the inputs are ordered according to a saliency measure S i (E1), and then pruned by comparing Si to a threshold set by cross validation. C denotes the cost function and Wij the weight from unit j to unit i. Other methods based on probabilistic dependence measures have been proposed. For example in [8] it is proposed to measure the mutual information between a set of inputs and the output. The expression MI(Xi,Y) is given in (E2) in the case of one input variable Xi, P(z) denotes a probability density function over z. Si =



1 ∂2C Wji2 2 ∂ 2 W ji j∈fan_out(i)

(E1)

P(x i,y) MI(Xi,Y) =∑ ∑ P(x i ,y) log( ) (E2) P(xi)P(y) y xi

All these techniques are heuristics since the problem of variable selection is combinatorial in the dimension of the input space. Their respective merits are

discussed in [7]. We show in table 2 the importance of our 34 variables according to three different techniques: the statistical method (SAS results on I 4), the saliency (Si) and the mutual information (MI). Performances with input sets selected by SAS (I3 and I4) are not so good compared to performances with the whole input set (I 2). MI and Si selections perform quite similarly for X(t) components but there are significative differences on Xo(t): MI does not select any variable in Xo(t) whereas Si does. i 1 SAS Si MI

2

3

4

5

6

7

8

9

10

11 12

13 14 15

16 17

i 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 SAS Si MI Table 2. Importance of each variable Xi, computed by 3 different methods. White, grey and black denote respectively little, fair and high signification. Variables 1 to 18 correspond to X(t) and 19 to 34 to X o(t).

We can also use Si in an iterative way (OCD [7]): we pruned the input with the weakest saliency then we train a new network, etc... With this method, we stopped after having pruned 14 inputs. We used the remaining 20 indicators, and obtained a 84.3 % classification rate on test data and the confusion matrix of table 3. O1 O2 O3 O4 O5

O1 97.4 2.5 2.8 9.5 18.0

O2 0.5 96.5 0.0 1.2 0.0

O3 0.9 0.0 96.4 0.0 0.3

O4 0.6 0.8 0.5 67.7 33.0

O5 0.6 0.2 0.3 21.6 48.7

Table 3. Confusion matrix on test data for the best classifier (inputs selected with OCD).

We have analysed the errors of this classifier . The confusion matrix gives us some clues for improving the system. Good performances may be observed for the classification of the nominal situation and incoming or outgoing overloads. Two causes of error can be observed: - regional and overall overloads (O 4 and O5 ) with low or average percentage are confused (bottom-right of the confusion matrix). This can be easily explained by the definition of these overloads: some situations cannot be distinguished at the STC level. However, this could be possibly done at the MTC level. - in low traffic periods (22h-8h) the error number increases: the system does not easily distinguish low percentage overloads. However, those errors are not really important since small perturbations during the night are usually not relevant. 4.2 EXPERT modules We describe below the experiments carried out with the EXPERT modules. We will only give as an example the performances for the O 2 expert whose role is to determine the overload percentage corresponding to the situation "outgoing overload". Figure 4 illustrates the results of a MLP in predicting 24h on the test data (MSE=0.044). During the day, as for the CLASSIF module, errors are more

important during low traffic periods since data corresponding to different percentages may overlap. As it is the case for the classification of overloads, predicting the degree of overload is not crucial during these periods. 900 800 700 600 500 400 300 200 100 0 -100

Fig.4. O2 expert predicted value vs. real value for our best actual architecture. The real value is in solid line, and the predicted value in dashed line.

5 Conclusion We have presented results on the development of the first stages of a telecommunication diagnosis system and on the importance of our different parameters. Both data generation and system implementation are still to be improved. We are currently working in producing more realistic scenarios to simulate real world conditions. We will also take into account previous measurements of our parameters so as to improve our current results and to correct some errors which are inherent to the first system. For the EXPERT modules, it will be useful to predict the conditional probability distribution of overload percentage instead of predicting simply the mean of this percentage as it is the case for now. This will allow a better use of these outputs in subsequent modules of the system. The next step will be the development of diagnosis tools at the global level.

Acknowledgement: this work has been performed with France Télécom CNET under Grant 94 1B 003. We would like to thank the PAA/ATR team and particularly D. Stern and L. De Bois.

References [1] [2] [3] [4]

E. Didelet, B. Dubuisson, D. Stern, P. Chemouil, AIP approaches to diagnosis in network traffic management, Qualitative Reasoning and Decision Technologies, Carreté/Singh ed., 1993. P. Chemouil, J. Filipiak, Supporting Network Management with Real-Time Traffic Models, IEEE journal on selected areas in Communications, vol. 9, n°2, pp. 151156, 1991. D. Stern, A Statistical Study of Real-Time Telephone Traffic Variations for Network Management, ITC Krakow, 1991. Proceedings of the International Workshop on Applications of Neural Networks to Telecommunications 2, Alspector/Goodman/Brown ed., 1995.

[5]

SAS User's Guide: Statistics, 1982 edition.

[6]

Y. LeCun, J. Denker, S. Sola, Optimal Brain Damage, in NIPS, vol.2, pp. 598-605, 1990. T. Cibas, F. Fogelman Soulie, P. Gallinari, S. Raudys, Variable Selection with Neural Networks, to appear in Neurocomputing. R. Battiti, Using Mutual Information for Selecting Features in Supervised Neural Net Learning, IEE Transactions on Neural Networks, vol. 5, n°4, 1994.

[7] [8]