A Variable Selection Method based in Tabu Search for Logistic ...

47 downloads 59361 Views 230KB Size Report
statistical criteria which have been incorporated into some of the best known statistical packages such as SPSS, BMDP, etc. As highlighted by Huberty (1989) ...
A Variable Selection Method based in Tabu Search for Logistic Regression Models Joaquín Pacheco 1

(*)

. Silvia Casado 1 and Laura Núñez 2

(1) Department of Applied Economics. University of Burgos. Spain jpacheco{scasado}@ubu.es (2) Department of Finance Instituto de Empresa. Business School. Madrid. Spain [email protected]

Abstract A Tabu Search method to select variables that are subsequently used in Logistic Regression Models is proposed and analysed. The aim is to find from among a set of m variables a smaller subset which enables an efficient classification of cases. Reducing dimensionality has some advantages such as reducing the costs of data acquisition, better understanding of the final classification model, and an increase in the efficiency and efficacy of the model itself. The specific problem consists in finding, for a small integer value of p, the size p subset of original variables that yields the greatest percentage of hits in Logistic Regression. To solve this problem a technique based on the metaheuristic strategy Tabu Search is proposed. After performing some tests it is found that it obtains significantly better results than the Stepwise, Backward or Forward methods used by classic statistical packages. The way these methods work is illustrated with several examples. Key Words: Variable Selection, Logistic Regression, Metaheuristics, Tabu Search.

1.- Introduction The aim in the classification problem is to classify instances that are characterized by attributes or variables; that is, to determine which class every instance belongs to. Based on a set of examples (whose class is known) a set of rules is designed and generalised to classify the set of instances with the greatest precision possible. There are several methodologies for dealing with this problem: Classic Discriminant Analysis, Logistic Regression, Neural Networks, Decision Trees, Instance-Based Learning, etc. Linear Discriminant Analysis and Logistic Regression methods search for linear functions and then use them for classification purposes. The use of linear functions enables better interpretation of the results (e.g., importance and/or significance of each variable in instance classification) by analysing the value of the *

Corresponding autor. Fac C. Económicas y Empresariales, Plaza Infanta Elena s/n, BURGOS 09001, Spain, [email protected] Tel.+34-947-25-90-21; fax +34-947-25-80-13

coefficient obtained. Not every classification method is suited to this type of analysis and in fact some are classified as “black box” models. Thus, classic discriminant analysis and Logistic Regression continue to be interesting methodologies. Before beginning designing a classification method, when many variables are involved, only those variables that are really required should be selected; that is, the first step is to eliminate the less significant variables from the analysis. Thus, the problem consists in finding a subset of variables that can carry out this classification task in an optimum way. This problem is known as variables selection or feature selection. Research into this issue was started in the early 1960s by Lewis (1962) and Sebestyen (1962). According to Liu and Motoda (1998), feature selection provides some advantages such as reducing the costs of data acquisition, better understanding of the final classification model, and an increase in the efficiency and efficacy of such a model. Extensive research into variable selection has been carried out over the past four decades. Many studies on variable selection are related to medicine and biology, such as Sierra et al. (2001), Ganster et al. (2001), Inza et al. (2000), Lee et al. (2003), Shy and Suganthan (2003), and Tamoto et al. (2004). From a computational point of view variable selection is a NP-Hard problem [Kohavi (1995) and Cotta et al. (2004)] and therefore there is no guarantee of finding the optimum solution (NP = Nondeterministic Polynomial Time). This means that when the size of the problem is large finding an optimum solution in practice is not feasible. Two different methodological approaches have been developed for variable selection problems: a) the optimal or exact techniques (enumerative techniques) which are able to guarantee an optimal solution, but which are only applicable to small-sized sets; and the heuristic techniques, which are able to find good solutions (although unable to guarantee the optimum) in a reasonable amount of time. Among the enumerative techniques, the Narendra and Fukunaga (1977) algorithm is one of the best known but, as pointed out by Jain and Zongker (1997), the algorithm is impractical for problems with very large feature sets. Recent references about implicit enumerative techniques of selection features adapted to regression models could be found in Gatu and Kontoghiorghes (2003), (2005) and (2006). On the other hand, the quality of 'heuristic' solutions strongly varies depending on the methods used. As found in other optimization problems metaheuristic techniques have proved to be superior methodologies. Among the heuristic techniques we find works based on genetic algorithms (see Bala et al. (1996), Jourdan et. al. (2001), Inza et al. (2001a. 2001b) and 2

Wong and Nandi, (2004)) and the recent work by García et al. (2006) who present a method based on Scatter Search. Finally Pacheco et al. (2006) proposed several methods adapted to discriminant analysis. All these methods search for subsets with greater classification capacity based on different criteria. However, none of them focus on the posterior use of the variables selected in logistic regression models. This work proposes an “ad hoc” new method based on Tabu Search and compares different variable selection methods for logistic regression models. For this specific purpose the Stepwise method (Efroymson, 1960) and all its variants, such as O’Gorman's (1993), as well as the Backward and Forward methods, can be found in the literature. These are simple selection procedures based on statistical criteria which have been incorporated into some of the best known statistical packages such as SPSS, BMDP, etc. As highlighted by Huberty (1989) these methods are not very efficient, and when there are many original variables the optimum is rarely achieved. The method proposed in this work yield significantly better results as shown below. Different tests were used to analyse and compare their efficacy with previous methods. Our method is performed for 2 classes (“binary logistic regression”) but in future it could be adapted to higher number of classes. It’s necessary to point out that in the case of databases with much more variables than cases (different from the databases used in this work ), in which frequently there are problems of stability and colinearity, Cessie (1992) proposed the use of Ridge Penalization in logistic regression. With the use of penalization only the coefficients of the more significant variables in the model are big and the instability is reduced. However, this technique isn’t a technique of dimension reduction in its own because all variables stay in the model (Eilers et al., 2002). The remainder of this paper is organized as follows: the problem is modelled in Sect. 2; the Tabu Search procedure is described in Sect. 3 and in Sect. 4 the results of the computational experiments are presented. Finally in Sect. 5 the main conclusions are offered.

2.- Modelling the problem

We can formulate the problem of selecting the subset of variables with superior classification performance in logistic regression as follows: V being a set of m variables, such that V = {1, 2,...., m} and A being a set of n cases, (also named “training” set). For 3

each case we also know the class it belongs to. Being p the number of variables that are going to be selected, where p ∈ N and p < m, we have to find a subset S ⊂ V, with a size p with the greatest classification capacity for the logistic regression f(S). To be precise, the function f(S) is defined as a percentage of hits in A obtained through the variables of S with classifier obtained by logistic regression method.

This classifier is obtained as follows: let’s consider S = {1, 2, …, p} without loss of generality; let xij be the value of variable j for case i, i ∈ A, j∈ S, and yi = 1 if case i belongs first class and 0 otherwise; this classifier is obtained by calculating vector c = (c0. c1. …. cp) by maximizing the next expression n

L(c) =

∏ p ·(1 − p )

1− yi

yi

i

i

i =1

where pi =

1 and zi = co + c1 xi1 + c2 xi 2 + ... + c p xip ; 1 + e − zi

then, once vector c is obtained, every case i is classified to class 1 if pi > 0.5 and in class 2 otherwise. Function L is usually named as likelihood function. In the literature there exist some methods (Stepwise, Backward and Forward) for solving the problem of finding S ⊂ V / |S| = p that maximizes f(S). As it was pointed in previous section these methods are not very efficient and when value of m is high the solutions are not very good. For this reason in this work a more sophisticated method is proposed. Specifically a procedure based in the metaheuristic strategy Tabu Search is proposed in next section. Note that in this work p, the number of features to be selected, is an input of the method of variables selection. The reasons are the following: First, there are users who need to select a prefixed number of variables. Other reason is that it permits to address the problem from a bi-objective point of view (minimizing the number of selected variables and maximizing the classificatory capacity). Specifically considering a prefixed range of values for p and solving the problem for each of these values of p an approach to the efficiency curve can be obtained.

3.- Tabu Search

4

Tabu Search (TS) is a strategy proposed by Glover (1989; 1990). “Tabu Search is dramatically changing our possibilities of solving a host of combinatorial problems in different areas” (Glover and Laguna, 2002). This procedure explores the solution space beyond the local optimum. Once a local optimum is reached, upward moves and those worsening the solutions are allowed. Simultaneously, the last moves are marked as tabu during the following iterations to avoid cycling. Recent and comprehensive tutorials on Tabu Search that include all types of applications can be found in Glover and Laguna (1997; 2002). Our Tabu Search algorithm includes a method for building an initial solution, a basic procedure for exploring the space of solutions around this initial solution, and a diversification phase for built a new initial solution in a different region of space of solutions. The performance of the complete Tabu Algorithm is outlined as follows:

Complete Tabu Search Procedure Build an initial solution Repeat Execute Basic Tabu Search Execute Diversification until a stopping condition is reached

In this work a limit of 30 minutes of computational time is used as stopping condition. Next these elements are described.

3.1. Initial solution

The initial solution is built as follows: starting from the empty initial solution (S = Ø) a variable is added in each iteration until the solution S reaches p variables (|S| = p). To decide which variable is added to the solution in each iteration the value of f is used. The pseudo-code for this procedure is summarized as follows: Start S = Ø Repeat (a) Compute Rj = f(S∪ {j}) ∀ j ∈ V – S (b) Determine Rj* = max { Rj / j ∈ V – S } (d) Do S = S ∪ {j*} until |S| = p

5

3.2. Description of a basic algorithm

Next we design a basic Tabu Search algorithm that uses the neighbouring moves. These moves consist in exchanging an element that is in solution S for an outside element at each step. In order to avoid repetitive looping when a move is performed, consisting in exchanging j from S for j’ from V-S, element j is prevented from returning to S for a certain number of iterations. We define vector_tabu (j) = the number of the iterations in which element j leaves S.

Some 'tabu' moves can be permitted under specific conditions (“aspiration criterion”), for example, to improve the best solution found. The basic Tabu Search method is described next, where S is the current solution and S* the best solution. The Tabu_Tenure parameter indicates the number of iterations during which an element is not allowed to return to S. After different tests, Tabu_Tenure was set as p.

Basic Tabu Search Procedure (a) Read initial solution S (b) Do vector_tabu(j) = - Tabu_Tenure, j =1..m; niter =0, iter_better=0 and S* = S (c) Repeat (c.1) niter = niter+1 (c.2) Calculate vjj’= f(S ∪ {j’} - {j}) (c.3) Determine vj*j’* =max {vjj’ / ∀ j ∈ S, j’ ∉ S verifying: niter > vector_tabu(j) + Tabu_Tenure or vjj’ > f(S*) (‘aspiration criterion’)} (c.4) Do S = S ∪ {j’*} - {j*} and vector_tabu (j*) = niter (c.5) If f (S) > f(S*) then do: S* = S, f* = f and iter_better = niter; until niter > iter_better+2·m

That is, the basic procedure terminates when 2·m iterations have taken place without improvement.

3.3. Diversification

In many tabu search applications this basic procedure can be reinforced by Diversification strategies. Diversification consists in directing the search towards

6

unexplored regions. For that, we rebuilt a new solution by modifying the method outlined in 3.1. Specifically in step (a) we redefined Rj as

R j = f ( S ∪ { j}) − Rmax

freq( j ) , freqmax

where: freq(j) = the number of times element j appears in the solutions visited in the basic phases until now, freqmax = max { freq(j) / j ∈ V }, Rmax = max { f(S∪ {j}) / j ∈ V-S }.

In this way, we penalise the choice of elements that have appeared more often and force the choice of others.

4.- Computational Results

To check and compare the efficacy of this new method a series of experiments was run with different test problems. We have selected data sets with enough instances for building large training sets (at least 10 cases for every freedom degree) and 10 test sets from every data set. Using large training sets is recommended to obtain a trade-off between “optimization” and “generalization”. Seven data sets were used. The first six data sets can be found in the well-known data repository of the University of California, UCI,

(see

Murphi

and

Aha.

1994).

This

can

be

found

at:

www.ics.uci.edu/~mlearn/MLRepository.html. The following databases were used: -

Covertype Database: This is a forestry database with 54 explanatory variables, 8 classes and more than 580,000 cases or instances. For the training set a random selection of 600 cases from the two first groups was made for each one. Among the rest of the cases we have randomly selected 10 sets of 200 cases for evaluating the model with independent data (test sets)1.

-

Mushrooms Database: 22 variables, 2 classes and 8,100 cases. The 22 nominal variables were transformed into 121 binary variables: 1 for each

1

We use a training set an 10 test sets. These 11 sets are chosen randomly and disjoint. Other well-known options of experiment design as 10 crossvalidation could be to much expensive in computational time for these so large databases.

7

binary variable and 1 per possible answer for the remaining variables. 1,300 cases were randomly selected from the cases without missing data for the training set. Among the rest of the cases we have randomly selected 10 sets of 200 cases as test sets for evaluating the model with independent data. -

Spambase Database: 57 variables, 2 classes and 4,601 cases. 600 were randomly selected from these as training set. Among the rest of the cases we have randomly selected 10 sets of 200 cases as test sets.

-

Nursery Database: 8 nominal variables, 5 classes and 12,960 cases. The 8 nominal variables were transformed into 28 binary variables. The 5 classes are grouped together in two classes (“not_recom” and the rest). 400 cases were randomly selected for the training set. Also 10 sets of 200 cases each one were obtained as test sets.

-

Conect-4 Opening Database: 42 nominal variables, 3 classes and 67,557 cases. The 42 nominal variables were transformed into 126 binary variables. We have considered the two first classes. A random selection of 1,200 cases for the training set was made. Among the rest of the cases we have randomly selected 10 sets of 200 cases for evaluating the model with independent data.

-

Waveform Database: 40 variables with continuous values, 3 classes and 5,000 instances. We have considered the two first classes. 400 cases were randomly selected from the two first groups as training set and 10 sets of 200 cases were obtained as test sets.

Also a dataset with financial ratios (Financial Database) is used: 93 variables with continuous values (financial ratios), 17,108 cases (firms) and 2 classes (Failed / healthy). 1,000 cases were randomly selected as training set and 10 sets of 200 cases were obtained as test sets. This database is available for interested readers. The experiments were divided into two groups. The first group is devoted to compare our Tabu Search Algorithm with the traditional Stepwise, Forward and Backward methods. In the second group a model evaluation on independent data is made. All the experiments were done on a Pentium IV 2.4 GHz PC using the BORLAND DELPHI compiler (version 5.0).

4.1. Tabu Seach algorithm versus classic methods

8

Our Tabu Search Algorithm is compared to the classic Stepwise, Backward and Forward procedures used in some well-known statistical software packages such as SPSS, BMDP, etc. The Forward method (or forward selection) begins by selecting the most discriminant variable according to some criterion. It continues by selecting the second most discriminant variable and so on. The algorithm stops when none of the nonselected variables discriminates in a significant way. The Backward method (or backward elimination) works in the opposite way. It begins by selecting all the variables. The least discriminant variable is eliminated at each step. The algorithm stops when all the remaining variables discriminate significantly. The Stepwise variable selection procedure (or stepwise regression), originally proposed by Efroymson (1960), has been available in statistical software packages for many years. This method uses a combination of the two previous algorithms: at each step variables are introduced or eliminated depending on how significant their discriminating capacity is. It also allows for the possibility of changing decisions taken in previous steps, by eliminating from the selected set a variable introduced in a previous step of the algorithm or by selecting a previously eliminated variable. Wald and likelihood ratio tests are perhaps the most widely used criteria for selecting variables. The Backward, Forward and Stepwise methods were executed with the seven previous data sets. Table 1 presents a summary of the solutions obtained in the intermediary steps (classification capacity for each value of p considered). A column with the classification capacity obtained with our Tabu algorithm has also been added. The following considerations have to be taken into account: the stopping criteria of the classic algorithms have been 'relaxed' to the maximum to “go through” a greater number of steps and in this way make more comparisons. The results of Stepwise method are omitted because they are the same than the ones obtained by Forward method The best solutions for every case appear in bold. Table 1 Comparison of Tabu Search Algorithm and traditional methods Database p Tabu Search Forward Backward 3 0.772 0.772 0.787 Covertype 4 0.765 0.765 0.788 5 0.763 0.773 0.788

9

Database

Mushrooms

Spambase

Conect-4

Nursery

Waveform

Financial

p 6 7 8 3 4 5 3 4 5 6 7 8 3 4 5 6 7 8 9 10 11 12 3 4 5 6 3 4 5 6 7 3 4 5 6 7 8 9 10 11

Tabu Search 0.792 0.790 0.790 0.987 0.999 1.000 0.895 0.915 0.920 0.925 0.933 0.935 0.763 0.774 0.782 0.789 0.794 0.799 0.803 0.815 0.817 0.819 1.000 1.000 1.000 1.000 0.914 0.928 0.936 0.940 0.950 0.877 0.879 0.885 0.887 0.889 0.889 0.890 0.890 0.894

Forward 0.760 0.762 0.780 0.953 0.958 0.982 0.867 0.877 0.883 0.885 0.888 0.892 0.750 0.761 0.765 0.778 0.788 0.795 0.798 0.800 0.808 0.809 1.000 1.000 1.000 1.000 0.902 0.912 0.930 0.930 0.932 0.870 0.871 0.875 0.879 0.882 0.883 0.888 0.888 0.884

Backward 0.765 0.767 0.762 0.945 0.949 0.978 0.856 0.865 0.881 0.887 0.888 0.891 0.745 0.754 0.762 0.764 0.767 0.774 0.777 0.770 0.765 0.780 1.000 1.000 1.000 1.000 0.902 0.912 0.930 0.930 0.932 0.863 0.872 0.873 0.880 0.881 0.884 0.886 0.888 0.889

The following points can be made regarding Table 1: -

The Backward method seems to work worse than the Forward and Stepwise method.

10

-

Our Tabu Search algorithm improves significantly the solutions of the classic methods for any case. Only in Nursery database Forward method obtains the same results.

-

If we observe the different databases our Tabu Search method obtains similar results with less variables than traditional ones. Thus, for example, for the Spambase data the Forward method yields the maximum classification capacity of f = 0.892 with p = 8, whereas the Tabu Search algorithm already overcomes this value (f = 0.895) for p = 3. Similar results are obtained with the remaining data sets.

Fig. 1 shows the solutions obtained by Forward, Backward and Tabu Search algorithms for Connect database. We can observe very clearly how solutions obtained by our Tabu Search dominate the solutions of traditional methods (considering both objectives: maximize classificatory capacity and minimize number of variables).

0,82 0,81 0,8 0,79 0,78 0,77 0,76 0,75 0,74

Tabu Forward Backward

3

4

5

6

7

8

9

10 11 12

p values Fig. 1. Classification capacity f obtained for the various techniques for the Connect data set

4.2. Model evaluation with independent data

In this section we evaluate the model previously obtained with independent data. Specifically, the data we use are 10 tests sets obtained from each database described at the beginning of section 4. In table 2 the mean of ratio of hits obtained with these test is shown for every case. The best solutions for every case appear in bold.

11

Table 2 Comparison of Tabu Algorithm and traditional methods in test sets. Data p Tabu Search Forward Backward 3 0,671 0,671 0,749 4 0,735 0,740 0,749 5 0,751 0,760 0,764 Covertype 6 0,747 0,750 0,755 7 0,755 0,761 0,761 8 0,740 0,741 0,759 3 0,860 0,869 0,982 Mushrooms 4 0,828 0,828 0,995 5 0,810 0,803 1,000 3 0,834 0,832 0,867 4 0,839 0,834 0,871 5 0,855 0,852 0,877 Spambase 6 0,857 0,854 0,884 7 0,868 0,864 0,888 8 0,879 0,876 0,900 3 0,741 0,736 0,746 4 0,737 0,747 0,747 5 0,749 0,746 0,753 6 0,757 0,742 0,765 7 0,765 0,749 0,769 Conect-4 8 0,766 0,745 0,777 9 0,773 0,743 0,785 10 0,776 0,740 0,779 11 0,782 0,741 0,786 12 0,773 0,742 0,791 3 1,000 1,000 1,000 4 1,000 1,000 1,000 Nursery 5 1,000 1,000 1,000 6 1,000 1,000 1,000 3 0,865 0,865 0,868 4 0,752 0,752 0,892 5 0,894 0,899 0,899 Waveform 6 0,898 0,899 0,899 7 0,865 0,865 0,903 3 0.873 0,849 0.879 4 0.867 0,869 0.871 5 0.867 0,873 0.871 6 0.866 0,872 0.870 Financial 7 0.865 0,870 0.877 8 0.861 0,869 0.872 9 0.862 0,869 0.869 10 0.864 0,869 0.869 11 0.867 0,866 0.868

12

In table 2 we can observe that results obtained by Tabu Search in these independent data sets are similar than the ones obtained in training set. Also, in general, in these test sets the results of Tabu Search are still better than results of traditional methods. For remarking this conclusion in table 3 are shown for every database: the mean and standard deviation of ratio of hits of each method; also the results of t test (t value and probability) from comparing our tabu search method against forward and against backward.

Table 3 Mean, standard deviation and t-test for every database.

Database Covertype Mushroom Spam Conect-4 Nursery Wave Financial All

Tabu mean std.

Forward mean std.

Backward mean std.

0.7482 0.9923 0.8809 0.7689 1 0.8907 0.8715 0.8544

0.7567 0.8321 0.8549 0.7629 1 0.8630 0.8656 0.8042

0.7557 0.8333 0.8518 0.7462 1 0.8919 0.8674 0.8036

0.0254 0.0088 0.0232 0.0266 0 0.0327 0.0261 0.0866

0.0270 0.0335 0.0285 0.0208 0 0.0660 0.0231 0.0707

0.0301 0.0438 0.0295 0.0372 0 0.0330 0.0282 0.0767

Tabu-Forw. t prob -3.423 22.723 9.623 2.593 (*) 3.252 3.328 10.173

Tabu-Back. t Prob.

0.001 0.000 0.000 0.011

-2.864 17.768 10.318 4.363

0.006 0.000 0.000 0.000

0.002 0.001 0.000

-0.526 2.581 10.246

0.601 0.012 0.000

(*) Not variance

From table 3 we can observe that only in Covertype database both traditional methods perform better than our tabu search. For Wave database our tabu performs significantly better than forward and worst, but not significantly, than backward. For Nursery database all methods obtain 100% of hits in all test sets, so there isn’t neither variance nor possibility of t test.

For the rest of databases our method performs

significantly better than both traditional methods. Also for all databases together our method performs significantly better.

5. CONCLUSIONS

This work approaches the problem of variables selection for logistic regression models. Although there are many references in the literature regarding variables selection for their use in classification, there are very few key references on the selection of variables for their use in logistic regression. In fact, the most well-known statistical packages continue to use classic selection methods. In this work we propose an alternative new method based on the metaheuristic strategy Tabu Search. This new

13

method is better than classic ones. For all databases analysed our new algorithm obtains a set of solutions better than those obtained with classic methods: its classification capacity is greater with fewer variables. This helps to interpret and therefore to build a more efficacious and efficient model. Also the model obtained for our method performs significantly better than the models obtained for traditional methods in independent test sets for most of the databases. It must be stated that the calculation time of our methods is greater: in this work 30 minutes was needed for each value of p, versus the few seconds used by the classic methods. However, it is clear that for most applications and studies this has no special relevance if we take into account all the advantages presented in the foregoing. Finally, large samples have been used. The raison is to obtain a tradeoff between optimization and generalization.

Acknowledgements

Authors are grateful for financial support from the Spanish Ministry of Education and Science (National Plan of R&D - Projects SEJ2005-08923/ECON, and SEJ 200408176- 02-01/ECON) and from Regional Government of “Castilla y León” (“Consejeria de Educación” – Project BU008A06).

References Bala J., Dejong K., Huang J., Vafaie H. and Wechsler H. (1996): Using Learning to Facilitate the Evolution of Features for Recognizing Visual Concepts, Evolutionary Computation, 4, 3, 297-311. Cessic, N. (1992) : Ridge Penalization in Logistic Regression. Applied Statistics, 41, 191-201. Cotta C., Sloper C. and Moscato P. (2004): Evolutionary Search of Thresholds for Robust Feature Set Selection: Application to the Analysis of Microarray Data, Lecture Notes In Computer Science 3005, 21-30. Efroymson, M.A. (1960): Multiple Regression Analysis, Mathematical Methods for Digital Computers (Ralston, A. and Wilf, H.S., ed.) Vol.1. Wiley, New York. Eilers, P. and Boer, J. and Van Ommen, G.J. and Van Houwelingen, H. (2001): Classification of Microarray Data with Penalized Logistic Regression. In Proceedings od SPIE, progress in Biomedical optics and images, Vol 4266, 187-198. Ganster H., Pinz A., Rohrer R., Wildling E., Binder M. y Kittler H. (2001): Automated Melanoma Recognition, IEEE Transactions On Medical Imaging 20 (3): 233-239. García F.C., García M., Melián B., Moreno J.A., Moreno M. (2006): Solving Feature Selection Problem by a Parallel Scatter Search. European Journal of Operational Research. 169 (2): 477-489 Gatu C. and Kontoghiorghes E.J. (2003): Parallel Algorithms for Computing all Possible Subset Regression Models Using the {QR} Decomposition. Parallel Computing, 29, pp.505-521. Gatu C. and Kontoghiorghes E.J. (2005): Efficient Strategies for Deriving the Subset {VAR} Models. Computational Management Science, 2 (4):253-278. Gatu C. and Kontoghiorghes E.J. (2006): Branch-and-bound Algorithms for Computing the Best-Subset Regression Models. Journal of Computational and Graphical Statistics, 15 (1):139-156. Glover F. (1989): Tabu Search: Part I, ORSA Journal on Computing. Vol. 1, pp. 190-206. Glover F. (1990): Tabu Search: Part II, ORSA Journal on Computing. Vol. 2, pp. 4-32.. Glover F. y Laguna M. (1997): Tabu Search, Kluwer Academic Publishers, Boston.

14

Glover F. y Laguna M. (2002): Tabu Search, in Handbook of Applied Optimization, P. M. Pardalos and M. G. C. Resende (Eds.), Oxford University Press, pp. 194-208. Huberty C.J. (1989). Problems with Stepwise Methods: Better Alternatives. In B. Thompson (Ed.). Advances in social science methodology (Vol. 1). pp. 43-70. Greenwich, CT: JAI Press. Inza I., Larrañaga P., Etxeberria R. y Sierra B. (2000): Feature Subset Selection by Bayesian Networks Based Optimization, Artificial Intelligence, 123, 157-184. Inza I., Merino M., Larranaga P., Quiroga J., Sierra B. y Girala M.(2001a): Feature Subset Selection by Genetic Algorithms and Estimation of Distribution Algorithms - A Case Study in the Survival of Cirrhotic Patients Treated with TIPS. Artificial Intelligence in Medicine 23 (2): 187-205. Inza I., Larranaga P. y Sierra B. (2001b): Feature Subset Selection by Bayesian Networks: A Comparison with Genetic and Sequential Algorithms. International Journal of Approximate Reasoning 27 (2): 143-164. Jain A., and Zongker D. (1997): Feature Selection: Evaluation, Application, and Small Sample Performance, IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, no. 2, pp. 153-158. Jourdan L., Dhaenens C. and Talbi E. (2001): A Genetic Algorithm for Feature Subset Selection in DataMining for Genetics, MIC 2001 Proceedings, 4th Metaheuristics Internationl Conference, 29-34. Kohavi R. (1995): Wrappers for Performance Enhancement and Oblivious Decision Graphs, Stanford University, Computer Science Department. Lee S., Yang J. y Oh K.W. (2003): Prediction of Molecular Bioactivity for Drug Design Using a Decision Tree Algorithm, Lecture Notes In Artificial Intelligence 2843: 344-351. Lewis P.M. (1962): The Characteristic Selection Problem in Recognition Systems, IEEE Trans. Information Theory, vol. 8: 171-178. Liu H. y Motoda H.(1998): Feature Selection for Knowledge Discovery and Data Mining, Boston: Kluwer Academic. Murphy P. M. and Aha D. W. (1994): UCI repository of Machine Learning. University of California, Department of Information and Computer Science, http://www.ics.uci.edu/~mlearn/MLRepository.html Narendra P.M. and Fukunaga K. (1977): A Branch and Bound Algorithm for Feature Subset Selection, IEEE Trans. Computers, vol. 26, no. 9: 917-922. O’Gorman T.W., and Woolson R.F. (1993): On the Efficacy of the Rank Transformation in Stepwise Logistic and Discriminant Analysis, Statistics in Medicine 12, pp.143-151. Pacheco J., Casado S., Núñez L. and Gómez O. (2006) Analysis of New Variable Selection Methods for Discriminant Analysis. To appear in Computational Statistics and Data Analysis. Sebestyen G. (1962): Decision-Making Processes in Pattern Recognition. New York: MacMillan. Shy S. and Suganthan P.N. (2003): Feature Analysis and Clasification of Protein Secondary Structure Data, Lecture Notes in Computer Science 2714: 1151-1158. Sierra B., Lazkano E., Inza I., Merino M., Larrañaga P. and Quiroga J. (2001): Prototype Selection and Feature Subset Selection by Estimation of Distribution Algorithms. A Case Study in the Survival of Cirrhotic Patients Treated with TIPS. Lecture Notes in Artificial Intelligence 2101:20-29. Tamoto E., Tada M., Murakawa K., Takada M., Shindo G., Teramoto K., Matsunaga A., Komuro K., Kanai M., Kawakami A., Fujiwara Y., Kobayashi N., Shirata K., Nishimura N., Okushiba S.I., Kondo S., Hamada J., Yoshiki T., Moriuchi T. y Katoh H.(2004): Gene expression Profile Changes Correlated with Tumor Progression and Lymph Node Metastasis in Esophageal Cancer. Clinical Cancer Research 10(11):3629-3638. Wong M.L.D. and Nandi A.K. (2004): Automatic Digital Modulation Recognition Using Artificial Neural Network and Genetic Algorithm. Signal Processing 84 (2): 351-365.

15