Competing intelligent search agents in global optimization ... - CiteSeerX

5 downloads 537 Views 157KB Size Report
We use global optimization as the most general setting that ... over search process and to increase the number of ..... search engines: Lifestyle comparison via.
Competing intelligent search agents in global optimization  Simon Streltsov Pirooz Vakili [email protected] [email protected] Boston University Boston, MA

1 Introduction In this paper we present a new search methodology that we view as a development of intelligent agent approach to the analysis of complex system. The main idea is to consider search process as a competition mechanism between concurrent adaptive intelligent agents. Agents cooperate in achieving a common search goal and at the same time compete with each other for computational resources. We propose a statistical selection approach to resource allocation between agents that leads to simple and ecient on average index allocation policies. We use global optimization as the most general setting that encompasses many types of search problems, and show how proposed selection policies can be used to improve and combine various global optimization methods. This work opens a way to developing e ective numerical procedures that re ect qualitative knowledge absorbed by intelligent search architectures developed for particular applications. We discuss examples in the areas of manufacturing control and scheduling, optimization via simulation, classi cation, data mining, and multitarget tracking. We propose designing a new software package that will consist of the database of heuristic search methods for particular problems and a control engine that will utilize statistical procedures to distribute computational resources between di erent methods. We describe organization of the competing search processes in section 2. We analyze global optimization models in section 3 and list applications of the competing search methodologies in section 4.

Ilya Muchnik [email protected] Rutgers University Piscataway, NJ

2 Organization of competition between search agents Full probabilistic analysis of the search process is complex. By considering competition/cooperation between independent agents, we establish an ecient and simple resource allocation algorithm based on index policies computed independently for each agent. These indices allow us to simplify control over search process and to increase the number of di erent search strategies that can be applied to a particular problem by establishing hierarchical order among competing search strategies. Competing search methodology proposed in this paper consists of 1. developing adaptive probabilistic models for the individual search agents, and 2. constructing a policy that allocates computational resources dynamically based on the estimates of the stopping indices of the agents. Stopping index for each agent is computed under assumption of the in nite time horizon and takes into account  expectation of the nal search reward,  sampling costs, and  learning potential of the agent. This strategy transforms our qualitative understanding of the behavior of the search process into statistical setting by explicitly assigning probability measures to \realistic" con gurations of the agents.

 August 1996. To appear in the proceedings of NIST Conference \Intelligent Systems: A Semiotic Perspective"

Page 1

3 Global optimization Global optimization methodology relies on two steps: 1. description of a complex problem by an objective function designed in such a way that an argument of the extreme value of the function gives a solution of the problem, and 2. a technology to nd a global optimum for a given objective function. The history of the fundamental science demonstrates usefulness of the rst part of this paradigm: classical mechanics and other areas of physics give many examples of successful application of optimization models. The modern applied sciences continue this tradition of formulating problems as optimization of particular objective functions, and we are sure that "optimization language" will be also adequate for describing problems applied sciences in the future. On the other hand, the progress of the optimization techniques is not as fast as required in many challenging technical and scienti c applications. In this paper we focus our attention on the new optimization techniques.

3.1 New optimization techniques

We rst note that there is currently no universal procedure that can solve many di erent classes of optimization problems. We do, however, have a strong experience with particular classes of optimization problems where speci c features of the problem helps us to develop ecient algorithms. Linear and convex programming are good examples of problems where substantial progress has been made, and we can expect that eciency of solving these problems will increase even more in the future. Another example is combinatorial optimization where matroid theory and related approaches lead to ecient solutions of many important applied problems ([8]). There are also new developments in the area of optimization of complex functions in multidimensional continuous space. Methods such as genetic algorithm and simulated annealing represent search process by random walk processes and combinatorial algorithms in discrete time. Modi cations of these popular procedures di er mostly in the a priori assumptions made about objective function. This leads us to an observation known for many years in AI: if we have a good representation of data,

we can solve the corresponding search problem much easier. Informally, we can view optimization process as a combination of 1. preprocessing algorithms that transform original data from the unknown objective function to a simpler structure with known properties, and 2. search algorithms matching this structure. For example, LP and convex programming represent limited classes of problems with known ecient search strategies. On the other side of the spectrum, global optimization represents much richer classes of problems, but existing search algorithms are inecient or rely on additional assumptions about the structure of the objective function. In this paper we present a methodology of constructing ecient search algorithms that can be used in conjunction with complex probabilistic structures of the objective functions. We consider a stochastic global optimization problem max F () = E (f (; !)j 2 ); (1) 2

where f (; !) is a stochastic function of unknown parameter  de ned on a closed subset  in the ndimensional space. No assumptions about convexity of F () are made and, therefore, it may have several local maxima. Some information may or may not be available about the smoothness of F , values and smoothness of derivatives, a priori bounds on F  = max(F ), number of local optima, etc.

3.2 Review of global optimization methodologies Complexity of global optimization is mainly due to the fact that there are no algorithms that can ensure sequential approximation to the global optimum, and there are no local criteria that can verify whether the current best value is indeed the global optimum. Thus, the extensive apparatus developed for local optimization is not sucient in the nonconvex case. One approach to overcome this diculty is to repeatedly apply optimization methods that work well for convex problems: each run of the algorithm may lead to the same or di erent local optimum. If the number of optima is small and the size of all areas of attraction is large, we can quickly enumerate all local optima. Clustering methods are used in order to identify searches that lead to the same local

Page 2

optimum. Still, when the number of local optima is large (or even small but unknown), the required number of iterations becomes prohibitive. One-dimensional Brownian model var(x)

6

m(x)

Another approach is to forgo the local stage and to perform exhaustive search over the whole space in order to make sure that we do not miss the global optimum. This approach usually requires Lipschitz assumptions and considers worst case scenarios. Number of required sample points grows exponentially in large-dimensional spaces. Appeal of the average-case approach is in the possibility of creating algorithms that can work eciently for typical problems without paying a high premium for the worst-case guarantees. The drawback of the average-case algorithms is that they lead to complex probabilistic structures and it is dicult to construct ecient search strategies on these structures. One simple probabilistic model is partitioned random search: the search space is divided in several disjoint regions and sampling is uniform in each region. At each sample step we can choose which region to sample. Typical algorithms produce a number of initial sample points in each region in order to estimate the distribution of function values and an index for each region and then sample in the region(s) with the largest values of the index. Branch-and-bound methods start as partitioned random search but later partition the most prospective regions into smaller sub-regions. A more general approach assumes that the objective function is a realization of some unknown stochastic eld (in practice, only Gaussian eld is used). Then, after each sample point we can compute or approximate the joint distribution of the function values at all sample points. Di erent algorithms vary in a choice of the trade-o between desired complexity of the model of the objective model and the amount of computation required for model updates and search. Here is how conditional mean m(x) and variance var(x) of the function values look in onedimensional case:

5

4

3

2

1

0.2

0.4

0.6

0.8

1

Even for the simplest stochastic model of global optimization - partitioned random search - construction of a sampling policy is a challenge: under a typical assumption of a xed total number of samples, resource allocation reduces to a dynamic programming problem that is expensive to solve. Therefore, index selection policies are used to allocate computational resources. Promising indices are computed independently for each sample candidate based on estimates of eciency of future sampling in a region. A standard approach used in a number of di erent models is to start with the computing promising index as a myopic reward and then modify it heuristically in order to take into account a long-term reward ([9]). A general selection problem that arises in global optimization can be described as following: at each step select between a number of adaptive search agents. The structure of each agent may be quite complex, but it is usually parameterized by a limited number of adaptive parameters. All agents cooperate in searching for the overall best function value and compete for limited computational resources. In a short-term, agents often behave independently from each other, although in a long-term learning processes of di erent agents a ect each other.

3.3 Stopping indices Instead of the traditional optimization criteria: maximizing the best found value ZN after N samples, or minimizing the number of samples N required to reach the true maximum, we de ne an average expected reward of the optimization algorithm that takes into account cost of computations (as in [14]):

Page 3

max JN = E f ZN ?

XN ctg; t=1

(2)

max z*

where N is the stopping time, and ct is a xed cost incurred with sampling at time t. Our goal is to nd an optimization algorithm that maximizes the average expected reward of sampling on some priori de ned class of functions f (x). In this setting we can compute optimal, or asymptotically optimal, index sampling algorithms assuming independent sampling from di erent agents. These indices strike a balance between immediate sampling reward, computational costs, and potential for learning of the competing agents. Including costs in the optimization criterion allows us to compute stopping rules and combine several optimization algorithms. This is especially important because global optimization algorithms vary widely in the complexity and computational requirements. To compute a stopping index for an agent A, we de ne expected reward JA (z ) of an auxiliary process ZA ( ; z ) = maxfZA ( ); z g, where trajectory belongs to (A) - a tree of all adaptive trajectories of agent A, and ZA ( ) - best found value on the trajectory , as:

JA (z ) = sup Z (z ) ? C ; 2(A)

(3)

where C is the total cost of sampling trajectory . Then the stopping index of the agent A is de ned as a solution of the equation

z (A) : arg inf (4) z JA (z ) = z; i.e. z (A) de nes a value process ZA (z  (A)) with the expected reward equal to z  (A). 

Stopping index and one-step reward



Theorem 6.7.2 in ([3] ) shows that if in nitely many independent agents with equal initial stopping indices are available, then the index selection policy according to z  (A) is optimal. These index policies can be used to improve a number of global optimization algorithms. We can apply policy (4) directly to a set of fully independent agents, or use it as a one-step approximation in a more complex adaptive model ([12]). Here is how stopping index and onestep reward look for the one-dimensional Brownian model described above (note that these two indices recommend di erent sampling points according to their corresponding maximal values arg maxx J (x) and arg maxx z  (x)).

z*(x)

6

max J 5

J(x)

4

3

2

1

0.2

0.4

0.6

0.8

1

4 Applications Many large-scale problems can be adequately modeled as global optimization. Each of the problems has a particular structure of the objective function and an approach to modeling search processes. If we can construct probabilistic models for agents speci c for a given problem based on experimental data, we can use the model of competing search processes to organize the overall search process. We can see the parallel between this optimization technology and an expert system architecture: 1. data presentation is a core of the optimization technology and it is also a core for knowledgebase design; 2. a knowledge-base should be convenient for search procedures, and, again such procedure is also a part of the optimization technology.

4.1 Production control and scheduling with setups We consider a exible manufacturing system with external demand and setup costs and times. The goal of the control policy is to schedule production of di erent part types in order to minimize total backlog and inventory costs. In certain cases the structure of the solution can be used to reduce the problem to an optimization problem over possible setup moments or over the dual variables ([5]). When the problem is presented as optimization over the setup moments, objective function is highdimensional, areas of attraction around local optima are large and derivatives exist everywhere except boundaries between them. Organization of search

Page 4

in this model requires both global random search in order to nd perspective search areas and multiple local search agents that run concurrently according to their stopping indices. When representation by the dual variables is used, objective function becomes low-dimensional but irregular. Local search can not be used eciently, and we use global search methods that correspond to the Gaussian model of the objective function. In many scheduling, design and control applications objective function is available only via a \black box" - simulation. In certain cases, we can in uence the measures of stochastic simulation by imposing correlations between simulation of di erent inputs or between several stochastic realizations of the same input ([13]). It is often not clear how to weight the bene ts of the achieved variance reduction versus the additional computations required to impose correlations. From the optimization point of view, algorithms that impose correlation between di erent inputs represent yet another search model that can be included in the competing search paradigm: a stopping index for the correlation method characterizes trade-o between variance reduction and computational costs.

4.2 Classi cation Classi cation problems are usually solved by greedy procedures. Success of this approach is due to the heuristic models that were carefully developed for each particular class of problems ([7]). Neural network model is an example of a general representation of the objective function that can be used to model many classi cation problems. [15] describes global optimization setting in neural learning context: the objective function is almost linear in the area of small weight radius R and increasingly non-linear for large R. This leads to ravine-type structure of the objective function, where prospective candidates can be found in the areas of small R and then followed by a local search method into nonlinear areas. This representation requires searching by a combination of specialized local agents ([4]) and global algorithms that search for signi cant directions ([11]).

4.3 Data mining Data mining is de ned as a search for previously unknown meaningful dependencies in huge datasets. To illustrate arising opportunities, we provide an ex-

Table 1: Lifestyle comparison via Altavista New York San Boston Camb-Indy Franridge cisco MA Fun 1 2.3 2.8 0.75 1.3 Theater 1 0.47 0.7 0.24 0.7 1 2.1 1.75 0.1 3.5 Football Correlates N/A 0.33 1 0.375 0.17 with New York 2.15 4.71 2.75 2.67 1.83 Cobol jobs-toresumes ratio 2.86 6.12 2.7 20.67 2.45 Clintonto-Dole ratio ample of such analysis using AltaVista Web search engine. World Wide Web allows easy access to large amount of data. The question arises: can we extract meaningful statistical data from this huge assorted and unstructured dataset? We put ourselves in the shoes of a disgrunted New-Yorker who considers possible places to move. We compare lifestyles in several cities (Table 1) by analyzing easily available Web information - we simply construct queries of the type " near " and compare relative number of hits for di erent queries. The results are, of course, not precise but we get the overall picture ([10]). We can see that aggregate representation of the data allows us to de ne areas where future search should be conducted. Again, we conduct many searches in parallel, allocating resources according to stopping indices. The novelty of the data mining setting is in a potential of full automation of computations. When huge datasets are used both to produce and check statistical hypotheses, costs of various stages of the statistical analysis become comparable because most operations do not involve human interaction. In addition, due to the variety and exibility of arising problems, it becomes much harder to rely on the heuristics algorithms developed for the xed classes of problems. As a result, an issue of the trade-o between di erent search strategy becomes critical in data mining both because sets of possible search strategies increased and because it becomes less evident a priori

Page 5

which methods are ecient. Therefore, an ecient data mining algorithm should utilize many search strategies concurrently.

4.4 Multitarget tracking

One of the challenging resource allocation problems is distributing sensor resources in order to track many targets simultaneously ([1]). We can allocate sensor resources between search for the new targets or exploration of the previously identi ed targets. Exploration of a target can, in turn, include multiple hypotheses testing. The problem is additionally complicated by the fact that search agents often have to work under time constraints. In this case, we need to implement the paradigm of competing agents in the context of multiresolution models ([6],[2]).

5 Conclusions We de ned a competing search methodology that represents uni ed approach to many classes of optimization and search problems. This approach relies on asymptotically optimal adaptive algorithms of multiple selection and allows concurrent application of di erent search strategies to the solution of complex large-scale problems. To fully bene t from the proposed methodologies, developing adequate probabilistic representations of the speci c applications becomes of vital importance. In the area of methodology, future work should focus on developing probabilistic models for typical search agents, such as local search and onedimensional Brownian motion, and analysis of the selection models that take into account potential interaction between agents, restrictions on resources and nite time horizon.

References [1] Bar-Shalom, Y., Ed. Multitarget-multisensor tracking : advanced applications. Norwood, MA : Artech House, 1992. [2] Benveniste, A., Nikoukhah, R., and Willsky, A. Systems and signals: Multiscale system theory. IEEE transactions on circuits & systems. Part 1 41, 1 (1994), 2. [3] Bergman, S. Acceptance Sampling : The Buyer's Problem. PhD thesis, Yale University, 1981.

[4] Bertsekas, D. P. . Nonlinear Programming. Athena Scienti c, 1995. [5] Khmelnitsky, E., Maimon, O., and Streltsov, S. Numerical solution of a deterministic scheduling problem for a multi-product exible manufacturing system with setup time. in preparation, http://cad.bu.edu/go/simon.html (1996). [6] Luettgen, M., and Willsky, A. Likelihood calculation for a class of multiscale stochastic models, with application to texture discrimination. Ieee transactions on image processing 4, 2 (1995), 194. [7] Mirkin, B. Mathematical clustering and classi cation. 1996. [8] Muchnik, I., and Shvartzer, L. Submodular set functions and monotonic systems in aggregation problems. Automation & Remote Control 87, 5{6 (1987), 678{689,821{828. [9] Pinter, J. Convergence quali cation of adaptive partition algorithms in global optimization. Mathematical programming. Series A 56, 3 (Oct. 1992), 343. [10] Streltsov, S. Data mining using web search engines: Lifestyle comparison via Altavista. http://cad.bu.edu/go/mining.html (June 1995). [11] Streltsov, S., and Muchnik, I. Global optimization and line search. working paper, http://cad.bu.edu/go/mining.html (1993). [12] Streltsov, S., and Vakili, P. Multiple selection setting for statistical global optimization. submitted to Journal of Global Optimization (May 1996). [13] Streltsov, S., and Vakili, P. Parallel replicated simulation of markov chains: Parallel implementation and variance reduction. Discrete Event Dynamic Systems: Theory and Applications 6, 2 (1996), 159{180. [14] Tang, Z. Adaptive partitioned random search to global optimization. IEEE Transactions on Automatic Control 32, 11 (Nov. 1994), 2235. [15] Vysniaskas, V. Searching for minimum in neural networks. Informatica 5, 1{2 (1994), 241{255. Additional information is available on http://cad.bu.edu/go/simon.html

Page 6