  
		Basic Ideas
		
		
		
		In this page, we 
		describe the basic ideas of Combinatorial Optimization
		Problem (COP), Stochastic Local Search (SLS) 
		algorithms, Fitness Landscape 
		and Search Trajectory (FLST) visualization, the 
		implementation of FLST visualization in Viz, and the Integrated White 
		and Black Box Approach. This page is just a 
		summary and written in an informal tone, targeted for wider audience. More details and formal 
		presentations can be found in the main author future thesis (later) and 
		in our recent papers. 
		
		
		Table of Contents 
		
		1. Combinatorial 
		Optimization Problem (COP): definition, examples, and algorithms to 
		attack these problems 
		2. Stochastic 
		Local Search (SLS) Algorithm: definition, examples, SLS engineering, 
		SLS design and tuning problem 
		
		3. Fitness Landscape and Search Trajectory (FLST) visualization: the 
		SLS visualization ideas 
		
		4. Implementation of FLST visualization in SLS engineering suite: Viz 
		5. The 
		Integrated White and Black Box Approach (IWBBA): combining the 
		strength of both worlds 
		6. References 
		
		
		1. 
		Combinatorial Optimization Problem (COP) 
		
		
		
		Combinatorial Optimization Problem (COP) is a class of problem where 
		the solution(s) are combinatoric, e.g. a permutation, an assignment, 
		etc. The term optimization implies that one is interested in finding the 
		maximum (or minimum) of a set of feasible combinatorial solutions. In 
		general, COPs are classified as
		NP-hard, 
		which in loose term means that we will need enormous computation time in 
		order to find the best solution for these problems. 
		
		Examples of COPs include the 
		Traveling Salesman (TSP), 
		Quadratic 
		Assignment (QAP), 
		Vehicle Routing (VRP),
		
		Knapsack, etc. 
		
		To attack these COPs, people devised various computer algorithms that 
		can be classified into two major extremes: exact (complete search) 
		algorithms versus non-exact (incomplete) algorithms. There are pros and 
		cons between these two approaches which are summarized in the following 
		table. 
		
		
		
		2. Stochastic 
		Local Search (SLS) Algorithm 
		
		
		
		Stochastic Local Search (SLS), also known as
		
		Metaheuristic, is a non-exact algorithm that can be loosely 
		described as follows: 
		
			
				
					
					
		 1. SLS() { 
 2.  
		start from any solution of the COP instance; 
		 3.   while (not finished) { 
 4.     locally modify (search around) the current 
					solution with some heuristic/stochastic rule; 
 5.
    if (the newly found solution is better than the best solution so 
		far) 
 6. 
    update the best solution status; 
		 7.  
		} 
 8.   return the best found solution that it manages to find throughout 
		the search process; 
 9. } | 
				 
			 
		 
		
		SLS algorithms have been shown to be quite successful in attacking 
		various COPs. Examples of SLS algorithms include: 
		Tabu Search (TS), 
		Iterated Local Search (ILS), 
		Simulated Annealing (SA), 
		Ants Colony 
		Optimization (ACO), 
		Genetic Algorithms 
		(GA), 
		Variable Neighborhood Search (VNS),
		Guided 
		Local Search (GLS), etc. 
		
		While the basic version of an SLS algorithm is relatively simple, once 
		we want to make it performs 'well', we will need to design 
		(choose appropriate SLS components and search strategies), implement the 
		algorithm properly using the best data structures, tune the SLS 
		parameters, and analyze its performance. This SLS engineering process is 
		not a straightforward task and often consumes a lot of development time! 
		This issue is even more important within people without strong 
		background in local search techniques. This hinders the adoption of SLS 
		methods to wider community. 
		
		With this motivation, the main author proposed "to study the ways to 
		address this issue of designing an effective SLS algorithm and tuning 
		the corresponding SLS implementation for various COPs" in his PhD 
		thesis. 
		
		
		
		
		3. Fitness Landscape and Search Trajectory (FLST) visualization  
		
		This section has been 
		presented in UIST 
			2006, SLS 
			2007, and CP 
			2007.
		
		To address the SLS Design and Tuning Problem, one needs to analyze the 
		SLS behavior (which is closely related to its performance). However, analyzing SLS behavior is difficult.
		For example, if we look at this 
		
		RunLog file (a file that records the search information every iteration) alone, 
		we probably will not get many information from it. Thus people devise 
		SLS analysis techniques that can be classified as either 
		statistical 
		techniques or 
		information visualization techniques. 
		With such analysis, people hope to gain insights on how to further 
		improve the SLS algorithms. 
		
		We propose FLST visualization as an extension cum combination of the 
		existing
		Fitness Distance Correlation (FDC) and 
		Run Time Distribution (RTD) analysis. 
		The basic ingredients to form this 
		visualization are the fitness landscape components (search space, 
		distance function, and objective function) of the COP instance and the solutions visited 
		by an individual SLS runs. Obviously, visualizing exponentially 
		large fitness landscape is not trivial. We explain the FLST visualization ideas using the series of 
		illustrations below: 
		
			
				
					| 
					 
					Explanation  | 
					
					 
					Illustration  | 
				 
				
					| 
					
		 
		We visualize 
		fitness landscape of a COP 
		instance like 
		this mountain range picture: 
		*search space (very big), is 
		visualized as a collection of a lot of points (solutions) 
		*distance function spatially separates one mountain (solution) 
		and the other mountains 
		*objective function determines the height of each mountain 
		(solution) 
		*global optima is the highest mountain (solution with the best 
		objective value) 
		*local optima is high mountains but not the highest 
					
		Notes: This fitness 
					landscape formulation was proposed by 
		P. Merz in his 
					PhD thesis. This definition is slightly different with 
					the one in
		H.H. Hoos and T. Stuetzle's book: SLS: 
					Foundations and Applications where the fitness landscape is 
					defined as <search space, neighborhood function, and 
					objective function>. We do not use neighborhood function as 
					it will cause our FLST visualization to be unstable 
		(changing every SLS iteration).  | 
					
					 
		   | 
				 
				
					| 
					
		 
		We visualize the search trajectory of 
		an individual SLS run as a movement of that SLS on the fitness landscape 
		of the COP instance being attacked. The movement can be due to a local move 
		within a local neighborhood or a non local move due to strong 
		diversification mechanism. The objective is to find the global optima: imagine that you are one tiny human in this mountain range and can only 
		see the surroundings within radius 1 km (local view) and you need to 
		navigate locally to find the highest mountain. 
		
		However, without any 'reference point', it 
		is quite hard to describe/explain what is going on here (see the 
		picture on the right and try your best to explain the movement)... 
					
		Notes: This search 
					trajectory formulation is defined in 
		SLS: Foundations and Applications. 
					 | 
					
					 
		   | 
				 
				
					| 
					
		 
		Now suppose that we record the solution 
		denoted by the yellow rectangle (see the picture on the right).
		We can now describe the same search trajectory 
		above as follows: 
		
		"The search trajectory once hits the yellow rectangle solution, then 
		it moves somewhere 
		else, then after certain number iterations,
		it hits the yellow rectangle solution 
		again. Is this a solution cycling phenomenon? Is the SLS trapped?". 
		
		See that now we can say more things 
		with the existence of a reference point.  | 
					
					 
		   | 
				 
				
					| 
					
		 
		To build the FLST visualization, we gather 
		a constant amount of high quality (usually local optima), diverse, frequently 
		visited solutions in the fitness landscape using the SLS algorithms 
		themselves! We know that we cannot expect to record all 
		points (exponential space!) therefore we should expect to miss 
		some good points (look at the small blue arrows pointing at the other two 
		mountain peaks at the background). Nevertheless, if the 
		anchor points collected are reasonably good, diverse, and -that is- the 
		important ones, we can say that we have a 
		reasonable approximation of the fitness landscape. 
		 
		In order to collect these points, we run the SLS with different 
		configurations, with longer run 
		times, and let it loose.
		It will then sample various points in the search space. We then filter 
		the interesting points which we called: the Anchor Points, 
		abbreviated as APs.  | 
					
					 
		   | 
				 
				
					| 
					 
					We can also add quality information by labeling these anchor points 
					with color+shape: 
					
					
					*black 
		dot-very bad, 
					*yellow rectangle-bad, 
					*green triangle-medium, 
					
					*blue circle-good. 
					
					We use both color and shape as color alone is hard to be 
					distinguished in black and white scientific papers! Remember 
					that not all scientific publications out there are in color 
					at this point of time.  | 
					
					 
		   | 
				 
				
					| 
					
		 
		So now, we have this Fitness Landscape 
		visualization based on these 4 selected APs. 
					
		With these APs, we can now describe the search trajectory: 
					
		In the picture on the right, the 
		pink search trajectory encounters 
		solution cycling in yellow/black (poor) APs.
		It fails to reach the 
		better green/blue (better) APs. 
					 | 
					
					 
		   | 
				 
				
					| 
					
		 
		And for this picture on the right, the 
		blue search trajectory 
		performs a diversification strategy after hitting an AP (local optima). 
		From this visualization, we see that it manages to reach the better
		green/blue APs 
		and we can say that it performs better than the
		pink search trajectory 
		above. 
		
		  
		
		This is our proposed FLST 
		visualization. 
		If you have any comments, please don't hesitate to email the main 
		author: 
		stevenhalim at gmail dot com 
					 | 
					
					 
		   | 
				 
			 
		 
		
		
		With FLST visualization, 
		one can answer these
		
		COP fitness landscape characteristics and SLS behaviors questions (not exhaustive): 
		
		The fitness landscape characteristics of 
		the COP instance: 
		
		1. Are the local optima clustered or 
		spread? 
		2. Does the fitness landscape smooth or rugged? 
		3. How many objective value levels in the fitness landscape? is it 
		discrete or continuous? 
		4. Are there infeasible regions and where are they located? 
		
		
		The search trajectory behaviors of the SLS 
		algorithm on a particular COP instance/fitness landscape: 
		1. Does the SLS behave like as what we intended? How does it make 
			progress? Does it quickly find the best known solution or wander around in other 
			regions? 
		2. How good is the SLS in intensification and diversification? 
		3. Is there any sign of cycling behavior (search stagnation)? 
		4. Where in the fitness landscape does the SLS 
			spend most of its time? 
		5. How far is the starting/initial/greedy 
			solution w.r.t the global optima/best known solution? 
		6. How wide is the SLS coverage? 
		7. What is the effect of modifying a certain 
			search parameter/component/strategy w.r.t the SLS behavior? 
		8. How do two different SLS algorithms (on the same COP fitness 
		landscape) compare? 
		
		Answering these questions can give insights on designing 
		better performing SLS algorithm to attack the COP at hand. 
		
		 
		
		
		
		4. Implementation of FLST visualization in SLS engineering suite: Viz 
		
		Parts of this section has 
		been presented in UIST 
			2006 and updated in SLS 
			2007.
		
		In section 3 above, we show the FLST visualization ideas. In this section, we 
		explain how we implement those ideas in a concrete visualization tool Viz. Viz consists of two main programs: Viz Experiment Wizard (EW) which 
		executes SLS algorithms and processes the RunLogs and Viz Single 
		Instance Multiple Runs Analyzer (SIMRA) which displays FLST 
		visualization and some other statistical information in a user friendly 
		GUI. 
		
		1. Potential AP Selection phase: 
		
		Here, Viz EW runs the SLS algorithms (each 
		with its selected configuration) on the same COP 
		instance (must be on the same fitness landscape, otherwise the collected points 
		will be irrelevant with each other) to collect potential local optima points visited by 
		those SLS. Of course, the SLS must have been augmented with
		simple codes to 
		record search information. We use some of the points found by the SLS algorithms 
		themselves to visually analyze the SLS performance. 
		
		Alternatively, the user can choose to feed Viz EW with RunLog 
		files generated by their SLS algorithms running in non Windows operating 
		system. That's it, Viz EW can be set to work with RunLog files only. The non-Windows 
		users do not need to rewrite or recompile their SLS code in Windows 
		platform to use this tool. They just need to install Viz on another Windows PC. 
		e.g. friends' 
			Windows PC, school's Windows PC, or company's Windows PC, etc and 
		process the RunLog files there. (however, to fully utilize the benefits of Viz EW, we suggest that 
		you write a Windows based wrapper code that perform these steps: (1) 
		login to other OS, (2) executes the SLS algorithm in other OS, and (3) 
		returns the log files back to Viz EW). 
		
		
		FAQs: 
					Q: Why do you use SLS algorithms to sample points in the search 
					space? 
					A: Obviously, we do not have the time to generate all points using 
					exact search just to analyze non-exact search trajectories. 
					The best way is to analyze points visited by the SLS 
					algorithms itself. Ensure that your SLS algorithm does not stuck 
					(simply add random restart) so that it samples enough 
					points. However, even if your SLS algorithm does stuck, this 
		FLST visualization will show it to you. 
					 
					Q: Why do you use log file (offline visualization) instead of 
					real-time visualization? 
					A: Offline visualization has more advantages. First, we can display the final AP 
					set so that the visualization is more stable rather than 
		updating the AP set on the fly. Second, by 
					communicating via log files, one does not need to depend on 
					any Operating System. Third, visualization takes 
					some CPU power to be processed, we simply do not want our 
		visualization algorithm compete 
					with the SLS algorithm in terms of CPU resource. 
		 
		Q: Wait, is 
		it true that Viz can only analyze 1 (ONE) 
			single COP instance at one time. Will that be myopic? 
		A: We agree that analyzing SLS algorithm must be done with 
		respect to several instances so that our observations are not instance 
		specific only. Viz is designed to analyze the details of various SLS 
		runs on a single/1/one COP instance. However, you can still run your SLS algorithm on several training instances 
		using Viz EW. You will  
		obviously see different 
		details from one instance with the other instances but you should have 
		identify more or less some general characteristics (e.g. all training 
		instances have "Big Valley" properties, etc).
		
		
		2. AP Update phase: 
		
		In this phase, after the potential APs have been collected, Viz EW filters them and pick a 
		small number of more diverse and 
		high quality APs to form an updated AP set. The reason of this step is 
		simply because we cannot 
		draw too many points on the screen. 
		
		  
		
		
		FAQs : 
					Q: How many points in the screen is 'too many'? 
					A: As soon as one needs to use zoom in/out, panning 
					(scroll left/right/up/down), or image distortion (e.g. 
		fisheye technique), then the points are too many. 
					Zooming, panning, image distortion, etc decrease the capability of the user in 
					finding information from visualized data. Therefore, the 
		number of points is set to be reasonably small, which is currently set 
		to be min(50,0.5*instance size n). 
		 
		Q: But, the number of points in the screen can be 'enlarged' if you use 
		larger monitor, why don't you assume that the end users are using large 
		monitor? 
		A: At this point of time, we designed Viz to work under 800*600 monitor 
		resolution. That gives us 480,000 pixels to play with. Soon, we will 
		increase Viz default window size to 1024*768, which is about 786,432 
		pixels, around 1.5 more pixels to work with. However, we cannot keep 
		enlarging Viz window size as currently the typical monitor resolution is 
		around [1024*768 .. 1200*800]. We want Viz to work for this kind of 
		monitor resolution and thus can be used by most people that own a 
		computer today! 
		 
					Q: How do you actually filter the points? 
					A: The actual strategy is a bit too technical and still 
		under research. In general, we select AP points based on these criteria: 
		(1). have high quality (randomly selected from the top K% of the 
		collected points), (2). increase diversity of the AP set (if we already 
		have good point X in the AP set, we do not include X' which is very 
		similar to X), (3). have some importance in the SLS runs used to form 
		the AP set (e.g. the best solution found by the SLS, the most frequently 
		visited solution, etc).
		
		3. AP Layout phase: 
		
		After the AP set is selected, Viz layouts the APs in the abstract 2-D space 
		using a spring model (also 
		called 
		
		force-directed layout algorithms). We measure the distance (bond-number 
		of different edges,
		Hamming-number 
		of different bits, etc) between APs to set up the spring model. The spring system 
		will try to stretch and shrink itself into its most natural state 
		(minimal tension) and that make APs that are close to each other to 
		appear close in the visualization and vice versa. 
		
		We found that presenting the fitness 
		landscape information in spatial manner like this is quite natural and 
		easy to 
		comprehend. 
		
		  
		
		FAQs: 
					Q: Why use spring model and not any other ways? 
					A: It is so far the most natural representation that we can 
		think of. We are also aware of other graph drawing technique like 
		constraint stress majorization algorithm and will explore other 
		techniques soon. 
		 
					Q: What if the distance metric for the solutions in my COP 
		are not Hamming or bond distance? 
					A: We are in process of extending the built-in distance 
		metric inside Viz. Please co-operate with us by e-mailing the main 
		author: stevenhalim at gmail dot com these information: the COP that you 
		are planning to attack and the algorithm to compute distance metric that 
		you think will be suitable for your COP. 
		
		4. AP Labeling phase: 
		
		After the AP layout phase, we further enhance the fitness landscape presentation by labeling the APs using color and shape labels 
		as shown in the legend below.
		Now with one glance, we know the rough distribution and 
		quality of the 
		anchor points. 
		
		  
		
		
		FAQs : 
					Q: Do you have any particular reason why you choose these 
		set of colors? 
					A: The color set is actually taken from the color set used 
		in geographical map where blue represents deep oceans, green represents 
		the forests, yellow represents the mountains, and white surroundings 
		around black dot represents the snowy peaks. 
					 
					Q: Why do you use double features: shape and color, for the 
		labels? 
					A: While color is the best means to group these information, it is quite hard to see in 
		black and white scientific papers. So we add the shape :)
		
		5. Search Trajectory Layout phase: 
		
		  
		
		Now that we have the abstract 2-D space of 
		anchor points ready, we can visualize the movements of 
		the current point/solution (which is actually an N-dimensional combinatorial solution) 
		of the SLS run as it pass through the COP Fitness Landscape. 
		We playback the individual SLS run by measuring the distance of the 
		current solution of an SLS run with the APs. If it is close, we draw a 
		circle (smaller circle: more similar, larger circle: more different). If 
		the current solution is nowhere near any of the selected APs, we do not 
		draw anything (no information can be gained from this iteration). The 
		movement (appearance and disappearance) of these circles tells us about 
		the SLS trajectory. 
		
		As an example, we explain the interpretation of the search 
		trajectory visualization of the four runs above (see step 1). The 
		interpretation depends on the position and quality of APs 
		visited by the search trajectory. These are the possible interpretations: 
		
		Run 1: SLS visits medium quality (green 
		triangle) APs 
			and encounter solution cycling issue near AP C. Perhaps AP C is too attractive 
		even though it is not the best quality (AP D has better quality as 
		indicated by its 
		blue circle lable). 
		 
		Run 2: SLS visits bad region (yellow 
		rectangle): AP B, 
			then somewhere near AP C (here we assume that solution F in run 2 is close to 
			AP C, just to illustrate the drawing of circle with larger diameter 
		around AP C), and finally arrive at AP A (very bad/black 
		dot). Perhaps this SLS is not doing a good intensification. 
		 
		Run 3: SLS starts from a very bad AP A 
		(black dot), gradually 
		moves to medium AP C (green triangle), then to good AP D (blue circle). 
		This SLS behavior is quite okay. 
		 
		Run 4: The SLS trajectory is not 
		close to any known AP. It is exploring somewhere else, but definitely 
		not somewhere near these selected anchor points, it only hits AP A (very 
			bad/black 
		dot AP). 
		
		FAQs: 
					Q: Do you know any other ways to visualize search trajectory? 
					A: To date this is our best way to visualize search 
		trajectory. We will keep exploring other alternatives. 
		 
					Q: Do you realize that when two people look at the same FLST 
		visualization, they may arrive at different interpretations? 
					A: We agree with that. There may be more than one way to 
		interpret the visualization and two algorithm designer may design two 
		different SLS algorithms based on the visualization. Eventually, we just 
		want improved SLS algorithms. If both interpretations yield 
		improvements, that is ok. 
		 
		Q: My SLS is not a trajectory based visualization! I am using population 
		based algorithm. So how to visualize the 'search trajectory' of a 
		population? 
					A: To date, our best way to overcome this situation is by 
		defining the search trajectory as the movement of the 'best solution 
		within the population' from one generation to the next... We are 
		exploring some visualization ideas to better visualize population based 
		SLS algorithm. 
		 
		
		 Q: How accurate is 
		this FLST visualization? 
		A: The accuracy of FLST visualization is still a research topic. 
		We know that the accuracy will depends on at least two things: (1). the 
		number of anchor points that the user wants to display on the screen 
		(more APs, more layout errors), (2). the way we select AP set (if we 
		miss important APs, the FLST visualization will be less accurate).
		
		
		
		Viz SIMRA, the program that displays this FLST visualization, has few 
		more features that have not been elaborated. We invite the reader the 
		visit our Viz version history to appreciate 
		the development of Viz SIMRA user interface. The GUI are designed with 
		end user in mind and it boasts the following features: coordinated 
		visualizations from various angle, visual comparison, animated search 
		playback, heavy usage of color and highlighting, multiple level of 
		details, and including some built-in statistical analysis. 
		 
		This concludes section 4. 
		
		
		5. 
		The Integrated White and Black Box Approach (IWBBA) 
		
		This section has been 
		presented in CP 
			2007. 
		
		FLST visualization can give us insights of the approximate fitness 
		landscape structures and SLS behavior. However, while that may give you 
		insights to design good performing SLS algorithm, it is still not enough if we 
		want the 'best' performing SLS algorithm for our COP. 
		
		We suggest that one combines the strength of the white-box approach with 
		black-box approach: 
		1. In white box approach, we analyze our SLS algorithm by means of 
		statistic or information visualization techniques to gain insights on 
		how to design better performing SLS algorithm that match the observed 
		fitness landscape characteristics. 
		2. In black-box approach, we use a specialized tuning algorithm which 
		will find the best configuration for our SLS algorithm given the 
		configuration space and the training set of COP instances. Examples of 
		such algorithms are:
		
		ParamILS (Hutter et al., 2007),
		
		F-Race (Birattari, 2004),
		
		CALIBRA (Adenso-Diaz and Laguna, 2006). 
		
		
		We invite the reader to 
		check our results page where we apply this 
		IWBBA using Viz on real COPs and 
		real SLS algorithms. 
		
		6. References 
		
		
		These basic ideas are explained in more 
		details in our publications (download the local copy of those papers
		here): 
		
			- 
			
			S. Halim, R. Yap, H.C. Lau. An Integrated White+Black Box Approach for 
			Designing and Tuning Stochastic Local Search. 
            In Principles and Practice of Constraint Programming (CP 
			2007, Providence, Rhode Island, USA, September 23-27, 2007): 
			332-347  
			- 
			
			S. Halim, R. Yap.
			Designing and Tuning SLS through Animation and Graphics: an 
			Extended Walk-through. 
            In Engineering Stochastic Local Search Workshop (SLS 
			2007, Brussels, Belgium, September 6-8, 2007): 16-30  
			- 
			
			S. Halim, R. Yap, H.C. Lau.
			 
			Viz: A 
            Visual Analysis Suite for Explaining Local Search Behavior. 
            In User Interface System and Technology (UIST 
			2006, Montreux, Switzerland, October 15-18, 2006): 57-66 
			 
		 
		 
         | 
	 
	
		| 
		
		
This document, basic_ideas.html, has been accessed 646 times since 25-Jun-24 11:57:13 +08.
This is the 5th time it has been accessed today.
 
A total of 331 different hosts have accessed this document in the
last 498 days; your host, nsrp-source.comp.nus.edu.sg, has accessed it 104 times.
 
If you're interested, complete statistics for
this document are also available, including breakdowns by top-level
domain, host name, and date.
 
  | 
	 
 
 |