Traveling Salesman Problem
Using Genetic Algorithms
Click the image to run the TSP Genetic Algorighm.
I have developed a solution to the Traveling Salesman Problem (TSP)
using a Genetic Algorithm (GA)
In the Traveling Salesman Problem, the goal is to find the shortest distance between N different cities.
The path that the salesman takes is called a tour
Testing every possibility for an N city tour would be N! math additions.
A 30 city tour would have to measure the total distance of be 2.65 X 1032
Assuming a trillion additions per second, this would take 252,333,390,232,297 years.
Adding one more city would cause the time to increase by a factor of 31.
Obviously, this is an impossible solution.
A genetic algorithm can be used to find a solution is much less time. Although it might
not find the best solution, it can find a near perfect solution for a 100 city tour in less than a minute.
There are a couple of basic steps to solving the traveling salesman problem using a GA.
- First, create a group of many random tours in what is called a population. This algorithm
uses a greedy initial propulation that gives preference to linking cities that are close to each other.
- Second, pick 2 of the better (shorter) tours parents in the population and combine them
to make 2 new child tours. Hopefully, these children tour will be better than either parent.
- A small percentage of the time, the child tours are mutated. This is done to prevent all
tours in the population from looking identical.
- The new child tours are inserted into the population replacing two of the longer tours.
The size of the population remains the same.
- New children tours are repeatedly created until the desired goal is reached.
As the name implies, Genetic Algorithms mimic nature and evolution using the principles of
Survival of the Fittest
The two complex issues with using a Genetic Algorithm to solve the Traveling Salesman Problem are
the encoding of the tour and the crossover
algorithm that is used to combine the two
parent tours to make the child tours.
In a standard Genetic Algorithm, the encoding is a simple sequence of numbers and Crossover
is performed by picking a random point in the parent's sequences and switching every number
in the sequence after that point. In this example, the crossover point is between the 3rd
item in the list. To create the children, every item in the parent's sequence
after the crossover point is swapped.
|Parent 1||F A B | E C G D|
|Parent 2||D E A | C G B F|
|Child 1 ||F A B | C G B F|
|Child 1 ||D E A | E C G D|
The difficulty with the Traveling Salesman Problem is that every city can only be used
once in a tour. If the letters in the above example represented cities, this child tours
created by this crossover operation would be invalid. Child 1 goes to city F & B twice, and
never goes to cities D or E.
The encoding cannot simply be the list of cities in the order they are traveled.
Other encoding methods have been created that solve the crossover problem.
Although these methods will not create invalid tours, they do not take into account the
fact that the tour "A B C D E F G" is the same as "G F E D C B A". To solve the problem properly
the crossover algorithm will have to get much more complicated.
My solution stores the links
in both directions for each tour. In the above tour example, Parent 1
would be stored as:
|City||First Connection||Second Connection|
The crossover operation is more complicated than combining 2 strings. The crossover
will take every link that exists in both parents and place those links in both children.
Then, for Child 1 it alternates between taking links that appear in Parent 2 and then Parent 1.
For Child 2, it alternates between Parent 2 and Parent 1 taking a different set of links.
For either child, there is a chance that a link could create an invalid tour where instead of
a single path in the tour there are several disconnected paths. These links must be rejected.
To fill in the remaining missing links, cities are chosen at random.
Since the crossover is not completely random, this is considered a greedy crossover.
Eventually, this GA would make every solution look identical. This is not ideal. Once
every tour in the population is identical, the GA will not be able to find a better solution.
There are two ways around this. The first is to use a very large initial population so that it
takes the GA longer to make all of the solutions the same. The second method is mutation
where some child tours are randomly altered to produce a new unique tour.
This Genetic Algorithm also uses a greedy initial population. The city links in the initial tours
are not completely random. The GA will prefer to make links between cities that are close to each other.
This is not done 100% of the time, becuase that would cause every tour in the initial population to
be very similar.
There are 6 parameters to control the operation of the Genetic Algorithm:
- Population Size - The population size is the initial number of random tours
that are created when the algorithm starts. A large population takes longer to find a result.
A smaller population increases the chance that every tour in the population will eventually
look the same. This increases the chance that the best solution will not be found.
- Neighborhood / Group Size - Each generation, this number of tours are randomly
chosen from the population. The best 2 tours are the parents. The worst 2 tours get replaced
by the children. For group size, a high number will increase the likelyhood that the really
good tours will be selected as parents, but it will also cause many tours to never be used
as parents. A large group size will cause the algorithm to run faster, but it might not
find the best solution.
- Mutation % - The percentage that each child after crossover will undergo mutation
When a tour is mutated, one of the cities is randomly moved from one point in the tour to another.
- # Nearby Cities - As part of a greedy initial population, the GA will prefer to link
cities that are close to each other to make the initial tours. When creating the initial population
this is the number of cities that are considered to be close.
- Nearby City Odds % - This is the percent chance that any one link in a random tour in
the initial population will prefer to use a nearby city instead of a completely random city. If the GA
chooses to use a nearby city, then there is an equally random chance that it will be any one of the
cities from the previous parameter.
- Maximum Generations - How many crossovers are run before the algorithm is terminated.
The other options that can be configured are (note: these are only available in the downloadable version):
- Random Seed - This is the seed for the random number generator. By having a fixed instead
of a random seed, you can duplicate previous results as long as all other parameters are the same.
This is very helpful when looking for errors in the algorithm.
- City List - The downloadable version allows you to import city lists from XML files.
Again, when debugging problems it is useful to be able to run the algorithm with the same exact parameters.
The starting parameter values are:
|Parameter ||Initial Value|
|Population Size ||10,000|
|Group Size ||5|
|Mutation ||3 %|
|# Nearby Cities ||5|
|Nearby City Odds ||90 %|
Note: I originally wrote this program in 1995 in straight C.
The tours in the population were stored as an array of 32 bit int's, where each bit indicated a connection.
Ex: If tour = 00000000000001000000010000000000 in binary, then city 0 connected to city 11 and 19.
That implementation was much faster than the current C# version.
The greedy part of crossover could be performed by doing a binary AND on the two tours.
While that code was very fast, it had a lot of binary operations, was limited in the number of cities it could support, and the code wasn't readable.
Hopefully, this new version will allow for more re-use.
You can download this program here:
Source Code - Microsoft Visual Studio 2005 .NET 2.0 C# Project
Run the online version
See some of the other AI projects on the site