I have developed a solution to the Traveling Salesman Problem (TSP) using a Genetic Algorithm (GA). In the Traveling Salesman Problem, the goal is to find the shortest distance between N different cities. The path that the salesman takes is called a tour.
Testing every possibility for an N city tour would be N! math additions. A 30 city tour would have to measure the total distance of be 2.65 X 1032 different tours. Assuming a trillion additions per second, this would take 252,333,390,232,297 years. Adding one more city would cause the time to increase by a factor of 31. Obviously, this is an impossible solution.
A genetic algorithm can be used to find a solution is much less time. Although it might not find the best solution, it can find a near perfect solution for a 100 city tour in less than a minute. There are a couple of basic steps to solving the traveling salesman problem using a GA.
As the name implies, Genetic Algorithms mimic nature and evolution using the principles of Survival of the Fittest.
The two complex issues with using a Genetic Algorithm to solve the Traveling Salesman Problem are the encoding of the tour and the crossover algorithm that is used to combine the two parent tours to make the child tours.
In a standard Genetic Algorithm, the encoding is a simple sequence of numbers and Crossover is performed by picking a random point in the parent's sequences and switching every number in the sequence after that point. In this example, the crossover point is between the 3rd and 4th item in the list. To create the children, every item in the parent's sequence after the crossover point is swapped.
Parent 1 | F A B | E C G D |
---|---|
Parent 2 | D E A | C G B F |
Child 1 | F A B | C G B F |
Child 1 | D E A | E C G D |
The difficulty with the Traveling Salesman Problem is that every city can only be used once in a tour. If the letters in the above example represented cities, this child tours created by this crossover operation would be invalid. Child 1 goes to city F & B twice, and never goes to cities D or E.
The encoding cannot simply be the list of cities in the order they are traveled. Other encoding methods have been created that solve the crossover problem. Although these methods will not create invalid tours, they do not take into account the fact that the tour "A B C D E F G" is the same as "G F E D C B A". To solve the problem properly the crossover algorithm will have to get much more complicated.
My solution stores the links in both directions for each tour. In the above tour example, Parent 1 would be stored as:
City | First Connection | Second Connection |
---|---|---|
A | F | B |
B | A | E |
C | E | G |
D | G | F |
E | B | C |
F | D | A |
G | C | D |
The crossover operation is more complicated than combining 2 strings. The crossover will take every link that exists in both parents and place those links in both children. Then, for Child 1 it alternates between taking links that appear in Parent 2 and then Parent 1. For Child 2, it alternates between Parent 2 and Parent 1 taking a different set of links. For either child, there is a chance that a link could create an invalid tour where instead of a single path in the tour there are several disconnected paths. These links must be rejected. To fill in the remaining missing links, cities are chosen at random. Since the crossover is not completely random, this is considered a greedy crossover.
Eventually, this GA would make every solution look identical. This is not ideal. Once every tour in the population is identical, the GA will not be able to find a better solution. There are two ways around this. The first is to use a very large initial population so that it takes the GA longer to make all of the solutions the same. The second method is mutation, where some child tours are randomly altered to produce a new unique tour.
This Genetic Algorithm also uses a greedy initial population. The city links in the initial tours are not completely random. The GA will prefer to make links between cities that are close to each other. This is not done 100% of the time, because that would cause every tour in the initial population to be very similar.
There are 6 parameters to control the operation of the Genetic Algorithm:
The other options that can be configured are (note: these are only available in the downloadable version):
The starting parameter values are:
Parameter | Initial Value |
---|---|
Population Size | 10,000 |
Group Size | 5 |
Mutation | 3 % |
# Nearby Cities | 5 |
Nearby City Odds | 90 % |
Note: I originally wrote this program in 1995 in straight C. The tours in the population were stored as an array of 32 bit int's, where each bit indicated a connection. Ex: If tour[0] = 00000000000001000000010000000000 in binary, then city 0 connected to city 11 and 19. That implementation was much faster than the current C# version. The greedy part of crossover could be performed by doing a binary AND on the two tours. While that code was very fast, it had a lot of binary operations, was limited in the number of cities it could support, and the code wasn't readable. Hopefully, this new version will allow for more re-use.