Genetic algorithm
|
A genetic algorithm (GA) is a heuristic used to find approximate solutions to difficult-to-solve problems through application of the principles of evolutionary biology to computer science. Genetic algorithms use biologically-derived techniques such as inheritance, mutation, natural selection, and recombination (or crossover). Genetic algorithms are a particular class of evolutionary algorithms.
Genetic algorithms are typically implemented as a computer simulation in which a population of abstract representations (called chromosomes) of candidate solutions (called individuals) to an optimization problem evolves toward better solutions. Traditionally, solutions are represented in binary as strings of 0s and 1s, but different encodings are also possible. The evolution starts from a population of completely random individuals and happens in generations. In each generation, the fitness of the whole population is evaluated, multiple individuals are stochastically selected from the current population (based on their fitness), modified (mutated or recombined) to form a new population, which becomes current in the next iteration of the algorithm.
Contents |
Operation of a GA
The problem to be solved is represented by a list of parameters which can be used to drive an evaluation procedure, called chromosomes or genomes. Chromosomes are typically represented as simple strings of data and instructions, in a manner not unlike instructions for a von Neumann machine, although a wide variety of other data structures for storing chromosomes have also been tested, with varying degrees of success in different problem domains.
Initially several such parameter lists or chromosomes are generated. This may be totally random, or the programmer may seed the gene pool with "hints" to form an initial pool of possible solutions. This is called the first generation pool.
During each successive generation, each organism (or individual) is evaluated, and a value of goodness or fitness is returned by a fitness function. The pool is sorted, with those having better fitness (representing better solutions to the problem) ranked at the top. Notice that "better" in this context is relative, as initial solutions are all likely to be rather poor.
The next step is to generate a second generation pool of organisms, which is done using any or all of the genetic operators: selection, crossover (or recombination), and mutation. A pair of organisms are selected for breeding. Selection is biased towards elements of the initial generation which have better fitness, though it is usually not so biased that poorer elements have no chance to participate, in order to prevent the solution set from converging too early to a sub-optimal or local solution. There are several well-defined organism selection methods; roulette wheel selection and tournament selection are popular methods.
Following selection, the crossover (or recombination) operation is performed upon the selected chromosomes. Most genetic algorithms will have a single tweakable probability of crossover (Pc), typically between 0.6 and 1.0, which encodes the probability that two selected organisms will actually breed. A random number between 0 and 1 is generated, and if it falls below the crossover threshold, the organisms are mated; otherwise, they are propagated into the next generation unchanged. Crossover results in two new child chromosomes, which are added to the second generation pool. The chromosomes of the parents are mixed in some way during crossover, typically by simply swapping a portion of the underlying data structure (although other, more complex merging mechanisms have proved useful for certain types of problems.) This process is repeated with different parent organisms until there are an appropriate number of candidate solutions in the second generation pool.
The next step is to mutate the newly created offspring. Typical genetic algorithms have a fixed, very small probability of mutation (Pm) of perhaps 0.01 or less. A random number between 0 and 1 is generated; if it falls within the Pm range, the new child organism's chromosome is randomly mutated in some way, typically by simply randomly altering bits in the chromosome data structure. Note however, that a very small mutation rate may lead to a de facto implementation of genetic drift (which is non-ergodic in nature) rather than to an implementation of a search algorithm that thoroughly explores the search space.
These processes ultimately result in a second generation pool of chromosomes that is different from the initial generation. Generally the average degree of fitness will have increased by this procedure for the second generation pool, since only the best organisms from the first generation are selected for breeding. The entire process is repeated for this second generation: each organism in the second generation pool is then evaluated, the fitness value for each organism is obtained, pairs are selected for breeding, a third generation pool is generated, etc. The process is repeated until an organism is produced which gives a solution that is "good enough".
A slight variant of this method of pool generation is to allow some of the better organisms from the first generation to carry over to the second, unaltered. This form of genetic algorithm is known as an elite selection strategy.
Observations
There are several general observations about the generation of solutions via a genetic algorithm:
- Unless the fitness function is handled properly, GAs may have a tendency to converge towards local optima rather than the global optimum of the problem.
- Operating on dynamic data sets is difficult, as genomes begin to converge early on towards solutions which may no longer be valid for later data. Several methods have been proposed to remedy this by increasing genetic diversity somehow and preventing early convergence, either by increasing the probability of mutation when the solution quality drops (called triggered hypermutation), or by occasionally introducing entirely new, randomly generated elements into the gene pool (called random immigrants).
- Selection is clearly an important genetic operator, but opinion is divided over the importance of crossover verses mutation. Some argue that crossover is the most important, while mutation is only necessary to ensure that potential solutions are not lost. Others argue that crossover in a largely uniform population only serves to propagate innovations originally found by mutation, and in a non-uniform population crossover is nearly always equivalent to a very large mutation (which is likely to be catastrophic).
- GAs can rapidly locate good solutions, even for difficult search spaces.
- A number of experts believe that simpler optimization algorithms can find better local optima than GAs (given the same amount of computation time). Practitioners may wish to try other algorithms in addition to GAs, especially random-restart hill climbing.
- GAs can not effectively solve problems in which there is no way to judge the fitness of an answer other than right/wrong, as there is no way to converge on the solution. These problems are often called "needle in a haystack" problems.
- As with all current machine learning problems it is worth tuning the parameters such as mutation probability and recombination probability to find reasonable setting for the problem class you are working on. There are theoretical but not yet pratical upper and lower bounds for these parameters that can help guide selection.
Variants
The simplest algorithm represents each chromosome as a bit string. Typically, numeric parameters can be represented by integers, though it is possible to use floating point representations. The basic algorithm performs crossover and mutation at the bit level.
Other variants treat the chromosome as a list of numbers which are indexes into an instruction table, nodes in a linked list, hashes, objects, or any other imaginable data structure. Crossover and mutation are performed so as to respect data element boundaries. For most data types, specific variation operators can be designed. Different chromosomal data types seem to work better or worse for different specific problem domains.
There have also been attempts to introduce other evolutionary operations such as movement of genes, in the manner of transposons. These movements change the schema of the chromosome making effects of linkage visible.
There are also parallel implementations of genetic algorithms that use computers as 'islands' and implement migrations of populations from one computer to another over a network.
Some other variants introduce a variable fitness function. In classical genetic algorithms, the fitness function is unchanged over time. In simulated annealing the fitness function is changed over time and in artificial life, the fitness of any individual is affected by all individuals in the population with which it interacts.
Efficiency
Genetic algorithms are known to produce good results for some problems. Their major disadvantage is that they are relatively slow, being very computationally intensive compared to other methods, such as random optimization.
Recent speed improvements have focused on speciation, where crossover can only occur if individuals are closely-enough related. Genetic algorithms are extremely easy to adapt to parallel computing and clustering environments. One method simply treats each node as a parallel population. Organisms are then migrated from one pool to another according to various propagation techniques.
Another method, the Farmer/worker architecture, designates one node the farmer, responsible for organism selection and fitness assignment, and the other nodes as workers, responsible for recombination, mutation, and function evaluation.
Problem domains
Problems which appear to be particularly appropriate for solution by genetic algorithms include timetabling and scheduling problems, and many scheduling software packages are based on GAs. GAs have also been applied to engineering. Genetic algorithms are often applied as an approach to solve global optimization problems. Genetic algorithms have been successfully applied to the study of neurological evolution (see NeuroEvolution of Augmented Topologies).
As a general rule of thumb genetic algorithms might be useful in problem domains that have a complex fitness landscape as recombination is designed to move the population away from local minima that a traditional hill climbing algorithm might get stuck in.
History
Genetic algorithms or "GAs" originated from the studies of cellular automata, conducted by John Holland and his colleagues at the University of Michigan. Research in GAs remained largely theoretical until the mid-1980s, when The First International Conference on Genetic Algorithms was held at The University of Illinois. As academic interest grew, the dramatic increase in desktop computational power allowed for practical application of the new technique. In 1989, The New York Times writer John Markoff wrote about Evolver, the first commercially available desktop genetic algorithm. Custom computer applications began to emerge in a wide variety of fields, and these algorithms are now used by a majority of Fortune 500 companies to solve difficult scheduling, data fitting, trend spotting, budgeting and virtually any other type of combinatorial optimization.
Pseudo-code algorithm
Choose initial population Repeat Evaluate the individual fitnesses of a certain proportion of the population Select best-ranking individuals to reproduce Mate pairs at random Apply crossover operator Apply mutation operator Until terminating condition (see below)
Terminating conditions often include:
- Fixed number of generations reached
- Budgeting: allocated computation time/money used up
- An individual is found that satisfies minimum criteria
- The highest ranking individual's fitness is reaching or has reached a plateau such that successive iterations are not producing better results anymore.
- Manual inspection. May require start-and-stop ability
- Combinations of the above
Applications
- Automated design, including research on composite material design and multi-objective design of automotive components for crashworthiness, weight savings, and other characteristics.
- Automated design of mechatronic systems using bond graphs and genetic programming (NSF).
- Calculation of Bound States and Local Density Approximations.
- Configuration applications, particularly physics applications of optimal molecule configurations for particular systems like C60 (buckyballs).
- Container loading optimization.
- Distributed computer network topologies.
- Electronic circuit design, known as Evolvable hardware.
- File allocation for a distributed system.
- Parallelization of GAs/GPs including use of hierarchical decomposition of problem domains and design spaces nesting of irregular shapes using feature matching and GAs.
- Game Theory Equilibrium Resolution.
- Learning Robot behavior using Genetic Algorithms.
- Learning fuzzy rule base using genetic algorithms.
- Mobile communications infrastructure optimization.
- Molecular Structure Optimization (Chemistry).
- Multiple population topologies and interchange methodologies.
- Protein folding and protein/ligand docking.
- Plant floor layout.
- Scheduling applications, including job-shop scheduling. The objective being to schedule jobs in a sequence dependent or non-sequence dependent setup environment for a minimal total tardiness.
- Solving the machine-component grouping problem required for cellular manufacturing systems.
- Tactical asset allocation and international equity strategies.
- Timetabling problems, such as designing a non-conflicting class timetable for a large university.
- Training artificial neural networks when pre-classified training examples are not readily obtainable (neuroevolution).
- Traveling Salesman Problem.
Related techniques
Genetic programming (GP) is a related technique popularized by John Koza, in which computer programs, rather than function parameters, are optimized. Genetic programming often uses tree-based internal data structures to represent the computer programs for adaptation instead of the list, or array, structures typical of genetic algorithms. GP algorithms typically require running time that is orders of magnitude greater than that for genetic algorithms, but they may be suitable for problems that are intractable with genetic algorithms.
Interactive genetic algorithms are genetic algorithms that use human evaluation. They are usually applied to domains where it is hard to design a computational fitness function, for example, evolving images, music, artistic designs and forms to fit users' aesthetic preference.
Simulated Annealing (SA) is a related global optimization technique which traverses the search space by testing random mutations on an individual solution. A mutation that increases fitness is always accepted. A mutation which lowers fitness is accepted probabilistically based on the difference in fitness and a decreasing temperature parameter. In SA parlance, one speaks of seeking the lowest energy instead of the maximum fitness. SA can also be used within a standard GA algorithm, simply by starting with a relatively high rate of mutation, which decreases over time along a given schedule.
References
- Goldberg, David E (1989), Genetic Algorithms in Search, Optimization and Machine Learning, Kluwer Academic Publishers, Boston, MA.
- Goldberg, David E (2002), The Design of Innovation: Lessons from and for Competent Genetic Algorithms, Addison-Wesley, Reading, MA.
- Harvey, Inman (1992), Species Adaptation Genetic Algorithms: A basis for a continuing SAGA, in 'Toward a Practice of Autonomous Systems: Proceedings of the First European Conference on Artificial Life', F.J. Varela and P. Bourgine (eds.), MIT Press/Bradford Books, Cambridge, MA, pp. 346-354.
- Koza, John (1992), Genetic Programming: On the Programming of Computers by Means of Natural Selection
- Michalewicz, Zbigniew (1999), Genetic Algorithms + Data Structures = Evolution Programs, Springer-Verlag.
- Mitchell, Melanie, (1996), An Introduction to Genetic Algorithms, MIT Press, Cambridge, MA.
- Schmitt, Lothar M (2001), Theory of Genetic Algorithms, Theoretical Computer Science (259), pp. 1-61
- Schmitt, Lothar M (2004), Theory of Genetic Algorithms II: models for genetic operators over the string-tensor representation of populations and convergence to global optima for arbitrary fitness function under scaling, Theoretical Computer Science (310), pp. 181-231
- Vose, Michael D (1999), The Simple Genetic Algorithm: Foundations and Theory, MIT Press, Cambridge, MA.
External links
- [1] (http://cs.felk.cvut.cz/~xobitko/ga/) - An online introduction to genetic algorithms with Java applets
- IlliGAL (http://www-illigal.ge.uiuc.edu/) - Illinois Genetic Algorithms Laboratory - Download technical reports and code
- Golem Project (http://demo.cs.brandeis.edu/golem/) - Automatic Design and Manufacture of Robotic Lifeforms
- Introduction to Genetic Algorithms Using RPL2 (http://www.epcc.ed.ac.uk/computing/training/document_archive/GAs-course/main.html)
- Talk.Origins FAQ on the uses of genetic algorithms, by Adam Marczyk (http://www.talkorigins.org/faqs/genalg/genalg.html)
- Genetic algorithm in search and optimization, by Richard Baker (http://www.fenews.com/fen5/ga.html)
- Genetic Algorithm and Markov chain Monte Carlo:Differential Evolution Markov chain makes Bayesian Computing easy (http://216.239.59.104/search?q=cache:83jMMRd9nRQJ:www.biometris.nl/Markov%2520Chain.pdf+%22Differential+Evolution+Markov+chain+makes+Bayesian+Computing+easy&hl=en). HTML Google cache of original .pdf (appears no longer to be available)
- Differential Evolution using Genetic Algorithm (http://www.icsi.berkeley.edu/~storn/code.html#hist)
- Introduction to Genetic Algorithms and Neural Networks (http://www.ai-junkie.com/ai-junkie.html) including an example windows program
- Genetic Algorithm Solves the Toads and Frogs Puzzle (http://www.cut-the-knot.org/SimpleGames/evolutions.shtml) (requires Java)
- Not-So-Mad Science: Genetic Algorithms and Web Page Design for Marketers (http://www.marketingprofs.com/5/syrett6.asp) by Matthew Syrett
- MIT OpenCourseWare | Health Sciences and Technology | HST.508 Genomics and Computational Biology, Fall 2002 | Lecture Notes (http://ocw.mit.edu/OcwWeb/Health-Sciences-and-Technology/HST-508Genomics-and-Computational-BiologyFall2002/LectureNotes/index.htm)de:Genetischer Algorithmus
es:Algoritmos genéticos fr:Algorithme génétique it:Algoritmo genetico nl:Genetisch algoritme ja:遺伝的アルゴリズム pl:Algorytm genetyczny ru:Генетический алгоритм th:ขั้นตอนวิธีเชิงพันธุกรรม