Composing with Evolutionary Algorithms
Inspired by Darwin’s theories regarding natural selection and evolution, Evolutionary Algorithms (EA) use natural selection to find an optimal solution to a given problem. There are a variety of different strategies and methods that may be used for each step of the way, but in general the EA begins with a population and evaluates the individuals with a fitness function. Individuals are then selected to reproduce. The new individuals are created using material from their parents with the addition of mutations. Old members from the population are then replaced with the new individuals, and the process repeats until an optimal solution to the problem is found (Husbands et al. 2007). In this manner, more adapted individuals are more likely to survive and reproduce, and the algorithm terminates when the “fittest” solution is found.
There are a variety of strategies for selecting and replacing individuals based on their fitness. Automatically replacing the worst individuals with new ones or only letting the best individuals reproduce is not always desirable since these may create a bias towards solutions which may appear to be good, but which are not actually optimal. This can cause the program to terminate prematurely with a suboptimal solution (Husbands et al. 2007). Similarly, a variety of methods exist for combining individuals and also adding mutations methods, but some are more useful than others. However, the creation of new individuals is more complex, since implementing rules is necessary to insure that new individuals being created can be considered legal solutions or valid members of the population (Husbands et al. 2007).
The EA strategy lends itself to musical composition in which the population in question is made up of musical samples, usually short phrases. The best phrases are selected to “reproduce.” Parent phrases are combined together, and mutations are added to created new musical ideas that are then used to replace less desirable phrases from the original population. This can be particularly tricky since phrases have to be combined in ways that “make sense” musically. Depending on the style of music, the corresponding set of rules for theory can be used to define what music “is” in the sense of that particular style, and can then be used to ensure that the phrase created is legal. However, music theory generally dictates rules as well for what “good” music should be. These should be considered with some care. While these rules are good suggestions, they are hardly absolute, and in fact adhering too closely to them often creates worse music rather than better (Husbands et al. 2007). This point brings up one of the most difficult and also most interesting question in musical composition, computerized or otherwise: what makes good music? Even when computers are not involved, answering this question is controversial at best.
For an EA to be successful, some way of testing individuals and rating their fitness must be incorporated. For an unsupervised approach, the computer must evaluate the new individuals on its own, and assign them a fitness ranking. This approach is problematic since virtually all attempts to imbue a computer with aesthetic taste have resulted in varying degrees of failure. Attempts have been made using a heuristic approach for picking out “desirable features” and ranking the individual based on how many of these features it has. Unfortunately, this technique has been largely unsuccessful because “desirable features” are difficult to define and also because more of something is not always better. Another strategy is the rule-based approach, in which the algorithm tests the individual against traditional textbook rules of music theory for what makes “good” music. While somewhat more successful, this method can only theoretically identify music that should be good or should be bad. It cannot define gradations of better music within the broad category of “good” music that it has identified. Moreover, this method is highly prone to both false negatives and false positives. Many times the best or most interesting music is good because of how it breaks traditional rules. Likewise, following the rules should yield at least descent music, but even this is not always the case (as no doubt many a music theory teacher can attest). Another approach is the learned fitness using set of training samples and is generally implemented using a neural network. While the most successful of the untrained techniques, learned fitness by its construction can only yield limited results, all based on the training set (Biles, 2007).
Because of the obvious difficulties with producing quality results from an unsupervised program, a supervised model gains definite appeal. Letting the program interact with a human mentor to determine the fitness of a new individual overcomes the problems of false positives and false negatives generated by both heuristic and rule based approaches. Likewise, allowing a human judge removes the constraints generated by the training set when using learned fitness. However, using a supervised approach generates a new set of difficulties. By having a human judge, the computer will be biased by the individual or individuals giving feedback. This perhaps is a necessary evil since art by nature has no universal indicators for greatness. Aside from the inherent problems of using a human judge, namely the human’s various imperfections and inconsistencies in judging, the issue of efficiency is a more substantial difficulty. Humans are far slower at individually ranking each individual in the population than a computer attempting the same process. This difficulty created by humanity’s limited ability to process and rank sizeable quantities of musical phrases, which make up the population, is known as the fitness bottleneck. While not insurmountable, the fitness bottleneck greatly slows the process of composition. However, the greatly improved results that can be achieved using interactive fitness often outweigh the drawback of the added time necessary to complete the task (Biles, 2007).
Since the difficulties of each method are so pronounced and the trade-off between length of time and quality of product cannot be avoided, often the specific type of solution being sought will dictate which method is preferred. Bruce Jacob focuses his efforts on producing a complete, finished product. Since a complete, finished piece tends to actually be more desirable if there is some limitation to variation, as this creates continuity of ideas throughout the piece, using a more learned approach as a starting point makes sense. Jacob’s complete process combines rule-based approaches and interactive approaches with the initial results from using a learned technique, which greatly reduces the difficulties from the bottleneck (Jacob, 1995). John Biles, however, who is interested in composing jazz improvisation, takes a different approach. Since jazz improvisation focuses on short, creative melodies, emphasis on variety is important, and so Biles uses an interactive program, GenJam, to compose. This is feasible because the melodies being tested are fairly short. Even so, the improvements gained by using a purely interactive technique may not be enough to offset the added time necessary for the human judge to rank each solution generated by the program. For this reason, Biles constructed an alternate form of GenJam that breaks through the constraints of genetic algorithms by removing the fitness test altogether when finding a solution. Using their respective strategies, Jacob and Biles have each found viable strategies for accomplishing their goals (Biles, 2007).
Jacob’s solution breaks the problem down into three different stages, which are then dealt with separately using composer, ear, and arranger modules, respectively. The composer module begins with a set of phrases inputted by the user for the aesthetic quality, so the process initially begins using a learned approach. The ear module tests the new variations produced by the composer module for valid chord progressions. The ear module is initially trained interactively in order to learn rules for which chords appeal to the human agent. Once the ear is sufficiently “evolved,” it is able to work as a filter, testing and rejecting potential solutions that do not fit the testing criteria. This greatly reduces the bottleneck since the ear module is able to weed out obviously poor candidates so that the human agent does not have to deal with them. The results are sent back to the composer module, which is then able to evolve and generate better results by eliminating poor results, as identified by the ear module. Once a set of acceptable phrases has been found by the ear module, it is then sent to the arranging module, which then arranges them together to form a complete work of music. The results from the arranger module are then directly evaluated by a human agent. Results that are approved are then returned to the arranger for further exploration, to be rearranged to form new possibilities (Jacob, 1995).
Since the final goal Jacob is seeking is a finished piece, the composer is initially given a learning set of closely related phrases. Since the starting points are all interrelated, the results will also be interrelated, as desired. Further, Jacob uses mutation methods that employ traditional techniques for developing music. The end result is a program that works closely with a composer to speed the compositional process. The program is able to produce and examine variations on a given theme much more quickly than a human composer could without the aid of the computer. The computer begins with user input, tests potential variations of the input using the ear module’s understanding of the human composer’s preferences, and then the arranger combines approved phrases into a piece for the human to judge. In this way, the program attempts to simulate a portion of the creative process of composing and allows the composer to progress towards a finished product more quickly than could otherwise be accomplished. (Jacob, 1996).
Since Biles’ focus is not a complete finished musical piece, his compositional program GenJam takes a significantly different approach for finding solutions. The main focus of GenJam is to create interesting and appealing solos. It accomplishes this goal by breaking the process down into three stages: learning; breeding; and demo. GenJam begins by producing random phrases and then demos them for the human agent. GenJam learns when the human agent rates each phrase as good or bad. Individuals for breeding are selected with a bias towards the fittest. Half of the original population is then replaced by new individuals. The process then repeats with the program performing a demo of the phrases and the human agent rating them. Since only half the population is replaced at a time, the individuals will be rated more than once, so the fitness rating is incremented by how many times the human agent rated the musical phrase as good or bad, respectively. In this way, Biles implements a neural network which learns to play improvisational jazz from a human mentor (Biles, 1994).
While this method produced many interesting results, it had major drawbacks. Most notably, this method quickly runs into the difficulty of the bottleneck since the human agent is much slower than a computer when evaluating potential melodies. The original strategy was also limited because it failed to produce one of the most important features of improvisational jazz: trading fours, or the passing back and forth and development of a theme between soloists during concerts. This led Biles to alter GenJam so that the program could analyze another soloist’s performance and breed it with existing phrases from the population. With the addition of some mutations, GenJam could then respond to the original soloist with an appropriately developed musical phrase. Unfortunately, since GenJam must act in real time when trading fours, there is no longer opportunity for human feedback to tell GenJam whether or not the solos it is improvising are good or bad. Initially Biles viewed this as a drawback, since in the early development GenJam always used a human judge as a fitness function. Having been unable to create a satisfactory fitness function to work in place of a human judge, Biles was understandably worried about the results of playing a real time performance without giving GenJam any feedback. However, surprisingly enough, using the outlined strategy of listening to another soloist, breeding that phrase with an existing individual, and mutating it produced an excellent response (Biles, 2007).
Biles identified two main factors for this surprising result, the first being the importance of good methods for mutating and breeding. The second factor was that the solo being passed from the other performer to GenJam would already have passed the test as being good, so in a sense GenJam is actually learning good music from the other performer. These two factors combined allow GenJam to produce at worst descent music but usually excellent music in real time. Even early on in the process before GenJam started training fours, Biles was aware of the importance of good breeding and mutating techniques. Even with constant human feed back, poor techniques would impossibly slow the process, so Biles focused on creating an algorithm which would produce new individuals which “at least sound no worse than their predecessors” (Biles, 2007). As a result, Biles focused on using what he termed “musically meaningful mutations” which used established strategies to form new mutations (Biles, 2007). Likewise he used “intelligent” forms of crossover during the breeding process to insure offspring that were no worse than their parents. Once he had established a set of strategies which guaranteed that the offspring would be no worse than their parents, allowing GenJam to interact with another soloist became essentially the same as adding the new phrases to GenJam’s population and then automatically selecting them to breed with preexisting individuals (Biles, 2007).
This result led Biles to speculate that it might be possible to avoid the bottleneck by not testing for fitness at all, and making GenJam autonomous. He proposed to accomplish this by starting GenJam out with a population of pre-approved musical phrases. Biles in fact, created an entire database for GenJam of musical phrases he obtained from a book entitled 1001 Jazz Licks (Schneidman, 2000). In this way Biles was able to start GenJam out with repertoire of common phrases from jazz. Since GenJam was already designed to breed and mutate new individuals that were at least no worse than the originals, the new version of GenJam was able to operate autonomously, improvising and producing new phrases with musical interest. While this final version of GenJam no longer qualifies as a true EA, it is clearly inspired by concepts in evolutionary programming (Biles, 2007).
In general, evolutionary algorithms seem to be a viable solution for simulating to process of musical composition. By focusing on a particular compositional goal, different strategies can be combined to yield better results than any one approach could on its own, as seen with both Jacob’s and Biles’ work. In some cases, the inherent limitations can be made advantages, as when Jacob used the limited variation created from the learned approach to create musical cohesion within in finished piece. More importantly perhaps, is for the programmer to be flexible and open to new ideas and recognize that approaches used by others are good starting places, but need not be ending places. Biles began by combining several established approaches, but was flexible enough to let GenJam expand beyond the limitations of text book style evolutionary algorithms. Perhaps the art of programming should itself be considered an evolutionary endeavor, selecting strategies that work, recombining them, and mutating them into new ideas. The many different strategies for composing music via artificial intelligence each have shortcomings of their own, but they also each have benefits. Many of the more successful programs such as Jacob’s and Biles’ creations combine strategies in unique ways. Perhaps finding the optimal composing program is simply a matter of finding the best combination of techniques for the particular compositional style that interests the programmer.