Primer

Genetic Programming

Genetic programming (GP) takes the evolutionary recipe one step further: instead of evolving a fixed list of numbers, it evolves programs. Candidate solutions are pieces of executable code — usually trees — that grow, recombine, and mutate until they compute what you need. It's evolution that writes its own logic.

From parameters to programs

A genetic algorithm evolves a genome of fixed shape — a vector of values. That's perfect when you already know the form of the answer and only need to find the right settings. But sometimes you don't know the form. You want the search to discover the structure of the solution itself: the equation, the rule, the control program.

That's the gap genetic programming fills. Pioneered as a distinct field by John Koza in the early 1990s, GP represents each candidate as a program tree whose size and shape can change from one individual to the next. Evolution then operates directly on code.

A genetic algorithm asks, "what are the best values?" Genetic programming asks, "what is the best program?"

How a program becomes a genome

The classic GP individual is a syntax tree. Internal nodes come from a function set (operations like +, , ×, if, and); leaves come from a terminal set (input variables and constants). Read the tree and you get an expression.

For example, the tree below represents (x × x) + (3 − y):

        (+)
       /   \
     (×)    (−)
    /  \    /  \
   x    x  3    y

Because trees can be any size, GP can discover compact answers and sprawling ones alike — which is both its power and the source of its main headache (see "bloat" below).

The vocabulary

Function set & terminal set

The building blocks evolution is allowed to use. Choose them to fit the problem: arithmetic and trig for symbolic regression; sensor readings and movement commands for a robot controller; logical operators for a classifier.

Subtree crossover

The signature GP operator: pick a random subtree in each parent and swap them. Two working programs exchange chunks of logic to produce offspring — sometimes nonsense, sometimes a leap forward.

Subtree mutation

Replace a random subtree with a freshly generated one. This injects brand-new structure and keeps the population from going stale.

Closure & sufficiency

Two design rules: every function must accept any value the others can produce (closure), and the chosen building blocks must be expressive enough to represent a solution at all (sufficiency).

The loop is the same — the pieces differ

GP runs the familiar evolutionary cycle — initialize, evaluate, select, vary, repeat — but with tree-shaped individuals and tree-aware operators:

  1. Initialize a population of random program trees (often with the "ramped half-and-half" method for varied sizes).
  2. Evaluate each program by running it on test cases and scoring the results with a fitness function.
  3. Select parents, biased toward higher fitness.
  4. Vary with subtree crossover and subtree mutation.
  5. Repeat across generations, keeping elites, until a program solves the task or the budget runs out.

Bloat — GP's characteristic problem

Left unchecked, evolved programs tend to grow ever larger without getting better — accumulating dead code that does nothing useful. This is called bloat. Practitioners limit it with size or depth caps, parsimony pressure (penalizing large trees in the fitness score), and specialized operators. Managing bloat is a core part of getting GP to work well.

What genetic programming is good at

  • Symbolic regression — discovering a human-readable formula that fits data, not just a black-box fit. This is GP's flagship application.
  • Classifiers and rules — evolving decision logic that you can read and audit.
  • Control programs — robot behaviors, game agents, and signal-processing pipelines.
  • Automated design & "invention" — Koza's work produced patent-competitive circuits and antennas evolved from scratch.
  • Program synthesis — generating small programs from input/output examples.
A standout feature: GP often produces interpretable results. A symbolic-regression formula or a small rule tree can be inspected and understood — a useful contrast to opaque models.

Beyond trees

Tree GP is the classic form, but the family is broader: linear GP (programs as instruction sequences), Cartesian GP (programs as graphs), grammatical evolution (programs from a grammar), and stack-based systems like PushGP. They share the same goal — evolving executable structure — with different representations.

Go deeper

The Library collects the foundational GP references — Koza's volumes and the free Field Guide to Genetic Programming — along with software like DEAP, gplearn, and PushGP that let you run GP today. New to the whole idea? Start with the genetic algorithms primer first; GP will click faster once the basic loop is familiar.