Regularization in Symbolic Regression by an Additional Fitness Objective
Symbolic regression is a method for discovering functions that minimize error on a given dataset. It is of interest to prevent overfitting in symbolic regression. In this work, regularization of symbolic regression is attempted by incorporating an additional fitness objective. This new fitness objec...
Main Author: | |
---|---|
Format: | Others |
Language: | en |
Published: |
ScholarWorks @ UVM
2018
|
Subjects: | |
Online Access: | https://scholarworks.uvm.edu/graddis/965 https://scholarworks.uvm.edu/cgi/viewcontent.cgi?article=1965&context=graddis |
Summary: | Symbolic regression is a method for discovering functions that minimize error on a given dataset. It is of interest to prevent overfitting in symbolic regression. In this work, regularization of symbolic regression is attempted by incorporating an additional fitness objective. This new fitness objective is called Worst Neighbors (WN) score, which measures differences in approximate derivatives in the form of angles. To compute the Worst Neighbors score, place partition points between each pair of adjacent data points. For each pair of data points, compute the maximum angle between the line formed by the pair of data points and the lines formed by adjacent partition points. The maximum of all these maximum angles is the Worst Neighbors score. This method differs from other attempts to regularize symbolic regression because it considers the behavior of the evolved function between data points. A high WN score indicates that the function has overfit the data. A low score could indicate either an underfit solution or a well fit solution. The error objective is used to make this distinction. Worst Neighbors can reduce overfitting in symbolic regression because it encourages functions that have a low error and a low Worst Neighbors score. The error objective helps stop the solutions from becoming underfit and the Worst Neighbors score helps stop the solutions from becoming overfit. To use Worst Neighbors for target functions of higher dimensions, select nearby points as neighbors and compute the Worst Neighbors score on the evolved function restricted to the plane formed by these neighbors and the output direction. For the one dimensional case, Worst Neighbors has promise in reducing error on unseen data when compared with Age-Fitness Pareto Optimization (AFPO). WN achieves a small decrease in testing error on several target functions compared to AFPO. |
---|