Note

The documentation is under construction.

Introduction#

Operon is a modern C++ framework for symbolic regression that uses genetic programming to explore a hypothesis space of possible mathematical expressions in order to find the best-fitting model for a given regression target. Its main purpose is to help develop accurate and interpretable white-box models in areas such as system identification.

_images/evo_rtd.gif

Motivation#

Operon was motivated by the need to have a flexible and performant system that works out of the box. Thus, it was developed with the following goals in mind:

Modern concurrency model

Traditional threading approaches are not optimal for today’s many-core systems. This means designing the evolutionary main loop in such a way as to avoid synchronisation overhead and take advantage of C++17’s execution policies.

Performance

By using an efficient linear tree representation where each Node is trivial and vectorized evaluation with the help of the Eigen library. The encoding consumes 40 bytes per tree node, allowing practitioners to work with very large populations.

Ease-of-use

Operon (the core library) comes with a command-line client that just works: you pass it a dataset and it will start optimizing. Its behavior can be configured by command line options, making it easy to integrate with any scripting environment or high-level language such as Python. A Python script is provided for performing experiments automatically aggregating the results.

For more advanced use cases, we provide a C++ and a Python API, briefly illustrated with some examples.

For an overview of Operon please have a look at the Features page.

The software was also presented at GECCO’2020 EvoSoft workshop: https://dl.acm.org/doi/10.1145/3377929.3398099. If you want to reference it in your publication, please use:

Reference#

@inproceedings{Burlacu:2020:GECCOcomp,
author = {Bogdan Burlacu and Gabriel Kronberger and Michael Kommenda},
title = {Operon C++: An Efficient Genetic Programming Framework for Symbolic Regression},
year = {2020},
  editor = {Richard Allmendinger and others},
  isbn13 = {9781450371278},
publisher = {Association for Computing Machinery},
  publisher_address = {New York, NY, USA},
url = {https://doi.org/10.1145/3377929.3398099},
doi = {doi:10.1145/3377929.3398099},
booktitle = {Proceedings of the 2020 Genetic and Evolutionary Computation Conference Companion},
pages = {1562–1570},
  size = {9 pages},
  keywords = {genetic algorithms, genetic programming, C++, symbolic regression},
  address = {internet},
series = {GECCO '20},
  month = {July 8-12},
  organisation = {SIGEVO},
  abstract = {},
  notes = {Also known as \cite{10.1145/3377929.3398099}
           GECCO-2020
           A Recombination of the 29th International Conference on Genetic Algorithms (ICGA) and the 25th Annual Genetic Programming Conference (GP)},
}