Evaluators

Programs can be manually written or automatically generated to fulfill a certain purpose, or to follow specific criteria. This applies to both, static structure and runtime behavior. To be able to judge programs based on generic or specific criteria, Evaluators can be used.

Evaluators take a VmSession of an already executed program as input and inspect its runtime statistics as well as the associated program’s structure. Their application can vary from generically judging how efficient a program is executed (e.g., how much of the present code is actually executed), how much of the code (static or executed) are OpCode::NoOp operators, to more sophisticated checks like how well does a program sort a given array of numbers or process any other kind of input.

Currently, three generic classes of Evaluators are contained with the BEAST library:

  • Aggregation Evaluator: Aggregates the (weighted and optionally inverted) score values of one or more evaluators
  • Operator Usage Evaluator: Determines how many times a specific operator was used
  • Runtime Statistics Evaluator: Evaluates the static structure and dynamic behavior of a program

More specific evaluators will follow on a per-need basis. In any case, all evaluators need to derive from the beast::Evaluator class and adhere to its interface.

class Evaluator

Base class for Program and Session evaluation.

Evaluators are used to score the static structure and dynamic runtime statistics of programs and execution sessions.

Author
Jan Winkler
Date
2023-01-25

Subclassed by beast::AggregationEvaluator, beast::OperatorUsageEvaluator, beast::RuntimeStatisticsEvaluator

Public Functions

~Evaluator()

Virtual destructor performing no operation to ensure vtable consistency.

evaluate(const VmSession &session) = 0

Determines the fitness score of a session object.

The fitness score is determined based on the static program contained therein, and the dynamic runtime statistics collected while the program was executed. Concrete implementations of this base class may put emphasize in different aspects of this, and evaluate programs and sessions in different ways.

The score value returned is a value between 0.0 and 1.0, where 0.0 means “no fit at all” and 1.0 means “perfect fit”. All implementations of this interface must adhere to this rule.

Return
A score value from 0.0 (no fit) to 1.0 (perfect fit)
Parameters
  • session: The session object to base the score determination on

Aggregation Evaluator

Allows to aggregate the score value of one or more arbitrary evaluators. In addition to that, evaluator scores can be weighted relatively, and their logic can optionally be inverted (1.0 becomes 0.0 and vice versa). This evaluator is useful to combine multiple independent criteria, and produce an aggregated value that reflects the exact criteria a program should be scored by.

class AggregationEvaluator : public beast::Evaluator

Public Functions

addEvaluator(const std::shared_ptr<Evaluator> &evaluator, double weight, bool invert_logic)

Adds an evaluator to this aggregation evaluator.

Adds an evaluator to the aggregated list of evaluators. It can be weighted (relative; if you want the same weight for all evaluators contained, give them all the same weight, e.g. 1.0) and optionally its logic can be inverted (1.0 becomes 0.0 and vice versa, used to invert the specific evaluator’s effect). Weights may not be negative, and nullptr is not accepted as an evaluator pointer.

Parameters
  • evaluator: A pointer to the evaluator to add
  • weight: The relative weight this evaluator’s score should have in the aggregated score
  • invert_logic: Whether to invert this evaluator’s score before aggregating it

evaluate(const VmSession &session)

Determines the aggregated score of all contained evaluators.

For each evaluator added, a weight and a logic inversion flag is set. When evaluating a session, this evaluator will iterate through all contained evaluators and add up their weighted scores. The respective evaluator weights are considered relative weights, so before the evaluators are iterated, the total sum of weights is determined, and each evaluator’s score will be multiplied by own weight / total weight. In addition to that, if the respective evaluator has the logic inversion flag set, its score will be augmented with 1.0 - score rather than score.

If no evaluators were added to this aggregation evaluator when calling this function, an exception is thrown.

Return
A score value from 0.0 to 1.0
Parameters
  • session: The session object to base the score determination on

Operator Usage Evaluator

This evaluator checks how many of the executed operators in a program have been of a specific type and divides that number by the total number of steps executed. This results in a score from 0.0 (no operators of the given type) to 1.0 (only those operators). Depending on the use-case, emphasize may be put on different extremes of this score (invert the logic accordingly using an AggregationEvaluator if required).

class OperatorUsageEvaluator : public beast::Evaluator

Public Functions

OperatorUsageEvaluator(OpCode opcode)

Constructs the evaluator with a specific operator type whose usage to count.

Parameters
  • opcode: The operator code to count

evaluate(const VmSession &session)

Determines the portion of specific operators during execution.

Evaluation formula: score = specific operations / total operations If no operations were performed at all, 0.0 is returned.

Return
A score value from 0.0 (specific op not found) to 1.0 (only specific op)
Parameters
  • session: The session object to base the score determination on

Runtime Statistics Evaluator

This evaluator inspects the dynamic runtime behavior of an executed program and its static structure. In order to get to a good measure of efficiency in both, static program structure AND dynamic runtime behavior, three things must be met:

  • As few operators executed as possible should be noop operators. This means that we don’t waste time on empty cycles but execute efficiently. The variable holding a measure of this is steps_executed_noop_fraction.
  • As many operators actually present in the program code as possible should be noop operators. This means that the program’s function can be executed with a slender static structure rather than bloated code. The variable holding a measure of this is total_steps_noop_fraction.
  • As few individual operators as possible should actually be executed. This means that a program makes good use of iterations and jumps, and doesn’t rely on overfitting to any particular task. The variable holding a measure of this is program_executed_fraction.

These aspects need to be combined into a single score value. Given the considerations from above, the formula for this score value hence becomes:

prg_exec_weight = 1.0 - dyn_noop_weight - stat_noop_weight
                  with dyn_noop_weight + stat_noop_weight <= 1.0

score =
    dyn_noop_weight * (1.0 - steps_executed_noop_fraction) +
    stat_noop_weight * total_steps_noop_fraction +
    prg_exec_weight * (1.0 - program_executed_fraction)

The weight values dyn_noop_weight and stat_noop_weight are initialization parameters to the evaluator and must be chosen carefully so that programs can correctly converge to a good balance between runtime and static structure efficiency.

class RuntimeStatisticsEvaluator : public beast::Evaluator

Public Functions

RuntimeStatisticsEvaluator(double dyn_noop_weight, double stat_noop_weight)

Constructs the evaluator.

The two weights dyn_noop_weight and stat_noop_weight decide on the emphasize this evaluator puts on either the dynamic runtime behavior of a program, or its static structure. Their sum may not exceed 1.0, and any delta between their sum and 1.0 will be used to put emphasize on how much of the program was actually executed.

Parameters
  • dyn_noop_weight: How much emphasize to put on dynamic runtime behavior (0.0 - 1.0)
  • stat_noop_weight: How much emphasize to put on static program structure (0.0 - 1.0)

double evaluate(const VmSession &session)

Determines the fitness score of a session object.

The fitness score is determined based on the static program contained therein, and the dynamic runtime statistics collected while the program was executed. Concrete implementations of this base class may put emphasize in different aspects of this, and evaluate programs and sessions in different ways.

The score value returned is a value between 0.0 and 1.0, where 0.0 means “no fit at all” and 1.0 means “perfect fit”. All implementations of this interface must adhere to this rule.

Return
A score value from 0.0 (no fit) to 1.0 (perfect fit)
Parameters
  • session: The session object to base the score determination on