Evaluators¶
Programs can be manually written or automatically generated to fulfill a certain purpose, or to follow specific criteria. This applies to both, static structure and runtime behavior. To be able to judge programs based on generic or specific criteria, Evaluators can be used.
Evaluators take a VmSession of an already executed program as input and inspect its runtime statistics as well as the associated program’s structure. Their application can vary from generically judging how efficient a program is executed (e.g., how much of the present code is actually executed), how much of the code (static or executed) are OpCode::NoOp operators, to more sophisticated checks like how well does a program sort a given array of numbers or process any other kind of input.
Currently, three generic classes of Evaluators are contained with the BEAST library:
Aggregation Evaluator: Aggregates the (weighted and optionally inverted) score values of one or more evaluatorsOperator Usage Evaluator: Determines how many times a specific operator was usedRuntime Statistics Evaluator: Evaluates the static structure and dynamic behavior of a program
More specific evaluators will follow on a per-need basis. In any case, all evaluators need to derive from the beast::Evaluator class and adhere to its interface.
-
class
Evaluator¶ Base class for Program and Session evaluation.
Evaluators are used to score the static structure and dynamic runtime statistics of programs and execution sessions.
- Author
- Jan Winkler
- Date
- 2023-01-25
Subclassed by beast::AggregationEvaluator, beast::OperatorUsageEvaluator, beast::RuntimeStatisticsEvaluator
Public Functions
-
~Evaluator()¶ Virtual destructor performing no operation to ensure vtable consistency.
-
evaluate(const VmSession &session) const = 0¶ Determines the fitness score of a session object.
The fitness score is determined based on the static program contained therein, and the dynamic runtime statistics collected while the program was executed. Concrete implementations of this base class may put emphasize in different aspects of this, and evaluate programs and sessions in different ways.
The score value returned is a value between 0.0 and 1.0, where 0.0 means “no fit at all” and 1.0 means “perfect fit”. All implementations of this interface must adhere to this rule.
- Return
- A score value from 0.0 (no fit) to 1.0 (perfect fit)
- Parameters
session: The session object to base the score determination on
Aggregation Evaluator¶
Allows to aggregate the score value of one or more arbitrary evaluators. In addition to that, evaluator scores can be weighted relatively, and their logic can optionally be inverted (1.0 becomes 0.0 and vice versa). This evaluator is useful to combine multiple independent criteria, and produce an aggregated value that reflects the exact criteria a program should be scored by.
-
class
AggregationEvaluator: public beast::Evaluator¶ Public Functions
Adds an evaluator to this aggregation evaluator.
Adds an evaluator to the aggregated list of evaluators. It can be weighted (relative; if you want the same weight for all evaluators contained, give them all the same weight, e.g. 1.0) and optionally its logic can be inverted (1.0 becomes 0.0 and vice versa, used to invert the specific evaluator’s effect). Weights may not be negative, and
nullptris not accepted as an evaluator pointer.- Parameters
evaluator: A pointer to the evaluator to addweight: The relative weight this evaluator’s score should have in the aggregated scoreinvert_logic: Whether to invert this evaluator’s score before aggregating it
-
evaluate(const VmSession &session) const¶ Determines the aggregated score of all contained evaluators.
For each evaluator added, a weight and a logic inversion flag is set. When evaluating a session, this evaluator will iterate through all contained evaluators and add up their weighted scores. The respective evaluator weights are considered relative weights, so before the evaluators are iterated, the total sum of weights is determined, and each evaluator’s score will be multiplied by
own weight / total weight. In addition to that, if the respective evaluator has the logic inversion flag set, its score will be augmented with1.0 - scorerather thanscore.If no evaluators were added to this aggregation evaluator when calling this function, an exception is thrown.
- Return
- A score value from 0.0 to 1.0
- Parameters
session: The session object to base the score determination on
Operator Usage Evaluator¶
This evaluator checks how many of the executed operators in a program have been of a specific type and divides that number by the total number of steps executed. This results in a score from 0.0 (no operators of the given type) to 1.0 (only those operators). Depending on the use-case, emphasize may be put on different extremes of this score (invert the logic accordingly using an AggregationEvaluator if required).
-
class
OperatorUsageEvaluator: public beast::Evaluator¶ Public Functions
-
OperatorUsageEvaluator(OpCode opcode)¶ Constructs the evaluator with a specific operator type whose usage to count.
- Parameters
opcode: The operator code to count
-
evaluate(const VmSession &session) const¶ Determines the portion of specific operators during execution.
Evaluation formula:
score = specific operations / total operationsIf no operations were performed at all, 0.0 is returned.- Return
- A score value from 0.0 (specific op not found) to 1.0 (only specific op)
- Parameters
session: The session object to base the score determination on
-
Runtime Statistics Evaluator¶
This evaluator inspects the dynamic runtime behavior of an executed program and its static structure. In order to get to a good measure of efficiency in both, static program structure AND dynamic runtime behavior, three things must be met:
- As few operators executed as possible should be noop operators. This means that we don’t waste time on empty cycles but execute efficiently. The variable holding a measure of this is steps_executed_noop_fraction.
- As many operators actually present in the program code as possible should be noop operators. This means that the program’s function can be executed with a slender static structure rather than bloated code. The variable holding a measure of this is total_steps_noop_fraction.
- As few individual operators as possible should actually be executed. This means that a program makes good use of iterations and jumps, and doesn’t rely on overfitting to any particular task. The variable holding a measure of this is program_executed_fraction.
These aspects need to be combined into a single score value. Given the considerations from above, the formula for this score value hence becomes:
prg_exec_weight = 1.0 - dyn_noop_weight - stat_noop_weight
with dyn_noop_weight + stat_noop_weight <= 1.0
score =
dyn_noop_weight * (1.0 - steps_executed_noop_fraction) +
stat_noop_weight * total_steps_noop_fraction +
prg_exec_weight * (1.0 - program_executed_fraction)
The weight values dyn_noop_weight and stat_noop_weight are initialization parameters to the evaluator and must be chosen carefully so that programs can correctly converge to a good balance between runtime and static structure efficiency.
-
class
RuntimeStatisticsEvaluator: public beast::Evaluator¶ Public Functions
-
RuntimeStatisticsEvaluator(double dyn_noop_weight, double stat_noop_weight)¶ Constructs the evaluator.
The two weights
dyn_noop_weightandstat_noop_weightdecide on the emphasize this evaluator puts on either the dynamic runtime behavior of a program, or its static structure. Their sum may not exceed 1.0, and any delta between their sum and 1.0 will be used to put emphasize on how much of the program was actually executed.- Parameters
dyn_noop_weight: How much emphasize to put on dynamic runtime behavior (0.0 - 1.0)stat_noop_weight: How much emphasize to put on static program structure (0.0 - 1.0)
-
double
evaluate(const VmSession &session) const¶ Determines the fitness score of a session object.
The fitness score is determined based on the static program contained therein, and the dynamic runtime statistics collected while the program was executed. Concrete implementations of this base class may put emphasize in different aspects of this, and evaluate programs and sessions in different ways.
The score value returned is a value between 0.0 and 1.0, where 0.0 means “no fit at all” and 1.0 means “perfect fit”. All implementations of this interface must adhere to this rule.
- Return
- A score value from 0.0 (no fit) to 1.0 (perfect fit)
- Parameters
session: The session object to base the score determination on
-