Rewards¶

Interface¶

class ecole.typing.RewardFunction(*args, **kwargs)[source]¶

Class responsible for extracting rewards.

Reward functions are objects given to the Environment to extract the reward used for learning.

This class presents the interface expected to define a valid reward function. It is not necessary to inherit from this class, as reward functions are defined by structural subtyping. It is exists to support Python type hints.

Note

Rewards, or rather reward offset, are also extracted on reset(). This has no use for learning (since not action has been taken), but is useful when using the cumulative reward sum as a metric.

Listing¶

The list of reward functions relevant to users is given below.

Is Done¶

class ecole.reward.IsDone¶

Single reward on terminal states.

__init__(self: ecole.reward.IsDone) → None¶

before_reset(self: ecole.reward.IsDone, model: ecole.scip.Model) → None¶: Do nothing.

extract(self: ecole.reward.IsDone, model: ecole.scip.Model, done: bool = False) → float¶: Return 1 if the episode is on a terminal state, 0 otherwise.

LP Iterations¶

class ecole.reward.LpIterations¶

LP iterations difference.

The reward is defined as the number of iterations spent in solving the Linear Programs associated with the problem since the previous state.

__init__(self: ecole.reward.LpIterations) → None¶

before_reset(self: ecole.reward.LpIterations, model: ecole.scip.Model) → None¶: Reset the internal LP iterations count.

extract(self: ecole.reward.LpIterations, model: ecole.scip.Model, done: bool = False) → float¶

Update the internal LP iteration count and return the difference.

The difference in LP iterations is computed in between calls.

NNodes¶

class ecole.reward.NNodes¶

Number of nodes difference.

The reward is defined as the total number of nodes processed since the previous state.

__init__(self: ecole.reward.NNodes) → None¶

before_reset(self: ecole.reward.NNodes, model: ecole.scip.Model) → None¶: Reset the internal node count.

extract(self: ecole.reward.NNodes, model: ecole.scip.Model, done: bool = False) → float¶

Update the internal node count and return the difference.

The difference in number of nodes is computed in between calls.

Solving Time¶

class ecole.reward.SolvingTime¶

Solving time difference.

The reward is defined as the number of seconds spent solving the instance since the previous state. The solving time is specific to the operating system: it includes time spent in reset() and time spent waiting on the agent.

__init__(self: ecole.reward.SolvingTime, wall: bool = False) → None¶

Create a SolvingTime reward function.

Parameters: wall – If true, the wall time will be used. If False (default), the process time will be used.

before_reset(self: ecole.reward.SolvingTime, model: ecole.scip.Model) → None¶: Reset the internal clock counter.

extract(self: ecole.reward.SolvingTime, model: ecole.scip.Model, done: bool = False) → float¶

Update the internal clock counter and return the difference.

The difference in solving time is computed in between calls.

Utilities¶

The following reward functions are used internally by Ecole.

Constant¶

class ecole.reward.Constant¶

Constant Reward.

Always returns the value passed in constructor.

__init__(self: ecole.reward.Constant, constant: float = 0.0) → None¶

before_reset(self: ecole.reward.Constant, model: ecole.scip.Model) → None¶: Do nothing.

extract(self: ecole.reward.Constant, model: ecole.scip.Model, done: bool = False) → float¶: Return the constant value.

Arithmetic¶

class ecole.reward.Arithmetic¶

Proxy class for doing arithmetic on reward functions.

An object of this class is returned by reward function operators to forward calls to the reward function parameters of the operator.

__init__(self: ecole.reward.Arithmetic, arg0: object, arg1: list, arg2: str) → None¶

before_reset(self: ecole.reward.Arithmetic, model: object) → None¶

Reset the reward functions of the operator.

Calls before_reset on all reward functions parameters that were used to create this object.

extract(self: ecole.reward.Arithmetic, model: object, done: bool = False) → float¶

Obtain the reward of result of the operator.

Calls extract on all reward function parameters that were used to create this object and compute the operation on the results.