Rewards¶
Protocol¶
The protocol expected to define a valid reward function is given below.

class
ecole.typing.
RewardFunction
(*args, **kwds)[source]¶ Class repsonsible for extracting rewards.
Reward functions are objects given to the
EnvironmentComposer
to extract the reward used for learning.
obtain_reward
(model: ecole.scip.Model, done: bool) → float[source]¶ Extract reward for arriving on given state.
Extract the reward for arriving on the state given by
model
. A reward is typically computed by transitioning from a stateS1
to a stateS2
. For perfomance reasons, intermediate states are not kept. The reward function is reponsible for keeping track of relevant information from previous states. This can safely be done in this method as it will only be called once per state i.e., this method is not a getter and can have side effects.Note that the method is also called on
reset()
, afterreset()
, to obtain thereward_offset
. Parameters
model – The SCIP model defining the current state of the solver.
done – A flag indicating wether the state is terminal (as decided by the environment).
 Returns
The return is passed to the user by the environment.

reset
(model: ecole.scip.Model) → None[source]¶ Reset internal data at the start of episodes.
The method is called on new episodes
reset()
on the initial state. It can is usually used to reset the observation function internal data. Parameters
model – The SCIP model defining the current state of the solver.

Listing¶
The list of reward functions relevant to users is given below.
Is Done¶

class
ecole.reward.
IsDone
¶ Single reward on terminal states.

obtain_reward
(self: ecole.reward.IsDone, model: ecole.scip.Model, done: bool = False) → float¶ Return 1 if the episode is on a terminal state, 0 otherwise.

reset
(self: ecole.reward.IsDone, model: ecole.scip.Model) → None¶ Do nothing.

LP Iterations¶

class
ecole.reward.
LpIterations
¶ LP Iteration difference.
The reward is defined as the number of iterations spent in solving the Linear Programs associated with the problem since the previous state.

obtain_reward
(self: ecole.reward.LpIterations, model: ecole.scip.Model, done: bool = False) → float¶ Update the internal LP iteration count and return the difference.
The difference in LP iterations is computed in between calls.

reset
(self: ecole.reward.LpIterations, model: ecole.scip.Model) → None¶ Reset the internal LP iterations count.

NNodes¶

class
ecole.reward.
NNodes
¶ Number of nodes difference.
The reward is defined as the total number of nodes processed since the previous state.

obtain_reward
(self: ecole.reward.NNodes, model: ecole.scip.Model, done: bool = False) → float¶ Update the internal node count and return the difference.
The difference in number of nodes is computed in between calls.

reset
(self: ecole.reward.NNodes, model: ecole.scip.Model) → None¶ Reset the internal node count.

Utilities¶
The following reward functions are used internally by Ecole.
Constant¶

class
ecole.reward.
Constant
¶ Constant Reward.
Always returns the value passed in constructor.

obtain_reward
(self: ecole.reward.Constant, model: ecole.scip.Model, done: bool = False) → float¶ Return the constant value.

reset
(self: ecole.reward.Constant, model: ecole.scip.Model) → None¶ Do nothing.

Arithmetic¶

class
ecole.reward.
Arithmetic
¶ Proxy class for doing arithmetic on reward functions.
An object of this class is returned by reward function operators to forward calls to the reward function parameters of the operator.

obtain_reward
(self: ecole.reward.Arithmetic, model: object, done: bool = False) → float¶ Obtain the reward of result of the operator.
Calls
obtain_reward
on all reward function parameters that were used to create this object and compute the operation on the results.

reset
(self: ecole.reward.Arithmetic, model: object) → None¶ Reset the reward functions of the operator.
Calls
reset
on all reward functions parameters that were used to create this object.
