Rewards
Interface
- class ecole.typing.RewardFunction(*args, **kwargs)[source]
Class responsible for extracting rewards.
Reward functions are objects given to the
Environment
to extract the reward used for learning.This class presents the interface expected to define a valid reward function. It is not necessary to inherit from this class, as reward functions are defined by structural subtyping. It is exists to support Python type hints.
Note
Rewards, or rather reward offset, are also extracted on
reset()
. This has no use for learning (since not action has been taken), but is useful when using the cumulative reward sum as a metric.See also
DataFunction
Reward function are a specific type of generic data function where the data extracted are reward of type
float
.
- __init__(*args, **kwargs)
Initialize self. See help(type(self)) for accurate signature.
- before_reset(model: ecole.scip.Model) → None[source]
Reset internal data at the start of episodes.
The method is called on new episodes
reset()
right before the MDP is actually reset, that is right before the environment callsreset_dynamics()
.It is usually used to reset the internal data.
- Parameters
model – The
Model
, model defining the current state of the solver.
- extract(model: ecole.scip.Model, done: bool) → float[source]
Extract the reward on the given state.
Extract the reward after transitionning on the new state given by
model
. The function is reponsible for keeping track of relevant information from previous states. This can safely be done in this method as it will only be called once per state i.e., this method is not a getter and can have side effects.- Parameters
model – The
Model
, model defining the current state of the solver.done – A flag indicating wether the state is terminal (as decided by the environment).
- Returns
The return is passed to the user by the environment.
Listing
The list of reward functions relevant to users is given below.
Is Done
- class ecole.reward.IsDone
Single reward on terminal states.
- __init__(self: ecole.reward.IsDone) → None
- before_reset(self: ecole.reward.IsDone, model: ecole.scip.Model) → None
Do nothing.
- extract(self: ecole.reward.IsDone, model: ecole.scip.Model, done: bool = False) → float
Return 1 if the episode is on a terminal state, 0 otherwise.
LP Iterations
- class ecole.reward.LpIterations
LP iterations difference.
The reward is defined as the number of iterations spent in solving the Linear Programs associated with the problem since the previous state.
- __init__(self: ecole.reward.LpIterations) → None
- before_reset(self: ecole.reward.LpIterations, model: ecole.scip.Model) → None
Reset the internal LP iterations count.
- extract(self: ecole.reward.LpIterations, model: ecole.scip.Model, done: bool = False) → float
Update the internal LP iteration count and return the difference.
The difference in LP iterations is computed in between calls.
NNodes
- class ecole.reward.NNodes
Number of nodes difference.
The reward is defined as the total number of nodes processed since the previous state.
- __init__(self: ecole.reward.NNodes) → None
- before_reset(self: ecole.reward.NNodes, model: ecole.scip.Model) → None
Reset the internal node count.
- extract(self: ecole.reward.NNodes, model: ecole.scip.Model, done: bool = False) → float
Update the internal node count and return the difference.
The difference in number of nodes is computed in between calls.
Solving Time
- class ecole.reward.SolvingTime
Solving time difference.
The reward is defined as the number of seconds spent solving the instance since the previous state. The solving time is specific to the operating system: it includes time spent in
reset()
and time spent waiting on the agent.- __init__(self: ecole.reward.SolvingTime, wall: bool = False) → None
Create a SolvingTime reward function.
- Parameters
wall – If true, the wall time will be used. If False (default), the process time will be used.
- before_reset(self: ecole.reward.SolvingTime, model: ecole.scip.Model) → None
Reset the internal clock counter.
- extract(self: ecole.reward.SolvingTime, model: ecole.scip.Model, done: bool = False) → float
Update the internal clock counter and return the difference.
The difference in solving time is computed in between calls.
Primal and dual Integrals
- class ecole.reward.PrimalIntegral
Primal integral difference.
The reward is defined as the primal integral since the previous state, where the integral is computed with respect to the solving time. The solving time is specific to the operating system: it includes time spent in
reset()
and time spent waiting on the agent.- __init__(self: ecole.reward.PrimalIntegral, wall: bool = False, bound_function: Callable[[ecole.scip.Model], Tuple[float, float]] = None) → None
Create a PrimalIntegral reward function.
- Parameters
wall – If true, the wall time will be used. If False (default), the process time will be used.
bound_function – A function which takes an ecole model and returns a tuple of an initial primal bound and the offset to compute the primal bound with respect to. Values should be ordered as (offset, initial_primal_bound). The default function returns (0, -1e20) if the problem is a maximization and (0, 1e20) otherwise.
- before_reset(self: ecole.reward.PrimalIntegral, model: ecole.scip.Model) → None
Reset the internal clock counter and the event handler.
- extract(self: ecole.reward.PrimalIntegral, model: ecole.scip.Model, done: bool = False) → float
Computes the current primal integral and returns the difference.
The difference is computed based on the dual integral between sequential calls.
- class ecole.reward.DualIntegral
Dual integral difference.
The reward is defined as the dual integral since the previous state, where the integral is computed with respect to the solving time. The solving time is specific to the operating system: it includes time spent in
reset()
and time spent waiting on the agent.- __init__(self: ecole.reward.DualIntegral, wall: bool = False, bound_function: Callable[[ecole.scip.Model], Tuple[float, float]] = None) → None
Create a DualIntegral reward function.
- Parameters
wall – If true, the wall time will be used. If False (default), the process time will be used.
bound_function – A function which takes an ecole model and returns a tuple of an initial dual bound and the offset to compute the dual bound with respect to. Values should be ordered as (offset, initial_dual_bound). The default function returns (0, 1e20) if the problem is a maximization and (0, -1e20) otherwise.
- before_reset(self: ecole.reward.DualIntegral, model: ecole.scip.Model) → None
Reset the internal clock counter and the event handler.
- extract(self: ecole.reward.DualIntegral, model: ecole.scip.Model, done: bool = False) → float
Computes the current dual integral and returns the difference.
The difference is computed based on the dual integral between sequential calls.
- class ecole.reward.PrimalDualIntegral
Primal-dual integral difference.
The reward is defined as the primal-dual integral since the previous state, where the integral is computed with respect to the solving time. The solving time is specific to the operating system: it includes time spent in
reset()
and time spent waiting on the agent.- __init__(self: ecole.reward.PrimalDualIntegral, wall: bool = False, bound_function: Callable[[ecole.scip.Model], Tuple[float, float]] = None) → None
Create a PrimalDualIntegral reward function.
- Parameters
wall – If true, the wall time will be used. If False (default), the process time will be used.
bound_function – A function which takes an ecole model and returns a tuple of an initial primal bound and dual bound. Values should be ordered as (initial_primal_bound, initial_dual_bound). The default function returns (-1e20, 1e20) if the problem is a maximization and (1e20, -1e20) otherwise.
- before_reset(self: ecole.reward.PrimalDualIntegral, model: ecole.scip.Model) → None
Reset the internal clock counter and the event handler.
- extract(self: ecole.reward.PrimalDualIntegral, model: ecole.scip.Model, done: bool = False) → float
Computes the current primal-dual integral and returns the difference.
The difference is computed based on the primal-dual integral between sequential calls.
Utilities
The following reward functions are used internally by Ecole.
Constant
- class ecole.reward.Constant
Constant Reward.
Always returns the value passed in constructor.
- __init__(self: ecole.reward.Constant, constant: float = 0.0) → None
- before_reset(self: ecole.reward.Constant, model: ecole.scip.Model) → None
Do nothing.
- extract(self: ecole.reward.Constant, model: ecole.scip.Model, done: bool = False) → float
Return the constant value.
Arithmetic
- class ecole.reward.Arithmetic
Proxy class for doing arithmetic on reward functions.
An object of this class is returned by reward function operators to forward calls to the reward function parameters of the operator.
- __init__(self: ecole.reward.Arithmetic, arg0: object, arg1: list, arg2: str) → None
- before_reset(self: ecole.reward.Arithmetic, model: object) → None
Reset the reward functions of the operator.
Calls
before_reset
on all reward functions parameters that were used to create this object.
- extract(self: ecole.reward.Arithmetic, model: object, done: bool = False) → float
Obtain the reward of result of the operator.
Calls
extract
on all reward function parameters that were used to create this object and compute the operation on the results.