Create New Functions
ObservationFunction
and RewardFunction
functions
can be adapted and created from Python.
At the core of the environment, a SCIP Model
(equivalent abstraction to a
pyscipopt.Model
or a SCIP*
in C
), describes the state of the environment.
The idea of observation and reward functions is to have a function that takes as input a
Model
, and returns the desired value (an observation, or a reward).
The environment itself does nothing more than calling the functions and forward their output to the
user.
Pratically speaking, it is more convenient to implement such functions as a class than a function, as it makes it easier to keep information between states.
Extending a Function
To reuse a function, Python inheritance can be used. For example, the method in an observation function called
to extract the features from the model is called extract()
.
In the following example, we will extend the NodeBipartite
observation function by
overloading its extract()
function to scale the features by their
maximum absolute value.
import numpy as np
from ecole.observation import NodeBipartite
class ScaledNodeBipartite(NodeBipartite):
def extract(self, model, done):
# Call parent method to get the original observation
obs = super().extract(model, done)
# Apply scaling
column_max_abs = np.abs(obs.column_features).max(0)
obs.column_features[:] /= column_max_abs
row_max_abs = np.abs(obs.row_features).max(0)
obs.row_features[:] /= row_max_abs
# Return the updated observation
return obs
By using inheritance, we used NodeBipartite
’s own extract()
to do the heavy lifting, only appending the additional scaling code.
The resulting ScaledNodeBipartite
class is a perfectly valid observation function that can be given to an
environment.
As an additional example, instead of scaling by the maximum absolute value one might want to use a scaling factor smoothed by exponential moving averaging, with some coefficient α. This will illustrate how the class paradigm is useful to saving information between states.
class MovingScaledNodeBipartite(NodeBipartite):
def __init__(self, alpha, *args, **kwargs):
# Construct parent class with other parameters
super().__init__(*args, **kwargs)
self.alpha = alpha
def before_reset(self, model):
super().before_reset(model)
# Reset the exponential moving average (ema) on new episodes
self.column_ema = None
self.row_ema = None
def extract(self, model, done):
obs = super().extract(model, done)
# Compute the max absolute vector for the current observation
column_max_abs = np.abs(obs.column_features).max(0)
row_max_abs = np.abs(obs.row_features).max(0)
if self.column_ema is None:
# New exponential moving average on a new episode
self.column_ema = column_max_abs
self.row_ema = row_max_abs
else:
# Update the exponential moving average
self.column_ema = self.alpha * column_max_abs + (1 - alpha) * self.column_ema
self.row_ema = self.alpha * row_max_abs + (1 - alpha) * self.row_ema
# Scale features and return the new observation
obs.column_features[:] /= self.column_ema
obs.row_features[:] /= self.row_ema
return obs
Here, you can notice how we used the constructor to customize the coefficient of the
exponential moving average.
Note also that we inherited the before_reset()
method which does not
return anything: this method is called at the begining of the episode by
reset()
and is used to reintialize the class
internal attribute on new episodes.
Finally, the extract()
is also called during during
reset()
, hence the if
else else
condition.
Both these methods call the parent method to let it do its own initialization/resetting.
Warning
The scaling shown in this example is naive implementation meant to showcase the use of observation function. For proper scaling functions consider Scikit-Learn Scalers
Writing a Function from Scratch
The ObservationFunction
and RewardFunction
classes don’t do
anything more than what is explained in the previous section.
This means that to create new function in Python, one can simply create a class with the previous
methods.
For instance, we can create a StochasticReward
function that will wrap any given
RewardFunction
, and with some probability return either the given reward or
0.
import random
class StochasticReward:
def __init__(self, reward_function, probability=0.05):
self.reward_function = reward_function
self.probability = probability
def before_reset(self, model):
self.reward_function.before_reset(model)
def extract(self, model, done):
# Unconditionally getting reward as reward_funcition.extract may have side effects
reward = self.reward_function.extract(model, done)
if random.random() < probability:
return 0.0
else:
return reward
The resulting class is a perfectly valid reward function which can be used in any environment, for example as follows.
>> stochastic_lpiterations = StochaticReward(-ecole.reward.LpIteration, probability=0.1)
>> env = ecole.environment.Branching(reward_function=stochastic_lpiterations)
Using PySCIPOpt
The extraction functions described on this page, by definition, aim to extract information from the solver about the state of the process. An excellent reason to create or extend a reward function is to access information not provided by the default functions in Ecole. To do so in Python, one might want to use PyScipOpt, the official Python interface to SCIP.
In PySCIPOpt`, the state of the SCIP solver is stored in an ``pyscipopt.Model
object. This is closely related to,
but not quite the same, as Ecole’s Model
class. For a number of reasons (such as C++ compatibility),
the two classes don’t coincide. However, for ease of use, it is possible to convert back and forth without any copy.
Using ecole.scip.Model.as_pyscipopt()
, one can get a pyscipopt.Model
that shares its
internal data with ecole.scip.Model
. Conversely, given a pyscipopt.Model
, it is possible to to create a ecole.scip.Model
using the static method ecole.scip.Model.from_pyscipopt()
.