Create New Environments
Environment Structure
In Ecole, it is possible to customize the reward or
observation returned by the environment. These components are structured in
RewardFunction
and ObservationFunction
classes that are
independent from the rest of the environment. We call what is left, that is, the environment without rewards
or observations, the environment’s Dynamics
.
In other words, the dynamics define the bare bone transitions of the Markov Decision Process.
Dynamics have an interface similar to environments, but with different input parameters and return types. In fact environments are wrappers around dynamics classes that drive the following orchestration:
Environments store the state as a
Model
;Then, they forward the
Model
to theDynamics
to start a new episode or transition to receive an action set;Next, they forward the
Model
to theRewardFunction
andObservationFunction
to receive an observation and reward;Finally, return everything to the user.
One susbtantial difference between the environment and the dynamics is the seeding behavior. Given that this is not an easy topic, it is discussed in Seeding.
Creating Dynamics
Reset and Step
Creating dynamics is very similar to creating reward and observation functions.
It can be done from scratch or by inheriting an existing one.
The following examples show how we can inherit a BranchingDynamics
class to
deactivate cutting planes and presolving in SCIP.
Note
One can also more directly deactivate SCIP parameters through the environment constructor.
Given that there is a large number of parameters to change, we want to use one of SCIP default’s modes
by calling SCIPsetPresolving
and SCIPsetSeparating
through PyScipOpt
(SCIP doc).
We will do so by overriding reset_dynamics()
, which
gets called by reset()
.
The similar method step_dynamics()
, which is called
by step()
, does not need to be changed in this
example, so we do not override it.
import ecole
from pyscipopt.scip import PY_SCIP_PARAMSETTING
class SimpleBranchingDynamics(ecole.dynamics.BranchingDynamics):
def reset_dynamics(self, model):
# Share memory with Ecole model
pyscipopt_model = model.as_pyscipopt()
pyscipopt_model.setPresolve(PY_SCIP_PARAMSETTING.OFF)
pyscipopt_model.setSeparating(PY_SCIP_PARAMSETTING.OFF)
# Let the parent class get the model at the root node and return
# the done flag / action_set
return super().reset_dynamics(model)
With our SimpleBranchingDynamics
class we have defined what we want the solver to do.
Now, to use it as a full environment that can manage observations and rewards, we wrap it in an
Environment
.
class SimpleBranching(ecole.environment.Environment):
__Dynamics__ = SimpleBranchingDynamics
The resulting SimpleBranching
class is then an environment as valid as any other in Ecole.
Passing parameters
We can make the previous example more flexible by deciding what we want to disable. To do so, we will take parameters in the constructor.
class SimpleBranchingDynamics(ecole.dynamics.BranchingDynamics):
def __init__(self, disable_presolve=True, disable_cuts=True, *args, **kwargs):
super().__init__(*args, **kwargs)
self.disable_presolve = disable_presolve
self.disable_cuts = disable_cuts
def reset_dynamics(self, model):
# Share memory with Ecole model
pyscipopt_model = model.as_pyscipopt()
if self.disable_presolve:
pyscipopt_model.setPresolve(PY_SCIP_PARAMSETTING.OFF)
if self.disable_cuts:
pyscipopt_model.setSeparating(PY_SCIP_PARAMSETTING.OFF)
# Let the parent class get the model at the root node and return
# the done flag / action_set
return super().reset_dynamics(model)
class SimpleBranching(ecole.environment.Environment):
__Dynamics__ = SimpleBranchingDynamics
The constructor arguments are forwarded from the __init__()
constructor:
env = SimpleBranching(observation_function=None, disable_cuts=False)
Similarily, extra arguments given to the environemnt reset()
and
step()
are forwarded to the associated
Dynamics
methods.