Create New Environments
Environment Structure
In Ecole, it is possible to customize the reward or
observation returned by the environment. These components are structured in
RewardFunction
and ObservationFunction
classes that are
independent from the rest of the environment. We call what is left, that is, the environment without rewards
or observations, the environment’s Dynamics
.
In other words, the dynamics define the bare bone transitions of the Markov Decision Process.
Dynamics have an interface similar to environments, but with different input parameters and return types. In fact environments are wrappers around dynamics classes that drive the following orchestration:
Environments store the state as a
Model
;Then, they forward the
Model
to theDynamics
to start a new episode or transition to receive an action set;Next, they forward the
Model
to theRewardFunction
andObservationFunction
to receive an observation and reward;Finally, return everything to the user.
One susbtantial difference between the environment and the dynamics is the seeding behavior. Given that this is not an easy topic, it is discussed in Seeding.
Creating Dynamics
Reset and Step
Creating dynamics is very similar to creating reward and observation functions.
It can be done from scratch or by inheriting an existing one.
The following examples show how we can inherit a BranchingDynamics
class to
deactivate cutting planes and presolving in SCIP.
Note
One can also more directly deactivate SCIP parameters through the environment constructor.
Given that there is a large number of parameters to change, we want to use one of SCIP default’s modes
by calling SCIPsetPresolving
and SCIPsetSeparating
through PyScipOpt
(SCIP doc).
We will do so by overriding reset_dynamics()
, which
gets called by reset()
.
The similar method step_dynamics()
, which is called
by step()
, does not need to be changed in this
example, so we do not override it.
import ecole
from pyscipopt.scip import PY_SCIP_PARAMSETTING
class SimpleBranchingDynamics(ecole.dynamics.BranchingDynamics):
def reset_dynamics(self, model):
# Share memory with Ecole model
pyscipopt_model = model.as_pyscipopt()
pyscipopt_model.setPresolve(PY_SCIP_PARAMSETTING.OFF)
pyscipopt_model.setSeparating(PY_SCIP_PARAMSETTING.OFF)
# Let the parent class get the model at the root node and return
# the done flag / action_set
return super().reset_dynamics(model)
With our SimpleBranchingDynamics
class we have defined what we want the solver to do.
Now, to use it as a full environment that can manage observations and rewards, we wrap it in an
Environment
.
class SimpleBranching(ecole.environment.Environment):
__Dynamics__ = SimpleBranchingDynamics
The resulting SimpleBranching
class is then an environment as valid as any other in Ecole.
Passing parameters
We can make the previous example more flexible by deciding what we want to disable. To do so, we will take parameters in the constructor.
class SimpleBranchingDynamics(ecole.dynamics.BranchingDynamics):
def __init__(self, disable_presolve=True, disable_cuts=True, *args, **kwargs):
super().__init__(*args, **kwargs)
self.disable_presolve = disable_presolve
self.disable_cuts = disable_cuts
def reset_dynamics(self, model):
# Share memory with Ecole model
pyscipopt_model = model.as_pyscipopt()
if self.disable_presolve:
pyscipopt_model.setPresolve(PY_SCIP_PARAMSETTING.OFF)
if self.disable_cuts:
pyscipopt_model.setSeparating(PY_SCIP_PARAMSETTING.OFF)
# Let the parent class get the model at the root node and return
# the done flag / action_set
return super().reset_dynamics(model)
class SimpleBranching(ecole.environment.Environment):
__Dynamics__ = SimpleBranchingDynamics
The constructor arguments are forwarded from the __init__()
constructor:
env = SimpleBranching(observation_function=None, disable_cuts=False)
Similarily, extra arguments given to the environemnt reset()
and
step()
are forwarded to the associated
Dynamics
methods.
Using Control Inversion
When using a traditional SCIP callback, the user has to add the callback to SCIP, call SCIPsolve
, and wait for the
solving process to terminate.
We say that SCIP has the control.
This has some downsides, such a having to forward all the data the agent will use to the callback, making it harder to
stop the solving process, and reduce interactivity.
For instance when using a callback in a notebook, if the user forgot to fetch some data, then they have to re-execute
the whole solving process.
On the contrary, when using an Ecole environment such as Branching
, the environment
pauses on every branch-and-bound node (i.e. every branchrule callback call) to let the user make a decision,
or inspect the Model
.
We say that the user (or the agent) has the control.
To do so, we did not reconstruct the solving algorithm SCIPsolve
to fit our needs.
Rather, we have implemented a general inversion of control mechanism to let SCIP pause and be resumed on every
callback call (using a form of stackful coroutine).
We call this approach iterative solving and it runs exactly the same SCIPsolve
algorithm, without noticable
overhead, while perfectly forwarding all information available in the callback.
To use this tool, the user start by calling ecole.scip.Model.solve_iter()
, with a set of call callback
constructor arguments.
Iterative solving will then add these callbacks, start solving, and return the first time that one of these callback
is executed.
The return value describes where the solving has stopped, and the parameters of the callback where it has stopped.
This is the time for the user to perform whichever action they would have done in the callback.
Solving can be resumed by calling ecole.scip.Model.solve_iter_continue()
with the
ecole.scip.callback.Result
that would have been set in the callback.
Solving is finished when one of the iterative solving function returns None
.
The ecole.scip.Model
can safely be deleted an any time (SCIP termination is handled automatically).
For instance, iterative solving an environement while pausing on branchrule and heuristic callbacks look like the following.
model = ecole.scip.Model.from_file("path/to/file")
# Start solving until the first pause, if any.
fcall = model.solve_iter(
# Stop on branchrule callback.
ecole.scip.callback.BranchruleConstructor(),
# Stop on heuristic callback after node.
ecole.scip.callback.HeuristicConstructor(timing_mask=ecole.scip.HeurTiming.AfterNode),
)
# While solving is not finished, `fcall` contains information about the current stop.
while fcall is not None:
# Solving stopped on a branchrule callback.
if isinstance(fcall, ecole.scip.callback.BranchruleCall):
# Perform some branching (through PyScipOpt).
...
# Resume solving until next pause.
fcall = model.solve_iter_continue(ecole.scip.callback.Result.Branched)
# Solving stopped on a heurisitc callback.
elif isinstance(fcall, ecole.scip.callback.HeuristicCall):
# Return as no heuristic was performed (only data collection)
fcall = model.solve_iter_continue(ecole.scip.callback.Result.DidNotRun)
See BranchruleConstructor
, HeuristicConstructor
for
callback constructor parameters, as well as BranchruleCall
and
BranchruleCall
for callbacks functions parameters passed by SCIP to the callback
methods.
Note
By default callback parameters such as priority
, frequency
, and max_depth
taht control how when
the callback are evaluated by SCIP are set to run as often as possible.
However, it is entirely possible to run it with lower priority or frequency for create specific environments or
whatever other purpose.
To create dynamics using iterative solving, one should call ecole.scip.Model.solve_iter()
in
reset_dynamics()
and ecole.scip.Model.solve_iter_continue()
in
step_dynamics()
.
For instance, a branching environment could be created with the following dynamics.
class MyBranchingDynamics:
def __init__(self, pseudo_candidates=False, max_depth=ecole.scip.callback.max_depth_none):
self.pseudo_candidates = pseudo_candidates
self.max_depth = max_depth
def action_set(self, model):
if self.pseudo_candidates:
return model.as_pyscipopt().getPseudoBranchCands()
else:
return model.as_pyscipopt().getLPBranchCands()
return ...
def reset_dynamics(self, model):
fcall = model.solve_iter(
ecole.scip.callback.BranchruleConstructor(max_depth=self.max_depth)
)
return (fcall is None), self.action_set(model)
def step_dynamics(self, model, action):
model.as_pyscipopt().branchVar(action)
fcall = model.solve_iter_continue(ecole.scip.callback.Result.Branched)
return (fcall is None), self.action_set(model)