Environments
Interface
- class ecole.environment.Environment(observation_function=Default, reward_function=Default, information_function=Default, scip_params=None, **dynamics_kwargs)[source]
Ecole Partially Observable Markov Decision Process (POMDP).
Similar to OpenAI Gym, environments represent the task that an agent is supposed to solve. For maximum customizability, different components are composed/orchestrated in this class.
- __init__(observation_function=Default, reward_function=Default, information_function=Default, scip_params=None, **dynamics_kwargs) None [source]
Create a new environment object.
- Parameters
observation_function – An object of type
ObservationFunction
used to customize the observation returned byreset()
andstep()
.reward_function – An object of type
RewardFunction
used to customize the reward returned byreset()
andstep()
.information_function – An object of type
InformationFunction
used to customize the additional information returned byreset()
andstep()
.scip_params – Parameters set on the underlying
Model
at the start of every episode.**dynamics_kwargs – Other arguments are passed to the constructor of the
Dynamics
.
- reset(instance, *dynamics_args, **dynamics_kwargs)[source]
Start a new episode.
This method brings the environment to a new initial state, i.e. starts a new episode. The method can be called at any point in time.
- Parameters
instance – The combinatorial optimization problem to tackle during the newly started episode. Either a file path to an instance that can be read by SCIP, or a Model whose problem definition data will be copied.
dynamics_args – Extra arguments are forwarded as is to the underlying
Dynamics
.dynamics_kwargs – Extra arguments are forwarded as is to the underlying
Dynamics
.
- Returns
observation – The observation extracted from the initial state. Typically used to take the next action.
action_set – An optional subset that defines which actions are accepted in the next transition. For some environment, the action set may change at every transition.
reward_offset – An offset on the total cumulated reward, a.k.a. the initial reward. This reward does not impact learning (as no action has yet been taken) but can nonetheless be used for evaluation purposes. For example, in the total cumulated reward of an episode one may want to account for computations that happened during
reset()
(e.g. computation time, number of LP iteration in presolving…).done – A boolean flag indicating whether the current state is terminal. If this flag is true, then the current episode is finished, and
step()
cannot be called any more.info – A collection of environment specific information about the transition. This is not necessary for the control problem, but is useful to gain insights about the environment.
- seed(value: int) None [source]
Set the random seed of the environment.
The random seed is used to seed the environment
RandomGenerator
. At every call toreset()
, the random generator is used to create new seeds for the solver. Setting the seed once will ensure determinism for the next trajectories. By default, the random generator is initialized by the random module.
- step(action, *dynamics_args, **dynamics_kwargs)[source]
Transition from one state to another.
This method takes a user action to transition from the current state to the next. The method cannot be called if the environment has not been reset since its instantiation or since a terminal state has been reached.
- Parameters
action – The action to take in as part of the Markov Decision Process. If an action set has been given in the latest call (inluding calls to
reset()
), then the action must comply with the action set.dynamics_args – Extra arguments are forwarded as is to the underlying
Dynamics
.dynamics_kwargs – Extra arguments are forwarded as is to the underlying
Dynamics
.
- Returns
observation – The observation extracted from the initial state. Typically used to take the next action.
action_set – An optional subset that defines which actions are accepted in the next transition. For some environment, the action set may change at every transition.
reward – A real number to use for reinforcement learning.
done – A boolean flag indicating whether the current state is terminal. If this flag is true, then the current episode is finished, and
step()
cannot be called any more.info – A collection of environment specific information about the transition. This is not necessary for the control problem, but is useful to gain insights about the environment.
Protocol
- class ecole.typing.Dynamics(*args, **kwargs)[source]
Dynamics are raw environments.
The class is a bare
ecole.environment.Environment
without rewards, observations, and other utlilities. It defines the state transitions of a Markov Decision Process, that is the series of steps and possible actions of the environment.- __init__(*args, **kwargs)
- reset_dynamics(model: ecole.scip.Model) Tuple[bool, ecole.typing.ActionSet] [source]
Start a new episode.
This method brings the environment to a new initial state, i.e. starts a new episode. The method can be called at any point in time.
- Parameters
model – The SCIP model that will be used through the episode.
- Returns
done – A boolean flag indicating wether the current state is terminal. If this is true, the episode is finished, and
step_dynamics()
cannot be called.action_set – An optional subset of accepted action in the next transition. For some environment, this may change at every transition.
- set_dynamics_random_state(model: ecole.scip.Model, rng: ecole.RandomGenerator) None [source]
Set the random state of the episode.
This method is called by
reset()
to set all the random elements of the dynamics for the upcoming episode. The random generator is kept between episodes in order to sample different episodes.- Parameters
model – The SCIP model that will be used through the episode.
rng – The random generator used by the environment from which random numbers can be extracted.
- step_dynamics(model: ecole.scip.Model, action: ecole.typing.Action) Tuple[bool, ecole.typing.ActionSet] [source]
Transition from one state to another.
This method takes the user action to transition from the current state to the next. The method cannot be called if the dynamics has not been reset since its instantiation or is in a terminal state.
- Parameters
action – The action to take in as part of the Markov Decision Process. If an action set has been given in the latest call (inluding calls to
reset_dynamics()
), then the action must be in that set.- Returns
done – A boolean flag indicating wether the current state is terminal. If this is true, the episode is finished, and this method cannot be called until
reset_dynamics()
has been called.action_set – An optional subset of accepted action in the next transition. For some environment, this may change at every transition.
Listing
Branching
- class ecole.environment.Branching(observation_function=Default, reward_function=Default, information_function=Default, scip_params=None, **dynamics_kwargs)[source]
- class ecole.dynamics.BranchingDynamics
Single variable branching Dynamics.
Based on a SCIP branching callback with maximal priority and no depth limit. The dynamics give the control back to the user every time the callback would be called. The user receives as an action set the list of branching candidates, and is expected to select one of them as the action.
- __init__(self: ecole.dynamics.BranchingDynamics, pseudo_candidates: bool = False) None
Create new dynamics.
- Parameters
pseudo_candidates – Whether the action set contains pseudo branching variable candidates (
SCIPgetPseudoBranchCands
) or LP branching variable candidates (SCIPgetPseudoBranchCands
).
- reset_dynamics(self: ecole.dynamics.BranchingDynamics, model: ecole.scip.Model) Tuple[bool, Optional[numpy.ndarray[numpy.uint64]]]
Start solving up to first branching node.
Start solving with SCIP defaults (
SCIPsolve
) and give back control to the user on the first branching decision. Users can inherit from this dynamics to change the defaults settings such as presolving and cutting planes.- Parameters
model – The state of the Markov Decision Process. Passed by the environment.
- Returns
done – Whether the instance is solved. This can happen without branching, for instance if the instance is solved during presolving.
action_set – List of indices of branching candidate variables. Available candidates depend on parameters in
__init__()
. Variable indices (values in theaction_set
) are their position in the original problem (SCIPvarGetProbindex
). Variable ordering in theaction_set
is arbitrary.
- set_dynamics_random_state(self: ecole.dynamics.BranchingDynamics, model: ecole.scip.Model, rng: ecole.RandomGenerator) None
Set seeds on the
Model
.Set seed parameters, including permutation, LP, and shift.
- Parameters
model – The state of the Markov Decision Process. Passed by the environment.
rng – The source of randomness. Passed by the environment.
- step_dynamics(self: ecole.dynamics.BranchingDynamics, model: ecole.scip.Model, action: Union[ecole.DefaultType, int]) Tuple[bool, Optional[numpy.ndarray[numpy.uint64]]]
Branch and resume solving until next branching.
Branching is done on a single variable using
SCIPbranchVar
. The control is given back to the user on the next branching decision or when done.- Parameters
model – The state of the Markov Decision Process. Passed by the environment.
action – The index the LP column of the variable to branch on. One element of the action set. If an explicit
ecole.Default
is passed, then default SCIP branching is used, that is, the next branching rule is used fetch by SCIP according to their priorities.
- Returns
done – Whether the instance is solved.
action_set – List of indices of branching candidate variables. Available candidates depend on parameters in
__init__()
. Variable indices (values in theaction_set
) are their position in the original problem (SCIPvarGetProbindex
). Variables ordering in theaction_set
is arbitrary.
Configuring
- class ecole.environment.Configuring(observation_function=Default, reward_function=Default, information_function=Default, scip_params=None, **dynamics_kwargs)[source]
- class ecole.dynamics.ConfiguringDynamics
Setting solving parameters Dynamics.
These dynamics are meant to be used as a (contextual) bandit to find good parameters for SCIP.
- __init__(self: ecole.dynamics.ConfiguringDynamics) None
- reset_dynamics(self: ecole.dynamics.ConfiguringDynamics, model: ecole.scip.Model) Tuple[bool, None]
Does nothing.
Users can inherit from this dynamics to change when in the solving process parameters will be set (for instance after presolving).
- Parameters
model – The state of the Markov Decision Process. Passed by the environment.
- Returns
done – Whether the instance is solved. Always false.
action_set – Unused.
- set_dynamics_random_state(self: ecole.dynamics.ConfiguringDynamics, model: ecole.scip.Model, rng: ecole.RandomGenerator) None
Set seeds on the
Model
.Set seed parameters, including permutation, LP, and shift.
- Parameters
model – The state of the Markov Decision Process. Passed by the environment.
rng – The source of randomness. Passed by the environment.
- step_dynamics(self: ecole.dynamics.ConfiguringDynamics, model: ecole.scip.Model, action: Dict[str, Union[bool, int, float, str]]) Tuple[bool, None]
Set parameters and solve the instance.
- Parameters
model – The state of the Markov Decision Process. Passed by the environment.
action – A mapping of parameter names and values.
- Returns
done – Whether the instance is solved. Always true.
action_set – Unused.
PrimalSearch
- class ecole.environment.PrimalSearch(observation_function=Default, reward_function=Default, information_function=Default, scip_params=None, **dynamics_kwargs)[source]
- class ecole.dynamics.PrimalSearchDynamics
Search for primal solutions Dynamics.
Based on a SCIP primal heuristic callback with maximal priority, which executes after the processing of a node is finished (
SCIP_HEURTIMING_AFTERNODE
). The dynamics give the control back to the user a few times (trials) each time the callback is called. The agent receives as an action set the list of all non-fixed discrete variables at the current node (pseudo branching candidates), and is expected to give back as an action a partial primal solution, i.e., a value assignment for a subset of these variables.- __init__(self: ecole.dynamics.PrimalSearchDynamics, trials_per_node: int = 1, depth_freq: int = 1, depth_start: int = 0, depth_stop: int = - 1) None
Initialize new PrimalSearchDynamics.
- Parameters
trials_per_node – Number of primal searches performed at each node (or -1 for an infinite number of trials).
depth_freq – Depth frequency of when the primal search is called (
HEUR_FREQ
in SCIP).depth_start – Tree depth at which the primal search starts being called (
HEUR_FREQOFS
in SCIP).depth_stop – Tree depth after which the primal search stops being called (
HEUR_MAXDEPTH
in SCIP).
- reset_dynamics(self: ecole.dynamics.PrimalSearchDynamics, model: ecole.scip.Model) Tuple[bool, Optional[numpy.ndarray[numpy.uint64]]]
Start solving up to first primal heuristic call.
Start solving with SCIP defaults (
SCIPsolve
) and give back control to the user on the first heuristic call. Users can inherit from this dynamics to change the defaults settings such as presolving and cutting planes.- Parameters
model – The state of the Markov Decision Process. Passed by the environment.
- Returns
done – Whether the instance is solved. This can happen before the heuristic gets called, for instance if the instance is solved during presolving.
action_set – List of non-fixed discrete variables (
SCIPgetPseudoBranchCands
).
- set_dynamics_random_state(self: ecole.dynamics.PrimalSearchDynamics, model: ecole.scip.Model, rng: ecole.RandomGenerator) None
Set seeds on the
Model
.Set seed parameters, including permutation, LP, and shift.
- Parameters
model – The state of the Markov Decision Process. Passed by the environment.
rng – The source of randomness. Passed by the environment.
- step_dynamics(self: ecole.dynamics.PrimalSearchDynamics, model: ecole.scip.Model, action: Tuple[numpy.ndarray[numpy.uint64], numpy.ndarray[numpy.float64]]) Tuple[bool, Optional[numpy.ndarray[numpy.uint64]]]
Try to obtain a feasible primal solution from the given (partial) primal solution.
If the number of search trials per node is exceeded, then continue solving until the next time the heuristic gets called.
To obtain a complete feasible solution, variables are fixed to their partial assignment values, and the rest of the variable assigments is deduced by solving an LP in probing mode. If the provided partial assigment is empty, then nothing is done.
- Parameters
model – The state of the Markov Decision Process. Passed by the environment.
action – A subset of the variables given in the action set, and their assigned values.
- Returns
done – Whether the instance is solved.
action_set – List of non-fixed discrete variables (
SCIPgetPseudoBranchCands
).