Environments

Interface

class ecole.environment.Environment(observation_function=Default, reward_function=Default, information_function=Default, scip_params=None, **dynamics_kwargs)[source]

Ecole Partially Observable Markov Decision Process (POMDP).

Similar to OpenAI Gym, environments represent the task that an agent is supposed to solve. For maximum customizability, different components are composed/orchestrated in this class.

__init__(observation_function=Default, reward_function=Default, information_function=Default, scip_params=None, **dynamics_kwargs) → None[source]

Create a new environment object.

Parameters

observation_function – An object of type ObservationFunction used to customize the observation returned by reset() and step().
reward_function – An object of type RewardFunction used to customize the reward returned by reset() and step().
information_function – An object of type InformationFunction used to customize the additional information returned by reset() and step().
scip_params – Parameters set on the underlying Model at the start of every episode.
**dynamics_kwargs – Other arguments are passed to the constructor of the Dynamics.

reset(instance, *dynamics_args, **dynamics_kwargs)[source]

Start a new episode.

This method brings the environment to a new initial state, i.e. starts a new episode. The method can be called at any point in time.

Parameters

instance – The combinatorial optimization problem to tackle during the newly started episode. Either a file path to an instance that can be read by SCIP, or a Model whose problem definition data will be copied.
dynamics_args – Extra arguments are forwarded as is to the underlying Dynamics.
dynamics_kwargs – Extra arguments are forwarded as is to the underlying Dynamics.

Returns

observation – The observation extracted from the initial state. Typically used to take the next action.
action_set – An optional subset that defines which actions are accepted in the next transition. For some environment, the action set may change at every transition.
reward_offset – An offset on the total cumulated reward, a.k.a. the initial reward. This reward does not impact learning (as no action has yet been taken) but can nonetheless be used for evaluation purposes. For example, in the total cumulated reward of an episode one may want to account for computations that happened during reset() (e.g. computation time, number of LP iteration in presolving…).
done – A boolean flag indicating whether the current state is terminal. If this flag is true, then the current episode is finished, and step() cannot be called any more.
info – A collection of environment specific information about the transition. This is not necessary for the control problem, but is useful to gain insights about the environment.

seed(value: int) → None[source]

Set the random seed of the environment.

The random seed is used to seed the environment RandomGenerator. At every call to reset(), the random generator is used to create new seeds for the solver. Setting the seed once will ensure determinism for the next trajectories. By default, the random generator is initialized by the random module.

step(action, *dynamics_args, **dynamics_kwargs)[source]

Transition from one state to another.

This method takes a user action to transition from the current state to the next. The method cannot be called if the environment has not been reset since its instantiation or since a terminal state has been reached.

Parameters

action – The action to take in as part of the Markov Decision Process. If an action set has been given in the latest call (inluding calls to reset()), then the action must comply with the action set.
dynamics_args – Extra arguments are forwarded as is to the underlying Dynamics.
dynamics_kwargs – Extra arguments are forwarded as is to the underlying Dynamics.

Returns

observation – The observation extracted from the initial state. Typically used to take the next action.
action_set – An optional subset that defines which actions are accepted in the next transition. For some environment, the action set may change at every transition.
reward – A real number to use for reinforcement learning.
done – A boolean flag indicating whether the current state is terminal. If this flag is true, then the current episode is finished, and step() cannot be called any more.
info – A collection of environment specific information about the transition. This is not necessary for the control problem, but is useful to gain insights about the environment.

Protocol

class ecole.typing.Dynamics(*args, **kwargs)[source]

Dynamics are raw environments.

The class is a bare ecole.environment.Environment without rewards, observations, and other utlilities. It defines the state transitions of a Markov Decision Process, that is the series of steps and possible actions of the environment.

__init__(*args, **kwargs): Initialize self. See help(type(self)) for accurate signature.

reset_dynamics(model: ecole.scip.Model) → Tuple[bool, ecole.typing.ActionSet][source]

Start a new episode.

This method brings the environment to a new initial state, i.e. starts a new episode. The method can be called at any point in time.

Parameters

model – The SCIP model that will be used through the episode.

Returns

done – A boolean flag indicating wether the current state is terminal. If this is true, the episode is finished, and step_dynamics() cannot be called.
action_set – An optional subset of accepted action in the next transition. For some environment, this may change at every transition.

set_dynamics_random_state(model: ecole.scip.Model, rng: ecole.RandomGenerator) → None[source]

Set the random state of the episode.

This method is called by reset() to set all the random elements of the dynamics for the upcoming episode. The random generator is kept between episodes in order to sample different episodes.

Parameters

model – The SCIP model that will be used through the episode.
rng – The random generator used by the environment from which random numbers can be extracted.

step_dynamics(model: ecole.scip.Model, action: ecole.typing.Action) → Tuple[bool, ecole.typing.ActionSet][source]

Transition from one state to another.

This method takes the user action to transition from the current state to the next. The method cannot be called if the dynamics has not been reset since its instantiation or is in a terminal state.

Parameters

action – The action to take in as part of the Markov Decision Process. If an action set has been given in the latest call (inluding calls to reset_dynamics()), then the action must be in that set.

Returns

done – A boolean flag indicating wether the current state is terminal. If this is true, the episode is finished, and this method cannot be called until reset_dynamics() has been called.
action_set – An optional subset of accepted action in the next transition. For some environment, this may change at every transition.

Listing

Branching

class ecole.environment.Branching(observation_function=Default, reward_function=Default, information_function=Default, scip_params=None, **dynamics_kwargs)[source]

class ecole.dynamics.BranchingDynamics

Single variable branching Dynamics.

Based on a SCIP branching callback with maximal priority and no depth limit. The dynamics give the control back to the user every time the callback would be called. The user receives as an action set the list of branching candidates, and is expected to select one of them as the action.

__init__(self: ecole.dynamics.BranchingDynamics, pseudo_candidates: bool = False) → None

Create new dynamics.

Parameters: pseudo_candidates – Whether the action set contains pseudo branching variable candidates (SCIPgetPseudoBranchCands) or LP branching variable candidates (SCIPgetPseudoBranchCands).

reset_dynamics(self: ecole.dynamics.BranchingDynamics, model: ecole.scip.Model) → Tuple[bool, Optional[numpy.ndarray[numpy.uint64]]]

Start solving up to first branching node.

Start solving with SCIP defaults (SCIPsolve) and give back control to the user on the first branching decision. Users can inherit from this dynamics to change the defaults settings such as presolving and cutting planes.

Parameters

model – The state of the Markov Decision Process. Passed by the environment.

Returns

done – Whether the instance is solved. This can happen without branching, for instance if the instance is solved during presolving.
action_set – List of indices of branching candidate variables. Available candidates depend on parameters in __init__(). Variable indices (values in the action_set) are their position in the original problem (SCIPvarGetProbindex). Variable ordering in the action_set is arbitrary.

set_dynamics_random_state(self: ecole.dynamics.BranchingDynamics, model: ecole.scip.Model, rng: ecole.RandomGenerator) → None

Set seeds on the Model.

Set seed parameters, including permutation, LP, and shift.

Parameters

model – The state of the Markov Decision Process. Passed by the environment.
rng – The source of randomness. Passed by the environment.

step_dynamics(self: ecole.dynamics.BranchingDynamics, model: ecole.scip.Model, action: Union[ecole.DefaultType, int]) → Tuple[bool, Optional[numpy.ndarray[numpy.uint64]]]

Branch and resume solving until next branching.

Branching is done on a single variable using SCIPbranchVar. The control is given back to the user on the next branching decision or when done.

Parameters

model – The state of the Markov Decision Process. Passed by the environment.
action – The index the LP column of the variable to branch on. One element of the action set. If an explicit ecole.Default is passed, then default SCIP branching is used, that is, the next branching rule is used fetch by SCIP according to their priorities.

Returns

done – Whether the instance is solved.
action_set – List of indices of branching candidate variables. Available candidates depend on parameters in __init__(). Variable indices (values in the action_set) are their position in the original problem (SCIPvarGetProbindex). Variables ordering in the action_set is arbitrary.

Configuring

class ecole.environment.Configuring(observation_function=Default, reward_function=Default, information_function=Default, scip_params=None, **dynamics_kwargs)[source]

class ecole.dynamics.ConfiguringDynamics

Setting solving parameters Dynamics.

These dynamics are meant to be used as a (contextual) bandit to find good parameters for SCIP.

__init__(self: ecole.dynamics.ConfiguringDynamics) → None

reset_dynamics(self: ecole.dynamics.ConfiguringDynamics, model: ecole.scip.Model) → Tuple[bool, None]

Does nothing.

Users can inherit from this dynamics to change when in the solving process parameters will be set (for instance after presolving).

Parameters

model – The state of the Markov Decision Process. Passed by the environment.

Returns

done – Whether the instance is solved. Always false.
action_set – Unused.

set_dynamics_random_state(self: ecole.dynamics.ConfiguringDynamics, model: ecole.scip.Model, rng: ecole.RandomGenerator) → None

Set seeds on the Model.

Set seed parameters, including permutation, LP, and shift.

Parameters

model – The state of the Markov Decision Process. Passed by the environment.
rng – The source of randomness. Passed by the environment.

step_dynamics(self: ecole.dynamics.ConfiguringDynamics, model: ecole.scip.Model, action: Dict[str, Union[bool, int, float, str]]) → Tuple[bool, None]

Set parameters and solve the instance.

Parameters

model – The state of the Markov Decision Process. Passed by the environment.
action – A mapping of parameter names and values.

Returns

done – Whether the instance is solved. Always true.
action_set – Unused.

PrimalSearch

class ecole.environment.PrimalSearch(observation_function=Default, reward_function=Default, information_function=Default, scip_params=None, **dynamics_kwargs)[source]

class ecole.dynamics.PrimalSearchDynamics

Search for primal solutions Dynamics.

Based on a SCIP primal heuristic callback with maximal priority, which executes after the processing of a node is finished (SCIP_HEURTIMING_AFTERNODE). The dynamics give the control back to the user a few times (trials) each time the callback is called. The agent receives as an action set the list of all non-fixed discrete variables at the current node (pseudo branching candidates), and is expected to give back as an action a partial primal solution, i.e., a value assignment for a subset of these variables.

__init__(self: ecole.dynamics.PrimalSearchDynamics, trials_per_node: int = 1, depth_freq: int = 1, depth_start: int = 0, depth_stop: int = - 1) → None

Initialize new PrimalSearchDynamics.

Parameters

trials_per_node – Number of primal searches performed at each node (or -1 for an infinite number of trials).
depth_freq – Depth frequency of when the primal search is called (HEUR_FREQ in SCIP).
depth_start – Tree depth at which the primal search starts being called (HEUR_FREQOFS in SCIP).
depth_stop – Tree depth after which the primal search stops being called (HEUR_MAXDEPTH in SCIP).

reset_dynamics(self: ecole.dynamics.PrimalSearchDynamics, model: ecole.scip.Model) → Tuple[bool, Optional[numpy.ndarray[numpy.uint64]]]

Start solving up to first primal heuristic call.

Start solving with SCIP defaults (SCIPsolve) and give back control to the user on the first heuristic call. Users can inherit from this dynamics to change the defaults settings such as presolving and cutting planes.

Parameters

model – The state of the Markov Decision Process. Passed by the environment.

Returns

done – Whether the instance is solved. This can happen before the heuristic gets called, for instance if the instance is solved during presolving.
action_set – List of non-fixed discrete variables (SCIPgetPseudoBranchCands).

set_dynamics_random_state(self: ecole.dynamics.PrimalSearchDynamics, model: ecole.scip.Model, rng: ecole.RandomGenerator) → None

Set seeds on the Model.

Set seed parameters, including permutation, LP, and shift.

Parameters

model – The state of the Markov Decision Process. Passed by the environment.
rng – The source of randomness. Passed by the environment.

step_dynamics(self: ecole.dynamics.PrimalSearchDynamics, model: ecole.scip.Model, action: Tuple[numpy.ndarray[numpy.uint64], numpy.ndarray[numpy.float64]]) → Tuple[bool, Optional[numpy.ndarray[numpy.uint64]]]

Try to obtain a feasible primal solution from the given (partial) primal solution.

If the number of search trials per node is exceeded, then continue solving until the next time the heuristic gets called.

To obtain a complete feasible solution, variables are fixed to their partial assignment values, and the rest of the variable assigments is deduced by solving an LP in probing mode. If the provided partial assigment is empty, then nothing is done.

Parameters

model – The state of the Markov Decision Process. Passed by the environment.
action – A subset of the variables given in the action set, and their assigned values.

Returns

done – Whether the instance is solved.
action_set – List of non-fixed discrete variables (SCIPgetPseudoBranchCands).