Environments

Interface

class ecole.environment.EnvironmentComposer(observation_function='default', reward_function='default', scip_params=None, **dynamics_kwargs)[source]
reset(instance, *dynamics_args, **dynamics_kwargs)[source]

Start a new episode.

This method brings the environment to a new initial state, i.e. starts a new episode. The method can be called at any point in time.

Parameters
  • instance – The combinatorial optimization problem to tackle during the newly started episode.

  • dynamics_args – Extra arguments are forwarded as is to the underlying Dynamics.

  • dynamics_kwargs – Extra arguments are forwarded as is to the underlying Dynamics.

Returns

  • observation – The observation of extracted from the initial state. Typically used to take the next action.

  • action_set – An optional subset of accepted action in the next transition. For some environment, this may change at every transition.

  • reward_offset – An offset on the initial state. This reward is not used for learning (as no action has yet been taken) but is used in evaluation for the sum of rewards when one needs to account for computations that happened during reset() (e.g. computation time, number of LP iteration in presolving…).

  • done – A boolean flag indicating wether the current state is terminal. If this is true, the episode is finished, and step() cannot be called.

seed(value: int) → None[source]

Set the random seed of the environment.

The the random seed is used to seed the environment RandomEngine. At every call to reset(), the random engine is used to create new seeds for the solver. Setting the seed once will ensure determinism for the next trajectories. By default, the random engine is initialized by the random module.

step(action, *dynamics_args, **dynamics_kwargs)[source]

Transition from one state to another.

This method takes a user action to transition from the current state to the next. The method cannot be called if the environment has not been reset since its instantiation or since a terminal state.

Parameters
  • action – The action to take in as part of the Markov Decision Process. If an action set has been given in the latest call (inluding calls to reset()), then the action must be in that set.

  • dynamics_args – Extra arguments are forwarded as is to the underlying Dynamics.

  • dynamics_kwargs – Extra arguments are forwarded as is to the underlying Dynamics.

Returns

  • observation – The observation of extracted from the current state. Typically used to take the next action.

  • action_set – An optional subset of accepted action in the next transition. For some environment, this may change at every transition.

  • reward – A real number to use for reinforcement learning.

  • done – A boolean flag indicating wether the current state is terminal. If this is true, the episode is finished, and this method cannot be called until reset() has been called.

  • info – A collection of environment specific information about the transition. This is not necessary for the control problem, but is useful to gain insights about the environment.

Protocol

class ecole.typing.Dynamics(*args, **kwds)[source]

Dynamics are raw environments.

The class is a bare ecole.environment.EnvironmentComposer without rewards, observations, and other utlilities. It defines the state transitions of a Markov Decision Process, that is the series of steps and possible actions of the environment.

reset_dynamics(model: ecole.scip.Model) → Tuple[bool, ActionSet][source]

Start a new episode.

This method brings the environment to a new initial state, i.e. starts a new episode. The method can be called at any point in time.

Parameters

model – The SCIP model that will be used through the episode.

Returns

  • done – A boolean flag indicating wether the current state is terminal. If this is true, the episode is finished, and step_dynamics() cannot be called.

  • action_set – An optional subset of accepted action in the next transition. For some environment, this may change at every transition.

set_dynamics_random_state(model: ecole.scip.Model, random_engine: ecole.RandomEngine) → None[source]

Set the random state of the episode.

This method is called by reset() to set all the random elements of the dynamics for the upcoming episode. The random engine is kept between episodes in order to sample different episodes.

Parameters
  • model – The SCIP model that will be used through the episode.

  • random_engine – The random engine used by the environment from which random numbers can be extracted.

step_dynamics(model: ecole.scip.Model, action: Action) → Tuple[bool, ActionSet][source]

Transition from one state to another.

This method takes the user action to transition from the current state to the next. The method cannot be called if the dynamics has not been reset since its instantiation or is in a terminal state.

Parameters

action – The action to take in as part of the Markov Decision Process. If an action set has been given in the latest call (inluding calls to reset_dynamics()), then the action must be in that set.

Returns

  • done – A boolean flag indicating wether the current state is terminal. If this is true, the episode is finished, and this method cannot be called until reset_dynamics() has been called.

  • action_set – An optional subset of accepted action in the next transition. For some environment, this may change at every transition.

Listing

Branching

class ecole.environment.Branching(observation_function='default', reward_function='default', scip_params=None, **dynamics_kwargs)[source]
class ecole.environment.BranchingDynamics
reset_dynamics(self: ecole.environment.BranchingDynamics, model: ecole.scip.Model) → Tuple[bool, Optional[xt::xtensor]]
set_dynamics_random_state(self: ecole.environment.BranchingDynamics, model: ecole.scip.Model, random_engine: ecole.RandomEngine) → None
step_dynamics(self: ecole.environment.BranchingDynamics, model: ecole.scip.Model, action: int) → Tuple[bool, Optional[xt::xtensor]]

Configuring

class ecole.environment.Configuring(observation_function='default', reward_function='default', scip_params=None, **dynamics_kwargs)[source]
class ecole.environment.ConfiguringDynamics
reset_dynamics(self: ecole.environment.ConfiguringDynamics, model: ecole.scip.Model) → Tuple[bool, None]
set_dynamics_random_state(self: ecole.environment.ConfiguringDynamics, model: ecole.scip.Model, random_engine: ecole.RandomEngine) → None
step_dynamics(self: ecole.environment.ConfiguringDynamics, model: ecole.scip.Model, action: Dict[str, Union[bool, int, float, str]]) → Tuple[bool, None]