Environments

Interface

class ecole.environment.Environment(observation_function='default', reward_function='default', information_function='default', scip_params=None, **dynamics_kwargs)[source]

Ecole Partially Observable Markov Decision Process (POMDP).

Similar to OpenAI Gym, environments represent the environment that an agent is supposed to solved. For maximum customizability, different components are composed/orchestrated in this class.

__init__(observation_function='default', reward_function='default', information_function='default', scip_params=None, **dynamics_kwargs)None[source]

Create a new environment object.

Parameters
  • observation_function – An object of type ObservationFunction used to custotize what observation are returned in reset() and step().

  • reward_function – An object of type RewardFunction used to customize what reward are returned in reset() and step().

  • information_function – An object of type InformationFunction used to customize what additional information are returned in reset() and step().

  • scip_params – Parameters set on the underlying Model on every episode.

  • **dynamics_kwargs – Other arguments are passed to the constructor of the Dynamics.

reset(instance, *dynamics_args, **dynamics_kwargs)[source]

Start a new episode.

This method brings the environment to a new initial state, i.e. starts a new episode. The method can be called at any point in time.

Parameters
  • instance – The combinatorial optimization problem to tackle during the newly startedre episode. Either a file path to an instance that can be read by SCIP, or a Model whose problem definition data will be copied.

  • dynamics_args – Extra arguments are forwarded as is to the underlying Dynamics.

  • dynamics_kwargs – Extra arguments are forwarded as is to the underlying Dynamics.

Returns

  • observation – The observation of extracted from the initial state. Typically used to take the next action.

  • action_set – An optional subset of accepted action in the next transition. For some environment, this may change at every transition.

  • reward_offset – An offset on the initial state. This reward is not used for learning (as no action has yet been taken) but is used in evaluation for the sum of rewards when one needs to account for computations that happened during reset() (e.g. computation time, number of LP iteration in presolving…).

  • done – A boolean flag indicating wether the current state is terminal. If this is true, the episode is finished, and step() cannot be called.

  • info – A collection of environment specific information about the transition. This is not necessary for the control problem, but is useful to gain insights about the environment.

seed(value: int)None[source]

Set the random seed of the environment.

The the random seed is used to seed the environment RandomEngine. At every call to reset(), the random engine is used to create new seeds for the solver. Setting the seed once will ensure determinism for the next trajectories. By default, the random engine is initialized by the random module.

step(action, *dynamics_args, **dynamics_kwargs)[source]

Transition from one state to another.

This method takes a user action to transition from the current state to the next. The method cannot be called if the environment has not been reset since its instantiation or since a terminal state.

Parameters
  • action – The action to take in as part of the Markov Decision Process. If an action set has been given in the latest call (inluding calls to reset()), then the action must be in that set.

  • dynamics_args – Extra arguments are forwarded as is to the underlying Dynamics.

  • dynamics_kwargs – Extra arguments are forwarded as is to the underlying Dynamics.

Returns

  • observation – The observation of extracted from the current state. Typically used to take the next action.

  • action_set – An optional subset of accepted action in the next transition. For some environment, this may change at every transition.

  • reward – A real number to use for reinforcement learning.

  • done – A boolean flag indicating wether the current state is terminal. If this is true, the episode is finished, and this method cannot be called until reset() has been called.

  • info – A collection of environment specific information about the transition. This is not necessary for the control problem, but is useful to gain insights about the environment.

Protocol

class ecole.typing.Dynamics(*args, **kwargs)[source]

Dynamics are raw environments.

The class is a bare ecole.environment.Environment without rewards, observations, and other utlilities. It defines the state transitions of a Markov Decision Process, that is the series of steps and possible actions of the environment.

__init__(*args, **kwargs)

Initialize self. See help(type(self)) for accurate signature.

reset_dynamics(model: ecole.scip.Model)Tuple[bool, ecole.typing.ActionSet][source]

Start a new episode.

This method brings the environment to a new initial state, i.e. starts a new episode. The method can be called at any point in time.

Parameters

model – The SCIP model that will be used through the episode.

Returns

  • done – A boolean flag indicating wether the current state is terminal. If this is true, the episode is finished, and step_dynamics() cannot be called.

  • action_set – An optional subset of accepted action in the next transition. For some environment, this may change at every transition.

set_dynamics_random_state(model: ecole.scip.Model, random_engine: ecole.RandomEngine)None[source]

Set the random state of the episode.

This method is called by reset() to set all the random elements of the dynamics for the upcoming episode. The random engine is kept between episodes in order to sample different episodes.

Parameters
  • model – The SCIP model that will be used through the episode.

  • random_engine – The random engine used by the environment from which random numbers can be extracted.

step_dynamics(model: ecole.scip.Model, action: ecole.typing.Action)Tuple[bool, ecole.typing.ActionSet][source]

Transition from one state to another.

This method takes the user action to transition from the current state to the next. The method cannot be called if the dynamics has not been reset since its instantiation or is in a terminal state.

Parameters

action – The action to take in as part of the Markov Decision Process. If an action set has been given in the latest call (inluding calls to reset_dynamics()), then the action must be in that set.

Returns

  • done – A boolean flag indicating wether the current state is terminal. If this is true, the episode is finished, and this method cannot be called until reset_dynamics() has been called.

  • action_set – An optional subset of accepted action in the next transition. For some environment, this may change at every transition.

Listing

Branching

class ecole.environment.Branching(observation_function='default', reward_function='default', information_function='default', scip_params=None, **dynamics_kwargs)[source]
class ecole.dynamics.BranchingDynamics
__init__(self: ecole.dynamics.BranchingDynamics, pseudo_candidates: bool = False)None
reset_dynamics(self: ecole.dynamics.BranchingDynamics, model: ecole.scip.Model)Tuple[bool, Optional[xt::xtensor]]
set_dynamics_random_state(self: ecole.dynamics.BranchingDynamics, model: ecole.scip.Model, random_engine: ecole.RandomEngine)None
step_dynamics(self: ecole.dynamics.BranchingDynamics, model: ecole.scip.Model, action: int)Tuple[bool, Optional[xt::xtensor]]

Configuring

class ecole.environment.Configuring(observation_function='default', reward_function='default', information_function='default', scip_params=None, **dynamics_kwargs)[source]
class ecole.dynamics.ConfiguringDynamics
__init__(self: ecole.dynamics.ConfiguringDynamics)None
reset_dynamics(self: ecole.dynamics.ConfiguringDynamics, model: ecole.scip.Model)Tuple[bool, None]
set_dynamics_random_state(self: ecole.dynamics.ConfiguringDynamics, model: ecole.scip.Model, random_engine: ecole.RandomEngine)None
step_dynamics(self: ecole.dynamics.ConfiguringDynamics, model: ecole.scip.Model, action: Dict[str, Union[bool, int, float, str]])Tuple[bool, None]