Environments¶
Interface¶
- class ecole.environment.Environment(observation_function='default', reward_function='default', information_function='default', scip_params=None, **dynamics_kwargs)[source]¶
Ecole Partially Observable Markov Decision Process (POMDP).
Similar to OpenAI Gym, environments represent the environment that an agent is supposed to solved. For maximum customizability, different components are composed/orchestrated in this class.
- __init__(observation_function='default', reward_function='default', information_function='default', scip_params=None, **dynamics_kwargs) → None[source]¶
Create a new environment object.
- Parameters
observation_function – An object of type
ObservationFunction
used to custotize what observation are returned inreset()
andstep()
.reward_function – An object of type
RewardFunction
used to customize what reward are returned inreset()
andstep()
.information_function – An object of type
InformationFunction
used to customize what additional information are returned inreset()
andstep()
.scip_params – Parameters set on the underlying
Model
on every episode.**dynamics_kwargs – Other arguments are passed to the constructor of the
Dynamics
.
- reset(instance, *dynamics_args, **dynamics_kwargs)[source]¶
Start a new episode.
This method brings the environment to a new initial state, i.e. starts a new episode. The method can be called at any point in time.
- Parameters
instance – The combinatorial optimization problem to tackle during the newly startedre episode. Either a file path to an instance that can be read by SCIP, or a Model whose problem definition data will be copied.
dynamics_args – Extra arguments are forwarded as is to the underlying
Dynamics
.dynamics_kwargs – Extra arguments are forwarded as is to the underlying
Dynamics
.
- Returns
observation – The observation of extracted from the initial state. Typically used to take the next action.
action_set – An optional subset of accepted action in the next transition. For some environment, this may change at every transition.
reward_offset – An offset on the initial state. This reward is not used for learning (as no action has yet been taken) but is used in evaluation for the sum of rewards when one needs to account for computations that happened during
reset()
(e.g. computation time, number of LP iteration in presolving…).done – A boolean flag indicating wether the current state is terminal. If this is true, the episode is finished, and
step()
cannot be called.info – A collection of environment specific information about the transition. This is not necessary for the control problem, but is useful to gain insights about the environment.
- seed(value: int) → None[source]¶
Set the random seed of the environment.
The the random seed is used to seed the environment
RandomEngine
. At every call toreset()
, the random engine is used to create new seeds for the solver. Setting the seed once will ensure determinism for the next trajectories. By default, the random engine is initialized by the random module.
- step(action, *dynamics_args, **dynamics_kwargs)[source]¶
Transition from one state to another.
This method takes a user action to transition from the current state to the next. The method cannot be called if the environment has not been reset since its instantiation or since a terminal state.
- Parameters
action – The action to take in as part of the Markov Decision Process. If an action set has been given in the latest call (inluding calls to
reset()
), then the action must be in that set.dynamics_args – Extra arguments are forwarded as is to the underlying
Dynamics
.dynamics_kwargs – Extra arguments are forwarded as is to the underlying
Dynamics
.
- Returns
observation – The observation of extracted from the current state. Typically used to take the next action.
action_set – An optional subset of accepted action in the next transition. For some environment, this may change at every transition.
reward – A real number to use for reinforcement learning.
done – A boolean flag indicating wether the current state is terminal. If this is true, the episode is finished, and this method cannot be called until
reset()
has been called.info – A collection of environment specific information about the transition. This is not necessary for the control problem, but is useful to gain insights about the environment.
Protocol¶
- class ecole.typing.Dynamics(*args, **kwargs)[source]¶
Dynamics are raw environments.
The class is a bare
ecole.environment.Environment
without rewards, observations, and other utlilities. It defines the state transitions of a Markov Decision Process, that is the series of steps and possible actions of the environment.- __init__(*args, **kwargs)¶
Initialize self. See help(type(self)) for accurate signature.
- reset_dynamics(model: ecole.scip.Model) → Tuple[bool, ecole.typing.ActionSet][source]¶
Start a new episode.
This method brings the environment to a new initial state, i.e. starts a new episode. The method can be called at any point in time.
- Parameters
model – The SCIP model that will be used through the episode.
- Returns
done – A boolean flag indicating wether the current state is terminal. If this is true, the episode is finished, and
step_dynamics()
cannot be called.action_set – An optional subset of accepted action in the next transition. For some environment, this may change at every transition.
- set_dynamics_random_state(model: ecole.scip.Model, random_engine: ecole.RandomEngine) → None[source]¶
Set the random state of the episode.
This method is called by
reset()
to set all the random elements of the dynamics for the upcoming episode. The random engine is kept between episodes in order to sample different episodes.- Parameters
model – The SCIP model that will be used through the episode.
random_engine – The random engine used by the environment from which random numbers can be extracted.
- step_dynamics(model: ecole.scip.Model, action: ecole.typing.Action) → Tuple[bool, ecole.typing.ActionSet][source]¶
Transition from one state to another.
This method takes the user action to transition from the current state to the next. The method cannot be called if the dynamics has not been reset since its instantiation or is in a terminal state.
- Parameters
action – The action to take in as part of the Markov Decision Process. If an action set has been given in the latest call (inluding calls to
reset_dynamics()
), then the action must be in that set.- Returns
done – A boolean flag indicating wether the current state is terminal. If this is true, the episode is finished, and this method cannot be called until
reset_dynamics()
has been called.action_set – An optional subset of accepted action in the next transition. For some environment, this may change at every transition.
Listing¶
Branching¶
- class ecole.environment.Branching(observation_function='default', reward_function='default', information_function='default', scip_params=None, **dynamics_kwargs)[source]¶
- class ecole.dynamics.BranchingDynamics¶
- __init__(self: ecole.dynamics.BranchingDynamics, pseudo_candidates: bool = False) → None¶
- reset_dynamics(self: ecole.dynamics.BranchingDynamics, model: ecole.scip.Model) → Tuple[bool, Optional[xt::xtensor]]¶
- set_dynamics_random_state(self: ecole.dynamics.BranchingDynamics, model: ecole.scip.Model, random_engine: ecole.RandomEngine) → None¶
- step_dynamics(self: ecole.dynamics.BranchingDynamics, model: ecole.scip.Model, action: int) → Tuple[bool, Optional[xt::xtensor]]¶
Configuring¶
- class ecole.environment.Configuring(observation_function='default', reward_function='default', information_function='default', scip_params=None, **dynamics_kwargs)[source]¶
- class ecole.dynamics.ConfiguringDynamics¶
- __init__(self: ecole.dynamics.ConfiguringDynamics) → None¶
- reset_dynamics(self: ecole.dynamics.ConfiguringDynamics, model: ecole.scip.Model) → Tuple[bool, None]¶
- set_dynamics_random_state(self: ecole.dynamics.ConfiguringDynamics, model: ecole.scip.Model, random_engine: ecole.RandomEngine) → None¶
- step_dynamics(self: ecole.dynamics.ConfiguringDynamics, model: ecole.scip.Model, action: Dict[str, Union[bool, int, float, str]]) → Tuple[bool, None]¶