# Use Reward Functions

Similarily to observation functions the reward received by
the user for learning can be customized by changing the `RewardFunction`

used by the
solver.
In fact, the mechanism of reward functions is very similar to that of observation
functions: environments do not compute the reward directly but delegate that
responsibility to a `RewardFunction`

object.
The object has complete access to the solver and extracts the data it needs.

Specifying a reward function is performed by passing the `RewardFunction`

object to
the `reward_function`

environment parameter.
For instance, specifying a reward function with the `Configuring`

environment
looks as follows:

```
>>> env = ecole.environment.Configuring(reward_function=ecole.reward.LpIterations())
>>> env.reward_function
ecole.reward.LpIterations()
>>> env.reset("path/to/problem")
(..., ..., 0.0, ..., ...)
>>> env.step({})
(..., ..., 45.0, ..., ...)
```

Environments also have a default reward function, which will be used if the user does not specify any.

```
>>> env = ecole.environment.Configuring()
>>> env.reward_function
ecole.reward.IsDone()
```

See the reference for the list of available reward functions, as well as the documention for explanations on how to create one.

## Arithmetic on Reward Functions

Reinforcement learning in combinatorial optimization solving is an active area of research, and there is at this point little consensus on reward functions to use. In recognition of that fact, reward functions have been explicitely designed in Ecole to be easily combined with Python arithmetic.

For instance, one might want to minimize the number of LP iterations used throughout the solving process.
To achieve this using a standard reinforcement learning algorithm, one would might use the negative
number of LP iterations between two steps as a reward: this can be achieved by negating the
`LpIterations`

function.

```
>>> env = ecole.environment.Configuring(reward_function=-ecole.reward.LpIterations())
>>> env.reset("path/to/problem")
(..., ..., -0.0, ..., ...)
>>> env.step({})
(..., ..., -45.0, ..., ...)
```

More generally, any operation, such as

```
from ecole.reward import LpIterations
-3.5 * LpIterations() ** 2.1 + 4.4
```

is valid.

Note that this is a full reward *function* object that can be given to an environment:
it is equivalent to doing the following.

```
>>> env = ecole.environment.Configuring(reward_function=ecole.reward.LpIterations())
>>> env.reset("path/to/problem")
(..., ..., ..., ..., ...)
>>> _, _, lp_iter_reward, _, _ = env.step({})
>>> reward = -3.5 * lp_iter_reward ** 2.1 + 4.4
```

Arithmetic operations are even allowed between different reward functions,

```
from ecole.reward import LpIterations, IsDone
4.0 * LpIterations() ** 2 - 3 * IsDone()
```

which is especially powerful because in this normally it would *not* be possible to pass both
`LpIterations`

and `IsDone`

to the
environment.

All operations that are valid between scalars are valid between reward functions.

```
-IsDone() ** abs(LpIterations() // 4)
```

In addition, not all commonly used mathematical operations have a dedicated Python operator: to
accomodate this, Ecole implements a number of other operations as methods of reward functions.
For instance, to get the exponential of `LpIterations`

, one can use

```
LpIterations().exp()
```

This also works with rewards functions created from arithmetic expressions.

```
(3 - 2 * LpIterations()).exp()
```

Finally, reward functions have an `apply`

method to compose rewards with any
function.

```
LpIterations().apply(lambda reward: math.factorial(round(reward)))
```