In Gymnasium v1.0, significant changes were made to improve the VectorEnv
implementation. One of these changes is how sub-environments are reset on termination (or truncation), referred to as the Autoreset Mode or API. Added in Gymnasium v1.1, Gymnasium’s built-in Vector environments and wrappers support for all autoreset modes / APIs, though the default with make_vec
is next-step. In this blog post, we explain the differences in the possible modes and how to use them with example training code for each.
Vector environments allow multiple sub-environments to run in parallel, improving training efficiency in Reinforcement Learning through sampling multiple episodes at the same time. What should the vector environment do when one or multiple sub-environments reset as it will need to be reset before future actions can be taken? There are three options referred to as Next-Step, Same-Step and Disabled mode, visualised in the figure below.
Gymnasium’s built-in vector environment implementations, SyncVectorEnv
and AsyncVectorEnv
support all three modes using the autoreset_mode
argument expecting a gym.vector.AutoresetMode
, for example, SyncVectorEnv(..., autoreset_mode=gym.vector.AutoresetMode.NEXT_STEP)
. Further, most of Gymnasium’s vector wrappers support all modes, however, for external projects, there is no guarantee what autoreset mode will be supported by either the vector environments, wrapper implementations or training algorithms. To help users know what autoreset mode is being used, VectorEnv.metadata["autoreset_mode"]
should be specified and that developers can specify in their documentation what autoreset modes are supported.
For Gymnasium, some of the vector wrappers only support particular autoreset modes.
Vector Wrapper name | Next step | Same Step | Disabled |
---|---|---|---|
VectorObservationWrapper |
✔ | ✖ | ✔ |
TransformObservation |
✔ | ✖ | ✔ |
NormalizeObservation |
✔ | ✖ | ✖ |
VectorizeTransformObservation * |
✔ | ✔ | ✔ |
RecordEpisodeStatistics |
✔ | ✔ | ✔ |
* all inherited wrappers from VectorizeTransformObservation
are compatible (FilterObservation
, FlattenObservation
, GrayscaleObservation
, ResizeObservation
, ReshapeObservation
, DtypeObservation
).
If a sub-environments terminates, in the next step call, it is reset. Gymnasium’s Async and Sync Vector environments default to this mode. Implementing training algorithms using Next-step mode, beware of episode boundaries in training, either through not adding the relevant data to the replay buffer or through masking out the relevant errors in rollout buffers.
If a sub-environments terminated, in the same step call, it is reset, beware that some vector wrappers do not support this mode and the step’s observation can be the reset’s observation with the terminated observation being stored in info["final_obs"]
. This makes it is a simplistic approach for training algorithms if value errors with truncation are skipped. See this, for details.
No automatic resetting occurs and users need to manually reset the sub-environment through a mask, env.reset(mask=np.array([True, False, ...], dtype=bool))
. The easier way of generating this mask is np.logical_or(terminations, truncations)
. This makes training code closer to single vector training code, however, can be slower is some cases due to running another function.
The autoreset mode have a significant impact on the implementation of RL training algorithms for sampling from environments and its not possible to convert between different modes. Gymnasium v1.1 now supports all three autoreset implementations with most of the wrappers supporting all of them providing more options to developers and greater backward compatibility to Gymnasium v0 vectorised training algorithms.
If there are missing details or questions please raise them on the Farama Discord or GitHub.