gymbag.core¶

Simple, efficient OpenAI Gym environment recording and playback.

Functions

deserialize_space(…)

Return:	a `gym.Space` reconstituted from bytes serialized by `serialize_space()`.

drive_env((env: gym.core.Env, …) Run an env and return an iterable of episodes, each an iterable of step data.

generate_monotonic(…)

Return:	monotonically increasing environment data for testing.

generate_random(…)

Return:	environment data generated from the given functions.

get_actions(…)

Return:	an iterable of episode data containing just the valid actions (starting from the 2nd step) from episodes.

getshape((obj: typing.Union[gym.core.Space, …)

Return:	the shape of a `gym.Space` or `np.ndarray`, or `()` for a scalar.

null_sample((space: gym.core.Space) -> TObs)

Return:	NaN or zero or whatever is appropriate as a null (invalid) sample for a given space.

record_from_iter(…) Read from an iterable of episode data and record to a recorder.

run_episode((env: gym.core.Env, …) Run an env for a single episode and return an iterable of step data.

serialize_space((space: gym.core.Space) -> bytes) Serialize a description of a space to JSON so recorders can store metadata.

set_done((data: typing.Tuple[TObs, …)

Return:	a new data tuple with the done field set to True.

Classes

`ListRecorder`(() -> None)	Record environment data to a list in memory.
`PlaybackAgent`(…)	An agent that plays back recorded actions, ignoring observations.
`PlaybackEnv`(…)	A gym environment that plays back a sequence of recorded observations, ignoring actions.
`RandomAgent`((action_space: gym.core.Space, …)	An agent that samples randomly from its action space, ignoring observations.
`RandomEnv`(…)	An environment that generates random observations and rewards for testing.
`Reader`((args, *kwargs) -> None)	Abstract base class to iterate over recorded gym data as well as provide metadata like spaces and size.
`RecordEnv`((env: gym.core.Env, …)	Wrap a gym environment and pass its data to a `Recorder` to record a specific format.
`Recorder`	Abstract base class for Recorders.

gymbag.core.TObs = ~TObs¶: Type of observations (discrete, box)

gymbag.core.TAct = ~TAct¶: Type of actions (discrete, box)

gymbag.core.TStep¶

Type returned by step(): obs, reward, done, info

alias of Tuple

gymbag.core.TData¶

Type of data data recorded at each step: time, obs, action, reward, done, info

alias of Tuple

class gymbag.core.Recorder[source]¶

Abstract base class for Recorders. Subclass to record to a particular format.

In order to delineate episodes on read, it is important that the last step in an episode have done = True. This may necessitate buffering a row if it cannot be modified after written.

on_reset(unix_time: float, observation: TObs, action: TAct, reward: float = nan, done: bool = False, info: typing.Union[typing.Dict[typing.Any, typing.Any], NoneType] = None) → None[source]¶

Default implementation calls on_step().

Can ignore action, reward, done, info; they provide default values for fixed-column implementations that need to write something.

on_step(unix_time: float, observation: TObs, action: TAct, reward: float, done: bool, info: typing.Union[typing.Dict[typing.Any, typing.Any], NoneType]) → None[source]¶: Record step data. By default this is also called on reset with the first observation and NaN action and reward.

on_close() → None[source]¶: Close this Recorder.

class gymbag.core.RecordEnv(env: gym.core.Env, recorder: gymbag.core.Recorder[TObs, TAct]) → None[source]¶

Wrap a gym environment and pass its data to a Recorder to record a specific format.

Recorder callbacks are called after the corresponding environment call. Reset and step times are unix time UTC float epoch seconds after the corresponding environment call.

class gymbag.core.ListRecorder → None[source]¶

Record environment data to a list in memory.

The list can be accessed with the .data attribute.

on_reset(unix_time: float, observation: TObs, action: TAct, reward: float = nan, done: bool = False, info: typing.Union[typing.Dict[typing.Any, typing.Any], NoneType] = None) → None[source]¶: Start a new episode.

on_step(unix_time: float, observation: TObs, action: TAct, reward: float, done: bool, info: typing.Union[typing.Dict[typing.Any, typing.Any], NoneType]) → None[source]¶: Record step data.

data¶: The list of recorded episodes.

class gymbag.core.Reader(*args, **kwargs) → None[source]¶

Abstract base class to iterate over recorded gym data as well as provide metadata like spaces and size.

observation_space = None¶: The observation space

action_space = None¶: The action space

nsteps = None¶: The total number of steps (in all episodes) in this Reader

close() → None[source]¶: Close this Reader.

class gymbag.core.PlaybackEnv(source: typing.Union[gymbag.core.Reader, typing.Iterable[typing.Iterable[typing.Tuple[float, ~TObs, ~TAct, float, bool, typing.Union[typing.Dict[typing.Any, typing.Any], NoneType]]]]], observation_space: typing.Union[gym.core.Space, NoneType] = None, action_space: typing.Union[gym.core.Space, NoneType] = None) → None[source]¶

A gym environment that plays back a sequence of recorded observations, ignoring actions.

End of playback can be detected by checking for env.played_out == True after each “done” step but before the next reset.

Parameters:

Parameters:	source – `Reader` or `Iterable` over episodes, each of which is an iterable over steps, each of which is a tuple of `(time, obs, action, reward, done, info)`. The action and reward from the first item in each episode are discarded. Note `reset()` returns the observation from the fist item in each episode iterable, while the first `step()` returns the reward, done, and info from the second. observation_space – If given, override the `observation_space` attribute of the source. action_space – If given, override the `action_space` attribute of the source.

source – Reader or Iterable over episodes, each of which is an iterable over steps, each of which is a tuple of (time, obs, action, reward, done, info). The action and reward from the first item in each episode are discarded. Note reset() returns the observation from the fist item in each episode iterable, while the first step() returns the reward, done, and info from the second.
observation_space – If given, override the observation_space attribute of the source.
action_space – If given, override the action_space attribute of the source.

metadata = {'render.modes': ['human']}¶

played_out = None¶: True when input iterable has been exhausted (no more data)

class gymbag.core.RandomEnv(observation_space: gym.core.Space, action_space: typing.Union[gym.core.Space, NoneType] = None, episode_steps: int = 10) → None[source]¶

An environment that generates random observations and rewards for testing.

Rewards are uniform in the range [0, 1].

Parameters:	action_space – Not used by this environment, but can be set for agents. episode_steps – Run each episode this number of steps.

spec¶

class gymbag.core.RandomAgent(action_space: gym.core.Space, seed: typing.Union[int, NoneType] = None) → None[source]¶

An agent that samples randomly from its action space, ignoring observations.

act(observation: typing.Union[~TObs, NoneType] = None) → TAct[source]¶

Return:	a random action sampled from the action space, ignoring observation.

class gymbag.core.PlaybackAgent(source: typing.Union[gymbag.core.Reader, typing.Iterable[typing.Iterable[typing.Tuple[float, ~TObs, ~TAct, float, bool, typing.Union[typing.Dict[typing.Any, typing.Any], NoneType]]]]]) → None[source]¶

An agent that plays back recorded actions, ignoring observations.

Call next_episode() at the start of each episode, then act() until done. next_episode() returns False when there are no more episodes.

next_episode() → bool[source]¶: Advance to the next episode, return False if there are no more episodes.

act(observation: typing.Union[~TObs, NoneType] = None) → TAct[source]¶

Return:	the next recorded action, ignoring observation.

done¶: True when there is no more data for the current episode.

gymbag.core.serialize_space(space: gym.core.Space) → bytes[source]¶: Serialize a description of a space to JSON so recorders can store metadata.

gymbag.core.deserialize_space(data: bytes) → gym.core.Space[source]¶

Return:	a `gym.Space` reconstituted from bytes serialized by `serialize_space()`.

gymbag.core.record_from_iter(recorder: gymbag.core.Recorder[TObs, TAct], data: typing.Iterable[typing.Iterable[typing.Tuple[TObs, TAct]]]) → None[source]¶: Read from an iterable of episode data and record to a recorder. Can be used to convert formats or test Recorders.

gymbag.core.generate_random(sample_observation: typing.Callable[TObs], sample_action: typing.Callable[TObs], sample_reward: typing.Callable[float] = <built-in method rand of mtrand.RandomState object>, episodes: int = 10, steps_per_episode: typing.Callable[int] = <function <lambda>>) → typing.Iterable[typing.Iterable[typing.Tuple[TObs, TAct]]][source]¶

Return:	environment data generated from the given functions.

Note you must evaluate every step before the next or the iterator will be invalid. For example: don’t store all episodes in a list; store all steps in a list of lists. Don’t buffer or skip one episode and evaluate the next; iterate over all steps of each episode in order. sample_observation sample_action sample_reward are called in that order. Returns 1 more item than steps_per_episode to account for the initial reset.

gymbag.core.generate_monotonic(episodes: int = 10) → typing.Iterable[typing.Iterable[typing.Tuple[numpy.ndarray, int]]][source]¶

Return:	monotonically increasing environment data for testing.

If episode is the zero-based episode number, and step is the zero-based step number within an episode: Observation is a pair of (episode, step); action is step; reward is step * 10, first reward is NaN. Episode i runs for i + 1 steps, couting the reset as the first step. The observation space is Box(0, episodes, shape=(2,)) The action space can be treated as either Discrete(episodes) or Box(0, episodes, shape=()).

gymbag.core.run_episode(env: gym.core.Env, actions: typing.Union[typing.Iterable[~TAct], NoneType] = None, max_steps: typing.Union[int, NoneType] = None) → typing.Union[typing.Iterable[typing.Tuple[float, ~TObs, ~TAct, float, bool, typing.Union[typing.Dict[typing.Any, typing.Any], NoneType]]], NoneType][source]¶

Run an env for a single episode and return an iterable of step data.

Note this may not mark done = True on the last step if the environment did not return done. The first step data returned has “null” action and reward (it should be ignored). If actions is exhausted, will return None.

Parameters:	actions – Step the environment with these actions and stop when exhausted. If None, sample actions randomly from the action space indefinitely. max_steps – If given, run at most this many steps per episode.

gymbag.core.drive_env(env: gym.core.Env, actions: typing.Union[typing.Iterable[~TAct], NoneType] = None, episodes: typing.Union[int, NoneType] = None, steps_per_episode: typing.Union[int, NoneType] = None) → typing.Iterable[typing.Iterable[typing.Tuple[TObs, TAct]]][source]¶

Run an env and return an iterable of episodes, each an iterable of step data.

Note this may not mark done = True on the last step of each episode if the environment did not return done.

Parameters:	actions – Step the environment with these actions and stop when exhausted. If None, sample actions randomly from the action space indefinitely. episodes – If given, run at most this many episodes. steps_per_episode – If given, run at most this many steps per episode.

gymbag.core.getshape(obj: typing.Union[gym.core.Space, numpy.ndarray, float]) → typing.Tuple[int, ...][source]¶

Return:	the shape of a `gym.Space` or `np.ndarray`, or `()` for a scalar.

gymbag.core.null_sample(space: gym.core.Space) → TObs[source]¶

Return:	NaN or zero or whatever is appropriate as a null (invalid) sample for a given space.

gymbag.core.get_actions(episodes: typing.Iterable[typing.Iterable[typing.Tuple[TObs, TAct]]]) → typing.Iterable[typing.Iterable[TAct]][source]¶

Return:	an iterable of episode data containing just the valid actions (starting from the 2nd step) from episodes.

gymbag.core.set_done(data: typing.Tuple[TObs, TAct]) → typing.Tuple[TObs, TAct][source]¶

Return:	a new data tuple with the done field set to True.