gymbag.core¶
Simple, efficient OpenAI Gym environment recording and playback.
Functions
deserialize_space (…) |
|
||
drive_env ((env: gym.core.Env, …) |
Run an env and return an iterable of episodes, each an iterable of step data. | ||
generate_monotonic (…) |
|
||
generate_random (…) |
|
||
get_actions (…) |
|
||
getshape ((obj: typing.Union[gym.core.Space, …) |
|
||
null_sample ((space: gym.core.Space) -> TObs) |
|
||
record_from_iter (…) |
Read from an iterable of episode data and record to a recorder. | ||
run_episode ((env: gym.core.Env, …) |
Run an env for a single episode and return an iterable of step data. | ||
serialize_space ((space: gym.core.Space) -> bytes) |
Serialize a description of a space to JSON so recorders can store metadata. | ||
set_done ((data: typing.Tuple[TObs, …) |
|
Classes
ListRecorder (() -> None) |
Record environment data to a list in memory. |
PlaybackAgent (…) |
An agent that plays back recorded actions, ignoring observations. |
PlaybackEnv (…) |
A gym environment that plays back a sequence of recorded observations, ignoring actions. |
RandomAgent ((action_space: gym.core.Space, …) |
An agent that samples randomly from its action space, ignoring observations. |
RandomEnv (…) |
An environment that generates random observations and rewards for testing. |
Reader ((*args, **kwargs) -> None) |
Abstract base class to iterate over recorded gym data as well as provide metadata like spaces and size. |
RecordEnv ((env: gym.core.Env, …) |
Wrap a gym environment and pass its data to a Recorder to record a specific format. |
Recorder |
Abstract base class for Recorders. |
-
gymbag.core.
TObs
= ~TObs¶ Type of observations (discrete, box)
-
gymbag.core.
TAct
= ~TAct¶ Type of actions (discrete, box)
-
gymbag.core.
TStep
¶ Type returned by
step()
: obs, reward, done, infoalias of
Tuple
-
gymbag.core.
TData
¶ Type of data data recorded at each step: time, obs, action, reward, done, info
alias of
Tuple
-
class
gymbag.core.
Recorder
[source]¶ Abstract base class for Recorders. Subclass to record to a particular format.
In order to delineate episodes on read, it is important that the last step in an episode have
done = True
. This may necessitate buffering a row if it cannot be modified after written.-
on_reset
(unix_time: float, observation: TObs, action: TAct, reward: float = nan, done: bool = False, info: typing.Union[typing.Dict[typing.Any, typing.Any], NoneType] = None) → None[source]¶ Default implementation calls
on_step()
.Can ignore action, reward, done, info; they provide default values for fixed-column implementations that need to write something.
-
-
class
gymbag.core.
RecordEnv
(env: gym.core.Env, recorder: gymbag.core.Recorder[TObs, TAct]) → None[source]¶ Wrap a gym environment and pass its data to a
Recorder
to record a specific format.Recorder callbacks are called after the corresponding environment call. Reset and step times are unix time UTC float epoch seconds after the corresponding environment call.
-
class
gymbag.core.
ListRecorder
→ None[source]¶ Record environment data to a list in memory.
The list can be accessed with the
.data
attribute.-
on_reset
(unix_time: float, observation: TObs, action: TAct, reward: float = nan, done: bool = False, info: typing.Union[typing.Dict[typing.Any, typing.Any], NoneType] = None) → None[source]¶ Start a new episode.
-
on_step
(unix_time: float, observation: TObs, action: TAct, reward: float, done: bool, info: typing.Union[typing.Dict[typing.Any, typing.Any], NoneType]) → None[source]¶ Record step data.
-
data
¶ The list of recorded episodes.
-
-
class
gymbag.core.
Reader
(*args, **kwargs) → None[source]¶ Abstract base class to iterate over recorded gym data as well as provide metadata like spaces and size.
-
observation_space
= None¶ The observation space
-
action_space
= None¶ The action space
-
nsteps
= None¶ The total number of steps (in all episodes) in this Reader
-
-
class
gymbag.core.
PlaybackEnv
(source: typing.Union[gymbag.core.Reader, typing.Iterable[typing.Iterable[typing.Tuple[float, ~TObs, ~TAct, float, bool, typing.Union[typing.Dict[typing.Any, typing.Any], NoneType]]]]], observation_space: typing.Union[gym.core.Space, NoneType] = None, action_space: typing.Union[gym.core.Space, NoneType] = None) → None[source]¶ A gym environment that plays back a sequence of recorded observations, ignoring actions.
End of playback can be detected by checking for
env.played_out == True
after each “done” step but before the next reset.Parameters: - source –
Reader
orIterable
over episodes, each of which is an iterable over steps, each of which is a tuple of(time, obs, action, reward, done, info)
. The action and reward from the first item in each episode are discarded. Notereset()
returns the observation from the fist item in each episode iterable, while the firststep()
returns the reward, done, and info from the second. - observation_space – If given, override the
observation_space
attribute of the source. - action_space – If given, override the
action_space
attribute of the source.
-
metadata
= {'render.modes': ['human']}¶
-
played_out
= None¶ True when input iterable has been exhausted (no more data)
- source –
-
class
gymbag.core.
RandomEnv
(observation_space: gym.core.Space, action_space: typing.Union[gym.core.Space, NoneType] = None, episode_steps: int = 10) → None[source]¶ An environment that generates random observations and rewards for testing.
Rewards are uniform in the range [0, 1].
Parameters: - action_space – Not used by this environment, but can be set for agents.
- episode_steps – Run each episode this number of steps.
-
spec
¶
-
class
gymbag.core.
RandomAgent
(action_space: gym.core.Space, seed: typing.Union[int, NoneType] = None) → None[source]¶ An agent that samples randomly from its action space, ignoring observations.
-
class
gymbag.core.
PlaybackAgent
(source: typing.Union[gymbag.core.Reader, typing.Iterable[typing.Iterable[typing.Tuple[float, ~TObs, ~TAct, float, bool, typing.Union[typing.Dict[typing.Any, typing.Any], NoneType]]]]]) → None[source]¶ An agent that plays back recorded actions, ignoring observations.
Call
next_episode()
at the start of each episode, thenact()
untildone
.next_episode()
returnsFalse
when there are no more episodes.-
next_episode
() → bool[source]¶ Advance to the next episode, return
False
if there are no more episodes.
-
act
(observation: typing.Union[~TObs, NoneType] = None) → TAct[source]¶ Return: the next recorded action, ignoring observation.
-
done
¶ True when there is no more data for the current episode.
-
-
gymbag.core.
serialize_space
(space: gym.core.Space) → bytes[source]¶ Serialize a description of a space to JSON so recorders can store metadata.
-
gymbag.core.
deserialize_space
(data: bytes) → gym.core.Space[source]¶ Return: a gym.Space
reconstituted from bytes serialized byserialize_space()
.
-
gymbag.core.
record_from_iter
(recorder: gymbag.core.Recorder[TObs, TAct], data: typing.Iterable[typing.Iterable[typing.Tuple[TObs, TAct]]]) → None[source]¶ Read from an iterable of episode data and record to a recorder. Can be used to convert formats or test Recorders.
-
gymbag.core.
generate_random
(sample_observation: typing.Callable[TObs], sample_action: typing.Callable[TObs], sample_reward: typing.Callable[float] = <built-in method rand of mtrand.RandomState object>, episodes: int = 10, steps_per_episode: typing.Callable[int] = <function <lambda>>) → typing.Iterable[typing.Iterable[typing.Tuple[TObs, TAct]]][source]¶ Return: environment data generated from the given functions. Note you must evaluate every step before the next or the iterator will be invalid. For example: don’t store all episodes in a list; store all steps in a list of lists. Don’t buffer or skip one episode and evaluate the next; iterate over all steps of each episode in order. sample_observation sample_action sample_reward are called in that order. Returns 1 more item than steps_per_episode to account for the initial reset.
-
gymbag.core.
generate_monotonic
(episodes: int = 10) → typing.Iterable[typing.Iterable[typing.Tuple[numpy.ndarray, int]]][source]¶ Return: monotonically increasing environment data for testing. If
episode
is the zero-based episode number, andstep
is the zero-based step number within an episode: Observation is a pair of(episode, step)
; action isstep
; reward isstep * 10
, first reward isNaN
. Episode i runs for i + 1 steps, couting the reset as the first step. The observation space isBox(0, episodes, shape=(2,))
The action space can be treated as eitherDiscrete(episodes)
orBox(0, episodes, shape=())
.
-
gymbag.core.
run_episode
(env: gym.core.Env, actions: typing.Union[typing.Iterable[~TAct], NoneType] = None, max_steps: typing.Union[int, NoneType] = None) → typing.Union[typing.Iterable[typing.Tuple[float, ~TObs, ~TAct, float, bool, typing.Union[typing.Dict[typing.Any, typing.Any], NoneType]]], NoneType][source]¶ Run an env for a single episode and return an iterable of step data.
Note this may not mark
done = True
on the last step if the environment did not return done. The first step data returned has “null” action and reward (it should be ignored). If actions is exhausted, will return None.Parameters: - actions – Step the environment with these actions and stop when exhausted. If None, sample actions randomly from the action space indefinitely.
- max_steps – If given, run at most this many steps per episode.
-
gymbag.core.
drive_env
(env: gym.core.Env, actions: typing.Union[typing.Iterable[~TAct], NoneType] = None, episodes: typing.Union[int, NoneType] = None, steps_per_episode: typing.Union[int, NoneType] = None) → typing.Iterable[typing.Iterable[typing.Tuple[TObs, TAct]]][source]¶ Run an env and return an iterable of episodes, each an iterable of step data.
Note this may not mark
done = True
on the last step of each episode if the environment did not return done.Parameters: - actions – Step the environment with these actions and stop when exhausted. If None, sample actions randomly from the action space indefinitely.
- episodes – If given, run at most this many episodes.
- steps_per_episode – If given, run at most this many steps per episode.
-
gymbag.core.
getshape
(obj: typing.Union[gym.core.Space, numpy.ndarray, float]) → typing.Tuple[int, ...][source]¶ Return: the shape of a gym.Space
ornp.ndarray
, or()
for a scalar.
-
gymbag.core.
null_sample
(space: gym.core.Space) → TObs[source]¶ Return: NaN or zero or whatever is appropriate as a null (invalid) sample for a given space.