Note here that the sequence of actions precomputed by the agent.
Subsequent actions are returned off of the queue, without recomputation.
Only when the queue of actions is exhausted does the S-P-S-A compute new moves.
To think about: what does this mean in terms of unexpected results of actions and noisy environments?