reinforcement learning from David Silver. In autonomous vehicle training, more and more practices dependent on simulator training e.g. carla RF
an open source architecture used often is OpenAI GYM
each agent has state, observation, action, which can be defined in state_space, observation_space, action_space.
background
1) infinite or finite :
in real case, state, observation space are easily to be infinite, e.g. in every frame, the agent has a different environment,
which means a different state. the elements in action_space may be limited, e.g. turn left, turn right, accelerate, brake. etc.
2) continuous or discrete :
when driving a car, the control is continuous, namely, a continuous function a time t
. on the other hand, like play GO game,
the action is not time related, each action is discrete to another.
3) locally or globally perception
for a grid world game, the agent knows the whole world; but when a car running in a city, it can only percept the environment locally.
this has a big difference when make decision
the following talks about on-line policy RL, which means the agent will follow some policy(high-level). e.g the velocity limit, no collision with another car etc to training agent know how to drive, often need define environment reward, used to update the policy(detail-level), e.g. when the agent in stateA
, policyI
will be used. but keep in mind, neither the human nor the agent know exactly at first, which is the best(detail-level) value for the chosen policy, and the input (state, policy/action)
is not even accurate so the learning process is actually to iterate in each episode to update the table (state, action)
, in which the policy is updated hidden, and which is usually called Q-value. after training, the (state, action)
is better to handle the simulation environement.
so the high-level computing flow:
|
|
real case
while loop
is basically update policy, so next time in a certain state, the agent will performance better. but there is a problem, most real problem don’t have a Q-table, since the state space is infinite.
1) linear space approximator
the real state space is high-dimension even infinite-dimension, mathmatically it’s OK to use a linear space to be as close as possible to the real space (Hilbert space). e.g. in elastic mechanics, the finite element space can approximate any accuracy to the continuous elastic space.
so the idea is to construct the linear space base vector
based on existing sample state, then all state can be descriped as a linear combination of the base vector.
2) neuro-network aproximator
for nonlinear fitted application, to construct a neuro-network to describe the state space. how to train the network and how to make sure it is robost and convergence is another topic