Deep Q Learning¶
DQNAgent Class¶
-
class
pykitml.
DQNAgent
(layer_sizes, mem_size=10000)¶ This class implements Double Deep Q Learning for RL. Use a Neural Network with huber loss for predicting Q values.
-
__init__
(layer_sizes, mem_size=10000)¶ Parameters: - layer_sizes (list) – A list of integers describing the number of layers and the number of neurons in each
layer. For e.g.
[784, 100, 100, 10]
describes a network with one input layer having 784 neurons, two hidden LSTM layers having 100 neurons each and a dense output layer with 10 neurons. - mem_size (int) – The size for storing experience replay.
- layer_sizes (list) – A list of integers describing the number of layers and the number of neurons in each
layer. For e.g.
-
train
(env, nepisodes, optimizer, batch_size=64, render=False, update_freq=1, explr_rate=1, explr_min=0.01, explr_decay=0.99, disc_factor=0.95)¶ Trains the agent on the given environment using deep Q learning.
Parameters: - env (obj) – Object that represents the environment. See Environment Class
- nepisodes (int) – Number of episodes to train for.
- optimizer (any Optimizer object) – See Optimizers
- batch_size (int) – How many samples from replay experience to train on on each step.
- render (bool) – If set to true, will call the
render()
method inenv
object. - update_freq (int) – How often to update the target model in episodes.
- explr_rate (float) – Initial exploration rate. Higher the exploration rate, agent will take more random actions.
- explr_min (float) – Minimum exploration rate.
- explr_decay (float) – Multiplication factor for reducing exploration rate after each episode.
- disc_factor (float) – Discount factor, value of future rewards.
-
exploit
(env, render=False)¶ Exploit the trained model to make decision. No training occurs. Use to demo the trained agent.
Parameters: - env (obj) – Object that represents the environment. See Environment Class
- render (bool) – If set to true, will call the
render()
method inenv
-
Environment Class¶
-
class
pykitml.
Environment
¶ -
reset
()¶ Resets the environment and returns initial state.
Returns: initial_state – The initial state of the environment as a numpy array. Return type: np.array
-
step
(action)¶ Performas given action, modifies current state to next state, returns next state, reward and a bool to tell weather the episode has terminated.
Returns: - next_state (np.array) – The next state of the environment as a numpy array.
- reward (int) – An integer telling the agent how well it performed.
- done (bool) – Flag to tell weather the environment has reached a terminal state.
-
close
()¶ Called after training is completely, properly closes/exists the environment.
-
render
()¶ Method to render the environment, show a visual representation of the environment.
-
Example : Cartpole using OpenAI Gym¶
import numpy as np
import gym
import pykitml as pk
# Wrapper class around the environment
class Environment:
def __init__(self):
self._env = gym.make('CartPole-v1')
def reset(self):
return self._env.reset()
def step(self, action):
obs, reward, done, _ = self._env.step(action)
# Reward function, from
# https://github.com/keon/deep-q-learning/blob/master/ddqn.py
x, _, theta, _ = obs
r1 = (self._env.x_threshold - abs(x)) / self._env.x_threshold - 0.8
r2 = (self._env.theta_threshold_radians - abs(theta)) / self._env.theta_threshold_radians - 0.5
reward = r1 + r2
return np.array(obs), reward, done
def close(self):
self._env.close()
def render(self):
self._env.render()
env = Environment()
# Create DQN agent and train it
agent = pk.DQNAgent([4, 64, 64, 2])
agent.set_save_freq(100, 'cartpole_agent')
agent.train(env, 500, pk.Adam(0.001), render=True)
# Plot reward graph
agent.plot_performance()