Deep Q Learning

DQNAgent Class

class pykitml.DQNAgent(layer_sizes, mem_size=10000)

This class implements Double Deep Q Learning for RL. Use a Neural Network with huber loss for predicting Q values.

__init__(layer_sizes, mem_size=10000)
Parameters:
  • layer_sizes (list) – A list of integers describing the number of layers and the number of neurons in each layer. For e.g. [784, 100, 100, 10] describes a network with one input layer having 784 neurons, two hidden LSTM layers having 100 neurons each and a dense output layer with 10 neurons.
  • mem_size (int) – The size for storing experience replay.
train(env, nepisodes, optimizer, batch_size=64, render=False, update_freq=1, explr_rate=1, explr_min=0.01, explr_decay=0.99, disc_factor=0.95)

Trains the agent on the given environment using deep Q learning.

Parameters:
  • env (obj) – Object that represents the environment. See Environment Class
  • nepisodes (int) – Number of episodes to train for.
  • optimizer (any Optimizer object) – See Optimizers
  • batch_size (int) – How many samples from replay experience to train on on each step.
  • render (bool) – If set to true, will call the render() method in env object.
  • update_freq (int) – How often to update the target model in episodes.
  • explr_rate (float) – Initial exploration rate. Higher the exploration rate, agent will take more random actions.
  • explr_min (float) – Minimum exploration rate.
  • explr_decay (float) – Multiplication factor for reducing exploration rate after each episode.
  • disc_factor (float) – Discount factor, value of future rewards.
exploit(env, render=False)

Exploit the trained model to make decision. No training occurs. Use to demo the trained agent.

Parameters:
  • env (obj) – Object that represents the environment. See Environment Class
  • render (bool) – If set to true, will call the render() method in env
plot_performance(N=30)

Plots logged performance data after training. Should be called after train().

Parameters:N (int) – How many points to take for the running mean.
Raises:AttributeError – If the model has not been trained, i.e train() has not been called before.

Environment Class

class pykitml.Environment
reset()

Resets the environment and returns initial state.

Returns:initial_state – The initial state of the environment as a numpy array.
Return type:np.array
step(action)

Performas given action, modifies current state to next state, returns next state, reward and a bool to tell weather the episode has terminated.

Returns:
  • next_state (np.array) – The next state of the environment as a numpy array.
  • reward (int) – An integer telling the agent how well it performed.
  • done (bool) – Flag to tell weather the environment has reached a terminal state.
close()

Called after training is completely, properly closes/exists the environment.

render()

Method to render the environment, show a visual representation of the environment.

Example : Cartpole using OpenAI Gym

import numpy as np
import gym
import pykitml as pk

# Wrapper class around the environment
class Environment:
    def __init__(self):
        self._env = gym.make('CartPole-v1')

    def reset(self):
        return self._env.reset()

    def step(self, action):
        obs, reward, done, _ = self._env.step(action)

        # Reward function, from
        # https://github.com/keon/deep-q-learning/blob/master/ddqn.py
        x, _, theta, _ = obs
        r1 = (self._env.x_threshold - abs(x)) / self._env.x_threshold - 0.8
        r2 = (self._env.theta_threshold_radians - abs(theta)) / self._env.theta_threshold_radians - 0.5
        reward = r1 + r2

        return np.array(obs), reward, done

    def close(self):
        self._env.close()

    def render(self):
        self._env.render()

env = Environment()

# Create DQN agent and train it
agent = pk.DQNAgent([4, 64, 64, 2])
agent.set_save_freq(100, 'cartpole_agent')
agent.train(env, 500, pk.Adam(0.001), render=True)

# Plot reward graph
agent.plot_performance()