Deep Q Learning¶

DQNAgent Class¶

class pykitml.DQNAgent(layer_sizes, mem_size=10000)¶

This class implements Double Deep Q Learning for RL. Use a Neural Network with huber loss for predicting Q values.

__init__(layer_sizes, mem_size=10000)¶

Parameters:	layer_sizes (list) – A list of integers describing the number of layers and the number of neurons in each layer. For e.g. `[784, 100, 100, 10]` describes a network with one input layer having 784 neurons, two hidden LSTM layers having 100 neurons each and a dense output layer with 10 neurons. mem_size (int) – The size for storing experience replay.

train(env, nepisodes, optimizer, batch_size=64, render=False, update_freq=1, explr_rate=1, explr_min=0.01, explr_decay=0.99, disc_factor=0.95)¶

Trains the agent on the given environment using deep Q learning.

Parameters:

env (obj) – Object that represents the environment. See Environment Class
nepisodes (int) – Number of episodes to train for.
optimizer (any Optimizer object) – See Optimizers
batch_size (int) – How many samples from replay experience to train on on each step.
render (bool) – If set to true, will call the render() method in env object.
update_freq (int) – How often to update the target model in episodes.
explr_rate (float) – Initial exploration rate. Higher the exploration rate, agent will take more random actions.
explr_min (float) – Minimum exploration rate.
explr_decay (float) – Multiplication factor for reducing exploration rate after each episode.
disc_factor (float) – Discount factor, value of future rewards.

exploit(env, render=False)¶

Exploit the trained model to make decision. No training occurs. Use to demo the trained agent.

Parameters:	env (obj) – Object that represents the environment. See Environment Class render (bool) – If set to true, will call the `render()` method in `env`

plot_performance(N=30)¶

Plots logged performance data after training. Should be called after train().

Parameters:	N (int) – How many points to take for the running mean.
Raises:	`AttributeError` – If the model has not been trained, i.e `train()` has not been called before.

Environment Class¶

class pykitml.Environment¶

reset()¶

Resets the environment and returns initial state.

Returns:	initial_state – The initial state of the environment as a numpy array.
Return type:	np.array

step(action)¶

Performas given action, modifies current state to next state, returns next state, reward and a bool to tell weather the episode has terminated.

Returns:	next_state (np.array) – The next state of the environment as a numpy array. reward (int) – An integer telling the agent how well it performed. done (bool) – Flag to tell weather the environment has reached a terminal state.

close()¶: Called after training is completely, properly closes/exists the environment.

render()¶: Method to render the environment, show a visual representation of the environment.

Example : Cartpole using OpenAI Gym¶

import numpy as np
import gym
import pykitml as pk

# Wrapper class around the environment
class Environment:
    def __init__(self):
        self._env = gym.make('CartPole-v1')

    def reset(self):
        return self._env.reset()

    def step(self, action):
        obs, reward, done, _ = self._env.step(action)

        # Reward function, from
        # https://github.com/keon/deep-q-learning/blob/master/ddqn.py
        x, _, theta, _ = obs
        r1 = (self._env.x_threshold - abs(x)) / self._env.x_threshold - 0.8
        r2 = (self._env.theta_threshold_radians - abs(theta)) / self._env.theta_threshold_radians - 0.5
        reward = r1 + r2

        return np.array(obs), reward, done

    def close(self):
        self._env.close()

    def render(self):
        self._env.render()

env = Environment()

# Create DQN agent and train it
agent = pk.DQNAgent([4, 64, 64, 2])
agent.set_save_freq(100, 'cartpole_agent')
agent.train(env, 500, pk.Adam(0.001), render=True)

# Plot reward graph
agent.plot_performance()