Deep Q Learning

DQNAgent Class

class pykitml.DQNAgent(layer_sizes, mem_size=10000)

This class implements Double Deep Q Learning for RL. Use a Neural Network with huber loss for predicting Q values.

__init__(layer_sizes, mem_size=10000)
Parameters:
  • layer_sizes (list) – A list of integers describing the number of layers and the number of neurons in each layer. For e.g. [784, 100, 100, 10] describes a network with one input layer having 784 neurons, two hidden LSTM layers having 100 neurons each and a dense output layer with 10 neurons.

  • mem_size (int) – The size for storing experience replay.

train(env, nepisodes, optimizer, batch_size=64, render=False, update_freq=1, explr_rate=1, explr_min=0.01, explr_decay=0.99, disc_factor=0.95)

Trains the agent on the given environment using deep Q learning.

Parameters:
  • env (obj) – Object that represents the environment. See Environment Class

  • nepisodes (int) – Number of episodes to train for.

  • optimizer (any Optimizer object) – See Optimizers

  • batch_size (int) – How many samples from replay experience to train on on each step.

  • render (bool) – If set to true, will call the render() method in env object.

  • update_freq (int) – How often to update the target model in episodes.

  • explr_rate (float) – Initial exploration rate. Higher the exploration rate, agent will take more random actions.

  • explr_min (float) – Minimum exploration rate.

  • explr_decay (float) – Multiplication factor for reducing exploration rate after each episode.

  • disc_factor (float) – Discount factor, value of future rewards.

exploit(env, render=False)

Exploit the trained model to make decision. No training occurs. Use to demo the trained agent.

Parameters:
  • env (obj) – Object that represents the environment. See Environment Class

  • render (bool) – If set to true, will call the render() method in env

plot_performance(N=30)

Plots logged performance data after training. Should be called after train().

Parameters:

N (int) – How many points to take for the running mean.

Raises:

AttributeError – If the model has not been trained, i.e train() has not been called before.

Environment Class

class pykitml.Environment
abstract reset()

Resets the environment and returns initial state.

Returns:

initial_state – The initial state of the environment as a numpy array.

Return type:

np.array

abstract step(action)

Performas given action, modifies current state to next state, returns next state, reward and a bool to tell weather the episode has terminated.

Returns:

  • next_state (np.array) – The next state of the environment as a numpy array.

  • reward (int) – An integer telling the agent how well it performed.

  • done (bool) – Flag to tell weather the environment has reached a terminal state.

abstract close()

Called after training is completely, properly closes/exists the environment.

render()

Method to render the environment, show a visual representation of the environment.

Example : Cartpole using gymnasium

import gymnasium as gym
import pykitml as pk

# Wrapper class around the environment
class Environment:
    def __init__(self):
        self._env = gym.make('CartPole-v1', render_mode="human")

    def reset(self):
        return self._env.reset()[0]

    def step(self, action):
        obs, reward, done, _, _ = self._env.step(action)

        x, _, theta, _ = obs
        x_threshold = self._env.env.env.env.x_threshold
        theta_threshold_radians = self._env.env.env.env.theta_threshold_radians

        # Reward function, from
        # https://github.com/keon/deep-q-learning/blob/master/ddqn.py            
        r1 = (x_threshold - abs(x)) / x_threshold - 0.8
        r2 = (theta_threshold_radians - abs(theta)) / theta_threshold_radians - 0.5
        reward = r1 + r2

        return obs, reward, done

    def close(self):
        self._env.close()

    def render(self):
        self._env.render()

env = Environment()

# Create DQN agent and train it
agent = pk.DQNAgent([4, 64, 64, 2])
agent.set_save_freq(100, 'cartpole_agent')
agent.train(env, 500, pk.Adam(0.001), render=True)

# Plot reward graph
agent.plot_performance()