Deep Q Learning
DQNAgent Class
- class pykitml.DQNAgent(layer_sizes, mem_size=10000)
This class implements Double Deep Q Learning for RL. Use a Neural Network with huber loss for predicting Q values.
- __init__(layer_sizes, mem_size=10000)
- Parameters:
layer_sizes (list) – A list of integers describing the number of layers and the number of neurons in each layer. For e.g.
[784, 100, 100, 10]describes a network with one input layer having 784 neurons, two hidden LSTM layers having 100 neurons each and a dense output layer with 10 neurons.mem_size (int) – The size for storing experience replay.
- train(env, nepisodes, optimizer, batch_size=64, render=False, update_freq=1, explr_rate=1, explr_min=0.01, explr_decay=0.99, disc_factor=0.95)
Trains the agent on the given environment using deep Q learning.
- Parameters:
env (obj) – Object that represents the environment. See Environment Class
nepisodes (int) – Number of episodes to train for.
optimizer (any Optimizer object) – See Optimizers
batch_size (int) – How many samples from replay experience to train on on each step.
render (bool) – If set to true, will call the
render()method inenvobject.update_freq (int) – How often to update the target model in episodes.
explr_rate (float) – Initial exploration rate. Higher the exploration rate, agent will take more random actions.
explr_min (float) – Minimum exploration rate.
explr_decay (float) – Multiplication factor for reducing exploration rate after each episode.
disc_factor (float) – Discount factor, value of future rewards.
- exploit(env, render=False)
Exploit the trained model to make decision. No training occurs. Use to demo the trained agent.
- Parameters:
env (obj) – Object that represents the environment. See Environment Class
render (bool) – If set to true, will call the
render()method inenv
Environment Class
- class pykitml.Environment
- abstract reset()
Resets the environment and returns initial state.
- Returns:
initial_state – The initial state of the environment as a numpy array.
- Return type:
np.array
- abstract step(action)
Performas given action, modifies current state to next state, returns next state, reward and a bool to tell weather the episode has terminated.
- Returns:
next_state (np.array) – The next state of the environment as a numpy array.
reward (int) – An integer telling the agent how well it performed.
done (bool) – Flag to tell weather the environment has reached a terminal state.
- abstract close()
Called after training is completely, properly closes/exists the environment.
- render()
Method to render the environment, show a visual representation of the environment.
Example : Cartpole using gymnasium
import gymnasium as gym
import pykitml as pk
# Wrapper class around the environment
class Environment:
def __init__(self):
self._env = gym.make('CartPole-v1', render_mode="human")
def reset(self):
return self._env.reset()[0]
def step(self, action):
obs, reward, done, _, _ = self._env.step(action)
x, _, theta, _ = obs
x_threshold = self._env.env.env.env.x_threshold
theta_threshold_radians = self._env.env.env.env.theta_threshold_radians
# Reward function, from
# https://github.com/keon/deep-q-learning/blob/master/ddqn.py
r1 = (x_threshold - abs(x)) / x_threshold - 0.8
r2 = (theta_threshold_radians - abs(theta)) / theta_threshold_radians - 0.5
reward = r1 + r2
return obs, reward, done
def close(self):
self._env.close()
def render(self):
self._env.render()
env = Environment()
# Create DQN agent and train it
agent = pk.DQNAgent([4, 64, 64, 2])
agent.set_save_freq(100, 'cartpole_agent')
agent.train(env, 500, pk.Adam(0.001), render=True)
# Plot reward graph
agent.plot_performance()