K-Means Clustering¶

K-Means Function¶

pykitml.kmeans(training_data, nclusters, max_iter=1000, trials=50)¶

Identifies cluster centres on training data using k-means.

Parameters:

training_data (numpy.array) – Numpy array containing training data.
nclusters (int) – Number of cluster to find.
max_iter (int) – Maximum number of iterations to run per trial.
trials (int) – Number of times k-means should run, each with different random initialization.

Returns:

clusters (numpy.array) – Numpy array containing cluster centres.
cost (numpy.array) – The cost of converged cluster centres.

Example: S1 Dataset¶

Dataset

S1 Clustering - pykitml.datasets.s1clustering module

Training

import os

import pykitml as pk
from pykitml.datasets import s1clustering
import matplotlib.pyplot as plt

# Download the dataset
if not os.path.exists('s1.pkl'):
    s1clustering.get()

# Load the dataset
train_data = s1clustering.load()

# Run KMeans
clusters, cost = pk.kmeans(train_data, 15)

# Plot dataset, x and y
plt.scatter(train_data[:, 0], train_data[:, 1])

# Plot clusters, x and y
plt.scatter(clusters[:, 0], clusters[:, 1], c='red')

# Show graph
plt.show()

Scatter Plot