K Medoids Clustering is a clustering algorithm, have you tried to write it from scratch in Python?
But before that, if you are also looking for other algorithms from scratch then please follow along:
 Linear Regression from Scratch
 Logistic Regression from Scratch
 Logistic Regression with Different Loss Functions
 PCA from Scratch
 KMeans Clustering from Scratch
 Feed Forward Neural Network from Scratch
 Convolutional Neural Network from Scratch
 Image Processing Class from Scratch
 Machine Learning Optimizer from Scratch
 Naive Bayes from Scratch
 AutoEncoder from Scratch
 Basics of Stock Backtesting
Kmedoids are a prominent clustering algorithm as an improvement of the predecessor, the KMeans algorithm. Despite its being widely used and less sensitive to noises and outliers, the performance of the Kmedoids clustering algorithm is affected by the distance function. From here.
When the kmeans algorithm is not appropriate to make objects of the cluster to the data points then the kmedoid clustering algorithm is preferred. A medoid is an object of the cluster whose dissimilarity to all the objects in the cluster is minimum. The main difference between the Kmeans and the Kmedoid algorithm is that we work with an arbitrary matrix of distance instead of Euclidean distance. Kmedoid is a classical partitioning technique of clustering that cluster the dataset into a k cluster. It is more robust to noise and outliers because it may minimize the sum of pairwise dissimilarities however kmeans minimize the sum of squared Euclidean distances. The most common distances used in KMedoids clustering techniques are Manhattan distance or Minkowski distance and here we will use Manhattan distance.
Besides K Medoids Clustering from scratch, we have also written a blog about kmeans clustering from scratch. Please read it below:
Manhattan Distance
Of p1, p2 is: $(x2x1)+(y2y1)$.
Algorithm
 Step 1: Randomly select(without replacement) k of the n data points as the median.
 Step 2: Associate each data point to the closest median.

Step 3: While the cost of the configuration decreases:
 For each medoid
m
, for each nonmedoid data pointo
: Swap
m
ando
, recompute the cost.  If the total cost of the configuration increased in the previous step, undo the swap.
 Swap
Example
Cluster the following data set of ten objects into two clusters i.e. k =3. Data points are ${(1,3),(4,5),(6,3),(3,4),(2,1)}$
Solution
 For each medoid
Let $p_1=(1,3)$, $p_2 =(4,5)$, $p_3 =(6,3)$, $p_4=(3,4)$, $p_5 =(2,1)$. We are going to cluster the data into two clusters using the kmedoids algorithm.
Initial Step
Let $m_1= p_1=(1,3)$ and $m_2 = p_4 = (3,4)$ are two initial medoid
$ d(m_1, p_2) = mode(x_2x_1) + mode(y_2y_1)$
Iteration 1
Calculate the distance between cluster centers and each data point.
$d(m_1,p_1) = 0$
$d(m_2,p_1) = 3$
$d(m_1,p_2) = 3 + 2 = 5$
$d(m_2,p_2) = 1 + 1= 2$
$d(m_1,p_3) = 5 + 0 = 5$
$d(m_2,p_3) = 3 + 1 = 4$
$d(m_1,p_5) = 1 + 2= 3$
$d(m_2,p_5) = 1 + 3= 4$
Hence, after clusters are formed are
 Cluster 1 = {$p_1, p_5$}

Cluster 2 = {$P_2,p_3,p_4$}
Now, Total Cost = ${d(m_1,p_1) +d(m_1,p_5) + d(m_2, p_2) + d(m_2,p_3)+d(m_2,p_4)}$ = {0 + 2 + 4+3} = 9
Iteration 2
Now $m_1 = p_2=(4,5) $ and $m_2 = p_4 =(3,4) $ are two medoid
Let $p_1 =(1,3)$, $p_2 =(4,5)$, $p_3=(6,3)$ and $p_5=(2,1)$.
Calculate the distance between new clusters and each data point.
$d(m_1, p_1) = 3 + 2= 5$
$d(m_2, p_1) = 2 + 1= 3$
$d(m_1, p_3) = 2 + 2 = 4$
$d(m_2, p_3) = 3 + 1 = 4$
$d(m_1, p_5) = 2 + 4 = 6 $
$d(m_2, p_5) = 1 + 3= 5$
Thus after the second iteration
 Cluster 1 = {$p_2, p_3$}
 Cluster 2 = {$p_1, p_4, p_5$}
Total cost = ${d(m_1,p_2)+d(m_1,p_3) + d(m_2, p_1) + d(m_2,p_4)+d(m_2,p_5)}$ = 0 + 4 + 3 + 0 + 5 = 12
Total cost = 12.
Here total cost in iteration 2 is greater than in iteration 1 so, we don't go further.
Coding Part
import numpy as np
import matplotlib.pyplot as plt
plt.style.use("seabornwhitegrid")
Basic Example
Let's take an example of data points where we have 5 points and we will try to make 2 clusters for those data.
 Initially, we will plot our data points by passing
x
andy
coordinates of all points.  Take points 1st and 4th as initial medoids points or medoids of two clusters.
data = [(1,3),(4,5),(6,3),(3,4),(2,1)]
points = np.array(data)
# print(points)
plt.figure(figsize=(5,5))
plt.scatter(x=points[:,0],y = points[:,1])
plt.title("Initially")
plt.show()
k = 2
c1 = points[0]
c2 = points[1]
plt.figure(figsize=(5,5))
plt.scatter(x=points[:,0],y = points[:,1])
plt.scatter([c1[0],c2[0]],[c1[1],c2[1]],marker="*")
plt.title("Step 1")
plt.show()
 Make a function
manhattan
andget_costs
to calculate the manhattan distance between points and a medoid of cluster and the minimum cost between particular data paints and medoids.  Get initial clusters and cost.
 Prepare colors for each cluster and show initial clusters.
 Now, we will loop until some internal break happens:
 Initially assume that there is no swap happened because if it is the case we must exit from the loop.
 Loop through numbers of sample
 If the current sample is not in medoid, then we will compare this sample's distance with each of the medoids.
 If We will swap the sample and medoids and calculate its cost. If the cost is smaller than the previous cost then we accept the swap and make it true as well as change the clusters. We will also plot the clusters.
 Else we ignore the swap and move on from there.
 If no swap has happened or we reached the number of iterations, then we exit from the loop.
def manhattan(p1, p2):
return np.abs((p1[0]p2[0])) + np.abs((p1[1]p2[1]))
def get_costs(data, medoids):
clusters = {i:[] for i in range(len(medoids))}
cst = 0
for d in data:
dst = np.array([manhattan(d, md) for md in medoids])
c = dst.argmin()
clusters[c].append(d)
cst+=dst.min()
clusters = {k:np.array(v) for k,v in clusters.items()}
return clusters, cst
def KMedoids(data, k, iters = 100):
medoids = np.array([data[i] for i in range(k)])
samples,_ = data.shape
clusters, cost = get_costs(data, medoids)
count = 0
colors = np.array(np.random.randint(0, 255, size =(k, 4)))/255
colors[:,3]=1
plt.title(f"Step : 0")
[plt.scatter(clusters[t][:, 0], clusters[t][:, 1], marker="*", s=100,
color = colors[t]) for t in range(k)]
plt.scatter(medoids[:, 0], medoids[:, 1], s=200, color=colors)
plt.show()
while True:
swap = False
for i in range(samples):
if not i in medoids:
for j in range(k):
tmp_meds = medoids.copy()
tmp_meds[j] = i
clusters_, cost_ = get_costs(data, tmp_meds)
if cost_<cost:
medoids = tmp_meds
cost = cost_
swap = True
clusters = clusters_
print(f"Medoids Changed to: {medoids}.")
plt.title(f"Step : {count+1}")
[plt.scatter(clusters[t][:, 0], clusters[t][:, 1], marker="*", s=100,
color = colors[t]) for t in range(k)]
plt.scatter(medoids[:, 0], medoids[:, 1], s=200, color=colors)
plt.show()
count+=1
if count>=iters:
print("End of the iterations.")
break
if not swap:
print("No changes.")
break
KMedoids(points,2)
No changes.
Now looking over the output of our code, it seems that it worked pretty well but what if our cluster size is variable and data also varies? So to make our code a little bit more awesome, we will do a few more tasks.
KMedoids
Class
Some of the reasons that we are using a class are:
 To make our code dynamic.
 To wrap useful methods.
 To make it reusable without changing a structure.
Let's take it into action.
Initializing
 Initialize the class by giving data, the number of clusters we want, and a number of items to loop.
 Make attributes of those values.
 Take k number of medoids serially for the sake of simplicity.
 Prepare random colors for each cluster.
class KMedoidsClass:
def __init__(self,data,k,iters):
self.data= data
self.k = k
self.iters = iters
self.medoids = np.array([data[i] for i in range(self.k)])
self.colors = np.array(np.random.randint(0, 255, size =(self.k, 4)))/255
self.colors[:,3]=1
Distance method
To find the distance between data points and medoids.
def manhattan(self,p1, p2):
return np.abs((p1[0]p2[0])) + np.abs((p1[1]p2[1]))
get_cost
Method
Just to find the cost minimum cost between the data points and medoids.
def get_costs(self, medoids):
tmp_clusters = {i:[] for i in range(len(medoids))}
cst = 0
for d in self.data:
dst = np.array([self.manhattan(d, md) for md in medoids])
c = dst.argmin()
tmp_clusters[c].append(d)
cst+=dst.min()
tmp_clusters = {k:np.array(v) for k,v in tmp_clusters.items()}
return tmp_clusters, cst
fit
Method
This method will implement everything we did in an above simple example.
def fit(self):
samples,_ = self.data.shape
self.clusters, cost = self.get_costs(data=self.data, medoids=self.medoids)
count = 0
colors = np.array(np.random.randint(0, 255, size =(self.k, 4)))/255
colors[:,3]=1
plt.title(f"Step : 0")
[plt.scatter(self.clusters[t][:, 0], self.clusters[t][:, 1], marker="*", s=100,
color = colors[t]) for t in range(self.k)]
plt.scatter(self.medoids[:, 0], self.medoids[:, 1], s=200, color=colors)
plt.show()
while True:
swap = False
for i in range(samples):
if not i in self.medoids:
for j in range(self.k):
tmp_meds = self.medoids.copy()
tmp_meds[j] = i
clusters_, cost_ = self.get_costs(data=self.data, medoids=tmp_meds)
if cost_<cost:
self.medoids = tmp_meds
cost = cost_
swap = True
self.clusters = clusters_
print(f"Medoids Changed to: {self.medoids}.")
plt.title(f"Step : {count+1}")
[plt.scatter(self.clusters[t][:, 0], self.clusters[t][:, 1], marker="*", s=100,
color = colors[t]) for t in range(self.k)]
plt.scatter(self.medoids[:, 0], self.medoids[:, 1], s=200, color=colors)
plt.show()
count+=1
if count>=self.iters:
print("End of the iterations.")
break
if not swap:
print("No changes.")
break
Combine all code
class KMedoidsClass:
def __init__(self,data,k,iters):
self.data= data
self.k = k
self.iters = iters
self.medoids = np.array([data[i] for i in range(self.k)])
self.colors = np.array(np.random.randint(0, 255, size =(self.k, 4)))/255
self.colors[:,3]=1
def manhattan(self,p1, p2):
return np.abs((p1[0]p2[0])) + np.abs((p1[1]p2[1]))
def get_costs(self, medoids, data):
tmp_clusters = {i:[] for i in range(len(medoids))}
cst = 0
for d in data:
dst = np.array([self.manhattan(d, md) for md in medoids])
c = dst.argmin()
tmp_clusters[c].append(d)
cst+=dst.min()
tmp_clusters = {k:np.array(v) for k,v in tmp_clusters.items()}
return tmp_clusters, cst
def fit(self):
samples,_ = self.data.shape
self.clusters, cost = self.get_costs(data=self.data, medoids=self.medoids)
count = 0
colors = np.array(np.random.randint(0, 255, size =(self.k, 4)))/255
colors[:,3]=1
plt.title(f"Step : 0")
[plt.scatter(self.clusters[t][:, 0], self.clusters[t][:, 1], marker="*", s=100,
color = colors[t]) for t in range(self.k)]
plt.scatter(self.medoids[:, 0], self.medoids[:, 1], s=200, color=colors)
plt.show()
while True:
swap = False
for i in range(samples):
if not i in self.medoids:
for j in range(self.k):
tmp_meds = self.medoids.copy()
tmp_meds[j] = i
clusters_, cost_ = self.get_costs(data=self.data, medoids=tmp_meds)
if cost_<cost:
self.medoids = tmp_meds
cost = cost_
swap = True
self.clusters = clusters_
print(f"Medoids Changed to: {self.medoids}.")
plt.title(f"Step : {count+1}")
[plt.scatter(self.clusters[t][:, 0], self.clusters[t][:, 1], marker="*", s=100,
color = colors[t]) for t in range(self.k)]
plt.scatter(self.medoids[:, 0], self.medoids[:, 1], s=200, color=colors)
plt.show()
count+=1
if count>=self.iters:
print("End of the iterations.")
break
if not swap:
print("No changes.")
break
Test it
dt = np.random.randint(0,100, (100,2))
kmedoid = KMedoidsClass(dt,5,5)
kmedoid.fit()
Medoids Changed to: [[10 10]
[19 68]
[56 35]
[87 32]
[89 98]].
Medoids Changed to: [[11 11]
[19 68]
[56 35]
[87 32]
[89 98]].
Medoids Changed to: [[12 12]
[19 68]
[56 35]
[87 32]
[89 98]].
Medoids Changed to: [[13 13]
[19 68]
[56 35]
[87 32]
[89 98]].
Medoids Changed to: [[14 14]
[19 68]
[56 35]
[87 32]
[89 98]].
Medoids Changed to: [[18 18]
[19 68]
[56 35]
[87 32]
[89 98]].
Medoids Changed to: [[18 18]
[19 68]
[49 49]
[87 32]
[89 98]].
Medoids Changed to: [[18 18]
[19 68]
[50 50]
[87 32]
[89 98]].
Medoids Changed to: [[18 18]
[19 68]
[51 51]
[87 32]
[89 98]].
Medoids Changed to: [[18 18]
[19 68]
[52 52]
[87 32]
[89 98]].
Medoids Changed to: [[18 18]
[19 68]
[53 53]
[87 32]
[89 98]].
Medoids Changed to: [[18 18]
[19 68]
[54 54]
[87 32]
[89 98]].
Medoids Changed to: [[18 18]
[19 68]
[55 55]
[87 32]
[89 98]].
Medoids Changed to: [[18 18]
[19 68]
[56 56]
[87 32]
[89 98]].
Medoids Changed to: [[18 18]
[19 68]
[57 57]
[87 32]
[89 98]].
Medoids Changed to: [[18 18]
[19 68]
[57 57]
[87 32]
[75 75]].
Medoids Changed to: [[18 18]
[19 68]
[57 57]
[87 32]
[76 76]].
Medoids Changed to: [[18 18]
[19 68]
[57 57]
[87 32]
[77 77]].
Medoids Changed to: [[18 18]
[19 68]
[57 57]
[87 32]
[78 78]].
Medoids Changed to: [[18 18]
[19 68]
[57 57]
[87 32]
[79 79]].
Medoids Changed to: [[18 18]
[19 68]
[57 57]
[87 32]
[80 80]].
Medoids Changed to: [[18 18]
[19 68]
[57 57]
[87 32]
[81 81]].
Medoids Changed to: [[20 20]
[19 68]
[57 57]
[87 32]
[81 81]].
Medoids Changed to: [[21 21]
[19 68]
[57 57]
[87 32]
[81 81]].
Medoids Changed to: [[21 21]
[19 68]
[51 51]
[87 32]
[81 81]].
Medoids Changed to: [[21 21]
[19 68]
[52 52]
[87 32]
[81 81]].
Medoids Changed to: [[21 21]
[19 68]
[53 53]
[87 32]
[81 81]].
Medoids Changed to: [[16 16]
[19 68]
[53 53]
[87 32]
[81 81]].
Medoids Changed to: [[17 17]
[19 68]
[53 53]
[87 32]
[81 81]].
Medoids Changed to: [[18 18]
[19 68]
[53 53]
[87 32]
[81 81]].
No changes.
This ends the K Medoids Clustering from the the scratch blog. Thank you so much for reading this blog. In the next blog, we will do Hierarchical Clustering from Scratch.
You must log in to post a comment.