Gradient Descent In Machine learning For Beginners.

4 min readMay 8, 2022

Gradient descent is an optimization algorithm which is used to find the parameters m and c so that cost of the function will be minimum.

Cost() function:

The cost function is used to measure just how wrong the model is in finding a relationship between the input and output.
It tells how badly a model is behaving.

More on this can be referred here.

When the values are more and the m value cannot be calculated analytically we use this algorithm to find the cost value.

where Cost_min and m_min are the minimum values of Cost and m respectively.

The idea is to select m and cost randomly in the beginning. Then, we will find the slope.

If the slope is positive, the selected m is to the right of m_min

If the slope is negative, the selected m is to the left of m_min.

We try to choose the point randomly. The optimized equation will be

α tells us how much difference the value should be taken. It is a constant.

Also, optimised intercept c can be calculated by :

where,

α is the learning rate.

and

It is very easy to find the above two partials.Taking the derivative of Cost wrt m gives us

Taking the derivative of Cost wrt c gives us

We go on to calculate the cost value for each new value m and c until when the change in cost is very less. That is the decreasing point.

Learning rate and its importance.

We must set the value neither too high nor too low. Because if it’s too high there are chances it will reach the other side of the graph.

If it’s too small it will reach the minimum value but it will take a lot of time.

Adaptive Learning Rate (α).

Coding gradient descent for a single feature

We are using the data.csv which is provided in the link below.

import numpy as np
data = np.loadtxt(r"..\data.csv", delimiter=",")
X = data[:,0]
Y = data[:,1]
X

Setting up training data

A part of the dataset is taken to be training and testing data.

training_data = data[:70,:]
training_data.shape

Setting up testing data

testing_data = data[70:,]
testing_data.shape

Now using gradient descent, we will find the best values of m and c

# This function finds the new gradient at each step
def step_gradient(points, learning_rate, m , c):
   m_slope = 0
   c_slope = 0
   M = len(points)
   for i in range(M):
      x = points[i, 0]
      y = points[i, 1]
      m_slope += (-2/M)* (y - m * x - c)*x
      c_slope += (-2/M)* (y - m * x - c)
   new_m = m - learning_rate * m_slope
   new_c = c - learning_rate * c_slope
   return new_m, new_c

Defining the gradient descent function

we try to get new values for m and c .

def gd(points, learning_rate, num_iterations):
   m = 0       # Intial random value taken as 0
   c = 0       # Intial random value taken as 0
   for i in range(num_iterations):
      m, c = step_gradient(points, learning_rate, m , c)
      print(i, " Cost: ", cost(points, m, c))
   return m, c

Defining the cost function

More about the cost function can be referred to here.

# This function finds the new cost after each optimisation.
def cost(points, m, c):
   total_cost = 0
   M = len(points)
   for i in range(M):
      x = points[i, 0]
      y = points[i, 1]
      total_cost += (1/M)*((y - m*x - c)**2)
   return total_cost

Finding the best-optimized value for m and c

def run():
   learning_rate = 0.0001
   num_iterations = 100
   m, c = gd(training_data, learning_rate, num_iterations)
   print("Final m :", m)
   print("Final c :", c)
   return m,c

Predicting the y value using the found value of m and c by defining the predict function.

def predict(final_m, final_c, testing_data):
   y_pred = []
   for i in range(len(testing_data)):
      ans = m*testing_data[i][0] + c
      y_pred.append(ans)
      return y_pred

The jupyter file for the code is here. The dataset is here.

Follow me to learn ML from scratch in the next 100 days.