Gradient Descent In Machine learning For Beginners.
Gradient descent is an optimization algorithm which is used to find the parameters m
and c
so that cost
of the function will be minimum.
Cost() function:
- The cost function is used to measure just how wrong the model is in finding a relationship between the input and output.
- It tells how badly a model is behaving.
More on this can be referred here.
When the values are more and the m
value cannot be calculated analytically we use this algorithm to find the cost
value.
where Cost_min
and m_min
are the minimum values of Cost
and m
respectively.
The idea is to select m
and cost
randomly in the beginning. Then, we will find the slope.
If the slope is positive, the selected m is to the right of m_min
If the slope is negative, the selected m is to the left of m_min.
We try to choose the point randomly. The optimized equation will be
α
tells us how much difference the value should be taken. It is a constant.
Also, optimised intercept c
can be calculated by :
where,
α is the learning rate.
and
It is very easy to find the above two partials.Taking the derivative of Cost wrt m
gives us
Taking the derivative of Cost wrt c
gives us
We go on to calculate the cost value for each new value m
and c
until when the change in cost is very less. That is the decreasing point.
Learning rate and its importance.
We must set the value neither too high nor too low. Because if it’s too high there are chances it will reach the other side of the graph.
If it’s too small it will reach the minimum value but it will take a lot of time.
Adaptive Learning Rate (α).
Coding gradient descent for a single feature
We are using the data.csv which is provided in the link below.
import numpy as np
data = np.loadtxt(r"..\data.csv", delimiter=",")
X = data[:,0]
Y = data[:,1]
X
Setting up training data
A part of the dataset is taken to be training and testing data.
training_data = data[:70,:]
training_data.shape
Setting up testing data
testing_data = data[70:,]
testing_data.shape
Now using gradient descent, we will find the best values of m
and c
# This function finds the new gradient at each step
def step_gradient(points, learning_rate, m , c):
m_slope = 0
c_slope = 0
M = len(points)
for i in range(M):
x = points[i, 0]
y = points[i, 1]
m_slope += (-2/M)* (y - m * x - c)*x
c_slope += (-2/M)* (y - m * x - c)
new_m = m - learning_rate * m_slope
new_c = c - learning_rate * c_slope
return new_m, new_c
Defining the gradient descent function
we try to get new values for m
and c
.
def gd(points, learning_rate, num_iterations):
m = 0 # Intial random value taken as 0
c = 0 # Intial random value taken as 0
for i in range(num_iterations):
m, c = step_gradient(points, learning_rate, m , c)
print(i, " Cost: ", cost(points, m, c))
return m, c
Defining the cost function
More about the cost function can be referred to here.
# This function finds the new cost after each optimisation.
def cost(points, m, c):
total_cost = 0
M = len(points)
for i in range(M):
x = points[i, 0]
y = points[i, 1]
total_cost += (1/M)*((y - m*x - c)**2)
return total_cost
Finding the best-optimized value for m and c
def run():
learning_rate = 0.0001
num_iterations = 100
m, c = gd(training_data, learning_rate, num_iterations)
print("Final m :", m)
print("Final c :", c)
return m,c
Predicting the y
value using the found value of m
and c
by defining the predict function.
def predict(final_m, final_c, testing_data):
y_pred = []
for i in range(len(testing_data)):
ans = m*testing_data[i][0] + c
y_pred.append(ans)
return y_pred
The jupyter file for the code is here. The dataset is here.
Follow me to learn ML from scratch in the next 100 days.