When to perform feature scaling? — Machine Learning

Anantha Kattani

3 min readJun 1, 2022

It is a technique used to bring all the features to the same scale.

Let's say we have 2 features:

Money
Quantity of Milk

The quantity of milk will be in litres which will mostly be in the range of 1–20.

For the same quantity of milk, the money will be in the 100s.

We, as humans understand the scale between these quantities but how do we make the machine understand. The model considers all the values to be on the same scale and gives higher weightage to higher values and lower values in the above example quantity of milk values might be ignored if it's too low.

So in machine learning, we try to bring both the parameters to the same scale and this process is called Feature Scaling.

Why feature scaling?

In ML algorithms like linear regression, logistic regression which use gradient descent is an optimization technique with different scales of value that will cause different step sizes for each feature.

To ensure it reaches minima smoothly and steps are updated smoothly for all features at the same rate we should scale the data.

Using sklearn feature scaling can be done in 3 ways:

Normal Scaling
Standard Scaler
Minmax Scaler

Normal Scaling

import pandas as pd
import numpy as np
from sklearn import datasets
wine = datasets.load_wine()

Taking in the dataframe

wine_df = pd.DataFrame(X, columns = columnNames)

Scaling using scale()

from sklearn import preprocessing
wine_scaled = preprocessing.scale(X)

This will create an array and to convert it to the dataframe for better understanding.

wine_df_scaled = pd.DataFrame(wine_scaled, columns = columnNames)

This is the simplest scaling solution in sklearn.

Features:

This scales in such a way that the mean of the entire data is 0.

the standard deviation becomes 1 of the entire data.

The problem with this method is that we have to combine both testing and training data to perform the scale. This will bias the model evaluation because the information would have leaked from the test set to the training set.

Also, if we receive new training points in future, they will be scaled according to their own features, irrespective of the value of old data. This will lead to huge problems in your ML model. We need some way to store the method with which scaling was done.

Thus, we usually do not use this.

Min-Max Scaler

Normalization requires that you know or are able to accurately estimate the minimum and maximum observable values.

Transform features by scaling each feature to a given range.

Min-Max Scaler estimator scales and translates each feature individually such that it is in the given range on the training set, e.g. between zero and one.

min_max_scaler_object = preprocessing.MinMaxScaler()
min_max_scaler_object.fit(X)wine_min_max = min_max_scaler_object.transform(X)

To convert it to the dataframe for a better understanding of the dataframe.

wine1 = pd.DataFrame(wine_min_max , columns = columnNames)

You now may use the same object, to transform any data in a similar manner.

Standard Scaler

StandardScaler follows Standard Normal Distribution (SND).

Standardize features by removing the mean and scaling to unit variance.

standard_scaler_object = preprocessing.StandardScaler()
standard_scaler_object.fit(X)
wine_standard = standard_scaler_object.transform(X)

To convert it to the dataframe for a better understanding of the dataframe.

wine2 = pd.DataFrame(wine_standard , columns = columnNames)

Most of the time, we implement the standard scalar.

However both Standard Scalar and Min-Max Scaler are sensitive to outliers.

There is another scaler, the Robust Scaler, which gives better results when data has outliers.

The jupyter notebook for this code is here.

Follow me to learn machine learning from scratch.