Python basics for Machine Learning Using Pandas -1

# Importing pandas library
import pandas as pd
# 'as pd' makes it easy to call the pandas library. 'pd' can be named anything that you want.

Reading Data

titanic = pd.read_csv(r"C:\Users\Name\Desktop\codenotes\ML\titanicdataset\titanic_train.csv")
#when there is "\" you can add 'r' before to convert them into a raw string for the path.
#you can actually replace the '\' with '/' or '\\'.

Creating a copy of the data.

df = titanic.copy()

Dimension of the dataset:

#the first value is the number of rows and the second one is the number of columns.

Understanding the dataset


Viewing the data frame


Select a column to view.

1 --> df.Name2 --> df['Name']

Finding null values

#isnull just gives us the boolean values 'True' if null. 'Flase' if not null.
#sum can be used to get the number of values that are null

Accessing values in dataframe

  • Iloc() function.
  • Loc() function.

iloc[] function.

  • An integer, e.g. 5.
  • A list or array of integers, e.g. [4, 3, 0].
  • A slice object with ints, e.g. 1:7.
  • A boolean array.
df.iloc[1:4,2:4] #it selects the first 3 columns and 2 rowsdf.iloc[1:4] #it selects 3 columns and all the rows.

loc() function.

  • A single label, e.g. 5 or ‘a’, (note that 5 is interpreted as a label of the index, and never as an integer position along with the index).
  • A list or array of labels, e.g. [‘a’, ‘b’, ‘c’].
  • A slice object with labels, e.g. ‘a’:’f’.
  • A boolean array of the same length as the axis being sliced, e.g. [True, False, True].
# it selects the first 4 rows with the labels from 1 to 4.
# it selects the first 4 rows with the labels from 1 to 4. and selects all the columns from 'Name' to 'Ticket'

Deletion of data

#this will create a object with a dropped row number 0 index name.
#this wont change the original dataframe since inplace will be 'False' by default. This means if we call back 'df' then
there will be no changes done to it.'''
df.drop(0,inplace = True) 
#giving inplace true will be a permanent change in the dataframe.
df.drop('PassengerId',axis = 1)



Anantha Kattani

