{
"cells": [
{
"cell_type": "markdown",
"id": "substantial-probability",
"metadata": {},
"source": [
"# Pandas Textbook for Digital Humanities"
]
},
{
"cell_type": "markdown",
"id": "appropriate-advertising",
"metadata": {},
"source": [
"## Chapter Three: Finding Data in DataFrame"
]
},
{
"cell_type": "markdown",
"id": "typical-vector",
"metadata": {},
"source": [
"## Covered in this Chapter"
]
},
{
"cell_type": "markdown",
"id": "constant-reward",
"metadata": {},
"source": [
"1) How to Find Column Names \n",
"2) How to Get a Quick Sense of the Dataset \n",
"3) How to Grab a Specific Range of Rows \n",
"4) How to Get a Quick Quantitative Understanding of the Dataset \n",
"5) How to Find Specific Information in the Dataset \n",
"6) How to use Or in a DataFrame query "
]
},
{
"cell_type": "markdown",
"id": "lightweight-following",
"metadata": {},
"source": [
"## Video"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "mobile-account",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
"
]
},
{
"cell_type": "markdown",
"id": "under-daily",
"metadata": {},
"source": [
"## About the Titanic Dataset"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "french-macedonia",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd"
]
},
{
"cell_type": "markdown",
"id": "accepted-hepatitis",
"metadata": {},
"source": [
""
]
},
{
"cell_type": "markdown",
"id": "stuffed-species",
"metadata": {},
"source": [
"In the next two notebooks, we will be using the same dataset, the infamous *Titanic Survivor* dataset that was put out by Kaggle a few years ago. It has since become a staple dataset in machine learning and data science communities because of the breadth of data that it offers on the Titanic passengers. The goal of most machine learning challenges is to use this dataset to train a model that can accurately predict if a passenger would have survived given certain gender, economic, and class conditions.\n",
"\n",
"In these notebooks, we will be using a cleaned version of the dataset available on the GitHub page below. We will be using it because of its breadth of its qualitative and quantitative data.\n",
"\n",
"In this notebook, we are strictly interested in finding key data in the DataFrame. By the end of the notebook, you should have a good understanding of why working with CSV data in Python via Pandas is far more powerful and easy than using Excel."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "manual-rendering",
"metadata": {},
"outputs": [],
"source": [
"#data obtained from => https://github.com/datasciencedojo/datasets , but originally from Kaggle\n",
"df = pd.read_csv(\"data/titanic.csv\")"
]
},
{
"cell_type": "markdown",
"id": "polish-calcium",
"metadata": {},
"source": [
"## How to Find Column Data"
]
},
{
"cell_type": "markdown",
"id": "responsible-tooth",
"metadata": {},
"source": [
"Let's presume that we just obtained this dataset. Let's also presume we know nothing about it. Our first job is to get a sense of the data. Maybe we want to know all the columns that the dataset contains. To find this information, we can use df.columns"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "parallel-plymouth",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp',\n",
" 'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked'],\n",
" dtype='object')"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.columns"
]
},
{
"cell_type": "markdown",
"id": "removed-bench",
"metadata": {},
"source": [
"With our knowledge from the last notebook, we now know how to convert this into a list!"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "editorial-generator",
"metadata": {},
"outputs": [],
"source": [
"cols = df.columns.tolist()"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "limited-webmaster",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['PassengerId',\n",
" 'Survived',\n",
" 'Pclass',\n",
" 'Name',\n",
" 'Sex',\n",
" 'Age',\n",
" 'SibSp',\n",
" 'Parch',\n",
" 'Ticket',\n",
" 'Fare',\n",
" 'Cabin',\n",
" 'Embarked']"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cols"
]
},
{
"cell_type": "markdown",
"id": "informational-enough",
"metadata": {},
"source": [
"## How to Get a Quick Sense of the Dataset with df.head()"
]
},
{
"cell_type": "markdown",
"id": "norwegian-beijing",
"metadata": {},
"source": [
"Another way to get a quick sense of the data is to use df.head(). This allows us to return to top pieces of information from the dataset. By default, head returns the top 5 rows."
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "molecular-trigger",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
PassengerId
\n",
"
Survived
\n",
"
Pclass
\n",
"
Name
\n",
"
Sex
\n",
"
Age
\n",
"
SibSp
\n",
"
Parch
\n",
"
Ticket
\n",
"
Fare
\n",
"
Cabin
\n",
"
Embarked
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
3
\n",
"
Braund, Mr. Owen Harris
\n",
"
male
\n",
"
22.0
\n",
"
1
\n",
"
0
\n",
"
A/5 21171
\n",
"
7.2500
\n",
"
NaN
\n",
"
S
\n",
"
\n",
"
\n",
"
1
\n",
"
2
\n",
"
1
\n",
"
1
\n",
"
Cumings, Mrs. John Bradley (Florence Briggs Th...
\n",
"
female
\n",
"
38.0
\n",
"
1
\n",
"
0
\n",
"
PC 17599
\n",
"
71.2833
\n",
"
C85
\n",
"
C
\n",
"
\n",
"
\n",
"
2
\n",
"
3
\n",
"
1
\n",
"
3
\n",
"
Heikkinen, Miss. Laina
\n",
"
female
\n",
"
26.0
\n",
"
0
\n",
"
0
\n",
"
STON/O2. 3101282
\n",
"
7.9250
\n",
"
NaN
\n",
"
S
\n",
"
\n",
"
\n",
"
3
\n",
"
4
\n",
"
1
\n",
"
1
\n",
"
Futrelle, Mrs. Jacques Heath (Lily May Peel)
\n",
"
female
\n",
"
35.0
\n",
"
1
\n",
"
0
\n",
"
113803
\n",
"
53.1000
\n",
"
C123
\n",
"
S
\n",
"
\n",
"
\n",
"
4
\n",
"
5
\n",
"
0
\n",
"
3
\n",
"
Allen, Mr. William Henry
\n",
"
male
\n",
"
35.0
\n",
"
0
\n",
"
0
\n",
"
373450
\n",
"
8.0500
\n",
"
NaN
\n",
"
S
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" PassengerId Survived Pclass \\\n",
"0 1 0 3 \n",
"1 2 1 1 \n",
"2 3 1 3 \n",
"3 4 1 1 \n",
"4 5 0 3 \n",
"\n",
" Name Sex Age SibSp \\\n",
"0 Braund, Mr. Owen Harris male 22.0 1 \n",
"1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 \n",
"2 Heikkinen, Miss. Laina female 26.0 0 \n",
"3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 \n",
"4 Allen, Mr. William Henry male 35.0 0 \n",
"\n",
" Parch Ticket Fare Cabin Embarked \n",
"0 0 A/5 21171 7.2500 NaN S \n",
"1 0 PC 17599 71.2833 C85 C \n",
"2 0 STON/O2. 3101282 7.9250 NaN S \n",
"3 0 113803 53.1000 C123 S \n",
"4 0 373450 8.0500 NaN S "
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "markdown",
"id": "exotic-porcelain",
"metadata": {},
"source": [
"We can pass in an argument to return more than 5. Let's try and display 20."
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "operational-seating",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
PassengerId
\n",
"
Survived
\n",
"
Pclass
\n",
"
Name
\n",
"
Sex
\n",
"
Age
\n",
"
SibSp
\n",
"
Parch
\n",
"
Ticket
\n",
"
Fare
\n",
"
Cabin
\n",
"
Embarked
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
1
\n",
"
0
\n",
"
3
\n",
"
Braund, Mr. Owen Harris
\n",
"
male
\n",
"
22.0
\n",
"
1
\n",
"
0
\n",
"
A/5 21171
\n",
"
7.2500
\n",
"
NaN
\n",
"
S
\n",
"
\n",
"
\n",
"
1
\n",
"
2
\n",
"
1
\n",
"
1
\n",
"
Cumings, Mrs. John Bradley (Florence Briggs Th...
\n",
"
female
\n",
"
38.0
\n",
"
1
\n",
"
0
\n",
"
PC 17599
\n",
"
71.2833
\n",
"
C85
\n",
"
C
\n",
"
\n",
"
\n",
"
2
\n",
"
3
\n",
"
1
\n",
"
3
\n",
"
Heikkinen, Miss. Laina
\n",
"
female
\n",
"
26.0
\n",
"
0
\n",
"
0
\n",
"
STON/O2. 3101282
\n",
"
7.9250
\n",
"
NaN
\n",
"
S
\n",
"
\n",
"
\n",
"
3
\n",
"
4
\n",
"
1
\n",
"
1
\n",
"
Futrelle, Mrs. Jacques Heath (Lily May Peel)
\n",
"
female
\n",
"
35.0
\n",
"
1
\n",
"
0
\n",
"
113803
\n",
"
53.1000
\n",
"
C123
\n",
"
S
\n",
"
\n",
"
\n",
"
4
\n",
"
5
\n",
"
0
\n",
"
3
\n",
"
Allen, Mr. William Henry
\n",
"
male
\n",
"
35.0
\n",
"
0
\n",
"
0
\n",
"
373450
\n",
"
8.0500
\n",
"
NaN
\n",
"
S
\n",
"
\n",
"
\n",
"
5
\n",
"
6
\n",
"
0
\n",
"
3
\n",
"
Moran, Mr. James
\n",
"
male
\n",
"
NaN
\n",
"
0
\n",
"
0
\n",
"
330877
\n",
"
8.4583
\n",
"
NaN
\n",
"
Q
\n",
"
\n",
"
\n",
"
6
\n",
"
7
\n",
"
0
\n",
"
1
\n",
"
McCarthy, Mr. Timothy J
\n",
"
male
\n",
"
54.0
\n",
"
0
\n",
"
0
\n",
"
17463
\n",
"
51.8625
\n",
"
E46
\n",
"
S
\n",
"
\n",
"
\n",
"
7
\n",
"
8
\n",
"
0
\n",
"
3
\n",
"
Palsson, Master. Gosta Leonard
\n",
"
male
\n",
"
2.0
\n",
"
3
\n",
"
1
\n",
"
349909
\n",
"
21.0750
\n",
"
NaN
\n",
"
S
\n",
"
\n",
"
\n",
"
8
\n",
"
9
\n",
"
1
\n",
"
3
\n",
"
Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)
\n",
"
female
\n",
"
27.0
\n",
"
0
\n",
"
2
\n",
"
347742
\n",
"
11.1333
\n",
"
NaN
\n",
"
S
\n",
"
\n",
"
\n",
"
9
\n",
"
10
\n",
"
1
\n",
"
2
\n",
"
Nasser, Mrs. Nicholas (Adele Achem)
\n",
"
female
\n",
"
14.0
\n",
"
1
\n",
"
0
\n",
"
237736
\n",
"
30.0708
\n",
"
NaN
\n",
"
C
\n",
"
\n",
"
\n",
"
10
\n",
"
11
\n",
"
1
\n",
"
3
\n",
"
Sandstrom, Miss. Marguerite Rut
\n",
"
female
\n",
"
4.0
\n",
"
1
\n",
"
1
\n",
"
PP 9549
\n",
"
16.7000
\n",
"
G6
\n",
"
S
\n",
"
\n",
"
\n",
"
11
\n",
"
12
\n",
"
1
\n",
"
1
\n",
"
Bonnell, Miss. Elizabeth
\n",
"
female
\n",
"
58.0
\n",
"
0
\n",
"
0
\n",
"
113783
\n",
"
26.5500
\n",
"
C103
\n",
"
S
\n",
"
\n",
"
\n",
"
12
\n",
"
13
\n",
"
0
\n",
"
3
\n",
"
Saundercock, Mr. William Henry
\n",
"
male
\n",
"
20.0
\n",
"
0
\n",
"
0
\n",
"
A/5. 2151
\n",
"
8.0500
\n",
"
NaN
\n",
"
S
\n",
"
\n",
"
\n",
"
13
\n",
"
14
\n",
"
0
\n",
"
3
\n",
"
Andersson, Mr. Anders Johan
\n",
"
male
\n",
"
39.0
\n",
"
1
\n",
"
5
\n",
"
347082
\n",
"
31.2750
\n",
"
NaN
\n",
"
S
\n",
"
\n",
"
\n",
"
14
\n",
"
15
\n",
"
0
\n",
"
3
\n",
"
Vestrom, Miss. Hulda Amanda Adolfina
\n",
"
female
\n",
"
14.0
\n",
"
0
\n",
"
0
\n",
"
350406
\n",
"
7.8542
\n",
"
NaN
\n",
"
S
\n",
"
\n",
"
\n",
"
15
\n",
"
16
\n",
"
1
\n",
"
2
\n",
"
Hewlett, Mrs. (Mary D Kingcome)
\n",
"
female
\n",
"
55.0
\n",
"
0
\n",
"
0
\n",
"
248706
\n",
"
16.0000
\n",
"
NaN
\n",
"
S
\n",
"
\n",
"
\n",
"
16
\n",
"
17
\n",
"
0
\n",
"
3
\n",
"
Rice, Master. Eugene
\n",
"
male
\n",
"
2.0
\n",
"
4
\n",
"
1
\n",
"
382652
\n",
"
29.1250
\n",
"
NaN
\n",
"
Q
\n",
"
\n",
"
\n",
"
17
\n",
"
18
\n",
"
1
\n",
"
2
\n",
"
Williams, Mr. Charles Eugene
\n",
"
male
\n",
"
NaN
\n",
"
0
\n",
"
0
\n",
"
244373
\n",
"
13.0000
\n",
"
NaN
\n",
"
S
\n",
"
\n",
"
\n",
"
18
\n",
"
19
\n",
"
0
\n",
"
3
\n",
"
Vander Planke, Mrs. Julius (Emelia Maria Vande...
\n",
"
female
\n",
"
31.0
\n",
"
1
\n",
"
0
\n",
"
345763
\n",
"
18.0000
\n",
"
NaN
\n",
"
S
\n",
"
\n",
"
\n",
"
19
\n",
"
20
\n",
"
1
\n",
"
3
\n",
"
Masselmani, Mrs. Fatima
\n",
"
female
\n",
"
NaN
\n",
"
0
\n",
"
0
\n",
"
2649
\n",
"
7.2250
\n",
"
NaN
\n",
"
C
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" PassengerId Survived Pclass \\\n",
"0 1 0 3 \n",
"1 2 1 1 \n",
"2 3 1 3 \n",
"3 4 1 1 \n",
"4 5 0 3 \n",
"5 6 0 3 \n",
"6 7 0 1 \n",
"7 8 0 3 \n",
"8 9 1 3 \n",
"9 10 1 2 \n",
"10 11 1 3 \n",
"11 12 1 1 \n",
"12 13 0 3 \n",
"13 14 0 3 \n",
"14 15 0 3 \n",
"15 16 1 2 \n",
"16 17 0 3 \n",
"17 18 1 2 \n",
"18 19 0 3 \n",
"19 20 1 3 \n",
"\n",
" Name Sex Age SibSp \\\n",
"0 Braund, Mr. Owen Harris male 22.0 1 \n",
"1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 \n",
"2 Heikkinen, Miss. Laina female 26.0 0 \n",
"3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 \n",
"4 Allen, Mr. William Henry male 35.0 0 \n",
"5 Moran, Mr. James male NaN 0 \n",
"6 McCarthy, Mr. Timothy J male 54.0 0 \n",
"7 Palsson, Master. Gosta Leonard male 2.0 3 \n",
"8 Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg) female 27.0 0 \n",
"9 Nasser, Mrs. Nicholas (Adele Achem) female 14.0 1 \n",
"10 Sandstrom, Miss. Marguerite Rut female 4.0 1 \n",
"11 Bonnell, Miss. Elizabeth female 58.0 0 \n",
"12 Saundercock, Mr. William Henry male 20.0 0 \n",
"13 Andersson, Mr. Anders Johan male 39.0 1 \n",
"14 Vestrom, Miss. Hulda Amanda Adolfina female 14.0 0 \n",
"15 Hewlett, Mrs. (Mary D Kingcome) female 55.0 0 \n",
"16 Rice, Master. Eugene male 2.0 4 \n",
"17 Williams, Mr. Charles Eugene male NaN 0 \n",
"18 Vander Planke, Mrs. Julius (Emelia Maria Vande... female 31.0 1 \n",
"19 Masselmani, Mrs. Fatima female NaN 0 \n",
"\n",
" Parch Ticket Fare Cabin Embarked \n",
"0 0 A/5 21171 7.2500 NaN S \n",
"1 0 PC 17599 71.2833 C85 C \n",
"2 0 STON/O2. 3101282 7.9250 NaN S \n",
"3 0 113803 53.1000 C123 S \n",
"4 0 373450 8.0500 NaN S \n",
"5 0 330877 8.4583 NaN Q \n",
"6 0 17463 51.8625 E46 S \n",
"7 1 349909 21.0750 NaN S \n",
"8 2 347742 11.1333 NaN S \n",
"9 0 237736 30.0708 NaN C \n",
"10 1 PP 9549 16.7000 G6 S \n",
"11 0 113783 26.5500 C103 S \n",
"12 0 A/5. 2151 8.0500 NaN S \n",
"13 5 347082 31.2750 NaN S \n",
"14 0 350406 7.8542 NaN S \n",
"15 0 248706 16.0000 NaN S \n",
"16 1 382652 29.1250 NaN Q \n",
"17 0 244373 13.0000 NaN S \n",
"18 0 345763 18.0000 NaN S \n",
"19 0 2649 7.2250 NaN C "
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head(20)"
]
},
{
"cell_type": "markdown",
"id": "broad-plaza",
"metadata": {},
"source": [
"## How to Grab a Specific Range of Rows with df.iloc[]"
]
},
{
"cell_type": "markdown",
"id": "timely-comparative",
"metadata": {},
"source": [
"Sometimes, however, you need to grab a specific range of rows. Let's say I am interested in rows 5-20. To grab this data, we can use the df.iloc[] command that we met in the last notebook. This will allow us to pass a specific index range like a list."
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "unable-respondent",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
PassengerId
\n",
"
Survived
\n",
"
Pclass
\n",
"
Name
\n",
"
Sex
\n",
"
Age
\n",
"
SibSp
\n",
"
Parch
\n",
"
Ticket
\n",
"
Fare
\n",
"
Cabin
\n",
"
Embarked
\n",
"
\n",
" \n",
" \n",
"
\n",
"
5
\n",
"
6
\n",
"
0
\n",
"
3
\n",
"
Moran, Mr. James
\n",
"
male
\n",
"
NaN
\n",
"
0
\n",
"
0
\n",
"
330877
\n",
"
8.4583
\n",
"
NaN
\n",
"
Q
\n",
"
\n",
"
\n",
"
6
\n",
"
7
\n",
"
0
\n",
"
1
\n",
"
McCarthy, Mr. Timothy J
\n",
"
male
\n",
"
54.0
\n",
"
0
\n",
"
0
\n",
"
17463
\n",
"
51.8625
\n",
"
E46
\n",
"
S
\n",
"
\n",
"
\n",
"
7
\n",
"
8
\n",
"
0
\n",
"
3
\n",
"
Palsson, Master. Gosta Leonard
\n",
"
male
\n",
"
2.0
\n",
"
3
\n",
"
1
\n",
"
349909
\n",
"
21.0750
\n",
"
NaN
\n",
"
S
\n",
"
\n",
"
\n",
"
8
\n",
"
9
\n",
"
1
\n",
"
3
\n",
"
Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)
\n",
"
female
\n",
"
27.0
\n",
"
0
\n",
"
2
\n",
"
347742
\n",
"
11.1333
\n",
"
NaN
\n",
"
S
\n",
"
\n",
"
\n",
"
9
\n",
"
10
\n",
"
1
\n",
"
2
\n",
"
Nasser, Mrs. Nicholas (Adele Achem)
\n",
"
female
\n",
"
14.0
\n",
"
1
\n",
"
0
\n",
"
237736
\n",
"
30.0708
\n",
"
NaN
\n",
"
C
\n",
"
\n",
"
\n",
"
10
\n",
"
11
\n",
"
1
\n",
"
3
\n",
"
Sandstrom, Miss. Marguerite Rut
\n",
"
female
\n",
"
4.0
\n",
"
1
\n",
"
1
\n",
"
PP 9549
\n",
"
16.7000
\n",
"
G6
\n",
"
S
\n",
"
\n",
"
\n",
"
11
\n",
"
12
\n",
"
1
\n",
"
1
\n",
"
Bonnell, Miss. Elizabeth
\n",
"
female
\n",
"
58.0
\n",
"
0
\n",
"
0
\n",
"
113783
\n",
"
26.5500
\n",
"
C103
\n",
"
S
\n",
"
\n",
"
\n",
"
12
\n",
"
13
\n",
"
0
\n",
"
3
\n",
"
Saundercock, Mr. William Henry
\n",
"
male
\n",
"
20.0
\n",
"
0
\n",
"
0
\n",
"
A/5. 2151
\n",
"
8.0500
\n",
"
NaN
\n",
"
S
\n",
"
\n",
"
\n",
"
13
\n",
"
14
\n",
"
0
\n",
"
3
\n",
"
Andersson, Mr. Anders Johan
\n",
"
male
\n",
"
39.0
\n",
"
1
\n",
"
5
\n",
"
347082
\n",
"
31.2750
\n",
"
NaN
\n",
"
S
\n",
"
\n",
"
\n",
"
14
\n",
"
15
\n",
"
0
\n",
"
3
\n",
"
Vestrom, Miss. Hulda Amanda Adolfina
\n",
"
female
\n",
"
14.0
\n",
"
0
\n",
"
0
\n",
"
350406
\n",
"
7.8542
\n",
"
NaN
\n",
"
S
\n",
"
\n",
"
\n",
"
15
\n",
"
16
\n",
"
1
\n",
"
2
\n",
"
Hewlett, Mrs. (Mary D Kingcome)
\n",
"
female
\n",
"
55.0
\n",
"
0
\n",
"
0
\n",
"
248706
\n",
"
16.0000
\n",
"
NaN
\n",
"
S
\n",
"
\n",
"
\n",
"
16
\n",
"
17
\n",
"
0
\n",
"
3
\n",
"
Rice, Master. Eugene
\n",
"
male
\n",
"
2.0
\n",
"
4
\n",
"
1
\n",
"
382652
\n",
"
29.1250
\n",
"
NaN
\n",
"
Q
\n",
"
\n",
"
\n",
"
17
\n",
"
18
\n",
"
1
\n",
"
2
\n",
"
Williams, Mr. Charles Eugene
\n",
"
male
\n",
"
NaN
\n",
"
0
\n",
"
0
\n",
"
244373
\n",
"
13.0000
\n",
"
NaN
\n",
"
S
\n",
"
\n",
"
\n",
"
18
\n",
"
19
\n",
"
0
\n",
"
3
\n",
"
Vander Planke, Mrs. Julius (Emelia Maria Vande...
\n",
"
female
\n",
"
31.0
\n",
"
1
\n",
"
0
\n",
"
345763
\n",
"
18.0000
\n",
"
NaN
\n",
"
S
\n",
"
\n",
"
\n",
"
19
\n",
"
20
\n",
"
1
\n",
"
3
\n",
"
Masselmani, Mrs. Fatima
\n",
"
female
\n",
"
NaN
\n",
"
0
\n",
"
0
\n",
"
2649
\n",
"
7.2250
\n",
"
NaN
\n",
"
C
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" PassengerId Survived Pclass \\\n",
"5 6 0 3 \n",
"6 7 0 1 \n",
"7 8 0 3 \n",
"8 9 1 3 \n",
"9 10 1 2 \n",
"10 11 1 3 \n",
"11 12 1 1 \n",
"12 13 0 3 \n",
"13 14 0 3 \n",
"14 15 0 3 \n",
"15 16 1 2 \n",
"16 17 0 3 \n",
"17 18 1 2 \n",
"18 19 0 3 \n",
"19 20 1 3 \n",
"\n",
" Name Sex Age SibSp \\\n",
"5 Moran, Mr. James male NaN 0 \n",
"6 McCarthy, Mr. Timothy J male 54.0 0 \n",
"7 Palsson, Master. Gosta Leonard male 2.0 3 \n",
"8 Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg) female 27.0 0 \n",
"9 Nasser, Mrs. Nicholas (Adele Achem) female 14.0 1 \n",
"10 Sandstrom, Miss. Marguerite Rut female 4.0 1 \n",
"11 Bonnell, Miss. Elizabeth female 58.0 0 \n",
"12 Saundercock, Mr. William Henry male 20.0 0 \n",
"13 Andersson, Mr. Anders Johan male 39.0 1 \n",
"14 Vestrom, Miss. Hulda Amanda Adolfina female 14.0 0 \n",
"15 Hewlett, Mrs. (Mary D Kingcome) female 55.0 0 \n",
"16 Rice, Master. Eugene male 2.0 4 \n",
"17 Williams, Mr. Charles Eugene male NaN 0 \n",
"18 Vander Planke, Mrs. Julius (Emelia Maria Vande... female 31.0 1 \n",
"19 Masselmani, Mrs. Fatima female NaN 0 \n",
"\n",
" Parch Ticket Fare Cabin Embarked \n",
"5 0 330877 8.4583 NaN Q \n",
"6 0 17463 51.8625 E46 S \n",
"7 1 349909 21.0750 NaN S \n",
"8 2 347742 11.1333 NaN S \n",
"9 0 237736 30.0708 NaN C \n",
"10 1 PP 9549 16.7000 G6 S \n",
"11 0 113783 26.5500 C103 S \n",
"12 0 A/5. 2151 8.0500 NaN S \n",
"13 5 347082 31.2750 NaN S \n",
"14 0 350406 7.8542 NaN S \n",
"15 0 248706 16.0000 NaN S \n",
"16 1 382652 29.1250 NaN Q \n",
"17 0 244373 13.0000 NaN S \n",
"18 0 345763 18.0000 NaN S \n",
"19 0 2649 7.2250 NaN C "
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.iloc[5:20]"
]
},
{
"cell_type": "markdown",
"id": "happy-induction",
"metadata": {},
"source": [
"## How to Get a Quick Quantitative Understanding of the Dataset with Describe()"
]
},
{
"cell_type": "markdown",
"id": "visible-london",
"metadata": {},
"source": [
"When working with numerical, quantitative data, we can automatically grab all numerical rows and learn a lot about the data with df.describe(). This will return the count, mean, standard deviation, minimum, maximum, etc. of our quantitative data. Sometimes, this data is useful to compute in such a way, such as the column Survived. Here, we can see that roughly .38 or 38% of the passengers (in this dataset) survived. Other numerical data does not really lend itself well to this kind of analysis, e.g. PassengerId which a unique numerical number corresponding to each passenger."
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "planned-alberta",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
PassengerId
\n",
"
Survived
\n",
"
Pclass
\n",
"
Age
\n",
"
SibSp
\n",
"
Parch
\n",
"
Fare
\n",
"
\n",
" \n",
" \n",
"
\n",
"
count
\n",
"
891.000000
\n",
"
891.000000
\n",
"
891.000000
\n",
"
714.000000
\n",
"
891.000000
\n",
"
891.000000
\n",
"
891.000000
\n",
"
\n",
"
\n",
"
mean
\n",
"
446.000000
\n",
"
0.383838
\n",
"
2.308642
\n",
"
29.699118
\n",
"
0.523008
\n",
"
0.381594
\n",
"
32.204208
\n",
"
\n",
"
\n",
"
std
\n",
"
257.353842
\n",
"
0.486592
\n",
"
0.836071
\n",
"
14.526497
\n",
"
1.102743
\n",
"
0.806057
\n",
"
49.693429
\n",
"
\n",
"
\n",
"
min
\n",
"
1.000000
\n",
"
0.000000
\n",
"
1.000000
\n",
"
0.420000
\n",
"
0.000000
\n",
"
0.000000
\n",
"
0.000000
\n",
"
\n",
"
\n",
"
25%
\n",
"
223.500000
\n",
"
0.000000
\n",
"
2.000000
\n",
"
20.125000
\n",
"
0.000000
\n",
"
0.000000
\n",
"
7.910400
\n",
"
\n",
"
\n",
"
50%
\n",
"
446.000000
\n",
"
0.000000
\n",
"
3.000000
\n",
"
28.000000
\n",
"
0.000000
\n",
"
0.000000
\n",
"
14.454200
\n",
"
\n",
"
\n",
"
75%
\n",
"
668.500000
\n",
"
1.000000
\n",
"
3.000000
\n",
"
38.000000
\n",
"
1.000000
\n",
"
0.000000
\n",
"
31.000000
\n",
"
\n",
"
\n",
"
max
\n",
"
891.000000
\n",
"
1.000000
\n",
"
3.000000
\n",
"
80.000000
\n",
"
8.000000
\n",
"
6.000000
\n",
"
512.329200
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" PassengerId Survived Pclass Age SibSp \\\n",
"count 891.000000 891.000000 891.000000 714.000000 891.000000 \n",
"mean 446.000000 0.383838 2.308642 29.699118 0.523008 \n",
"std 257.353842 0.486592 0.836071 14.526497 1.102743 \n",
"min 1.000000 0.000000 1.000000 0.420000 0.000000 \n",
"25% 223.500000 0.000000 2.000000 20.125000 0.000000 \n",
"50% 446.000000 0.000000 3.000000 28.000000 0.000000 \n",
"75% 668.500000 1.000000 3.000000 38.000000 1.000000 \n",
"max 891.000000 1.000000 3.000000 80.000000 8.000000 \n",
"\n",
" Parch Fare \n",
"count 891.000000 891.000000 \n",
"mean 0.381594 32.204208 \n",
"std 0.806057 49.693429 \n",
"min 0.000000 0.000000 \n",
"25% 0.000000 7.910400 \n",
"50% 0.000000 14.454200 \n",
"75% 0.000000 31.000000 \n",
"max 6.000000 512.329200 "
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.describe()"
]
},
{
"cell_type": "markdown",
"id": "dependent-disclaimer",
"metadata": {},
"source": [
"## How to Find Specific Information in the Dataset with df.loc"
]
},
{
"cell_type": "markdown",
"id": "rough-singapore",
"metadata": {},
"source": [
"As researchers, however, we often need to find targeted information in our dataset. Let's say I am researching female passengers on the Titanic. I can achieve this fairly easily in Excel and in Python with Pandas. To do this in Pandas, you can use df.loc[] to pass a specific argument. In the example below, we are stating that we are looking for the columns in the DataFrame of \"Sex\" that match the string \"female\". Try to recreate this and find all males in the dataset on your own. Note at the bottom of the DataFrame 314x12. This is the dimensions of the DataFrame. We can see that we have 314 results. "
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "sunset-wallace",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
PassengerId
\n",
"
Survived
\n",
"
Pclass
\n",
"
Name
\n",
"
Sex
\n",
"
Age
\n",
"
SibSp
\n",
"
Parch
\n",
"
Ticket
\n",
"
Fare
\n",
"
Cabin
\n",
"
Embarked
\n",
"
\n",
" \n",
" \n",
"
\n",
"
1
\n",
"
2
\n",
"
1
\n",
"
1
\n",
"
Cumings, Mrs. John Bradley (Florence Briggs Th...
\n",
"
female
\n",
"
38.0
\n",
"
1
\n",
"
0
\n",
"
PC 17599
\n",
"
71.2833
\n",
"
C85
\n",
"
C
\n",
"
\n",
"
\n",
"
2
\n",
"
3
\n",
"
1
\n",
"
3
\n",
"
Heikkinen, Miss. Laina
\n",
"
female
\n",
"
26.0
\n",
"
0
\n",
"
0
\n",
"
STON/O2. 3101282
\n",
"
7.9250
\n",
"
NaN
\n",
"
S
\n",
"
\n",
"
\n",
"
3
\n",
"
4
\n",
"
1
\n",
"
1
\n",
"
Futrelle, Mrs. Jacques Heath (Lily May Peel)
\n",
"
female
\n",
"
35.0
\n",
"
1
\n",
"
0
\n",
"
113803
\n",
"
53.1000
\n",
"
C123
\n",
"
S
\n",
"
\n",
"
\n",
"
8
\n",
"
9
\n",
"
1
\n",
"
3
\n",
"
Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)
\n",
"
female
\n",
"
27.0
\n",
"
0
\n",
"
2
\n",
"
347742
\n",
"
11.1333
\n",
"
NaN
\n",
"
S
\n",
"
\n",
"
\n",
"
9
\n",
"
10
\n",
"
1
\n",
"
2
\n",
"
Nasser, Mrs. Nicholas (Adele Achem)
\n",
"
female
\n",
"
14.0
\n",
"
1
\n",
"
0
\n",
"
237736
\n",
"
30.0708
\n",
"
NaN
\n",
"
C
\n",
"
\n",
"
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
\n",
"
\n",
"
880
\n",
"
881
\n",
"
1
\n",
"
2
\n",
"
Shelley, Mrs. William (Imanita Parrish Hall)
\n",
"
female
\n",
"
25.0
\n",
"
0
\n",
"
1
\n",
"
230433
\n",
"
26.0000
\n",
"
NaN
\n",
"
S
\n",
"
\n",
"
\n",
"
882
\n",
"
883
\n",
"
0
\n",
"
3
\n",
"
Dahlberg, Miss. Gerda Ulrika
\n",
"
female
\n",
"
22.0
\n",
"
0
\n",
"
0
\n",
"
7552
\n",
"
10.5167
\n",
"
NaN
\n",
"
S
\n",
"
\n",
"
\n",
"
885
\n",
"
886
\n",
"
0
\n",
"
3
\n",
"
Rice, Mrs. William (Margaret Norton)
\n",
"
female
\n",
"
39.0
\n",
"
0
\n",
"
5
\n",
"
382652
\n",
"
29.1250
\n",
"
NaN
\n",
"
Q
\n",
"
\n",
"
\n",
"
887
\n",
"
888
\n",
"
1
\n",
"
1
\n",
"
Graham, Miss. Margaret Edith
\n",
"
female
\n",
"
19.0
\n",
"
0
\n",
"
0
\n",
"
112053
\n",
"
30.0000
\n",
"
B42
\n",
"
S
\n",
"
\n",
"
\n",
"
888
\n",
"
889
\n",
"
0
\n",
"
3
\n",
"
Johnston, Miss. Catherine Helen \"Carrie\"
\n",
"
female
\n",
"
NaN
\n",
"
1
\n",
"
2
\n",
"
W./C. 6607
\n",
"
23.4500
\n",
"
NaN
\n",
"
S
\n",
"
\n",
" \n",
"
\n",
"
314 rows × 12 columns
\n",
"
"
],
"text/plain": [
" PassengerId Survived Pclass \\\n",
"1 2 1 1 \n",
"2 3 1 3 \n",
"3 4 1 1 \n",
"8 9 1 3 \n",
"9 10 1 2 \n",
".. ... ... ... \n",
"880 881 1 2 \n",
"882 883 0 3 \n",
"885 886 0 3 \n",
"887 888 1 1 \n",
"888 889 0 3 \n",
"\n",
" Name Sex Age SibSp \\\n",
"1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 \n",
"2 Heikkinen, Miss. Laina female 26.0 0 \n",
"3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 \n",
"8 Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg) female 27.0 0 \n",
"9 Nasser, Mrs. Nicholas (Adele Achem) female 14.0 1 \n",
".. ... ... ... ... \n",
"880 Shelley, Mrs. William (Imanita Parrish Hall) female 25.0 0 \n",
"882 Dahlberg, Miss. Gerda Ulrika female 22.0 0 \n",
"885 Rice, Mrs. William (Margaret Norton) female 39.0 0 \n",
"887 Graham, Miss. Margaret Edith female 19.0 0 \n",
"888 Johnston, Miss. Catherine Helen \"Carrie\" female NaN 1 \n",
"\n",
" Parch Ticket Fare Cabin Embarked \n",
"1 0 PC 17599 71.2833 C85 C \n",
"2 0 STON/O2. 3101282 7.9250 NaN S \n",
"3 0 113803 53.1000 C123 S \n",
"8 2 347742 11.1333 NaN S \n",
"9 0 237736 30.0708 NaN C \n",
".. ... ... ... ... ... \n",
"880 1 230433 26.0000 NaN S \n",
"882 0 7552 10.5167 NaN S \n",
"885 5 382652 29.1250 NaN Q \n",
"887 0 112053 30.0000 B42 S \n",
"888 2 W./C. 6607 23.4500 NaN S \n",
"\n",
"[314 rows x 12 columns]"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.loc[df[\"Sex\"] == \"female\"]"
]
},
{
"cell_type": "markdown",
"id": "fitted-multimedia",
"metadata": {},
"source": [
"But that's not all df.loc[] can do. We can add multiple arguments to our search query here, rather like SQL. In the example below, note the addition of the parentheses around df\\[\"Sex\"\\] == \"female\" this denotes one parameter of our search. We then use the & to add a second parameter. In this case, we are interested in not only returning those that are female, but also those who are in Pclass, or Passenger Class 1, i.e. First Class. When we execute the command below, we see that we have found 94 results."
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "interstate-compact",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
PassengerId
\n",
"
Survived
\n",
"
Pclass
\n",
"
Name
\n",
"
Sex
\n",
"
Age
\n",
"
SibSp
\n",
"
Parch
\n",
"
Ticket
\n",
"
Fare
\n",
"
Cabin
\n",
"
Embarked
\n",
"
\n",
" \n",
" \n",
"
\n",
"
1
\n",
"
2
\n",
"
1
\n",
"
1
\n",
"
Cumings, Mrs. John Bradley (Florence Briggs Th...
\n",
"
female
\n",
"
38.0
\n",
"
1
\n",
"
0
\n",
"
PC 17599
\n",
"
71.2833
\n",
"
C85
\n",
"
C
\n",
"
\n",
"
\n",
"
3
\n",
"
4
\n",
"
1
\n",
"
1
\n",
"
Futrelle, Mrs. Jacques Heath (Lily May Peel)
\n",
"
female
\n",
"
35.0
\n",
"
1
\n",
"
0
\n",
"
113803
\n",
"
53.1000
\n",
"
C123
\n",
"
S
\n",
"
\n",
"
\n",
"
11
\n",
"
12
\n",
"
1
\n",
"
1
\n",
"
Bonnell, Miss. Elizabeth
\n",
"
female
\n",
"
58.0
\n",
"
0
\n",
"
0
\n",
"
113783
\n",
"
26.5500
\n",
"
C103
\n",
"
S
\n",
"
\n",
"
\n",
"
31
\n",
"
32
\n",
"
1
\n",
"
1
\n",
"
Spencer, Mrs. William Augustus (Marie Eugenie)
\n",
"
female
\n",
"
NaN
\n",
"
1
\n",
"
0
\n",
"
PC 17569
\n",
"
146.5208
\n",
"
B78
\n",
"
C
\n",
"
\n",
"
\n",
"
52
\n",
"
53
\n",
"
1
\n",
"
1
\n",
"
Harper, Mrs. Henry Sleeper (Myna Haxtun)
\n",
"
female
\n",
"
49.0
\n",
"
1
\n",
"
0
\n",
"
PC 17572
\n",
"
76.7292
\n",
"
D33
\n",
"
C
\n",
"
\n",
"
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
\n",
"
\n",
"
856
\n",
"
857
\n",
"
1
\n",
"
1
\n",
"
Wick, Mrs. George Dennick (Mary Hitchcock)
\n",
"
female
\n",
"
45.0
\n",
"
1
\n",
"
1
\n",
"
36928
\n",
"
164.8667
\n",
"
NaN
\n",
"
S
\n",
"
\n",
"
\n",
"
862
\n",
"
863
\n",
"
1
\n",
"
1
\n",
"
Swift, Mrs. Frederick Joel (Margaret Welles Ba...
\n",
"
female
\n",
"
48.0
\n",
"
0
\n",
"
0
\n",
"
17466
\n",
"
25.9292
\n",
"
D17
\n",
"
S
\n",
"
\n",
"
\n",
"
871
\n",
"
872
\n",
"
1
\n",
"
1
\n",
"
Beckwith, Mrs. Richard Leonard (Sallie Monypeny)
\n",
"
female
\n",
"
47.0
\n",
"
1
\n",
"
1
\n",
"
11751
\n",
"
52.5542
\n",
"
D35
\n",
"
S
\n",
"
\n",
"
\n",
"
879
\n",
"
880
\n",
"
1
\n",
"
1
\n",
"
Potter, Mrs. Thomas Jr (Lily Alexenia Wilson)
\n",
"
female
\n",
"
56.0
\n",
"
0
\n",
"
1
\n",
"
11767
\n",
"
83.1583
\n",
"
C50
\n",
"
C
\n",
"
\n",
"
\n",
"
887
\n",
"
888
\n",
"
1
\n",
"
1
\n",
"
Graham, Miss. Margaret Edith
\n",
"
female
\n",
"
19.0
\n",
"
0
\n",
"
0
\n",
"
112053
\n",
"
30.0000
\n",
"
B42
\n",
"
S
\n",
"
\n",
" \n",
"
\n",
"
94 rows × 12 columns
\n",
"
"
],
"text/plain": [
" PassengerId Survived Pclass \\\n",
"1 2 1 1 \n",
"3 4 1 1 \n",
"11 12 1 1 \n",
"31 32 1 1 \n",
"52 53 1 1 \n",
".. ... ... ... \n",
"856 857 1 1 \n",
"862 863 1 1 \n",
"871 872 1 1 \n",
"879 880 1 1 \n",
"887 888 1 1 \n",
"\n",
" Name Sex Age SibSp \\\n",
"1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 \n",
"3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 \n",
"11 Bonnell, Miss. Elizabeth female 58.0 0 \n",
"31 Spencer, Mrs. William Augustus (Marie Eugenie) female NaN 1 \n",
"52 Harper, Mrs. Henry Sleeper (Myna Haxtun) female 49.0 1 \n",
".. ... ... ... ... \n",
"856 Wick, Mrs. George Dennick (Mary Hitchcock) female 45.0 1 \n",
"862 Swift, Mrs. Frederick Joel (Margaret Welles Ba... female 48.0 0 \n",
"871 Beckwith, Mrs. Richard Leonard (Sallie Monypeny) female 47.0 1 \n",
"879 Potter, Mrs. Thomas Jr (Lily Alexenia Wilson) female 56.0 0 \n",
"887 Graham, Miss. Margaret Edith female 19.0 0 \n",
"\n",
" Parch Ticket Fare Cabin Embarked \n",
"1 0 PC 17599 71.2833 C85 C \n",
"3 0 113803 53.1000 C123 S \n",
"11 0 113783 26.5500 C103 S \n",
"31 0 PC 17569 146.5208 B78 C \n",
"52 0 PC 17572 76.7292 D33 C \n",
".. ... ... ... ... ... \n",
"856 1 36928 164.8667 NaN S \n",
"862 0 17466 25.9292 D17 S \n",
"871 1 11751 52.5542 D35 S \n",
"879 1 11767 83.1583 C50 C \n",
"887 0 112053 30.0000 B42 S \n",
"\n",
"[94 rows x 12 columns]"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.loc[(df[\"Sex\"] == \"female\") & (df[\"Pclass\"] == 1)]"
]
},
{
"cell_type": "markdown",
"id": "center-gossip",
"metadata": {},
"source": [
"Let's say now we are interested in only finding those that did not survive of the 94. To do that, we would simply add a third argument to our query. We ask Pandas to only show those whose Survived value is 0, or False. In other words, we are seeking to find those who did not survive."
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "continental-nowhere",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
PassengerId
\n",
"
Survived
\n",
"
Pclass
\n",
"
Name
\n",
"
Sex
\n",
"
Age
\n",
"
SibSp
\n",
"
Parch
\n",
"
Ticket
\n",
"
Fare
\n",
"
Cabin
\n",
"
Embarked
\n",
"
\n",
" \n",
" \n",
"
\n",
"
177
\n",
"
178
\n",
"
0
\n",
"
1
\n",
"
Isham, Miss. Ann Elizabeth
\n",
"
female
\n",
"
50.0
\n",
"
0
\n",
"
0
\n",
"
PC 17595
\n",
"
28.7125
\n",
"
C49
\n",
"
C
\n",
"
\n",
"
\n",
"
297
\n",
"
298
\n",
"
0
\n",
"
1
\n",
"
Allison, Miss. Helen Loraine
\n",
"
female
\n",
"
2.0
\n",
"
1
\n",
"
2
\n",
"
113781
\n",
"
151.5500
\n",
"
C22 C26
\n",
"
S
\n",
"
\n",
"
\n",
"
498
\n",
"
499
\n",
"
0
\n",
"
1
\n",
"
Allison, Mrs. Hudson J C (Bessie Waldo Daniels)
\n",
"
female
\n",
"
25.0
\n",
"
1
\n",
"
2
\n",
"
113781
\n",
"
151.5500
\n",
"
C22 C26
\n",
"
S
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" PassengerId Survived Pclass \\\n",
"177 178 0 1 \n",
"297 298 0 1 \n",
"498 499 0 1 \n",
"\n",
" Name Sex Age SibSp \\\n",
"177 Isham, Miss. Ann Elizabeth female 50.0 0 \n",
"297 Allison, Miss. Helen Loraine female 2.0 1 \n",
"498 Allison, Mrs. Hudson J C (Bessie Waldo Daniels) female 25.0 1 \n",
"\n",
" Parch Ticket Fare Cabin Embarked \n",
"177 0 PC 17595 28.7125 C49 C \n",
"297 2 113781 151.5500 C22 C26 S \n",
"498 2 113781 151.5500 C22 C26 S "
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.loc[(df[\"Sex\"] == \"female\") & (df[\"Pclass\"] == 1) & (df[\"Survived\"] == 0)]"
]
},
{
"cell_type": "markdown",
"id": "third-operation",
"metadata": {},
"source": [
"With this data, I not only have qualitative information about each passenger, I also have some significant quantitative information as well. I know that of the 314 people in the dataset with the sex of female, 94 were in first class and of those 94, all survived except 3. Let's now pose a question and I think you may already know the answer. Did class in any way play a role in the chance off survival on the Titanic. We don't need a fancy machine learning algorithm to help us answer this question. We can simply examine the results from our query. Let's find the total number of passengers that are identified as female, but not in first class, and then find the number of those who did not survive."
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "brilliant-genetics",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
"
],
"text/plain": [
" PassengerId Survived Pclass \\\n",
"14 15 0 3 \n",
"18 19 0 3 \n",
"24 25 0 3 \n",
"38 39 0 3 \n",
"40 41 0 3 \n",
".. ... ... ... \n",
"854 855 0 2 \n",
"863 864 0 3 \n",
"882 883 0 3 \n",
"885 886 0 3 \n",
"888 889 0 3 \n",
"\n",
" Name Sex Age SibSp \\\n",
"14 Vestrom, Miss. Hulda Amanda Adolfina female 14.0 0 \n",
"18 Vander Planke, Mrs. Julius (Emelia Maria Vande... female 31.0 1 \n",
"24 Palsson, Miss. Torborg Danira female 8.0 3 \n",
"38 Vander Planke, Miss. Augusta Maria female 18.0 2 \n",
"40 Ahlin, Mrs. Johan (Johanna Persdotter Larsson) female 40.0 1 \n",
".. ... ... ... ... \n",
"854 Carter, Mrs. Ernest Courtenay (Lilian Hughes) female 44.0 1 \n",
"863 Sage, Miss. Dorothy Edith \"Dolly\" female NaN 8 \n",
"882 Dahlberg, Miss. Gerda Ulrika female 22.0 0 \n",
"885 Rice, Mrs. William (Margaret Norton) female 39.0 0 \n",
"888 Johnston, Miss. Catherine Helen \"Carrie\" female NaN 1 \n",
"\n",
" Parch Ticket Fare Cabin Embarked \n",
"14 0 350406 7.8542 NaN S \n",
"18 0 345763 18.0000 NaN S \n",
"24 1 349909 21.0750 NaN S \n",
"38 0 345764 18.0000 NaN S \n",
"40 0 7546 9.4750 NaN S \n",
".. ... ... ... ... ... \n",
"854 0 244252 26.0000 NaN S \n",
"863 2 CA. 2343 69.5500 NaN S \n",
"882 0 7552 10.5167 NaN S \n",
"885 5 382652 29.1250 NaN Q \n",
"888 2 W./C. 6607 23.4500 NaN S \n",
"\n",
"[78 rows x 12 columns]"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.loc[(df[\"Sex\"] == \"female\") & (df[\"Pclass\"] > 1) & (df[\"Survived\"] == 0)]"
]
},
{
"cell_type": "markdown",
"id": "greatest-yeast",
"metadata": {},
"source": [
"As we can see, we have 220 total people identified as female with 78 not surviving. We can perform some quick calculations to understand better our results."
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "charming-rescue",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.031914893617021274"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x = 3/94\n",
"x"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "focal-collapse",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.35454545454545455"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"y = 78/220\n",
"y"
]
},
{
"cell_type": "markdown",
"id": "adjustable-lightning",
"metadata": {},
"source": [
"In this case, x is the death rate for a first class female, whereas y is the death rate for those not in first class and female. The result is clear. The chance for survival was significantly greater for those identified as female if they were in first class, roughly 32.5% difference. Were there other factors at play, possibly. But that is beyond the point here. I am simply trying to illustrate how to obtain data to begin framing the research question at hand.\n",
"\n",
"Although we can achieve this same analysis in Excel, doing this in Python allows us to leverage the power of an entire programming language with the data at hand. This is a simple example of the power of Pandas over Excel. As we move forward in the next notebooks, you will see more examples of this."
]
},
{
"cell_type": "markdown",
"id": "dominican-knock",
"metadata": {},
"source": [
"## How to Query with \"OR\" (|) on a DataFrame"
]
},
{
"cell_type": "markdown",
"id": "moral-green",
"metadata": {},
"source": [
"Above we saw how to use df.loc to structure logical arguments for finding two specific conditions with \"&\". This indicated that both conditions must be true to be returned as a result. In case you don't remember it, it looked like this:"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "capable-silly",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
PassengerId
\n",
"
Survived
\n",
"
Pclass
\n",
"
Name
\n",
"
Sex
\n",
"
Age
\n",
"
SibSp
\n",
"
Parch
\n",
"
Ticket
\n",
"
Fare
\n",
"
Cabin
\n",
"
Embarked
\n",
"
\n",
" \n",
" \n",
"
\n",
"
2
\n",
"
3
\n",
"
1
\n",
"
3
\n",
"
Heikkinen, Miss. Laina
\n",
"
female
\n",
"
26.0
\n",
"
0
\n",
"
0
\n",
"
STON/O2. 3101282
\n",
"
7.9250
\n",
"
NaN
\n",
"
S
\n",
"
\n",
"
\n",
"
8
\n",
"
9
\n",
"
1
\n",
"
3
\n",
"
Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)
\n",
"
female
\n",
"
27.0
\n",
"
0
\n",
"
2
\n",
"
347742
\n",
"
11.1333
\n",
"
NaN
\n",
"
S
\n",
"
\n",
"
\n",
"
9
\n",
"
10
\n",
"
1
\n",
"
2
\n",
"
Nasser, Mrs. Nicholas (Adele Achem)
\n",
"
female
\n",
"
14.0
\n",
"
1
\n",
"
0
\n",
"
237736
\n",
"
30.0708
\n",
"
NaN
\n",
"
C
\n",
"
\n",
"
\n",
"
10
\n",
"
11
\n",
"
1
\n",
"
3
\n",
"
Sandstrom, Miss. Marguerite Rut
\n",
"
female
\n",
"
4.0
\n",
"
1
\n",
"
1
\n",
"
PP 9549
\n",
"
16.7000
\n",
"
G6
\n",
"
S
\n",
"
\n",
"
\n",
"
14
\n",
"
15
\n",
"
0
\n",
"
3
\n",
"
Vestrom, Miss. Hulda Amanda Adolfina
\n",
"
female
\n",
"
14.0
\n",
"
0
\n",
"
0
\n",
"
350406
\n",
"
7.8542
\n",
"
NaN
\n",
"
S
\n",
"
\n",
"
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
\n",
"
\n",
"
875
\n",
"
876
\n",
"
1
\n",
"
3
\n",
"
Najib, Miss. Adele Kiamie \"Jane\"
\n",
"
female
\n",
"
15.0
\n",
"
0
\n",
"
0
\n",
"
2667
\n",
"
7.2250
\n",
"
NaN
\n",
"
C
\n",
"
\n",
"
\n",
"
880
\n",
"
881
\n",
"
1
\n",
"
2
\n",
"
Shelley, Mrs. William (Imanita Parrish Hall)
\n",
"
female
\n",
"
25.0
\n",
"
0
\n",
"
1
\n",
"
230433
\n",
"
26.0000
\n",
"
NaN
\n",
"
S
\n",
"
\n",
"
\n",
"
882
\n",
"
883
\n",
"
0
\n",
"
3
\n",
"
Dahlberg, Miss. Gerda Ulrika
\n",
"
female
\n",
"
22.0
\n",
"
0
\n",
"
0
\n",
"
7552
\n",
"
10.5167
\n",
"
NaN
\n",
"
S
\n",
"
\n",
"
\n",
"
885
\n",
"
886
\n",
"
0
\n",
"
3
\n",
"
Rice, Mrs. William (Margaret Norton)
\n",
"
female
\n",
"
39.0
\n",
"
0
\n",
"
5
\n",
"
382652
\n",
"
29.1250
\n",
"
NaN
\n",
"
Q
\n",
"
\n",
"
\n",
"
888
\n",
"
889
\n",
"
0
\n",
"
3
\n",
"
Johnston, Miss. Catherine Helen \"Carrie\"
\n",
"
female
\n",
"
NaN
\n",
"
1
\n",
"
2
\n",
"
W./C. 6607
\n",
"
23.4500
\n",
"
NaN
\n",
"
S
\n",
"
\n",
" \n",
"
\n",
"
220 rows × 12 columns
\n",
"
"
],
"text/plain": [
" PassengerId Survived Pclass \\\n",
"2 3 1 3 \n",
"8 9 1 3 \n",
"9 10 1 2 \n",
"10 11 1 3 \n",
"14 15 0 3 \n",
".. ... ... ... \n",
"875 876 1 3 \n",
"880 881 1 2 \n",
"882 883 0 3 \n",
"885 886 0 3 \n",
"888 889 0 3 \n",
"\n",
" Name Sex Age SibSp \\\n",
"2 Heikkinen, Miss. Laina female 26.0 0 \n",
"8 Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg) female 27.0 0 \n",
"9 Nasser, Mrs. Nicholas (Adele Achem) female 14.0 1 \n",
"10 Sandstrom, Miss. Marguerite Rut female 4.0 1 \n",
"14 Vestrom, Miss. Hulda Amanda Adolfina female 14.0 0 \n",
".. ... ... ... ... \n",
"875 Najib, Miss. Adele Kiamie \"Jane\" female 15.0 0 \n",
"880 Shelley, Mrs. William (Imanita Parrish Hall) female 25.0 0 \n",
"882 Dahlberg, Miss. Gerda Ulrika female 22.0 0 \n",
"885 Rice, Mrs. William (Margaret Norton) female 39.0 0 \n",
"888 Johnston, Miss. Catherine Helen \"Carrie\" female NaN 1 \n",
"\n",
" Parch Ticket Fare Cabin Embarked \n",
"2 0 STON/O2. 3101282 7.9250 NaN S \n",
"8 2 347742 11.1333 NaN S \n",
"9 0 237736 30.0708 NaN C \n",
"10 1 PP 9549 16.7000 G6 S \n",
"14 0 350406 7.8542 NaN S \n",
".. ... ... ... ... ... \n",
"875 0 2667 7.2250 NaN C \n",
"880 1 230433 26.0000 NaN S \n",
"882 0 7552 10.5167 NaN S \n",
"885 5 382652 29.1250 NaN Q \n",
"888 2 W./C. 6607 23.4500 NaN S \n",
"\n",
"[220 rows x 12 columns]"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.loc[(df[\"Sex\"] == \"female\") & (df[\"Pclass\"] > 1)]"
]
},
{
"cell_type": "markdown",
"id": "temporal-boston",
"metadata": {},
"source": [
"We can do the same conditional approach, but rather than using and, we can also tell Pandas \"Or\" with the \"|\", that vertical character you probably never use located above the \"Enter\" button on a standard American keyboard. Let's try and grab all first and second class passengers. We can specify if the Pclass is either 1 OR 2."
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "hearing-racing",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
PassengerId
\n",
"
Survived
\n",
"
Pclass
\n",
"
Name
\n",
"
Sex
\n",
"
Age
\n",
"
SibSp
\n",
"
Parch
\n",
"
Ticket
\n",
"
Fare
\n",
"
Cabin
\n",
"
Embarked
\n",
"
\n",
" \n",
" \n",
"
\n",
"
1
\n",
"
2
\n",
"
1
\n",
"
1
\n",
"
Cumings, Mrs. John Bradley (Florence Briggs Th...
\n",
"
female
\n",
"
38.0
\n",
"
1
\n",
"
0
\n",
"
PC 17599
\n",
"
71.2833
\n",
"
C85
\n",
"
C
\n",
"
\n",
"
\n",
"
3
\n",
"
4
\n",
"
1
\n",
"
1
\n",
"
Futrelle, Mrs. Jacques Heath (Lily May Peel)
\n",
"
female
\n",
"
35.0
\n",
"
1
\n",
"
0
\n",
"
113803
\n",
"
53.1000
\n",
"
C123
\n",
"
S
\n",
"
\n",
"
\n",
"
6
\n",
"
7
\n",
"
0
\n",
"
1
\n",
"
McCarthy, Mr. Timothy J
\n",
"
male
\n",
"
54.0
\n",
"
0
\n",
"
0
\n",
"
17463
\n",
"
51.8625
\n",
"
E46
\n",
"
S
\n",
"
\n",
"
\n",
"
9
\n",
"
10
\n",
"
1
\n",
"
2
\n",
"
Nasser, Mrs. Nicholas (Adele Achem)
\n",
"
female
\n",
"
14.0
\n",
"
1
\n",
"
0
\n",
"
237736
\n",
"
30.0708
\n",
"
NaN
\n",
"
C
\n",
"
\n",
"
\n",
"
11
\n",
"
12
\n",
"
1
\n",
"
1
\n",
"
Bonnell, Miss. Elizabeth
\n",
"
female
\n",
"
58.0
\n",
"
0
\n",
"
0
\n",
"
113783
\n",
"
26.5500
\n",
"
C103
\n",
"
S
\n",
"
\n",
"
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
\n",
"
\n",
"
880
\n",
"
881
\n",
"
1
\n",
"
2
\n",
"
Shelley, Mrs. William (Imanita Parrish Hall)
\n",
"
female
\n",
"
25.0
\n",
"
0
\n",
"
1
\n",
"
230433
\n",
"
26.0000
\n",
"
NaN
\n",
"
S
\n",
"
\n",
"
\n",
"
883
\n",
"
884
\n",
"
0
\n",
"
2
\n",
"
Banfield, Mr. Frederick James
\n",
"
male
\n",
"
28.0
\n",
"
0
\n",
"
0
\n",
"
C.A./SOTON 34068
\n",
"
10.5000
\n",
"
NaN
\n",
"
S
\n",
"
\n",
"
\n",
"
886
\n",
"
887
\n",
"
0
\n",
"
2
\n",
"
Montvila, Rev. Juozas
\n",
"
male
\n",
"
27.0
\n",
"
0
\n",
"
0
\n",
"
211536
\n",
"
13.0000
\n",
"
NaN
\n",
"
S
\n",
"
\n",
"
\n",
"
887
\n",
"
888
\n",
"
1
\n",
"
1
\n",
"
Graham, Miss. Margaret Edith
\n",
"
female
\n",
"
19.0
\n",
"
0
\n",
"
0
\n",
"
112053
\n",
"
30.0000
\n",
"
B42
\n",
"
S
\n",
"
\n",
"
\n",
"
889
\n",
"
890
\n",
"
1
\n",
"
1
\n",
"
Behr, Mr. Karl Howell
\n",
"
male
\n",
"
26.0
\n",
"
0
\n",
"
0
\n",
"
111369
\n",
"
30.0000
\n",
"
C148
\n",
"
C
\n",
"
\n",
" \n",
"
\n",
"
400 rows × 12 columns
\n",
"
"
],
"text/plain": [
" PassengerId Survived Pclass \\\n",
"1 2 1 1 \n",
"3 4 1 1 \n",
"6 7 0 1 \n",
"9 10 1 2 \n",
"11 12 1 1 \n",
".. ... ... ... \n",
"880 881 1 2 \n",
"883 884 0 2 \n",
"886 887 0 2 \n",
"887 888 1 1 \n",
"889 890 1 1 \n",
"\n",
" Name Sex Age SibSp \\\n",
"1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 \n",
"3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 \n",
"6 McCarthy, Mr. Timothy J male 54.0 0 \n",
"9 Nasser, Mrs. Nicholas (Adele Achem) female 14.0 1 \n",
"11 Bonnell, Miss. Elizabeth female 58.0 0 \n",
".. ... ... ... ... \n",
"880 Shelley, Mrs. William (Imanita Parrish Hall) female 25.0 0 \n",
"883 Banfield, Mr. Frederick James male 28.0 0 \n",
"886 Montvila, Rev. Juozas male 27.0 0 \n",
"887 Graham, Miss. Margaret Edith female 19.0 0 \n",
"889 Behr, Mr. Karl Howell male 26.0 0 \n",
"\n",
" Parch Ticket Fare Cabin Embarked \n",
"1 0 PC 17599 71.2833 C85 C \n",
"3 0 113803 53.1000 C123 S \n",
"6 0 17463 51.8625 E46 S \n",
"9 0 237736 30.0708 NaN C \n",
"11 0 113783 26.5500 C103 S \n",
".. ... ... ... ... ... \n",
"880 1 230433 26.0000 NaN S \n",
"883 0 C.A./SOTON 34068 10.5000 NaN S \n",
"886 0 211536 13.0000 NaN S \n",
"887 0 112053 30.0000 B42 S \n",
"889 0 111369 30.0000 C148 C \n",
"\n",
"[400 rows x 12 columns]"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.loc[(df[\"Pclass\"] == 1) | (df[\"Pclass\"] == 2)]"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.2"
}
},
"nbformat": 4,
"nbformat_minor": 5
}