{ "cells": [ { "cell_type": "markdown", "id": "quality-mumbai", "metadata": {}, "source": [ "#
Advanced Grouping with Groupby()
" ] }, { "cell_type": "markdown", "id": "brave-arrow", "metadata": {}, "source": [ "
Dr. W.J.B. Mattingly
\n", "\n", "
Smithsonian Data Science Lab and United States Holocaust Memorial Museum
\n", "\n", "
August 2021
" ] }, { "cell_type": "markdown", "id": "arctic-check", "metadata": {}, "source": [ "## Covered in this Chapter" ] }, { "cell_type": "markdown", "id": "bigger-adelaide", "metadata": {}, "source": [ "1) What is the Purpose of Groupby()
\n", "2) What are Pandas Group Objects
\n", "3) How to Analyze Group Objects Quantitatively
\n", "4) How to work with Multiple Groupings at Once
" ] }, { "cell_type": "markdown", "id": "decreased-sharp", "metadata": {}, "source": [ "## Introduction" ] }, { "cell_type": "markdown", "id": "eastern-zealand", "metadata": {}, "source": [ "When working with large quantities of data, it can sometimes be a bit difficult to understand broad patterns within your data. Often, you will need to group your data into small subsections based on some parameter, such as age, name, or some other feature. You can do this in Pandas using groupby(), which will be the main subject of this chapter. **Groupby** is a feature of Pandas that returns a special groupby object. This object can be called to perform different types of analyses on data, especially when leveraging the built-in quantitative features of Pandas, such as count() and sum(). In this chapter, we will explore these features and see how they can be used on a real-world dataset, the Titanic dataset." ] }, { "cell_type": "code", "execution_count": 1, "id": "colored-freeware", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
0103Braund, Mr. Owen Harrismale22.010A/5 211717.2500NaNS
1211Cumings, Mrs. John Bradley (Florence Briggs Th...female38.010PC 1759971.2833C85C
2313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.9250NaNS
3411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01011380353.1000C123S
4503Allen, Mr. William Henrymale35.0003734508.0500NaNS
.......................................
88688702Montvila, Rev. Juozasmale27.00021153613.0000NaNS
88788811Graham, Miss. Margaret Edithfemale19.00011205330.0000B42S
88888903Johnston, Miss. Catherine Helen \"Carrie\"femaleNaN12W./C. 660723.4500NaNS
88989011Behr, Mr. Karl Howellmale26.00011136930.0000C148C
89089103Dooley, Mr. Patrickmale32.0003703767.7500NaNQ
\n", "

891 rows × 12 columns

\n", "
" ], "text/plain": [ " PassengerId Survived Pclass \\\n", "0 1 0 3 \n", "1 2 1 1 \n", "2 3 1 3 \n", "3 4 1 1 \n", "4 5 0 3 \n", ".. ... ... ... \n", "886 887 0 2 \n", "887 888 1 1 \n", "888 889 0 3 \n", "889 890 1 1 \n", "890 891 0 3 \n", "\n", " Name Sex Age SibSp \\\n", "0 Braund, Mr. Owen Harris male 22.0 1 \n", "1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 \n", "2 Heikkinen, Miss. Laina female 26.0 0 \n", "3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 \n", "4 Allen, Mr. William Henry male 35.0 0 \n", ".. ... ... ... ... \n", "886 Montvila, Rev. Juozas male 27.0 0 \n", "887 Graham, Miss. Margaret Edith female 19.0 0 \n", "888 Johnston, Miss. Catherine Helen \"Carrie\" female NaN 1 \n", "889 Behr, Mr. Karl Howell male 26.0 0 \n", "890 Dooley, Mr. Patrick male 32.0 0 \n", "\n", " Parch Ticket Fare Cabin Embarked \n", "0 0 A/5 21171 7.2500 NaN S \n", "1 0 PC 17599 71.2833 C85 C \n", "2 0 STON/O2. 3101282 7.9250 NaN S \n", "3 0 113803 53.1000 C123 S \n", "4 0 373450 8.0500 NaN S \n", ".. ... ... ... ... ... \n", "886 0 211536 13.0000 NaN S \n", "887 0 112053 30.0000 B42 S \n", "888 2 W./C. 6607 23.4500 NaN S \n", "889 0 111369 30.0000 C148 C \n", "890 0 370376 7.7500 NaN Q \n", "\n", "[891 rows x 12 columns]" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "df = pd.read_csv(\"data/titanic.csv\")\n", "df" ] }, { "cell_type": "markdown", "id": "natural-gauge", "metadata": {}, "source": [ "## Groupby()" ] }, { "cell_type": "markdown", "id": "structural-arctic", "metadata": {}, "source": [ "The groupby() function allows us to easily group our data in the DataFrame. Once your data are grouped, there are a lot of quantitative questions you can begin to ask. Let's start simple. Let's group our DataFrame by Sex." ] }, { "cell_type": "code", "execution_count": 2, "id": "gorgeous-young", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.groupby(\"Sex\")" ] }, { "cell_type": "markdown", "id": "promotional-fireplace", "metadata": {}, "source": [ "This output may not be quite what you expect. This is an object to which we can now pose targeted questions. Let's try and see a DataFrame that only has \"male\" in the Sex column. We can do that by using get_group(\"male\")" ] }, { "cell_type": "code", "execution_count": 3, "id": "pleased-shoot", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
0103Braund, Mr. Owen Harrismale22.010A/5 211717.2500NaNS
4503Allen, Mr. William Henrymale35.0003734508.0500NaNS
5603Moran, Mr. JamesmaleNaN003308778.4583NaNQ
6701McCarthy, Mr. Timothy Jmale54.0001746351.8625E46S
7803Palsson, Master. Gosta Leonardmale2.03134990921.0750NaNS
.......................................
88388402Banfield, Mr. Frederick Jamesmale28.000C.A./SOTON 3406810.5000NaNS
88488503Sutehall, Mr. Henry Jrmale25.000SOTON/OQ 3920767.0500NaNS
88688702Montvila, Rev. Juozasmale27.00021153613.0000NaNS
88989011Behr, Mr. Karl Howellmale26.00011136930.0000C148C
89089103Dooley, Mr. Patrickmale32.0003703767.7500NaNQ
\n", "

577 rows × 12 columns

\n", "
" ], "text/plain": [ " PassengerId Survived Pclass Name Sex \\\n", "0 1 0 3 Braund, Mr. Owen Harris male \n", "4 5 0 3 Allen, Mr. William Henry male \n", "5 6 0 3 Moran, Mr. James male \n", "6 7 0 1 McCarthy, Mr. Timothy J male \n", "7 8 0 3 Palsson, Master. Gosta Leonard male \n", ".. ... ... ... ... ... \n", "883 884 0 2 Banfield, Mr. Frederick James male \n", "884 885 0 3 Sutehall, Mr. Henry Jr male \n", "886 887 0 2 Montvila, Rev. Juozas male \n", "889 890 1 1 Behr, Mr. Karl Howell male \n", "890 891 0 3 Dooley, Mr. Patrick male \n", "\n", " Age SibSp Parch Ticket Fare Cabin Embarked \n", "0 22.0 1 0 A/5 21171 7.2500 NaN S \n", "4 35.0 0 0 373450 8.0500 NaN S \n", "5 NaN 0 0 330877 8.4583 NaN Q \n", "6 54.0 0 0 17463 51.8625 E46 S \n", "7 2.0 3 1 349909 21.0750 NaN S \n", ".. ... ... ... ... ... ... ... \n", "883 28.0 0 0 C.A./SOTON 34068 10.5000 NaN S \n", "884 25.0 0 0 SOTON/OQ 392076 7.0500 NaN S \n", "886 27.0 0 0 211536 13.0000 NaN S \n", "889 26.0 0 0 111369 30.0000 C148 C \n", "890 32.0 0 0 370376 7.7500 NaN Q \n", "\n", "[577 rows x 12 columns]" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.groupby(\"Sex\").get_group(\"male\")" ] }, { "cell_type": "markdown", "id": "floppy-grammar", "metadata": {}, "source": [ "This argument does not have to be a string. Let's say, we want to just get all the people who are aged 20. We can do the same thing by grouping the dataset by \"Age\" and then getting the group of 20 year olds." ] }, { "cell_type": "code", "execution_count": 4, "id": "ancient-solomon", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
121303Saundercock, Mr. William Henrymale20.000A/5. 21518.0500NaNS
919203Andreasson, Mr. Paul Edvinmale20.0003474667.8542NaNS
11311403Jussila, Miss. Katriinafemale20.01041369.8250NaNS
13113203Coelho, Mr. Domingos Fernandeomale20.000SOTON/O.Q. 31013077.0500NaNS
37837903Betros, Mr. Tannousmale20.00026484.0125NaNC
40440503Oreskovic, Miss. Marijafemale20.0003150968.6625NaNS
44144203Hampe, Mr. Leonmale20.0003457699.5000NaNS
62262313Nakid, Mr. Sahidmale20.011265315.7417NaNC
64064103Jensen, Mr. Hans Pedermale20.0003500507.8542NaNS
66466513Lindqvist, Mr. Eino Williammale20.010STON/O 2. 31012857.9250NaNS
68268303Olsvigen, Mr. Thor Andersonmale20.00065639.2250NaNS
72572603Oreskovic, Mr. Lukamale20.0003150948.6625NaNS
76276313Barah, Mr. Hanna Assimale20.00026637.2292NaNC
84084103Alhomaki, Mr. Ilmari Rudolfmale20.000SOTON/O2 31012877.9250NaNS
87687703Gustafsson, Mr. Alfred Ossianmale20.00075349.8458NaNS
\n", "
" ], "text/plain": [ " PassengerId Survived Pclass Name Sex \\\n", "12 13 0 3 Saundercock, Mr. William Henry male \n", "91 92 0 3 Andreasson, Mr. Paul Edvin male \n", "113 114 0 3 Jussila, Miss. Katriina female \n", "131 132 0 3 Coelho, Mr. Domingos Fernandeo male \n", "378 379 0 3 Betros, Mr. Tannous male \n", "404 405 0 3 Oreskovic, Miss. Marija female \n", "441 442 0 3 Hampe, Mr. Leon male \n", "622 623 1 3 Nakid, Mr. Sahid male \n", "640 641 0 3 Jensen, Mr. Hans Peder male \n", "664 665 1 3 Lindqvist, Mr. Eino William male \n", "682 683 0 3 Olsvigen, Mr. Thor Anderson male \n", "725 726 0 3 Oreskovic, Mr. Luka male \n", "762 763 1 3 Barah, Mr. Hanna Assi male \n", "840 841 0 3 Alhomaki, Mr. Ilmari Rudolf male \n", "876 877 0 3 Gustafsson, Mr. Alfred Ossian male \n", "\n", " Age SibSp Parch Ticket Fare Cabin Embarked \n", "12 20.0 0 0 A/5. 2151 8.0500 NaN S \n", "91 20.0 0 0 347466 7.8542 NaN S \n", "113 20.0 1 0 4136 9.8250 NaN S \n", "131 20.0 0 0 SOTON/O.Q. 3101307 7.0500 NaN S \n", "378 20.0 0 0 2648 4.0125 NaN C \n", "404 20.0 0 0 315096 8.6625 NaN S \n", "441 20.0 0 0 345769 9.5000 NaN S \n", "622 20.0 1 1 2653 15.7417 NaN C \n", "640 20.0 0 0 350050 7.8542 NaN S \n", "664 20.0 1 0 STON/O 2. 3101285 7.9250 NaN S \n", "682 20.0 0 0 6563 9.2250 NaN S \n", "725 20.0 0 0 315094 8.6625 NaN S \n", "762 20.0 0 0 2663 7.2292 NaN C \n", "840 20.0 0 0 SOTON/O2 3101287 7.9250 NaN S \n", "876 20.0 0 0 7534 9.8458 NaN S " ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.groupby(\"Age\").get_group(20)" ] }, { "cell_type": "markdown", "id": "unsigned-attendance", "metadata": {}, "source": [ "## Quantitative Analysis with Count() and Sum()" ] }, { "cell_type": "markdown", "id": "turkish-interview", "metadata": {}, "source": [ "This is typically not how you would use the grouby function. It is far more powerful and often used for quantitative analysis on subsets of your data. Let's say that I want to examine my dataset by sex and I am interested in known the quantity of column based solely on the metric of sex. I could use groupby() and .count(). When chained together, our question then becomes, how many PassengerId, Survived, Pclass, Name, etc. do we see for each column based on sex. While this question is particularly useful for the qualitative rows (such as Name) or numerical strings (such as PassengerId) because they display the total number of passengers because each person has a unique PassengerId and Name." ] }, { "cell_type": "code", "execution_count": 5, "id": "continental-better", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PassengerIdSurvivedPclassNameAgeSibSpParchTicketFareCabinEmbarked
Sex
female31431431431426131431431431497312
male577577577577453577577577577107577
\n", "
" ], "text/plain": [ " PassengerId Survived Pclass Name Age SibSp Parch Ticket Fare \\\n", "Sex \n", "female 314 314 314 314 261 314 314 314 314 \n", "male 577 577 577 577 453 577 577 577 577 \n", "\n", " Cabin Embarked \n", "Sex \n", "female 97 312 \n", "male 107 577 " ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.groupby(\"Sex\").count()" ] }, { "cell_type": "markdown", "id": "major-matter", "metadata": {}, "source": [ "For the quantitative rows, we can use sum() function. This will tell us the sum of all the columns that have floats or integers. Note that this is not a really good question to pose for the Age column. It is, however, very useful for the Fare column and the Survived Column. Remember, if a person survived, they have a 1; if they did not, they have a 0. We can use the sum to know how many male vs. female survivors there were." ] }, { "cell_type": "code", "execution_count": 6, "id": "ecological-fence", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PassengerIdSurvivedPclassAgeSibSpParchFare
Sex
female1353432336787286.0021820413966.6628
male262043109137913919.1724813614727.2865
\n", "
" ], "text/plain": [ " PassengerId Survived Pclass Age SibSp Parch Fare\n", "Sex \n", "female 135343 233 678 7286.00 218 204 13966.6628\n", "male 262043 109 1379 13919.17 248 136 14727.2865" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.groupby(\"Sex\").sum()" ] }, { "cell_type": "markdown", "id": "israeli-gilbert", "metadata": {}, "source": [ "Let's say, though, that we are only interested in the Fare column. Before we add sum to our chain, we can specify that we want specifically the Fare column." ] }, { "cell_type": "code", "execution_count": 7, "id": "nervous-sister", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Sex\n", "female 13966.6628\n", "male 14727.2865\n", "Name: Fare, dtype: float64" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.groupby(\"Sex\").Fare.sum()" ] }, { "cell_type": "markdown", "id": "western-michael", "metadata": {}, "source": [ "## Working with Multiple Groups" ] }, { "cell_type": "markdown", "id": "smaller-final", "metadata": {}, "source": [ "Now, we have just the data on a single column. We can see that the combined fare of male passengers was greater than the combined sum of female passengers. Let's say though that we are interested in how these sums divide over Pclass. We can pass a list to groupby, rather than just a string. This list will be a list of a strings that correspond to columns." ] }, { "cell_type": "code", "execution_count": 8, "id": "hydraulic-cambridge", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Sex Pclass\n", "female 1 9975.8250\n", " 2 1669.7292\n", " 3 2321.1086\n", "male 1 8201.5875\n", " 2 2132.1125\n", " 3 4393.5865\n", "Name: Fare, dtype: float64" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.groupby([\"Sex\", \"Pclass\"]).Fare.sum()" ] }, { "cell_type": "markdown", "id": "confident-military", "metadata": {}, "source": [ "The result of this new question is more nuanced. We are not looking at the sum of all fares, rather the sum of fares divided on a Pclass-by-Pclass basis. This means that we can now understand that these sums varied by class. For example, while the total fare for male passengers was greater, the total fare for first class female passengers was greater than their first class male counterparts. The male fare, however, is greater for both the 2nd Class and 3rd Class groups." ] }, { "cell_type": "markdown", "id": "bronze-wildlife", "metadata": {}, "source": [ "## Groupings with Many Subsets" ] }, { "cell_type": "markdown", "id": "framed-onion", "metadata": {}, "source": [ "What if we were interested in something that would have more than just 6 neat subsections, such as 3 classes per sex. What if we also wanted to add another aspect to the groups, such as age. If we try and do that, our results are cutoff. We can try and use pd.set_option()" ] }, { "cell_type": "code", "execution_count": 9, "id": "southern-breeding", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Sex Pclass Age \n", "female 1 2.0 151.5500\n", " 14.0 120.0000\n", " 15.0 211.3375\n", " 16.0 183.8792\n", " 17.0 165.9000\n", " ... \n", "male 3 59.0 7.2500\n", " 61.0 6.2375\n", " 65.0 7.7500\n", " 70.5 7.7500\n", " 74.0 7.7750\n", "Name: Fare, Length: 283, dtype: float64" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.groupby([\"Sex\", \"Pclass\", \"Age\"]).Fare.sum()" ] }, { "cell_type": "markdown", "id": "convinced-tyler", "metadata": {}, "source": [ "In order to print off this many rows, we need to increase the amount of data Pandas will display. We can use the pd.set_options() command and pass an argument to display 1000 rows." ] }, { "cell_type": "code", "execution_count": 10, "id": "absolute-cooling", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Sex Pclass Age \n", "female 1 2.00 151.5500\n", " 14.00 120.0000\n", " 15.00 211.3375\n", " 16.00 183.8792\n", " 17.00 165.9000\n", " 18.00 569.5500\n", " 19.00 147.3625\n", " 21.00 340.3333\n", " 22.00 322.6500\n", " 23.00 376.2750\n", " 24.00 534.2625\n", " 25.00 151.5500\n", " 26.00 78.8500\n", " 29.00 211.3375\n", " 30.00 374.3542\n", " 31.00 278.1417\n", " 32.00 76.2917\n", " 33.00 229.6000\n", " 35.00 926.5375\n", " 36.00 326.6333\n", " 38.00 378.8083\n", " 39.00 329.5916\n", " 40.00 287.9625\n", " 41.00 134.5000\n", " 42.00 227.5250\n", " 43.00 211.3375\n", " 44.00 85.7000\n", " 45.00 164.8667\n", " 47.00 52.5542\n", " 48.00 65.5292\n", " 49.00 102.6584\n", " 50.00 276.2333\n", " 51.00 77.9583\n", " 52.00 171.7667\n", " 53.00 51.4792\n", " 54.00 137.6667\n", " 56.00 83.1583\n", " 58.00 326.5333\n", " 60.00 75.2500\n", " 62.00 80.0000\n", " 63.00 77.9583\n", " 2 2.00 26.0000\n", " 3.00 41.5792\n", " 4.00 62.0000\n", " 5.00 27.7500\n", " 6.00 33.0000\n", " 7.00 26.2500\n", " 8.00 26.2500\n", " 13.00 19.5000\n", " 14.00 30.0708\n", " 17.00 22.5000\n", " 18.00 36.0000\n", " 19.00 52.0000\n", " 21.00 10.5000\n", " 22.00 70.5792\n", " 23.00 13.7917\n", " 24.00 177.2500\n", " 25.00 56.0000\n", " 26.00 26.0000\n", " 27.00 45.3583\n", " 28.00 88.6500\n", " 29.00 62.5000\n", " 30.00 46.3500\n", " 31.00 26.2500\n", " 32.00 13.0000\n", " 32.50 13.0000\n", " 33.00 53.7500\n", " 34.00 79.0000\n", " 35.00 21.0000\n", " 36.00 52.0000\n", " 38.00 13.0000\n", " 40.00 67.7500\n", " 41.00 19.5000\n", " 42.00 39.0000\n", " 44.00 26.0000\n", " 45.00 39.7500\n", " 48.00 65.0000\n", " 50.00 47.0000\n", " 54.00 23.0000\n", " 55.00 16.0000\n", " 57.00 10.5000\n", " 3 0.75 38.5166\n", " 1.00 26.8750\n", " 2.00 81.9250\n", " 3.00 21.0750\n", " 4.00 52.1417\n", " 5.00 63.1208\n", " 6.00 31.2750\n", " 8.00 21.0750\n", " 9.00 108.7958\n", " 10.00 24.1500\n", " 11.00 31.2750\n", " 13.00 7.2292\n", " 14.00 19.0959\n", " 14.50 14.4542\n", " 15.00 29.7084\n", " 16.00 62.3833\n", " 17.00 22.3833\n", " 18.00 91.4667\n", " 19.00 15.7334\n", " 20.00 18.4875\n", " 21.00 59.6000\n", " 22.00 50.8792\n", " 23.00 15.4750\n", " 24.00 60.6583\n", " 25.00 15.7000\n", " 26.00 31.8792\n", " 27.00 31.5333\n", " 28.00 22.2958\n", " 29.00 46.7833\n", " 30.00 45.2875\n", " 30.50 7.7500\n", " 31.00 55.0625\n", " 32.00 15.5000\n", " 33.00 15.8500\n", " 35.00 20.2500\n", " 36.00 17.4000\n", " 37.00 9.5875\n", " 38.00 31.3875\n", " 39.00 60.4000\n", " 40.00 9.4750\n", " 41.00 59.9000\n", " 43.00 46.9000\n", " 45.00 50.1042\n", " 47.00 14.5000\n", " 48.00 34.3750\n", " 63.00 9.5875\n", "male 1 0.92 151.5500\n", " 4.00 81.8583\n", " 11.00 120.0000\n", " 17.00 110.8833\n", " 18.00 108.9000\n", " 19.00 316.1000\n", " 21.00 77.2875\n", " 22.00 135.6333\n", " 23.00 63.3583\n", " 24.00 326.7208\n", " 25.00 146.5209\n", " 26.00 30.0000\n", " 27.00 371.8292\n", " 28.00 191.3208\n", " 29.00 96.6000\n", " 30.00 27.7500\n", " 31.00 159.4958\n", " 32.00 30.5000\n", " 33.00 5.0000\n", " 34.00 26.5500\n", " 35.00 565.1667\n", " 36.00 803.9792\n", " 37.00 135.3542\n", " 38.00 243.4625\n", " 39.00 0.0000\n", " 40.00 58.7208\n", " 42.00 130.8417\n", " 44.00 90.0000\n", " 45.00 172.0750\n", " 45.50 28.5000\n", " 46.00 140.3750\n", " 47.00 150.1083\n", " 48.00 155.2792\n", " 49.00 256.9167\n", " 50.00 295.9750\n", " 51.00 87.9292\n", " 52.00 110.1500\n", " 54.00 129.1500\n", " 55.00 30.5000\n", " 56.00 92.7458\n", " 58.00 142.9750\n", " 60.00 105.7500\n", " 61.00 65.8208\n", " 62.00 53.1000\n", " 64.00 289.0000\n", " 65.00 88.5292\n", " 70.00 71.0000\n", " 71.00 84.1584\n", " 80.00 30.0000\n", " 2 0.67 14.5000\n", " 0.83 47.7500\n", " 1.00 76.0042\n", " 2.00 26.0000\n", " 3.00 44.7500\n", " 8.00 36.7500\n", " 16.00 36.5000\n", " 18.00 109.5000\n", " 19.00 70.7500\n", " 21.00 158.5000\n", " 23.00 76.0458\n", " 24.00 97.0000\n", " 25.00 106.5792\n", " 26.00 10.5000\n", " 27.00 52.0000\n", " 28.00 70.0000\n", " 29.00 59.2208\n", " 30.00 73.5000\n", " 31.00 86.7542\n", " 32.00 110.0000\n", " 32.50 30.0708\n", " 33.00 12.2750\n", " 34.00 107.0000\n", " 35.00 36.5000\n", " 36.00 64.1250\n", " 36.50 26.0000\n", " 37.00 26.0000\n", " 39.00 52.0000\n", " 42.00 53.0000\n", " 43.00 26.2500\n", " 44.00 26.0000\n", " 46.00 26.0000\n", " 47.00 15.0000\n", " 48.00 13.0000\n", " 50.00 13.0000\n", " 51.00 12.5250\n", " 52.00 26.5000\n", " 54.00 66.0000\n", " 57.00 12.3500\n", " 59.00 13.5000\n", " 60.00 39.0000\n", " 62.00 10.5000\n", " 66.00 10.5000\n", " 70.00 10.5000\n", " 3 0.42 8.5167\n", " 1.00 107.1625\n", " 2.00 89.8875\n", " 3.00 47.2875\n", " 4.00 99.4333\n", " 6.00 12.4750\n", " 7.00 68.8125\n", " 8.00 29.1250\n", " 9.00 114.7125\n", " 10.00 27.9000\n", " 11.00 65.6875\n", " 12.00 11.2417\n", " 14.00 86.5875\n", " 15.00 7.2292\n", " 16.00 154.9042\n", " 17.00 47.3959\n", " 18.00 74.2333\n", " 19.00 94.7915\n", " 20.00 110.8751\n", " 20.50 7.2500\n", " 21.00 111.3541\n", " 22.00 108.8874\n", " 23.00 24.9750\n", " 23.50 7.2292\n", " 24.00 95.1791\n", " 24.50 8.0500\n", " 25.00 85.2125\n", " 26.00 166.3333\n", " 27.00 45.7833\n", " 28.00 153.2374\n", " 28.50 23.3292\n", " 29.00 65.3749\n", " 30.00 71.3000\n", " 30.50 8.0500\n", " 31.00 23.4500\n", " 32.00 192.5291\n", " 33.00 70.9083\n", " 34.00 36.9958\n", " 34.50 6.4375\n", " 35.00 38.1708\n", " 36.00 55.0916\n", " 37.00 7.9250\n", " 38.00 23.6083\n", " 39.00 71.2750\n", " 40.00 58.5208\n", " 40.50 22.2500\n", " 41.00 21.2333\n", " 42.00 32.2667\n", " 43.00 14.5000\n", " 44.00 40.1250\n", " 45.00 15.0250\n", " 45.50 7.2250\n", " 47.00 16.2500\n", " 48.00 7.8542\n", " 49.00 0.0000\n", " 50.00 8.0500\n", " 51.00 22.8542\n", " 55.50 8.0500\n", " 59.00 7.2500\n", " 61.00 6.2375\n", " 65.00 7.7500\n", " 70.50 7.7500\n", " 74.00 7.7750\n", "Name: Fare, dtype: float64" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.set_option('display.max_rows', 1000)\n", "df.groupby([\"Sex\", \"Pclass\", \"Age\"]).Fare.sum()" ] }, { "cell_type": "markdown", "id": "graphic-plasma", "metadata": {}, "source": [ "What if we wanted to make this look a bit nicer, as a Pandas DataFrame? We can pass all our data back into a new DataFrame object." ] }, { "cell_type": "code", "execution_count": 11, "id": "emotional-parliament", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Fare
SexPclassAge
female12.00151.5500
14.00120.0000
15.00211.3375
16.00183.8792
17.00165.9000
18.00569.5500
19.00147.3625
21.00340.3333
22.00322.6500
23.00376.2750
24.00534.2625
25.00151.5500
26.0078.8500
29.00211.3375
30.00374.3542
31.00278.1417
32.0076.2917
33.00229.6000
35.00926.5375
36.00326.6333
38.00378.8083
39.00329.5916
40.00287.9625
41.00134.5000
42.00227.5250
43.00211.3375
44.0085.7000
45.00164.8667
47.0052.5542
48.0065.5292
49.00102.6584
50.00276.2333
51.0077.9583
52.00171.7667
53.0051.4792
54.00137.6667
56.0083.1583
58.00326.5333
60.0075.2500
62.0080.0000
63.0077.9583
22.0026.0000
3.0041.5792
4.0062.0000
5.0027.7500
6.0033.0000
7.0026.2500
8.0026.2500
13.0019.5000
14.0030.0708
17.0022.5000
18.0036.0000
19.0052.0000
21.0010.5000
22.0070.5792
23.0013.7917
24.00177.2500
25.0056.0000
26.0026.0000
27.0045.3583
28.0088.6500
29.0062.5000
30.0046.3500
31.0026.2500
32.0013.0000
32.5013.0000
33.0053.7500
34.0079.0000
35.0021.0000
36.0052.0000
38.0013.0000
40.0067.7500
41.0019.5000
42.0039.0000
44.0026.0000
45.0039.7500
48.0065.0000
50.0047.0000
54.0023.0000
55.0016.0000
57.0010.5000
30.7538.5166
1.0026.8750
2.0081.9250
3.0021.0750
4.0052.1417
5.0063.1208
6.0031.2750
8.0021.0750
9.00108.7958
10.0024.1500
11.0031.2750
13.007.2292
14.0019.0959
14.5014.4542
15.0029.7084
16.0062.3833
17.0022.3833
18.0091.4667
19.0015.7334
20.0018.4875
21.0059.6000
22.0050.8792
23.0015.4750
24.0060.6583
25.0015.7000
26.0031.8792
27.0031.5333
28.0022.2958
29.0046.7833
30.0045.2875
30.507.7500
31.0055.0625
32.0015.5000
33.0015.8500
35.0020.2500
36.0017.4000
37.009.5875
38.0031.3875
39.0060.4000
40.009.4750
41.0059.9000
43.0046.9000
45.0050.1042
47.0014.5000
48.0034.3750
63.009.5875
male10.92151.5500
4.0081.8583
11.00120.0000
17.00110.8833
18.00108.9000
19.00316.1000
21.0077.2875
22.00135.6333
23.0063.3583
24.00326.7208
25.00146.5209
26.0030.0000
27.00371.8292
28.00191.3208
29.0096.6000
30.0027.7500
31.00159.4958
32.0030.5000
33.005.0000
34.0026.5500
35.00565.1667
36.00803.9792
37.00135.3542
38.00243.4625
39.000.0000
40.0058.7208
42.00130.8417
44.0090.0000
45.00172.0750
45.5028.5000
46.00140.3750
47.00150.1083
48.00155.2792
49.00256.9167
50.00295.9750
51.0087.9292
52.00110.1500
54.00129.1500
55.0030.5000
56.0092.7458
58.00142.9750
60.00105.7500
61.0065.8208
62.0053.1000
64.00289.0000
65.0088.5292
70.0071.0000
71.0084.1584
80.0030.0000
20.6714.5000
0.8347.7500
1.0076.0042
2.0026.0000
3.0044.7500
8.0036.7500
16.0036.5000
18.00109.5000
19.0070.7500
21.00158.5000
23.0076.0458
24.0097.0000
25.00106.5792
26.0010.5000
27.0052.0000
28.0070.0000
29.0059.2208
30.0073.5000
31.0086.7542
32.00110.0000
32.5030.0708
33.0012.2750
34.00107.0000
35.0036.5000
36.0064.1250
36.5026.0000
37.0026.0000
39.0052.0000
42.0053.0000
43.0026.2500
44.0026.0000
46.0026.0000
47.0015.0000
48.0013.0000
50.0013.0000
51.0012.5250
52.0026.5000
54.0066.0000
57.0012.3500
59.0013.5000
60.0039.0000
62.0010.5000
66.0010.5000
70.0010.5000
30.428.5167
1.00107.1625
2.0089.8875
3.0047.2875
4.0099.4333
6.0012.4750
7.0068.8125
8.0029.1250
9.00114.7125
10.0027.9000
11.0065.6875
12.0011.2417
14.0086.5875
15.007.2292
16.00154.9042
17.0047.3959
18.0074.2333
19.0094.7915
20.00110.8751
20.507.2500
21.00111.3541
22.00108.8874
23.0024.9750
23.507.2292
24.0095.1791
24.508.0500
25.0085.2125
26.00166.3333
27.0045.7833
28.00153.2374
28.5023.3292
29.0065.3749
30.0071.3000
30.508.0500
31.0023.4500
32.00192.5291
33.0070.9083
34.0036.9958
34.506.4375
35.0038.1708
36.0055.0916
37.007.9250
38.0023.6083
39.0071.2750
40.0058.5208
40.5022.2500
41.0021.2333
42.0032.2667
43.0014.5000
44.0040.1250
45.0015.0250
45.507.2250
47.0016.2500
48.007.8542
49.000.0000
50.008.0500
51.0022.8542
55.508.0500
59.007.2500
61.006.2375
65.007.7500
70.507.7500
74.007.7750
\n", "
" ], "text/plain": [ " Fare\n", "Sex Pclass Age \n", "female 1 2.00 151.5500\n", " 14.00 120.0000\n", " 15.00 211.3375\n", " 16.00 183.8792\n", " 17.00 165.9000\n", " 18.00 569.5500\n", " 19.00 147.3625\n", " 21.00 340.3333\n", " 22.00 322.6500\n", " 23.00 376.2750\n", " 24.00 534.2625\n", " 25.00 151.5500\n", " 26.00 78.8500\n", " 29.00 211.3375\n", " 30.00 374.3542\n", " 31.00 278.1417\n", " 32.00 76.2917\n", " 33.00 229.6000\n", " 35.00 926.5375\n", " 36.00 326.6333\n", " 38.00 378.8083\n", " 39.00 329.5916\n", " 40.00 287.9625\n", " 41.00 134.5000\n", " 42.00 227.5250\n", " 43.00 211.3375\n", " 44.00 85.7000\n", " 45.00 164.8667\n", " 47.00 52.5542\n", " 48.00 65.5292\n", " 49.00 102.6584\n", " 50.00 276.2333\n", " 51.00 77.9583\n", " 52.00 171.7667\n", " 53.00 51.4792\n", " 54.00 137.6667\n", " 56.00 83.1583\n", " 58.00 326.5333\n", " 60.00 75.2500\n", " 62.00 80.0000\n", " 63.00 77.9583\n", " 2 2.00 26.0000\n", " 3.00 41.5792\n", " 4.00 62.0000\n", " 5.00 27.7500\n", " 6.00 33.0000\n", " 7.00 26.2500\n", " 8.00 26.2500\n", " 13.00 19.5000\n", " 14.00 30.0708\n", " 17.00 22.5000\n", " 18.00 36.0000\n", " 19.00 52.0000\n", " 21.00 10.5000\n", " 22.00 70.5792\n", " 23.00 13.7917\n", " 24.00 177.2500\n", " 25.00 56.0000\n", " 26.00 26.0000\n", " 27.00 45.3583\n", " 28.00 88.6500\n", " 29.00 62.5000\n", " 30.00 46.3500\n", " 31.00 26.2500\n", " 32.00 13.0000\n", " 32.50 13.0000\n", " 33.00 53.7500\n", " 34.00 79.0000\n", " 35.00 21.0000\n", " 36.00 52.0000\n", " 38.00 13.0000\n", " 40.00 67.7500\n", " 41.00 19.5000\n", " 42.00 39.0000\n", " 44.00 26.0000\n", " 45.00 39.7500\n", " 48.00 65.0000\n", " 50.00 47.0000\n", " 54.00 23.0000\n", " 55.00 16.0000\n", " 57.00 10.5000\n", " 3 0.75 38.5166\n", " 1.00 26.8750\n", " 2.00 81.9250\n", " 3.00 21.0750\n", " 4.00 52.1417\n", " 5.00 63.1208\n", " 6.00 31.2750\n", " 8.00 21.0750\n", " 9.00 108.7958\n", " 10.00 24.1500\n", " 11.00 31.2750\n", " 13.00 7.2292\n", " 14.00 19.0959\n", " 14.50 14.4542\n", " 15.00 29.7084\n", " 16.00 62.3833\n", " 17.00 22.3833\n", " 18.00 91.4667\n", " 19.00 15.7334\n", " 20.00 18.4875\n", " 21.00 59.6000\n", " 22.00 50.8792\n", " 23.00 15.4750\n", " 24.00 60.6583\n", " 25.00 15.7000\n", " 26.00 31.8792\n", " 27.00 31.5333\n", " 28.00 22.2958\n", " 29.00 46.7833\n", " 30.00 45.2875\n", " 30.50 7.7500\n", " 31.00 55.0625\n", " 32.00 15.5000\n", " 33.00 15.8500\n", " 35.00 20.2500\n", " 36.00 17.4000\n", " 37.00 9.5875\n", " 38.00 31.3875\n", " 39.00 60.4000\n", " 40.00 9.4750\n", " 41.00 59.9000\n", " 43.00 46.9000\n", " 45.00 50.1042\n", " 47.00 14.5000\n", " 48.00 34.3750\n", " 63.00 9.5875\n", "male 1 0.92 151.5500\n", " 4.00 81.8583\n", " 11.00 120.0000\n", " 17.00 110.8833\n", " 18.00 108.9000\n", " 19.00 316.1000\n", " 21.00 77.2875\n", " 22.00 135.6333\n", " 23.00 63.3583\n", " 24.00 326.7208\n", " 25.00 146.5209\n", " 26.00 30.0000\n", " 27.00 371.8292\n", " 28.00 191.3208\n", " 29.00 96.6000\n", " 30.00 27.7500\n", " 31.00 159.4958\n", " 32.00 30.5000\n", " 33.00 5.0000\n", " 34.00 26.5500\n", " 35.00 565.1667\n", " 36.00 803.9792\n", " 37.00 135.3542\n", " 38.00 243.4625\n", " 39.00 0.0000\n", " 40.00 58.7208\n", " 42.00 130.8417\n", " 44.00 90.0000\n", " 45.00 172.0750\n", " 45.50 28.5000\n", " 46.00 140.3750\n", " 47.00 150.1083\n", " 48.00 155.2792\n", " 49.00 256.9167\n", " 50.00 295.9750\n", " 51.00 87.9292\n", " 52.00 110.1500\n", " 54.00 129.1500\n", " 55.00 30.5000\n", " 56.00 92.7458\n", " 58.00 142.9750\n", " 60.00 105.7500\n", " 61.00 65.8208\n", " 62.00 53.1000\n", " 64.00 289.0000\n", " 65.00 88.5292\n", " 70.00 71.0000\n", " 71.00 84.1584\n", " 80.00 30.0000\n", " 2 0.67 14.5000\n", " 0.83 47.7500\n", " 1.00 76.0042\n", " 2.00 26.0000\n", " 3.00 44.7500\n", " 8.00 36.7500\n", " 16.00 36.5000\n", " 18.00 109.5000\n", " 19.00 70.7500\n", " 21.00 158.5000\n", " 23.00 76.0458\n", " 24.00 97.0000\n", " 25.00 106.5792\n", " 26.00 10.5000\n", " 27.00 52.0000\n", " 28.00 70.0000\n", " 29.00 59.2208\n", " 30.00 73.5000\n", " 31.00 86.7542\n", " 32.00 110.0000\n", " 32.50 30.0708\n", " 33.00 12.2750\n", " 34.00 107.0000\n", " 35.00 36.5000\n", " 36.00 64.1250\n", " 36.50 26.0000\n", " 37.00 26.0000\n", " 39.00 52.0000\n", " 42.00 53.0000\n", " 43.00 26.2500\n", " 44.00 26.0000\n", " 46.00 26.0000\n", " 47.00 15.0000\n", " 48.00 13.0000\n", " 50.00 13.0000\n", " 51.00 12.5250\n", " 52.00 26.5000\n", " 54.00 66.0000\n", " 57.00 12.3500\n", " 59.00 13.5000\n", " 60.00 39.0000\n", " 62.00 10.5000\n", " 66.00 10.5000\n", " 70.00 10.5000\n", " 3 0.42 8.5167\n", " 1.00 107.1625\n", " 2.00 89.8875\n", " 3.00 47.2875\n", " 4.00 99.4333\n", " 6.00 12.4750\n", " 7.00 68.8125\n", " 8.00 29.1250\n", " 9.00 114.7125\n", " 10.00 27.9000\n", " 11.00 65.6875\n", " 12.00 11.2417\n", " 14.00 86.5875\n", " 15.00 7.2292\n", " 16.00 154.9042\n", " 17.00 47.3959\n", " 18.00 74.2333\n", " 19.00 94.7915\n", " 20.00 110.8751\n", " 20.50 7.2500\n", " 21.00 111.3541\n", " 22.00 108.8874\n", " 23.00 24.9750\n", " 23.50 7.2292\n", " 24.00 95.1791\n", " 24.50 8.0500\n", " 25.00 85.2125\n", " 26.00 166.3333\n", " 27.00 45.7833\n", " 28.00 153.2374\n", " 28.50 23.3292\n", " 29.00 65.3749\n", " 30.00 71.3000\n", " 30.50 8.0500\n", " 31.00 23.4500\n", " 32.00 192.5291\n", " 33.00 70.9083\n", " 34.00 36.9958\n", " 34.50 6.4375\n", " 35.00 38.1708\n", " 36.00 55.0916\n", " 37.00 7.9250\n", " 38.00 23.6083\n", " 39.00 71.2750\n", " 40.00 58.5208\n", " 40.50 22.2500\n", " 41.00 21.2333\n", " 42.00 32.2667\n", " 43.00 14.5000\n", " 44.00 40.1250\n", " 45.00 15.0250\n", " 45.50 7.2250\n", " 47.00 16.2500\n", " 48.00 7.8542\n", " 49.00 0.0000\n", " 50.00 8.0500\n", " 51.00 22.8542\n", " 55.50 8.0500\n", " 59.00 7.2500\n", " 61.00 6.2375\n", " 65.00 7.7500\n", " 70.50 7.7500\n", " 74.00 7.7750" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.DataFrame(df.groupby([\"Sex\", \"Pclass\", \"Age\"]).Fare.sum())\n", "df" ] }, { "cell_type": "markdown", "id": "referenced-croatia", "metadata": {}, "source": [ "This is now a bit easier to read. You should now have a fairly good understanding of how to group data in Pandas using groupby() and some of the more powerful ways you can use groupby() to manipulate quantitative data." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.2" } }, "nbformat": 4, "nbformat_minor": 5 }