9.
Plotting Data with Pandas

Dr. W.J.B. Mattingly
Smithsonian Data Science Lab and United States Holocaust Memorial Museum
August 2021

9.1. Covered in this Chapter

  1. Key Ways to Plot Data in Pandas

  2. How to Create a Bar or Barh Graph

  3. How to Create a Pie Chart

  4. How to Plot Data in a Scatter Plot

9.2. Importing the DataFrame

This notebook begins Part 3 of this textbook. Here, we will build upon our skills from Parts 1 and 2, and begin exploring how to visualize data in Pandas. Pandas sits on top of Matplotlib, one of the standard libraries used by data scientists for plotting data. As we will see in the next notebooks, you can also leverage other, more robust graphing libraries through Pandas. For now, though, let’s start with the basics. In this notebook, we will explore how to create three types of graphs: bar (and barh), pie, and scatter. I will also introduce you to some of the more recent features of Pandas 1.3.0, that allow you to control the graph a bit more.

Before we do any of that, however, let’s import pandas and our data.

import pandas as pd
df = pd.read_csv("data/titanic.csv")
df
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S
1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C
2 3 1 3 Heikkinen, Miss. Laina female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S
3 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 C123 S
4 5 0 3 Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 NaN S
... ... ... ... ... ... ... ... ... ... ... ... ...
886 887 0 2 Montvila, Rev. Juozas male 27.0 0 0 211536 13.0000 NaN S
887 888 1 1 Graham, Miss. Margaret Edith female 19.0 0 0 112053 30.0000 B42 S
888 889 0 3 Johnston, Miss. Catherine Helen "Carrie" female NaN 1 2 W./C. 6607 23.4500 NaN S
889 890 1 1 Behr, Mr. Karl Howell male 26.0 0 0 111369 30.0000 C148 C
890 891 0 3 Dooley, Mr. Patrick male 32.0 0 0 370376 7.7500 NaN Q

891 rows × 12 columns

9.3. Bar and Barh Charts with Pandas

With our data imported successfully, let’s jump right in with bar charts. Bar charts a great way to visualize qualitative data quantitatively. To demonstrate what I mean by this, let’s consider if we wanted to know how many male passengers were on the Titanic relative to female passengers. I could grab all the value counts and look at the numbers by calling .value_counts(), as in the example below.

df['Sex'].value_counts()
male      577
female    314
Name: Sex, dtype: int64

This kind of raw numerical data is useful, but it is often difficult to present visually to audiences. For this reason, it is quite common to have the raw numerical data available, but to give the audience a quick sense of the numbers visually. We can take that initial code we see above and append two other methods to it .plot.bar() and we get the following result.

df['Sex'].value_counts().plot.bar()
<AxesSubplot:>
_images/04_01_pandas_and_plots_13_1.png

Not bad, but this chart is quite staid. For one thing, we don’t even have a title! Let’s fix that. We can pass a keyword argument of title. This will take a string.

df['Sex'].value_counts().plot.bar(title="Passengers on the Titanic")
<AxesSubplot:title={'center':'Passengers on the Titanic'}>
_images/04_01_pandas_and_plots_15_1.png

We have another serious issue, though. Both types of gender are represented with the same color. This can be difficult for audiences to decipher in some instances, so let’s change that. We can pass the keyword argument of color which will take a list of colors.

df['Sex'].value_counts().plot.bar(title="Passengers on the Titanic", color=["blue", "red"])
<AxesSubplot:title={'center':'Passengers on the Titanic'}>
_images/04_01_pandas_and_plots_17_1.png
df['Sex'].value_counts().plot.bar(title="Passengers on the Titanic", color=["blue", "red"])
<AxesSubplot:title={'center':'Passengers on the Titanic'}>
_images/04_01_pandas_and_plots_18_1.png

We can do the same thing with a barh graph, or a bar-horizontal graph.

df['Sex'].value_counts().plot.barh(title="Passengers on the Titanic", color=["blue", "red"])
<AxesSubplot:title={'center':'Passengers on the Titanic'}>
_images/04_01_pandas_and_plots_20_1.png

9.4. Pie Charts with Pandas

df['Sex'].value_counts().plot.pie()
<AxesSubplot:ylabel='Sex'>
_images/04_01_pandas_and_plots_23_1.png
df['Sex'].value_counts().plot.pie(figsize=(6, 6))
<AxesSubplot:ylabel='Sex'>
_images/04_01_pandas_and_plots_24_1.png

Let’s say I was interested in the title of the genders not being lowercase. I can add in some custom labels to the data as a keyword argument, labels, which takes a list.

df['Sex'].value_counts().plot.pie(labels=["Male", "Female"])
<AxesSubplot:ylabel='Sex'>
_images/04_01_pandas_and_plots_26_1.png

Now that we have our labels as we want them, let’s give thee audience a bit of a better experience. Let’s allow them to easily see the percentage of each gender, not just visually, but quantitatively. To do this, we can pass the keyword argument, autopct, which will take a string. In this case, we can pass in the argument “%.2f” which is a formatted string. This argument will convert our data into a percentage.

df['Sex'].value_counts().plot.pie(labels=["Male", "Female"], autopct="%.2f")
<AxesSubplot:ylabel='Sex'>
_images/04_01_pandas_and_plots_28_1.png

9.5. Scatter Plots with Pandas

Scatter plots allow us to plot qualitative data quantitatively in relation to two numerical attributes. Let’s imagine that we are interested in exploring all passengers, something qualitative. Now, we want to know how each passenger relates to other passengers on two numerical, or quantitative attributes, e.g. the age of the passenger and the fare that they paid. Both of these are quantitative. We can therefore represent each person as a point on the scatter plot and plot them in relation to their fare (vertical, or y axis) and age (horizontal, or x axis) on the graph.

In Pandas we can do this by passing two keyword arguments, x and y and set them both equal to the DataFramee column we want, e.g. “Age” and “Fare”.

df.plot.scatter(x="Age", y="Fare")
<AxesSubplot:xlabel='Age', ylabel='Fare'>
_images/04_01_pandas_and_plots_31_1.png

That looks good, but we can do better. Let’s try to color coordinate this data. Let’s say we are interested in seeing not only the passenger’s age and fare, but we’re also interested in color-coordinating the graph so that their Pclass effects the color of each plot. We can do this by passing a few new keyword arguments.

  1. c=”Pclass” => c will be the column that affects the color

  2. cmap=”virdis” => will be the color map we want to use (these are built into Pandas)

df.plot.scatter(x="Age", y="Fare", c="Pclass",cmap="viridis")
<AxesSubplot:xlabel='Age', ylabel='Fare'>
_images/04_01_pandas_and_plots_33_1.png

This is starting to look a lot better now. But let’s say we didn’t want to represent our data as a series of marginally changing numbers. When we pass a DataFrame column to c as a set of numbers, Pandas presumes that that number corresponds to a gradient change in color. But passenger class is not a gradient change, it is a integral change, meaning no one will be Pclass 1.2. They will be 1, 2, or 3. In order to fix this graph, we can make a few changes. First, we can use df.loc that we met in a previous notebook to grab all classes. Now, we know there are three. We can convert these from numerical representations of the class into string representations, e.g. First, Second, and Third.

Next, we can convert that entire column from a string column into a Pandas Categorical Class.

df.loc[(df.Pclass == 1),'Pclass']="First"
df.loc[(df.Pclass == 2),'Pclass']="Second"
df.loc[(df.Pclass == 3),'Pclass']="Third"

We can now see that our data has now been altered in the Pclass column.

df
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 1 0 Third Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S
1 2 1 First Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C
2 3 1 Third Heikkinen, Miss. Laina female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S
3 4 1 First Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 C123 S
4 5 0 Third Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 NaN S
... ... ... ... ... ... ... ... ... ... ... ... ...
886 887 0 Second Montvila, Rev. Juozas male 27.0 0 0 211536 13.0000 NaN S
887 888 1 First Graham, Miss. Margaret Edith female 19.0 0 0 112053 30.0000 B42 S
888 889 0 Third Johnston, Miss. Catherine Helen "Carrie" female NaN 1 2 W./C. 6607 23.4500 NaN S
889 890 1 First Behr, Mr. Karl Howell male 26.0 0 0 111369 30.0000 C148 C
890 891 0 Third Dooley, Mr. Patrick male 32.0 0 0 370376 7.7500 NaN Q

891 rows × 12 columns

Now that our data is successfully converted into a string, you might be thinking that we can run the same code as before and we should see the data divided between strings, rather than a gradient shift between floats. If we execute the cell below, however, we get a rather large and scary looking error. (Scroll down to see the solution).

df.plot.scatter(x="Age", y="Fare", c="Pclass", cmap="viridis", s=50)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
c:\users\wma22\appdata\local\programs\python\python39\lib\site-packages\matplotlib\axes\_axes.py in _parse_scatter_color_args(c, edgecolors, kwargs, xsize, get_next_color_func)
   4290             try:  # Is 'c' acceptable as PathCollection facecolors?
-> 4291                 colors = mcolors.to_rgba_array(c)
   4292             except (TypeError, ValueError) as err:

c:\users\wma22\appdata\local\programs\python\python39\lib\site-packages\matplotlib\colors.py in to_rgba_array(c, alpha)
    340     else:
--> 341         return np.array([to_rgba(cc, alpha) for cc in c])
    342 

c:\users\wma22\appdata\local\programs\python\python39\lib\site-packages\matplotlib\colors.py in <listcomp>(.0)
    340     else:
--> 341         return np.array([to_rgba(cc, alpha) for cc in c])
    342 

c:\users\wma22\appdata\local\programs\python\python39\lib\site-packages\matplotlib\colors.py in to_rgba(c, alpha)
    188     if rgba is None:  # Suppress exception chaining of cache lookup failure.
--> 189         rgba = _to_rgba_no_colorcycle(c, alpha)
    190         try:

c:\users\wma22\appdata\local\programs\python\python39\lib\site-packages\matplotlib\colors.py in _to_rgba_no_colorcycle(c, alpha)
    259             return c, c, c, alpha if alpha is not None else 1.
--> 260         raise ValueError(f"Invalid RGBA argument: {orig_c!r}")
    261     # tuple color.

ValueError: Invalid RGBA argument: 'Third'

The above exception was the direct cause of the following exception:

ValueError                                Traceback (most recent call last)
<ipython-input-18-5db8b5e23a68> in <module>
----> 1 df.plot.scatter(x="Age", y="Fare", c="Pclass", cmap="viridis", s=50)

c:\users\wma22\appdata\local\programs\python\python39\lib\site-packages\pandas\plotting\_core.py in scatter(self, x, y, s, c, **kwargs)
   1634             ...                       colormap='viridis')
   1635         """
-> 1636         return self(kind="scatter", x=x, y=y, s=s, c=c, **kwargs)
   1637 
   1638     def hexbin(self, x, y, C=None, reduce_C_function=None, gridsize=None, **kwargs):

c:\users\wma22\appdata\local\programs\python\python39\lib\site-packages\pandas\plotting\_core.py in __call__(self, *args, **kwargs)
    915         if kind in self._dataframe_kinds:
    916             if isinstance(data, ABCDataFrame):
--> 917                 return plot_backend.plot(data, x=x, y=y, kind=kind, **kwargs)
    918             else:
    919                 raise ValueError(f"plot kind {kind} can only be used for data frames")

c:\users\wma22\appdata\local\programs\python\python39\lib\site-packages\pandas\plotting\_matplotlib\__init__.py in plot(data, kind, **kwargs)
     69             kwargs["ax"] = getattr(ax, "left_ax", ax)
     70     plot_obj = PLOT_CLASSES[kind](data, **kwargs)
---> 71     plot_obj.generate()
     72     plot_obj.draw()
     73     return plot_obj.result

c:\users\wma22\appdata\local\programs\python\python39\lib\site-packages\pandas\plotting\_matplotlib\core.py in generate(self)
    286         self._compute_plot_data()
    287         self._setup_subplots()
--> 288         self._make_plot()
    289         self._add_table()
    290         self._make_legend()

c:\users\wma22\appdata\local\programs\python\python39\lib\site-packages\pandas\plotting\_matplotlib\core.py in _make_plot(self)
   1068         else:
   1069             label = None
-> 1070         scatter = ax.scatter(
   1071             data[x].values,
   1072             data[y].values,

c:\users\wma22\appdata\local\programs\python\python39\lib\site-packages\matplotlib\__init__.py in inner(ax, data, *args, **kwargs)
   1445     def inner(ax, *args, data=None, **kwargs):
   1446         if data is None:
-> 1447             return func(ax, *map(sanitize_sequence, args), **kwargs)
   1448 
   1449         bound = new_sig.bind(ax, *args, **kwargs)

c:\users\wma22\appdata\local\programs\python\python39\lib\site-packages\matplotlib\cbook\deprecation.py in wrapper(*inner_args, **inner_kwargs)
    409                          else deprecation_addendum,
    410                 **kwargs)
--> 411         return func(*inner_args, **inner_kwargs)
    412 
    413     return wrapper

c:\users\wma22\appdata\local\programs\python\python39\lib\site-packages\matplotlib\axes\_axes.py in scatter(self, x, y, s, c, marker, cmap, norm, vmin, vmax, alpha, linewidths, verts, edgecolors, plotnonfinite, **kwargs)
   4449 
   4450         c, colors, edgecolors = \
-> 4451             self._parse_scatter_color_args(
   4452                 c, edgecolors, kwargs, x.size,
   4453                 get_next_color_func=self._get_patches_for_fill.get_next_color)

c:\users\wma22\appdata\local\programs\python\python39\lib\site-packages\matplotlib\axes\_axes.py in _parse_scatter_color_args(c, edgecolors, kwargs, xsize, get_next_color_func)
   4298                     # Both the mapping *and* the RGBA conversion failed: pretty
   4299                     # severe failure => one may appreciate a verbose feedback.
-> 4300                     raise ValueError(
   4301                         f"'c' argument must be a color, a sequence of colors, "
   4302                         f"or a sequence of numbers, not {c}") from err

ValueError: 'c' argument must be a color, a sequence of colors, or a sequence of numbers, not ['Third' 'First' 'Third' 'First' 'Third' 'Third' 'First' 'Third' 'Third'
 'Second' 'Third' 'First' 'Third' 'Third' 'Third' 'Second' 'Third'
 'Second' 'Third' 'Third' 'Second' 'Second' 'Third' 'First' 'Third'
 'Third' 'Third' 'First' 'Third' 'Third' 'First' 'First' 'Third' 'Second'
 'First' 'First' 'Third' 'Third' 'Third' 'Third' 'Third' 'Second' 'Third'
 'Second' 'Third' 'Third' 'Third' 'Third' 'Third' 'Third' 'Third' 'Third'
 'First' 'Second' 'First' 'First' 'Second' 'Third' 'Second' 'Third'
 'Third' 'First' 'First' 'Third' 'First' 'Third' 'Second' 'Third' 'Third'
 'Third' 'Second' 'Third' 'Second' 'Third' 'Third' 'Third' 'Third' 'Third'
 'Second' 'Third' 'Third' 'Third' 'Third' 'First' 'Second' 'Third' 'Third'
 'Third' 'First' 'Third' 'Third' 'Third' 'First' 'Third' 'Third' 'Third'
 'First' 'First' 'Second' 'Second' 'Third' 'Third' 'First' 'Third' 'Third'
 'Third' 'Third' 'Third' 'Third' 'Third' 'First' 'Third' 'Third' 'Third'
 'Third' 'Third' 'Third' 'Second' 'First' 'Third' 'Second' 'Third'
 'Second' 'Second' 'First' 'Third' 'Third' 'Third' 'Third' 'Third' 'Third'
 'Third' 'Third' 'Second' 'Second' 'Second' 'First' 'First' 'Third'
 'First' 'Third' 'Third' 'Third' 'Third' 'Second' 'Second' 'Third' 'Third'
 'Second' 'Second' 'Second' 'First' 'Third' 'Third' 'Third' 'First'
 'Third' 'Third' 'Third' 'Third' 'Third' 'Second' 'Third' 'Third' 'Third'
 'Third' 'First' 'Third' 'First' 'Third' 'First' 'Third' 'Third' 'Third'
 'First' 'Third' 'Third' 'First' 'Second' 'Third' 'Third' 'Second' 'Third'
 'Second' 'Third' 'First' 'Third' 'First' 'Third' 'Third' 'Second'
 'Second' 'Third' 'Second' 'First' 'First' 'Third' 'Third' 'Third'
 'Second' 'Third' 'Third' 'Third' 'Third' 'Third' 'Third' 'Third' 'Third'
 'Third' 'First' 'Third' 'Second' 'Third' 'Second' 'Third' 'First' 'Third'
 'Second' 'First' 'Second' 'Third' 'Second' 'Third' 'Third' 'First'
 'Third' 'Second' 'Third' 'Second' 'Third' 'First' 'Third' 'Second'
 'Third' 'Second' 'Third' 'Second' 'Second' 'Second' 'Second' 'Third'
 'Third' 'Second' 'Third' 'Third' 'First' 'Third' 'Second' 'First'
 'Second' 'Third' 'Third' 'First' 'Third' 'Third' 'Third' 'First' 'First'
 'First' 'Second' 'Third' 'Third' 'First' 'First' 'Third' 'Second' 'Third'
 'Third' 'First' 'First' 'First' 'Third' 'Second' 'First' 'Third' 'First'
 'Third' 'Second' 'Third' 'Third' 'Third' 'Third' 'Third' 'Third' 'First'
 'Third' 'Third' 'Third' 'Second' 'Third' 'First' 'First' 'Second' 'Third'
 'Third' 'First' 'Third' 'First' 'First' 'First' 'Third' 'Third' 'Third'
 'Second' 'Third' 'First' 'First' 'First' 'Second' 'First' 'First' 'First'
 'Second' 'Third' 'Second' 'Third' 'Second' 'Second' 'First' 'First'
 'Third' 'Third' 'Second' 'Second' 'Third' 'First' 'Third' 'Second'
 'Third' 'First' 'Third' 'First' 'First' 'Third' 'First' 'Third' 'First'
 'First' 'Third' 'First' 'Second' 'First' 'Second' 'Second' 'Second'
 'Second' 'Second' 'Third' 'Third' 'Third' 'Third' 'First' 'Third' 'Third'
 'Third' 'Third' 'First' 'Second' 'Third' 'Third' 'Third' 'Second' 'Third'
 'Third' 'Third' 'Third' 'First' 'Third' 'Third' 'First' 'First' 'Third'
 'Third' 'First' 'Third' 'First' 'Third' 'First' 'Third' 'Third' 'First'
 'Third' 'Third' 'First' 'Third' 'Second' 'Third' 'Second' 'Third'
 'Second' 'First' 'Third' 'Third' 'First' 'Third' 'Third' 'Third' 'Second'
 'Second' 'Second' 'Third' 'Third' 'Third' 'Third' 'Third' 'Second'
 'Third' 'Second' 'Third' 'Third' 'Third' 'Third' 'First' 'Second' 'Third'
 'Third' 'Second' 'Second' 'Second' 'Third' 'Third' 'Third' 'Third'
 'Third' 'Third' 'Third' 'Second' 'Second' 'Third' 'Third' 'First' 'Third'
 'Second' 'Third' 'First' 'First' 'Third' 'Second' 'First' 'Second'
 'Second' 'Third' 'Third' 'Second' 'Third' 'First' 'Second' 'First'
 'Third' 'First' 'Second' 'Third' 'First' 'First' 'Third' 'Third' 'First'
 'First' 'Second' 'Third' 'First' 'Third' 'First' 'Second' 'Third' 'Third'
 'Second' 'First' 'Third' 'Third' 'Third' 'Third' 'Second' 'Second'
 'Third' 'First' 'Second' 'Third' 'Third' 'Third' 'Third' 'Second' 'Third'
 'Third' 'First' 'Third' 'First' 'First' 'Third' 'Third' 'Third' 'Third'
 'First' 'First' 'Third' 'Third' 'First' 'Third' 'First' 'Third' 'Third'
 'Third' 'Third' 'Third' 'First' 'First' 'Second' 'First' 'Third' 'Third'
 'Third' 'Third' 'First' 'First' 'Third' 'First' 'Second' 'Third' 'Second'
 'Third' 'First' 'Third' 'Third' 'First' 'Third' 'Third' 'Second' 'First'
 'Third' 'Second' 'Second' 'Third' 'Third' 'Third' 'Third' 'Second'
 'First' 'First' 'Third' 'First' 'First' 'Third' 'Third' 'Second' 'First'
 'First' 'Second' 'Second' 'Third' 'Second' 'First' 'Second' 'Third'
 'Third' 'Third' 'First' 'First' 'First' 'First' 'Third' 'Third' 'Third'
 'Second' 'Third' 'Third' 'Third' 'Third' 'Third' 'Third' 'Third' 'Second'
 'First' 'First' 'Third' 'Third' 'Third' 'Second' 'First' 'Third' 'Third'
 'Second' 'First' 'Second' 'First' 'Third' 'First' 'Second' 'First'
 'Third' 'Third' 'Third' 'First' 'Third' 'Third' 'Second' 'Third' 'Second'
 'Third' 'Third' 'First' 'Second' 'Third' 'First' 'Third' 'First' 'Third'
 'Third' 'First' 'Second' 'First' 'Third' 'Third' 'Third' 'Third' 'Third'
 'Second' 'Third' 'Third' 'Second' 'Second' 'Third' 'First' 'Third'
 'Third' 'Third' 'First' 'Second' 'First' 'Third' 'Third' 'First' 'Third'
 'First' 'First' 'Third' 'Second' 'Third' 'Second' 'Third' 'Third' 'Third'
 'First' 'Third' 'Third' 'Third' 'First' 'Third' 'First' 'Third' 'Third'
 'Third' 'Second' 'Third' 'Third' 'Third' 'Second' 'Third' 'Third'
 'Second' 'First' 'First' 'Third' 'First' 'Third' 'Third' 'Second'
 'Second' 'Third' 'Third' 'First' 'Second' 'First' 'Second' 'Second'
 'Second' 'Third' 'Third' 'Third' 'Third' 'First' 'Third' 'First' 'Third'
 'Third' 'Second' 'Second' 'Third' 'Third' 'Third' 'First' 'First' 'Third'
 'Third' 'Third' 'First' 'Second' 'Third' 'Third' 'First' 'Third' 'First'
 'First' 'Third' 'Third' 'Third' 'Second' 'Second' 'First' 'First' 'Third'
 'First' 'First' 'First' 'Third' 'Second' 'Third' 'First' 'Second' 'Third'
 'Third' 'Second' 'Third' 'Second' 'Second' 'First' 'Third' 'Second'
 'Third' 'Second' 'Third' 'First' 'Third' 'Second' 'Second' 'Second'
 'Third' 'Third' 'First' 'Third' 'Third' 'First' 'First' 'First' 'Third'
 'Third' 'First' 'Third' 'Second' 'First' 'Third' 'Second' 'Third' 'Third'
 'Third' 'Second' 'Second' 'Third' 'Second' 'Third' 'First' 'Third'
 'Third' 'Third' 'First' 'Third' 'First' 'First' 'Third' 'Third' 'Third'
 'Third' 'Third' 'Second' 'Third' 'Second' 'Third' 'Third' 'Third' 'Third'
 'First' 'Third' 'First' 'First' 'Third' 'Third' 'Third' 'Third' 'Third'
 'Third' 'First' 'Third' 'Second' 'Third' 'First' 'Third' 'Second' 'First'
 'Third' 'Third' 'Third' 'Second' 'Second' 'First' 'Third' 'Third' 'Third'
 'First' 'Third' 'Second' 'First' 'Third' 'Third' 'Second' 'Third' 'Third'
 'First' 'Third' 'Second' 'Third' 'Third' 'First' 'Third' 'First' 'Third'
 'Third' 'Third' 'Third' 'Second' 'Third' 'First' 'Third' 'Second' 'Third'
 'Third' 'Third' 'First' 'Third' 'Third' 'Third' 'First' 'Third' 'Second'
 'First' 'Third' 'Third' 'Third' 'Third' 'Third' 'Second' 'First' 'Third'
 'Third' 'Third' 'First' 'Second' 'Third' 'First' 'First' 'Third' 'Third'
 'Third' 'Second' 'First' 'Third' 'Second' 'Second' 'Second' 'First'
 'Third' 'Third' 'Third' 'First' 'First' 'Third' 'Second' 'Third' 'Third'
 'Third' 'Third' 'First' 'Second' 'Third' 'Third' 'Second' 'Third' 'Third'
 'Second' 'First' 'Third' 'First' 'Third']
_images/04_01_pandas_and_plots_39_1.png

Keeping this massive error in the textbook is essential, despite its size being rather annoying. It tells us a lot of information about the problem. When we try and pass a keyword argument of c, Pandas is expecting a series of numbers (which will correspond to gradient shifts in the cmap), a list of colors, or a Pandas Categorical column. To change our data to a list of colors, let’s convert our data into three different colors.

df.loc[(df.Pclass == "First"),'Pclass']="red"
df.loc[(df.Pclass == "Second"),'Pclass']="blue"
df.loc[(df.Pclass == "Third"),'Pclass']="green"
df.plot.scatter(x="Age", y="Fare", c="Pclass")
<AxesSubplot:xlabel='Age', ylabel='Fare'>
_images/04_01_pandas_and_plots_42_1.png

Now, our plots are all color coordinated. But I don’t like this. It doesn’t have a nice ledger to read. Instead, we should convert this data into a Categorical Column. To do this, let’s first get our data back into First, Second, and Third class format.

df.loc[(df.Pclass == "red"),'Pclass']="First"
df.loc[(df.Pclass == "blue"),'Pclass']="Second"
df.loc[(df.Pclass == "green"),'Pclass']="Third"

Now, let’s try this again by first converting Pclass into a Categorical type.

df['Pclass'] = df.Pclass.astype('category')
df.plot.scatter(x="Age", y="Fare", c="Pclass", cmap="viridis")
<AxesSubplot:xlabel='Age', ylabel='Fare'>
_images/04_01_pandas_and_plots_47_1.png

Now, like magic, we have precisely what we want to see. But we can do even better! Let’s say we don’t like the size of the nodes (points) on the graph. We want to see smaller nodes to distinguish better between the points. We can pass another keyword argument, s, which stands for size. This expects an integer.

df.plot.scatter(x="Age", y="Fare", c="Pclass", cmap="viridis", s=5)
<AxesSubplot:xlabel='Age', ylabel='Fare'>
_images/04_01_pandas_and_plots_49_1.png

To make it a bit easier to read, let’s also adjust the size a bit. We can do this by passing the keyword argument, figsize, that we saw above with pie chars.

df.plot.scatter(x="Age", y="Fare", c="Pclass", cmap="viridis", s=5, figsize=(15,5))
<AxesSubplot:xlabel='Age', ylabel='Fare'>
_images/04_01_pandas_and_plots_51_1.png

By now, you should have a good sense of how to create simple bar, pie, and scatter charts. In the next few notebooks, we will be looking at other ways of leveraging Pandas to produce visualizations, such as using plotly and social networks with networkx.