9. Plotting Data with Pandas ¶
9.1. Covered in this Chapter¶
Key Ways to Plot Data in Pandas
How to Create a Bar or Barh Graph
How to Create a Pie Chart
How to Plot Data in a Scatter Plot
9.2. Importing the DataFrame¶
This notebook begins Part 3 of this textbook. Here, we will build upon our skills from Parts 1 and 2, and begin exploring how to visualize data in Pandas. Pandas sits on top of Matplotlib, one of the standard libraries used by data scientists for plotting data. As we will see in the next notebooks, you can also leverage other, more robust graphing libraries through Pandas. For now, though, let’s start with the basics. In this notebook, we will explore how to create three types of graphs: bar (and barh), pie, and scatter. I will also introduce you to some of the more recent features of Pandas 1.3.0, that allow you to control the graph a bit more.
Before we do any of that, however, let’s import pandas and our data.
import pandas as pd
df = pd.read_csv("data/titanic.csv")
df
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 0 | 3 | Braund, Mr. Owen Harris | male | 22.0 | 1 | 0 | A/5 21171 | 7.2500 | NaN | S |
1 | 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | C |
2 | 3 | 1 | 3 | Heikkinen, Miss. Laina | female | 26.0 | 0 | 0 | STON/O2. 3101282 | 7.9250 | NaN | S |
3 | 4 | 1 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) | female | 35.0 | 1 | 0 | 113803 | 53.1000 | C123 | S |
4 | 5 | 0 | 3 | Allen, Mr. William Henry | male | 35.0 | 0 | 0 | 373450 | 8.0500 | NaN | S |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
886 | 887 | 0 | 2 | Montvila, Rev. Juozas | male | 27.0 | 0 | 0 | 211536 | 13.0000 | NaN | S |
887 | 888 | 1 | 1 | Graham, Miss. Margaret Edith | female | 19.0 | 0 | 0 | 112053 | 30.0000 | B42 | S |
888 | 889 | 0 | 3 | Johnston, Miss. Catherine Helen "Carrie" | female | NaN | 1 | 2 | W./C. 6607 | 23.4500 | NaN | S |
889 | 890 | 1 | 1 | Behr, Mr. Karl Howell | male | 26.0 | 0 | 0 | 111369 | 30.0000 | C148 | C |
890 | 891 | 0 | 3 | Dooley, Mr. Patrick | male | 32.0 | 0 | 0 | 370376 | 7.7500 | NaN | Q |
891 rows × 12 columns
9.3. Bar and Barh Charts with Pandas¶
With our data imported successfully, let’s jump right in with bar charts. Bar charts a great way to visualize qualitative data quantitatively. To demonstrate what I mean by this, let’s consider if we wanted to know how many male passengers were on the Titanic relative to female passengers. I could grab all the value counts and look at the numbers by calling .value_counts(), as in the example below.
df['Sex'].value_counts()
male 577
female 314
Name: Sex, dtype: int64
This kind of raw numerical data is useful, but it is often difficult to present visually to audiences. For this reason, it is quite common to have the raw numerical data available, but to give the audience a quick sense of the numbers visually. We can take that initial code we see above and append two other methods to it .plot.bar() and we get the following result.
df['Sex'].value_counts().plot.bar()
<AxesSubplot:>
Not bad, but this chart is quite staid. For one thing, we don’t even have a title! Let’s fix that. We can pass a keyword argument of title. This will take a string.
df['Sex'].value_counts().plot.bar(title="Passengers on the Titanic")
<AxesSubplot:title={'center':'Passengers on the Titanic'}>
We have another serious issue, though. Both types of gender are represented with the same color. This can be difficult for audiences to decipher in some instances, so let’s change that. We can pass the keyword argument of color which will take a list of colors.
df['Sex'].value_counts().plot.bar(title="Passengers on the Titanic", color=["blue", "red"])
<AxesSubplot:title={'center':'Passengers on the Titanic'}>
df['Sex'].value_counts().plot.bar(title="Passengers on the Titanic", color=["blue", "red"])
<AxesSubplot:title={'center':'Passengers on the Titanic'}>
We can do the same thing with a barh graph, or a bar-horizontal graph.
df['Sex'].value_counts().plot.barh(title="Passengers on the Titanic", color=["blue", "red"])
<AxesSubplot:title={'center':'Passengers on the Titanic'}>
9.4. Pie Charts with Pandas¶
df['Sex'].value_counts().plot.pie()
<AxesSubplot:ylabel='Sex'>
df['Sex'].value_counts().plot.pie(figsize=(6, 6))
<AxesSubplot:ylabel='Sex'>
Let’s say I was interested in the title of the genders not being lowercase. I can add in some custom labels to the data as a keyword argument, labels, which takes a list.
df['Sex'].value_counts().plot.pie(labels=["Male", "Female"])
<AxesSubplot:ylabel='Sex'>
Now that we have our labels as we want them, let’s give thee audience a bit of a better experience. Let’s allow them to easily see the percentage of each gender, not just visually, but quantitatively. To do this, we can pass the keyword argument, autopct, which will take a string. In this case, we can pass in the argument “%.2f” which is a formatted string. This argument will convert our data into a percentage.
df['Sex'].value_counts().plot.pie(labels=["Male", "Female"], autopct="%.2f")
<AxesSubplot:ylabel='Sex'>
9.5. Scatter Plots with Pandas¶
Scatter plots allow us to plot qualitative data quantitatively in relation to two numerical attributes. Let’s imagine that we are interested in exploring all passengers, something qualitative. Now, we want to know how each passenger relates to other passengers on two numerical, or quantitative attributes, e.g. the age of the passenger and the fare that they paid. Both of these are quantitative. We can therefore represent each person as a point on the scatter plot and plot them in relation to their fare (vertical, or y axis) and age (horizontal, or x axis) on the graph.
In Pandas we can do this by passing two keyword arguments, x and y and set them both equal to the DataFramee column we want, e.g. “Age” and “Fare”.
df.plot.scatter(x="Age", y="Fare")
<AxesSubplot:xlabel='Age', ylabel='Fare'>
That looks good, but we can do better. Let’s try to color coordinate this data. Let’s say we are interested in seeing not only the passenger’s age and fare, but we’re also interested in color-coordinating the graph so that their Pclass effects the color of each plot. We can do this by passing a few new keyword arguments.
c=”Pclass” => c will be the column that affects the color
cmap=”virdis” => will be the color map we want to use (these are built into Pandas)
df.plot.scatter(x="Age", y="Fare", c="Pclass",cmap="viridis")
<AxesSubplot:xlabel='Age', ylabel='Fare'>
This is starting to look a lot better now. But let’s say we didn’t want to represent our data as a series of marginally changing numbers. When we pass a DataFrame column to c as a set of numbers, Pandas presumes that that number corresponds to a gradient change in color. But passenger class is not a gradient change, it is a integral change, meaning no one will be Pclass 1.2. They will be 1, 2, or 3. In order to fix this graph, we can make a few changes. First, we can use df.loc that we met in a previous notebook to grab all classes. Now, we know there are three. We can convert these from numerical representations of the class into string representations, e.g. First, Second, and Third.
Next, we can convert that entire column from a string column into a Pandas Categorical Class.
df.loc[(df.Pclass == 1),'Pclass']="First"
df.loc[(df.Pclass == 2),'Pclass']="Second"
df.loc[(df.Pclass == 3),'Pclass']="Third"
We can now see that our data has now been altered in the Pclass column.
df
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 0 | Third | Braund, Mr. Owen Harris | male | 22.0 | 1 | 0 | A/5 21171 | 7.2500 | NaN | S |
1 | 2 | 1 | First | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | C |
2 | 3 | 1 | Third | Heikkinen, Miss. Laina | female | 26.0 | 0 | 0 | STON/O2. 3101282 | 7.9250 | NaN | S |
3 | 4 | 1 | First | Futrelle, Mrs. Jacques Heath (Lily May Peel) | female | 35.0 | 1 | 0 | 113803 | 53.1000 | C123 | S |
4 | 5 | 0 | Third | Allen, Mr. William Henry | male | 35.0 | 0 | 0 | 373450 | 8.0500 | NaN | S |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
886 | 887 | 0 | Second | Montvila, Rev. Juozas | male | 27.0 | 0 | 0 | 211536 | 13.0000 | NaN | S |
887 | 888 | 1 | First | Graham, Miss. Margaret Edith | female | 19.0 | 0 | 0 | 112053 | 30.0000 | B42 | S |
888 | 889 | 0 | Third | Johnston, Miss. Catherine Helen "Carrie" | female | NaN | 1 | 2 | W./C. 6607 | 23.4500 | NaN | S |
889 | 890 | 1 | First | Behr, Mr. Karl Howell | male | 26.0 | 0 | 0 | 111369 | 30.0000 | C148 | C |
890 | 891 | 0 | Third | Dooley, Mr. Patrick | male | 32.0 | 0 | 0 | 370376 | 7.7500 | NaN | Q |
891 rows × 12 columns
Now that our data is successfully converted into a string, you might be thinking that we can run the same code as before and we should see the data divided between strings, rather than a gradient shift between floats. If we execute the cell below, however, we get a rather large and scary looking error. (Scroll down to see the solution).
df.plot.scatter(x="Age", y="Fare", c="Pclass", cmap="viridis", s=50)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
c:\users\wma22\appdata\local\programs\python\python39\lib\site-packages\matplotlib\axes\_axes.py in _parse_scatter_color_args(c, edgecolors, kwargs, xsize, get_next_color_func)
4290 try: # Is 'c' acceptable as PathCollection facecolors?
-> 4291 colors = mcolors.to_rgba_array(c)
4292 except (TypeError, ValueError) as err:
c:\users\wma22\appdata\local\programs\python\python39\lib\site-packages\matplotlib\colors.py in to_rgba_array(c, alpha)
340 else:
--> 341 return np.array([to_rgba(cc, alpha) for cc in c])
342
c:\users\wma22\appdata\local\programs\python\python39\lib\site-packages\matplotlib\colors.py in <listcomp>(.0)
340 else:
--> 341 return np.array([to_rgba(cc, alpha) for cc in c])
342
c:\users\wma22\appdata\local\programs\python\python39\lib\site-packages\matplotlib\colors.py in to_rgba(c, alpha)
188 if rgba is None: # Suppress exception chaining of cache lookup failure.
--> 189 rgba = _to_rgba_no_colorcycle(c, alpha)
190 try:
c:\users\wma22\appdata\local\programs\python\python39\lib\site-packages\matplotlib\colors.py in _to_rgba_no_colorcycle(c, alpha)
259 return c, c, c, alpha if alpha is not None else 1.
--> 260 raise ValueError(f"Invalid RGBA argument: {orig_c!r}")
261 # tuple color.
ValueError: Invalid RGBA argument: 'Third'
The above exception was the direct cause of the following exception:
ValueError Traceback (most recent call last)
<ipython-input-18-5db8b5e23a68> in <module>
----> 1 df.plot.scatter(x="Age", y="Fare", c="Pclass", cmap="viridis", s=50)
c:\users\wma22\appdata\local\programs\python\python39\lib\site-packages\pandas\plotting\_core.py in scatter(self, x, y, s, c, **kwargs)
1634 ... colormap='viridis')
1635 """
-> 1636 return self(kind="scatter", x=x, y=y, s=s, c=c, **kwargs)
1637
1638 def hexbin(self, x, y, C=None, reduce_C_function=None, gridsize=None, **kwargs):
c:\users\wma22\appdata\local\programs\python\python39\lib\site-packages\pandas\plotting\_core.py in __call__(self, *args, **kwargs)
915 if kind in self._dataframe_kinds:
916 if isinstance(data, ABCDataFrame):
--> 917 return plot_backend.plot(data, x=x, y=y, kind=kind, **kwargs)
918 else:
919 raise ValueError(f"plot kind {kind} can only be used for data frames")
c:\users\wma22\appdata\local\programs\python\python39\lib\site-packages\pandas\plotting\_matplotlib\__init__.py in plot(data, kind, **kwargs)
69 kwargs["ax"] = getattr(ax, "left_ax", ax)
70 plot_obj = PLOT_CLASSES[kind](data, **kwargs)
---> 71 plot_obj.generate()
72 plot_obj.draw()
73 return plot_obj.result
c:\users\wma22\appdata\local\programs\python\python39\lib\site-packages\pandas\plotting\_matplotlib\core.py in generate(self)
286 self._compute_plot_data()
287 self._setup_subplots()
--> 288 self._make_plot()
289 self._add_table()
290 self._make_legend()
c:\users\wma22\appdata\local\programs\python\python39\lib\site-packages\pandas\plotting\_matplotlib\core.py in _make_plot(self)
1068 else:
1069 label = None
-> 1070 scatter = ax.scatter(
1071 data[x].values,
1072 data[y].values,
c:\users\wma22\appdata\local\programs\python\python39\lib\site-packages\matplotlib\__init__.py in inner(ax, data, *args, **kwargs)
1445 def inner(ax, *args, data=None, **kwargs):
1446 if data is None:
-> 1447 return func(ax, *map(sanitize_sequence, args), **kwargs)
1448
1449 bound = new_sig.bind(ax, *args, **kwargs)
c:\users\wma22\appdata\local\programs\python\python39\lib\site-packages\matplotlib\cbook\deprecation.py in wrapper(*inner_args, **inner_kwargs)
409 else deprecation_addendum,
410 **kwargs)
--> 411 return func(*inner_args, **inner_kwargs)
412
413 return wrapper
c:\users\wma22\appdata\local\programs\python\python39\lib\site-packages\matplotlib\axes\_axes.py in scatter(self, x, y, s, c, marker, cmap, norm, vmin, vmax, alpha, linewidths, verts, edgecolors, plotnonfinite, **kwargs)
4449
4450 c, colors, edgecolors = \
-> 4451 self._parse_scatter_color_args(
4452 c, edgecolors, kwargs, x.size,
4453 get_next_color_func=self._get_patches_for_fill.get_next_color)
c:\users\wma22\appdata\local\programs\python\python39\lib\site-packages\matplotlib\axes\_axes.py in _parse_scatter_color_args(c, edgecolors, kwargs, xsize, get_next_color_func)
4298 # Both the mapping *and* the RGBA conversion failed: pretty
4299 # severe failure => one may appreciate a verbose feedback.
-> 4300 raise ValueError(
4301 f"'c' argument must be a color, a sequence of colors, "
4302 f"or a sequence of numbers, not {c}") from err
ValueError: 'c' argument must be a color, a sequence of colors, or a sequence of numbers, not ['Third' 'First' 'Third' 'First' 'Third' 'Third' 'First' 'Third' 'Third'
'Second' 'Third' 'First' 'Third' 'Third' 'Third' 'Second' 'Third'
'Second' 'Third' 'Third' 'Second' 'Second' 'Third' 'First' 'Third'
'Third' 'Third' 'First' 'Third' 'Third' 'First' 'First' 'Third' 'Second'
'First' 'First' 'Third' 'Third' 'Third' 'Third' 'Third' 'Second' 'Third'
'Second' 'Third' 'Third' 'Third' 'Third' 'Third' 'Third' 'Third' 'Third'
'First' 'Second' 'First' 'First' 'Second' 'Third' 'Second' 'Third'
'Third' 'First' 'First' 'Third' 'First' 'Third' 'Second' 'Third' 'Third'
'Third' 'Second' 'Third' 'Second' 'Third' 'Third' 'Third' 'Third' 'Third'
'Second' 'Third' 'Third' 'Third' 'Third' 'First' 'Second' 'Third' 'Third'
'Third' 'First' 'Third' 'Third' 'Third' 'First' 'Third' 'Third' 'Third'
'First' 'First' 'Second' 'Second' 'Third' 'Third' 'First' 'Third' 'Third'
'Third' 'Third' 'Third' 'Third' 'Third' 'First' 'Third' 'Third' 'Third'
'Third' 'Third' 'Third' 'Second' 'First' 'Third' 'Second' 'Third'
'Second' 'Second' 'First' 'Third' 'Third' 'Third' 'Third' 'Third' 'Third'
'Third' 'Third' 'Second' 'Second' 'Second' 'First' 'First' 'Third'
'First' 'Third' 'Third' 'Third' 'Third' 'Second' 'Second' 'Third' 'Third'
'Second' 'Second' 'Second' 'First' 'Third' 'Third' 'Third' 'First'
'Third' 'Third' 'Third' 'Third' 'Third' 'Second' 'Third' 'Third' 'Third'
'Third' 'First' 'Third' 'First' 'Third' 'First' 'Third' 'Third' 'Third'
'First' 'Third' 'Third' 'First' 'Second' 'Third' 'Third' 'Second' 'Third'
'Second' 'Third' 'First' 'Third' 'First' 'Third' 'Third' 'Second'
'Second' 'Third' 'Second' 'First' 'First' 'Third' 'Third' 'Third'
'Second' 'Third' 'Third' 'Third' 'Third' 'Third' 'Third' 'Third' 'Third'
'Third' 'First' 'Third' 'Second' 'Third' 'Second' 'Third' 'First' 'Third'
'Second' 'First' 'Second' 'Third' 'Second' 'Third' 'Third' 'First'
'Third' 'Second' 'Third' 'Second' 'Third' 'First' 'Third' 'Second'
'Third' 'Second' 'Third' 'Second' 'Second' 'Second' 'Second' 'Third'
'Third' 'Second' 'Third' 'Third' 'First' 'Third' 'Second' 'First'
'Second' 'Third' 'Third' 'First' 'Third' 'Third' 'Third' 'First' 'First'
'First' 'Second' 'Third' 'Third' 'First' 'First' 'Third' 'Second' 'Third'
'Third' 'First' 'First' 'First' 'Third' 'Second' 'First' 'Third' 'First'
'Third' 'Second' 'Third' 'Third' 'Third' 'Third' 'Third' 'Third' 'First'
'Third' 'Third' 'Third' 'Second' 'Third' 'First' 'First' 'Second' 'Third'
'Third' 'First' 'Third' 'First' 'First' 'First' 'Third' 'Third' 'Third'
'Second' 'Third' 'First' 'First' 'First' 'Second' 'First' 'First' 'First'
'Second' 'Third' 'Second' 'Third' 'Second' 'Second' 'First' 'First'
'Third' 'Third' 'Second' 'Second' 'Third' 'First' 'Third' 'Second'
'Third' 'First' 'Third' 'First' 'First' 'Third' 'First' 'Third' 'First'
'First' 'Third' 'First' 'Second' 'First' 'Second' 'Second' 'Second'
'Second' 'Second' 'Third' 'Third' 'Third' 'Third' 'First' 'Third' 'Third'
'Third' 'Third' 'First' 'Second' 'Third' 'Third' 'Third' 'Second' 'Third'
'Third' 'Third' 'Third' 'First' 'Third' 'Third' 'First' 'First' 'Third'
'Third' 'First' 'Third' 'First' 'Third' 'First' 'Third' 'Third' 'First'
'Third' 'Third' 'First' 'Third' 'Second' 'Third' 'Second' 'Third'
'Second' 'First' 'Third' 'Third' 'First' 'Third' 'Third' 'Third' 'Second'
'Second' 'Second' 'Third' 'Third' 'Third' 'Third' 'Third' 'Second'
'Third' 'Second' 'Third' 'Third' 'Third' 'Third' 'First' 'Second' 'Third'
'Third' 'Second' 'Second' 'Second' 'Third' 'Third' 'Third' 'Third'
'Third' 'Third' 'Third' 'Second' 'Second' 'Third' 'Third' 'First' 'Third'
'Second' 'Third' 'First' 'First' 'Third' 'Second' 'First' 'Second'
'Second' 'Third' 'Third' 'Second' 'Third' 'First' 'Second' 'First'
'Third' 'First' 'Second' 'Third' 'First' 'First' 'Third' 'Third' 'First'
'First' 'Second' 'Third' 'First' 'Third' 'First' 'Second' 'Third' 'Third'
'Second' 'First' 'Third' 'Third' 'Third' 'Third' 'Second' 'Second'
'Third' 'First' 'Second' 'Third' 'Third' 'Third' 'Third' 'Second' 'Third'
'Third' 'First' 'Third' 'First' 'First' 'Third' 'Third' 'Third' 'Third'
'First' 'First' 'Third' 'Third' 'First' 'Third' 'First' 'Third' 'Third'
'Third' 'Third' 'Third' 'First' 'First' 'Second' 'First' 'Third' 'Third'
'Third' 'Third' 'First' 'First' 'Third' 'First' 'Second' 'Third' 'Second'
'Third' 'First' 'Third' 'Third' 'First' 'Third' 'Third' 'Second' 'First'
'Third' 'Second' 'Second' 'Third' 'Third' 'Third' 'Third' 'Second'
'First' 'First' 'Third' 'First' 'First' 'Third' 'Third' 'Second' 'First'
'First' 'Second' 'Second' 'Third' 'Second' 'First' 'Second' 'Third'
'Third' 'Third' 'First' 'First' 'First' 'First' 'Third' 'Third' 'Third'
'Second' 'Third' 'Third' 'Third' 'Third' 'Third' 'Third' 'Third' 'Second'
'First' 'First' 'Third' 'Third' 'Third' 'Second' 'First' 'Third' 'Third'
'Second' 'First' 'Second' 'First' 'Third' 'First' 'Second' 'First'
'Third' 'Third' 'Third' 'First' 'Third' 'Third' 'Second' 'Third' 'Second'
'Third' 'Third' 'First' 'Second' 'Third' 'First' 'Third' 'First' 'Third'
'Third' 'First' 'Second' 'First' 'Third' 'Third' 'Third' 'Third' 'Third'
'Second' 'Third' 'Third' 'Second' 'Second' 'Third' 'First' 'Third'
'Third' 'Third' 'First' 'Second' 'First' 'Third' 'Third' 'First' 'Third'
'First' 'First' 'Third' 'Second' 'Third' 'Second' 'Third' 'Third' 'Third'
'First' 'Third' 'Third' 'Third' 'First' 'Third' 'First' 'Third' 'Third'
'Third' 'Second' 'Third' 'Third' 'Third' 'Second' 'Third' 'Third'
'Second' 'First' 'First' 'Third' 'First' 'Third' 'Third' 'Second'
'Second' 'Third' 'Third' 'First' 'Second' 'First' 'Second' 'Second'
'Second' 'Third' 'Third' 'Third' 'Third' 'First' 'Third' 'First' 'Third'
'Third' 'Second' 'Second' 'Third' 'Third' 'Third' 'First' 'First' 'Third'
'Third' 'Third' 'First' 'Second' 'Third' 'Third' 'First' 'Third' 'First'
'First' 'Third' 'Third' 'Third' 'Second' 'Second' 'First' 'First' 'Third'
'First' 'First' 'First' 'Third' 'Second' 'Third' 'First' 'Second' 'Third'
'Third' 'Second' 'Third' 'Second' 'Second' 'First' 'Third' 'Second'
'Third' 'Second' 'Third' 'First' 'Third' 'Second' 'Second' 'Second'
'Third' 'Third' 'First' 'Third' 'Third' 'First' 'First' 'First' 'Third'
'Third' 'First' 'Third' 'Second' 'First' 'Third' 'Second' 'Third' 'Third'
'Third' 'Second' 'Second' 'Third' 'Second' 'Third' 'First' 'Third'
'Third' 'Third' 'First' 'Third' 'First' 'First' 'Third' 'Third' 'Third'
'Third' 'Third' 'Second' 'Third' 'Second' 'Third' 'Third' 'Third' 'Third'
'First' 'Third' 'First' 'First' 'Third' 'Third' 'Third' 'Third' 'Third'
'Third' 'First' 'Third' 'Second' 'Third' 'First' 'Third' 'Second' 'First'
'Third' 'Third' 'Third' 'Second' 'Second' 'First' 'Third' 'Third' 'Third'
'First' 'Third' 'Second' 'First' 'Third' 'Third' 'Second' 'Third' 'Third'
'First' 'Third' 'Second' 'Third' 'Third' 'First' 'Third' 'First' 'Third'
'Third' 'Third' 'Third' 'Second' 'Third' 'First' 'Third' 'Second' 'Third'
'Third' 'Third' 'First' 'Third' 'Third' 'Third' 'First' 'Third' 'Second'
'First' 'Third' 'Third' 'Third' 'Third' 'Third' 'Second' 'First' 'Third'
'Third' 'Third' 'First' 'Second' 'Third' 'First' 'First' 'Third' 'Third'
'Third' 'Second' 'First' 'Third' 'Second' 'Second' 'Second' 'First'
'Third' 'Third' 'Third' 'First' 'First' 'Third' 'Second' 'Third' 'Third'
'Third' 'Third' 'First' 'Second' 'Third' 'Third' 'Second' 'Third' 'Third'
'Second' 'First' 'Third' 'First' 'Third']
Keeping this massive error in the textbook is essential, despite its size being rather annoying. It tells us a lot of information about the problem. When we try and pass a keyword argument of c, Pandas is expecting a series of numbers (which will correspond to gradient shifts in the cmap), a list of colors, or a Pandas Categorical column. To change our data to a list of colors, let’s convert our data into three different colors.
df.loc[(df.Pclass == "First"),'Pclass']="red"
df.loc[(df.Pclass == "Second"),'Pclass']="blue"
df.loc[(df.Pclass == "Third"),'Pclass']="green"
df.plot.scatter(x="Age", y="Fare", c="Pclass")
<AxesSubplot:xlabel='Age', ylabel='Fare'>
Now, our plots are all color coordinated. But I don’t like this. It doesn’t have a nice ledger to read. Instead, we should convert this data into a Categorical Column. To do this, let’s first get our data back into First, Second, and Third class format.
df.loc[(df.Pclass == "red"),'Pclass']="First"
df.loc[(df.Pclass == "blue"),'Pclass']="Second"
df.loc[(df.Pclass == "green"),'Pclass']="Third"
Now, let’s try this again by first converting Pclass into a Categorical type.
df['Pclass'] = df.Pclass.astype('category')
df.plot.scatter(x="Age", y="Fare", c="Pclass", cmap="viridis")
<AxesSubplot:xlabel='Age', ylabel='Fare'>
Now, like magic, we have precisely what we want to see. But we can do even better! Let’s say we don’t like the size of the nodes (points) on the graph. We want to see smaller nodes to distinguish better between the points. We can pass another keyword argument, s, which stands for size. This expects an integer.
df.plot.scatter(x="Age", y="Fare", c="Pclass", cmap="viridis", s=5)
<AxesSubplot:xlabel='Age', ylabel='Fare'>
To make it a bit easier to read, let’s also adjust the size a bit. We can do this by passing the keyword argument, figsize, that we saw above with pie chars.
df.plot.scatter(x="Age", y="Fare", c="Pclass", cmap="viridis", s=5, figsize=(15,5))
<AxesSubplot:xlabel='Age', ylabel='Fare'>
By now, you should have a good sense of how to create simple bar, pie, and scatter charts. In the next few notebooks, we will be looking at other ways of leveraging Pandas to produce visualizations, such as using plotly and social networks with networkx.