10.
Graphing Network Data with Pandas

Dr. W.J.B. Mattingly
Smithsonian Data Science Lab and United States Holocaust Memorial Museum
August 2021

10.1. Covered in this Chapter

  1. How to Get Network Data from Pandas to NetworkX

  2. How to Graph the Data

  3. How to Customize the Graph

10.2. Getting the Data from Pandas to NetworkX

Pandas on its own cannot plot out network data. Instead, we must rely on two other libraries, NetworkX and Matplotlib. NetworkX is the standard Python library for working with networks. I have a forthcoming textbook, like this one, that walks users through NetworkX. Matplotlib is one of the standard plotting libraries. The purpose of this brief notebook, is to provide the code necessary for making Pandas work with NetworkX and Matplotlib to take networks stored in a Pandas DataFrame and transform the relationships into graphs.

import pandas as pd
import networkx as nx
import matplotlib.pyplot as plt

Let’s now load our data and see what it looks like.

df = pd.read_csv("data/network.csv")
df
source target
0 Tom Rose
1 Rose Rosita
2 Jerry Jeff
3 Jeff Larry
4 Carmen Carmen
5 Rosita Rosita
6 Larry Carmen
7 Larry Jerry

This is a pretty standard format for networks. We have two columns of data, a source, and a target. Imagine drawing a line to demonstrate networks, the source is where you start drawing the line and the target is where that line ends. This is known as force in network theory and is important for understanding the relationship between nodes, or individual points, in a network graph.

We can use NetworkX’s built in function from_pandas_edgelist() and get that data straight into an edgelist.

G= nx.from_pandas_edgelist(df, "source", "target")

10.3. Graphing the Data

And with just two more lines of code we can plot that data out.

nx.draw(G)
plt.show()
_images/04_02_graphing_networks_with_pandas_14_0.png

10.4. Customize the Graph

We have a problem with the image above, however, it is difficult to understand who the nodes represent. Let’s give them some labels.

nx.draw(G, with_labels=True)
plt.show()
_images/04_02_graphing_networks_with_pandas_17_0.png

Now that we have labels, we need to make them a bit easier to read. We can do this by changing the font color to “whitesmoke” and setting the background to gray. To achieve this we first need to create a fig object to which we will append a few attributes. Next, we draw the network graph and give it a font_color of our desire. Finally, we set the facecolor to gray and plot it.

fig = plt.figure()
nx.draw(G, with_labels=True, font_color="whitesmoke")
fig.set_facecolor('gray')
plt.show()
_images/04_02_graphing_networks_with_pandas_19_0.png

What if I wanted each node in our network to have an individual color? We can do that too by setting up a color map.

val = []
for i in range(len(G.nodes)):
    val.append(i)
nx.set_node_attributes(G, val, 'val')
fig = plt.figure()
nx.draw(G, with_labels=True, node_color=val, font_color="whitesmoke")
fig.set_facecolor('gray')
plt.show()
_images/04_02_graphing_networks_with_pandas_21_0.png