Data Visualization

Blake Tolman
6 min readMar 26, 2021

Data visualization is the discipline of trying to understand data by placing it in a visual context so that patterns, trends and correlations that might not otherwise be detected can be exposed.

Python offers multiple great graphing libraries that come packed with lots of different features. No matter if you want to create interactive, live or highly customized plots python has an excellent library for you.

Common python plotting libraries:

Matplotlib: Low level, provides a lot of freedom

Seaborn: high-level interface, great default styles

Plotly: Cretes interactive plots

Matplot

Matplotlib is a visually simplistic graphing library giving the user the freedom to have control over almost every detail. It works best at making basic graphs for initial exploratory data analysis. Understanding the basics of Matplot correlates over into other plotting libraries like seaborn as it is the foundational library that others can be made from. To begin to go over Matplot we can start by creating a sample data set of 100 values equally spaced from each other.

After preparing the the data, we can use matplotlib’s plot() function to create the plot with our data, legend() to add context information to the plot, and finally show() functions to output the plot .

In jupyter notebooks, you can use %matplotlib magic with inline to show plots inside the notebook or qt for external/interactive plots. Inline is recommended for most needs.

With the sample plot made labels can also be added on to it for context about the data with the following code:

plt.xlabel(“text”) / plt.ylabel(“text”) — Define labels for x and y axes.

plt.title(“text”) — Define the plot title.

These functions can be used with the .legend() function as we just saw above to add legend to the plot. The legend function takes an optional keyword argument loc that can be used to specify where in the figure the legend is to be drawn.

plt.legend(loc=1) : upper right corner

plt.legend(loc=2) : upper left corner

plt.legend(loc=3) : lower left corner

plt.legend(loc=4) : lower right corner

Now to incorporate all this together to achieve a new more detailed plot

While the single graph may be all that is needed there are also scenarios where you want to overlay two plots on one image. This is done through the use of Figures and Axes. Looking at the above image, a figure is a top level component that refers to the overall image space. Axes are added to the figure to define the area where data is plotted with the plot() function seen above. A figure can have a number of components like title(s) and legend(s) which may be used to further explain and customize the plot. Axes have ticks and labels providing a perspective to the plot. set_xlim(min,max) and set_ylim(min,max) are used to define the limits of axes in a plot.

Let’s see all of above in action with another plot. Here we declare a new figure space by calling .figure() method and use random data values to draw a line graph and a scatter plot using same axes i.e. draw plots on top of each other. We also set the limits of x and y dimensions and output the final plot.

Note: function above add_subplots(111) defines a new axes. This function took 3 arguments: number of rows (1), the number of columns (1) and the plot number (1), i.e. a single plot.

To wrap up a an overview of matplot, we can finish by creating new line styles. The functions shown below takes additional parameters: line color, linewidth, linestyle and marker etc. for customization of plots. To change the line width, we can use the linewidth or lw keyword argument. The line style can be selected using the linestyle or ls keyword arguments. Following plot summarizes different types of lines you can draw in matplotlib.

In the plots above, notice how to use ax.scatter() for generating a scatter plot and ax.plot() function for displaying a line plot. It is imperative that data in the right format and dimensions is passed to these functions to avoid any errors or unexpected behaviour in the output. Following is a list of other similar functions which can be readily used for visualizing data.

.plot() Line plot

.scatter() Scatter plot

.bar() Vertical bar graph

.barh() Horizontal bar graph

.axhline() Horizontal line across axes

.vline() Vertical line across axes

.stackplot() Stack plot

Seaborn

Seaborn is a data visualization library that makes it easy to create professional-quality statistical visualizations with only one or two lines of code. Seaborn also makes it really easy to modify the aesthetics of a plot, so that we can make sure all of our visualizations are eye-catching and easy to interpret, which isn’t always the case with Matplotlib. As previously mentioned, Seaborn is built off of Matplot. Whereas Matplotlib provides the basic functionality for creating plots and filling them with different kinds of shapes and colors, Seaborn takes this functionality a step farther by providing a bunch of ready-made mathematical visualizations.

Here a side by side comparison can be seen between the two. The left being Matplot and right side made from Seaborn:

Seabone is most useful in two particular areas — providing ready-made plots for statistical analysis, and making beautiful plots for presenting to others. Seaborn can make the same plot as Matplot except the format is slightly different.

Boxplot:

Boxplot Grouped by Categorical Variable:

Complex Boxplot with Nested Grouping:

Seaborn is a relatively user friendly library to use and and a common favorite of those performing data visualization. The same manipulation to the graphs seen in Matplot can also be performed in Seaborn for more customization.

Plotly:

Plotly is a great interactive plotting library, but it requires a more in depth knowledge to be used effectively. For the purposes of this blog we can display a graph to show its features, but will be excluding the intricacies of its formatting. In the example scenario seen below we are creating a graph overlaid on top of a live map. Each point on the map represents a well in Tanzania based on longitude and latitude with color showing the functionality (functional, functional needs repair, and non-functional) and color representing population size.

The map seen allows for the user to adjust the zoom and position of the map while also being able to click on each point on the map for the expanded details.

The use of plotly is great at making highly technical graphs with various levels of interactivity but requires an expanded knowledge on how to use it. Between Matplot, Seaborn and Plotly just about any graph can be made. However there are even more plotting libraries for use depending on the individual’s need. It all comes down to what fits the user needs and what is most comfortable to use. The use of plotting libraries is the first step in identifying trends in data.

--

--