Introduction

Data visualization is at the heart of data science! It is an essential task in data exploration and analysis. Making the proper visualization is vital to understand the data, uncover pattern and communicate insights.

Mathplotlib is a popular and widely used python plotting library. It is possibly the easiest way to plot data in python. It also provides some interative features such as zoom, pan and update. The functionality of matplotlib can also be extended with many third party packages such as Cartopy, Seaborn. Matplotlib is very powerful for creating aesthetics and publication quality plots but the figures are usually static.

Plotly is a python library for interactive plotting. The significance of interactive data visualization is apparent when analyzing large datasets with numerous features. Another advantage of plotly over matplotlib is that aestheically pleaseing plots can be created with few lines of codes. With plotly, over 40 beautiful interactive web-based visualizations can be displayed in jupyter notebook or saved to HTML files.

This notebook provides a code-base examples of how to create interactive plotting using plotly.

The hilarious image below describes some fundamental types of plots for data visualization

Dataset

Three datasests are employed in this to demonstrate the different plot types.

We will be using the insitu snow depth data collected during the SnowEx 2020 Intensive Operation Period (IOP) in Grand Mesa, Colorado. Snow depth was measured using one of three instruments - Magnaprobe, Mesa 2, or pit ruler. Pit ruler data were collected from 150 snow pits identified for the Grand Mesa IOP. Check the SnowEx20 Depth Probe Landing Page and the User’s Guide for more info.

We will also use the gapminder dataset. The third data will be scraped from the wikipedia.

Prep Data for Analysis

The data has 37921 records of snow depth and 13 columns. Let's check the data types.

The snow depth data measurement of 150 snow pits measured with pit ruler. The other two mearement tool were used to collect snow depth along spiral tracks moving outwards from snow pit location. Let's check the number of records for each tools.

It appears there are 148 snow pits not 150. well, nsidc platforms says there are 150. Let's select the Pit Ruler (PR) records

Let's see what plotting with the plot method of dataframe and matplotlib look like. Let's recall the types of data visualization charts

As we can see in the above plots, there is no way to interactively check the value of each depth plotted. At least four lines of codes are required to produce the plots. Let's see what we can do with plotly

Simple

Scatter Plots

A scatter plot is a type of plot or mathematical diagram using Cartesian coordinates to display values for typically two variables for a set of data. If the points are coded, one additional variable can be displayed [1]. If the points are coded (color/shape/size), one additional variable can be displayed.

As we can from the cell above, an aesthetic interactive plot is produced with just two lines of code. We can add also change the hover name

Bar Charts

A bar chart or bar graph is a chart or graph that presents categorical data with rectangular bars with heights or lengths proportional to the values that they represent. A bar graph shows comparisons among discrete categories. One axis of the chart shows the specific categories being compared, and the other axis represents a measured value [1]

Let's visualize the growth in population of Nigeria over time

Line Plots

A type of graph that shows the relationship between two variables with a line that connects a series of successive data points. It is similar to a scatter plot except that the measurement points are ordered (typically by their x-axis value) and joined with straight line segments. A line chart is often used to visualize a trend in data over intervals of time – a time series – thus the line is often drawn chronologically. [1]

There are two unique countries in Oceania - Australia and New Zealand! Let's compare the life expectancy of these countries over the years

Pie Charts

A pie chart is a circular statistical chart, which is divided into sectors to illustrate numerical proportion. [1]

Colormaps

Distribution

Histograms

A histogram is an approximate representation of the distribution of numerical data.

As seen from above, Snowdepth distribution is symmetrically. Histogram can also be used to show the count of categoical feature.

MP tool was used to record most points outward of the snow Pits (148) where PR measurements was recorded.

Boxplots

A box plot is a statistical representation of numerical data through their quartiles. The ends of the box represent the lower and upper quartiles, while the median (second quartile) is marked by a line inside the box.

If we hover on the above plot, we see that the max, median and mean values of snowdepth are 260cm, 96cm, 17cm. We can plot the distribution of measurement by each measurement tools by passing the column of interest as the argument of x

Violin Plots

A violin plot is a statistical representation of numerical data. Violin plots are similar to box plots, except that they also show the probability density of the data at different values, usually smoothed by a kernel density estimator.

Density Heatmap

3D Maps

3D Scatter Plots

3D scatter plots are used to plot data points on three axes in the attempt to show the relationship between three variables. Here we will show relationships betwen life expectancy, population and gdp per capital for ocean african counrties

3D Line Plots

Map Plot

We will be putting the total population of all the countries on the map. Let's find the sum of population of the country data

Map Scatter Plot

References

  1. “Bar Chart.” Wikipedia: The Free Encyclopedia, Wikimedia Foundation, Inc., 3 June 2021, en.wikipedia.org/wiki/Bar_chart. Accessed 6 Aug. 2021.