Beginners guide to Matplotlib
Matplotlib is the most popular visualization package of python. It is very flexible and agile. It provides an interactive and comprehensive way to visualize our data. From scatterplot to histograms matplotlib provides a wide variety of ways to present our findings with a lot of customization. This library is very useful when you want to create really attractive and informative charts.
Histogram is another type of plot which is also one of the most used plots in data science. An example of histogram is shown below:
“Matplotlib is a Python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms. Matplotlib can be used in Python scripts, the Python and IPython shells, the Jupyter notebook, web application servers, and four graphical user interface toolkits.”
You can design all kinds of charts using matplotlib. We will be discussing some popular kinds of charts in this article. This article contains detailed information on starting with matplotlib.
Installing Matplotlib
python -m pip install -U pip python -m pip install -U matplotlib
Now to import matplotlib in your python program follow this:
import matplotlib.pyplot as plt import numpy as np
Now a simple example of a chart using matplotlib is:
import matplotlib.pyplot as pltyear = [1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005]car = [2, 4, 6, 8, 9, 10, 12, 13]plt.plot(year, car)plt.show()
Another main plot is scatter plot. you can draw scatter plot by simply doing this in the above code:
import matplotlib.pyplot as pltyear = [1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005]car = [2, 4, 6, 8, 9, 10, 12, 13]plt.scatter(year, car)plt.show()
import matplotlib.pyplot as plt
plt.plot([1,2,3,4,5], [1,2,3,4,10], 'go', label='GreenDots')
plt.plot([1,2,3,4,5], [2,3,4,5,11], 'b*', label='Bluestars')
plt.title('A Simple Scatterplot')
plt.xlabel('X')
plt.ylabel('Y')
plt.legend(loc='best') # legend text comes from the plot's label parameter.
plt.show()
import matplotlib.pyplot as plt
# Create Figure and Subplots
fig, (ax1, ax2) = plt.subplots(1,2, figsize=(10,4), sharey=True, dpi=120)
# Plot
ax1.plot([1,2,3,4,5], [1,2,3,4,10], 'go') # greendots
ax2.plot([1,2,3,4,5], [2,3,4,5,11], 'b*') # bluestart
# Title, X and Y labels, X and Y Lim
ax1.set_title('Scatterplot Greendots'); ax2.set_title('Scatterplot Bluestars')
ax1.set_xlabel('X'); ax2.set_xlabel('X') # x label
ax1.set_ylabel('Y'); ax2.set_ylabel('Y') # y label
ax1.set_xlim(0, 6) ; ax2.set_xlim(0, 6) # x axis limits
ax1.set_ylim(0, 12); ax2.set_ylim(0, 12) # y axis limits
# ax2.yaxis.set_ticks_position('none')
plt.tight_layout()
plt.show()
A lined plot can also be created in similar ways.
Also, multi-colored and multi styles can be obtained like below:
import matplotlib.pyplot as plt
import numpy as np
# plt.style.use('seaborn-notebook')
plt.figure(figsize=(10,7), dpi=80)
X = np.linspace(0, 2*np.pi, 1000)
sine = plt.plot(X,np.sin(X)); cosine = plt.plot(X,np.cos(X))
sine_2 = plt.plot(X,np.sin(X+.5)); cosine_2 = plt.plot(X,np.cos(X+.5))
plt.gca().set(ylim=(-1.25, 1.5), xlim=(-.5, 7))
plt.title('Custom Legend Example', fontsize=18)
# Modify legend
plt.legend([sine[0], cosine[0], sine_2[0], cosine_2[0]], # plot items
['sine curve', 'cosine curve', 'sine curve 2', 'cosine curve 2'],
frameon=True, # legend border
framealpha=1, # transparency of border
ncol=2, # num columns
shadow=True, # shadow on
borderpad=1, # thickness of border
title='Sines and Cosines') # title
plt.show()
We can customize the graph by adding plots arrows texts as below:
import matplotlib.pyplot as plt
import numpy as np
# Texts, Arrows and Annotations Example
# ref: https://matplotlib.org/users/annotations_guide.html
plt.figure(figsize=(14,7), dpi=120)
X = np.linspace(0, 8*np.pi, 1000)
sine = plt.plot(X,np.sin(X), color='tab:blue');
# 1. Annotate with Arrow Props and bbox
plt.annotate('Peaks', xy=(90/57.2985, 1.0), xytext=(90/57.2985, 1.5),
bbox=dict(boxstyle='square', fc='green', linewidth=0.1),
arrowprops=dict(facecolor='green', shrink=0.01, width=0.1),
fontsize=12, color='white', horizontalalignment='center')
# 2. Texts at Peaks and Troughs
for angle in [440, 810, 1170]:
plt.text(angle/57.2985, 1.05, str(angle) + "\ndegrees", transform=plt.gca().transData, horizontalalignment='center', color='green')
for angle in [270, 630, 990, 1350]:
plt.text(angle/57.2985, -1.3, str(angle) + "\ndegrees", transform=plt.gca().transData, horizontalalignment='center', color='red')
plt.gca().set(ylim=(-2.0, 2.0), xlim=(-.5, 26))
plt.title('Annotations and Texts Example', fontsize=18)
plt.show()
It looks like below:
Subplots can also be added in the plots. It can be added in the following ways:
# Plot inside a plot
plt.style.use('seaborn-whitegrid')
fig, ax = plt.subplots(figsize=(10,6))
x = np.linspace(-0.50, 1., 1000)
# Outer Plot
ax.plot(x, x**2)
ax.plot(x, np.sin(x))
ax.set(xlim=(-0.5, 1.0), ylim=(-0.5,1.2))
fig.tight_layout()
# Inner Plot
inner_ax = fig.add_axes([0.2, 0.55, 0.35, 0.35]) # x, y, width, height
inner_ax.plot(x, x**2)
inner_ax.plot(x, np.sin(x))
inner_ax.set(title='Zoom In', xlim=(-.2, .2), ylim=(-.01, .02),
yticks = [-0.01, 0, 0.01, 0.02], xticks=[-0.1,0,.1])
ax.set_title("Plot inside a Plot", fontsize=20)
plt.show()
mpl.rcParams.update(mpl.rcParamsDefault) # reset to defaults
Bubble plot is also one of the most popular plots. The data in the plot below is taken from an online page. It looks like below:
# Import data
import pandas as pd
midwest = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/midwest_filter.csv")
# Plot
fig = plt.figure(figsize=(14, 9), dpi= 80, facecolor='w', edgecolor='k')
colors = plt.cm.tab20.colors
categories = np.unique(midwest['category'])
for i, category in enumerate(categories):
plt.scatter('area', 'poptotal', data=midwest.loc[midwest.category==category, :], s='dot_size', c=colors[i], label=str(category), edgecolors='black', linewidths=.5)
# Legend for size of points
for dot_size in [100, 300, 1000]:
plt.scatter([], [], c='k', alpha=0.5, s=dot_size, label=str(dot_size) + ' TotalPop')
plt.legend(loc='upper right', scatterpoints=1, frameon=False, labelspacing=2, title='Saravana Stores', fontsize=8)
plt.title("Bubble Plot of PopTotal vs Area\n(color: 'category' - a categorical column in midwest)", fontsize=18)
plt.xlabel('Area', fontsize=16)
plt.ylabel('Poptotal', fontsize=16)
plt.show()
Histogram is another type of plot which is also one of the most used plots in data science. An example of histogram is shown below:
import pandas as pd
# Setup the subplot2grid Layout
fig = plt.figure(figsize=(10, 5))
ax1 = plt.subplot2grid((2,4), (0,0))
ax2 = plt.subplot2grid((2,4), (0,1))
ax3 = plt.subplot2grid((2,4), (0,2))
ax4 = plt.subplot2grid((2,4), (0,3))
ax5 = plt.subplot2grid((2,4), (1,0), colspan=2)
ax6 = plt.subplot2grid((2,4), (1,2))
ax7 = plt.subplot2grid((2,4), (1,3))
# Input Arrays
n = np.array([0,1,2,3,4,5])
x = np.linspace(0,5,10)
xx = np.linspace(-0.75, 1., 100)
# Scatterplot
ax1.scatter(xx, xx + np.random.randn(len(xx)))
ax1.set_title("Scatter Plot")
# Step Chart
ax2.step(n, n**2, lw=2)
ax2.set_title("Step Plot")
# Bar Chart
ax3.bar(n, n**2, align="center", width=0.5, alpha=0.5)
ax3.set_title("Bar Chart")
# Fill Between
ax4.fill_between(x, x**2, x**3, color="steelblue", alpha=0.5);
ax4.set_title("Fill Between");
# Time Series
dates = pd.date_range('2018-01-01', periods = len(xx))
ax5.plot(dates, xx + np.random.randn(len(xx)))
ax5.set_xticks(dates[::30])
ax5.set_xticklabels(dates.strftime('%Y-%m-%d')[::30])
ax5.set_title("Time Series")
# Box Plot
ax6.boxplot(np.random.randn(len(xx)))
ax6.set_title("Box Plot")
# Histogram
ax7.hist(xx + np.random.randn(len(xx)))
ax7.set_title("Histogram")
fig.tight_layout()
We can also draw plots with multiple axis. It can be done in the following ways. Here data is taken from a GitHub account of one popular articles writer:
# Import Data
df = pd.read_csv("https://github.com/selva86/datasets/raw/master/economics.csv")
x = df['date']; y1 = df['psavert']; y2 = df['unemploy']
# Plot Line1 (Left Y Axis)
fig, ax1 = plt.subplots(1,1,figsize=(16,7), dpi= 80)
ax1.plot(x, y1, color='tab:red')
# Plot Line2 (Right Y Axis)
ax2 = ax1.twinx() # instantiate a second axes that shares the same x-axis
ax2.plot(x, y2, color='tab:blue')
# Just Decorations!! -------------------
# ax1 (left y axis)
ax1.set_xlabel('Year', fontsize=20)
ax1.set_ylabel('Personal Savings Rate', color='tab:red', fontsize=20)
ax1.tick_params(axis='y', rotation=0, labelcolor='tab:red' )
# ax2 (right Y axis)
ax2.set_ylabel("# Unemployed (1000's)", color='tab:blue', fontsize=20)
ax2.tick_params(axis='y', labelcolor='tab:blue')
ax2.set_title("Personal Savings Rate vs Unemployed: Plotting in Secondary Y Axis", fontsize=20)
ax2.set_xticks(np.arange(0, len(x), 60))
ax2.set_xticklabels(x[::60], rotation=90, fontdict={'fontsize':10})
plt.show()
Post a Comment