To create a scatter plot, specify the kind
argument as 'scatter'
.
A scatter plot requires both an x-axis and a y-axis.
In the example below, we’ll use “Duration” for the x-axis and “Calories” for the y-axis, by including the x
and y
arguments as follows:x = 'Duration', y = 'Calories'
.
import pandas as pd import matplotlib.pyplot as plt df = pd.read_csv(‘data.csv’) df.plot(kind = ‘scatter’, x = ‘Duration’, y = ‘Calories’) plt.show() |
Note: In the previous example, we found a correlation of 0.922721 between “Duration” and “Calories,” which led to the conclusion that a longer duration results in more calories burned. After examining the scatter plot, I agree with this conclusion. |
Let’s create another scatter plot, this time for columns with a weak relationship, such as “Duration” and “Maxpulse,” which have a correlation of 0.009403.
A scatter plot where there is no relationship between the columns.
import pandas as pd import matplotlib.pyplot as plt df = pd.read_csv(‘data.csv’) df.plot(kind = ‘scatter’, x = ‘Duration’, y = ‘Maxpulse’) plt.show() |