Curriculum
Course: Data Science
Login

Curriculum

Data Science

Text lesson

Visual Example of a High R – Squared Value (0.79)

However, when we plot Duration and Calorie Burnage, the R-Squared value increases. In this case, the data points are much closer to the linear regression line.

img_lr_high_r

Here is the Python code:

Example

import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats

full_health_data = pd.read_csv(“data.csv”, header=0, sep=“,”)

x = full_health_data[“Duration”]
y = full_health_data [“Calorie_Burnage”]

slope, intercept, r, p, std_err = stats.linregress(x, y)

def myfunc(x):
 return slope * x + intercept

mymodel = list(map(myfunc, x))

print(mymodel)

plt.scatter(x, y)
plt.plot(x, mymodel)
plt.ylim(ymin=0, ymax=2000)
plt.xlim(xmin=0, xmax=200)
plt.xlabel(“Duration”)
plt.ylabel (“Calorie_Burnage”)

plt.show()

Summary – Predicting Calorie_Burnage with Average_Pulse

We can summarize the linear regression function with Average Pulse as the explanatory variable as follows:

  • The coefficient is 0.3296, indicating that Average Pulse has a small effect on Calorie Burnage.
  • The high P-value (0.824) suggests that we cannot conclude a relationship between Average Pulse and Calorie Burnage.
  • The R-Squared value of 0 means that the linear regression line does not fit the data well.