However, when we plot Duration and Calorie Burnage, the R-Squared value increases. In this case, the data points are much closer to the linear regression line.
Here is the Python code:
import pandas as pd import matplotlib.pyplot as plt from scipy import stats full_health_data = pd.read_csv(“data.csv”, header=0, sep=“,”) x = full_health_data[“Duration”] y = full_health_data [“Calorie_Burnage”] slope, intercept, r, p, std_err = stats.linregress(x, y) def myfunc(x): return slope * x + intercept mymodel = list(map(myfunc, x)) print(mymodel) plt.scatter(x, y) plt.plot(x, mymodel) plt.ylim(ymin=0, ymax=2000) plt.xlim(xmin=0, xmax=200) plt.xlabel(“Duration”) plt.ylabel (“Calorie_Burnage”) plt.show() |
We can summarize the linear regression function with Average Pulse as the explanatory variable as follows: