Curriculum
Course: Data Science
Login

Curriculum

Data Science

Text lesson

Linear Regression Using One Explanatory Variable

In this example, we will use Linear Regression to predict Calorie Burnage based on Average Pulse.

Example

import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats

full_health_data = pd.read_csv(“data.csv”, header=0, sep=“,”)

x = full_health_data[“Average_Pulse”]
y = full_health_data [“Calorie_Burnage”]

slope, intercept, r, p, std_err = stats.linregress(x, y)

def myfunc(x):
 return slope * x + intercept

mymodel = list(map(myfunc, x))

plt.scatter(x, y)
plt.plot(x, slope * x + intercept)
plt.ylim(ymin=0, ymax=2000)
plt.xlim(xmin=0, xmax=200)
plt.xlabel(“Average_Pulse”)
plt.ylabel (“Calorie_Burnage”)
plt.show()

Example Explained:

  1. Import the necessary modules: Pandas, matplotlib, and Scipy.
  2. Isolate Average_Pulse as xx and Calorie Burnage as yy.
  3. Calculate key values using: slope, intercept, r, p, std_err = stats.linregress(x, y).
  4. Create a function that uses the slope and intercept to return a new value, indicating where the corresponding xx value will fall on the yy-axis.
  5. Apply the function to each value in the xx array, resulting in a new array with updated yy-axis values: mymodel = list(map(myfunc, x)).
  6. Plot the original scatter plot: plt.scatter(x, y).
  7. Plot the linear regression line: plt.plot(x, mymodel).
  8. Set the axis’s maximum and minimum values.
  9. Label the axes as “Average_Pulse” and “Calorie_Burnage”.

Output:

img_tryit_lr_least_square