Best fit line python numpy


You can use the following basic syntax to plot a line of best fit in Python:

#find line of best fit
a, b = np.polyfit(x, y, 1)

#add points to plot
plt.scatter(x, y)

#add line of best fit to plot
plt.plot(x, a*x+b)

The following example shows how to use this syntax in practice.

Example 1: Plot Basic Line of Best Fit in Python

The following code shows how to plot a basic line of best fit in Python:

import numpy as np
import matplotlib.pyplot as plt

#define data
x = np.array([1, 2, 3, 4, 5, 6, 7, 8])
y = np.array([2, 5, 6, 7, 9, 12, 16, 19])

#find line of best fit
a, b = np.polyfit(x, y, 1)

#add points to plot
plt.scatter(x, y)

#add line of best fit to plot
plt.plot(x, a*x+b)        

Best fit line python numpy

Example 2: Plot Custom Line of Best Fit in Python

The following code shows how to create the same line of best fit as the previous example except with the following additions:

  • Customized colors for the points and the line of best fit
  • Customized style and width for the line of best fit
  • The equation of the fitted regression line displayed on the plot
import numpy as np
import matplotlib.pyplot as plt

#define data
x = np.array([1, 2, 3, 4, 5, 6, 7, 8])
y = np.array([2, 5, 6, 7, 9, 12, 16, 19])

#find line of best fit
a, b = np.polyfit(x, y, 1)

#add points to plot
plt.scatter(x, y, color='purple')

#add line of best fit to plot
plt.plot(x, a*x+b, color='steelblue', linestyle='--', linewidth=2)

#add fitted regression equation to plot
plt.text(1, 17, 'y = ' + '{:.2f}'.format(b) + ' + {:.2f}'.format(a) + 'x', size=14)

Best fit line python numpy

Feel free to place the fitted regression equation in whatever (x, y) coordinates you would like on the plot.

For this particular example, we chose (x, y) = (1, 17).

Additional Resources

The following tutorials explain how to fit various regression models in Python:

A Complete Guide to Linear Regression in Python
How to Perform Polynomial Regression in Python
How to Perform Quantile Regression in Python

In this Python tutorial, we will discuss How to plot the best-fit line in matplotlib in python, and we will also cover the following topics:

  • Best fit line
  • Matplotlib best fit line
  • Matplotlib best fit line using numpy.polyfit()
  • Matplotlib best fit line histogram
  • Matplotlib best fit curve
  • Matplotlib best fit line to scatter

The best fit line in a 2-dimensional graph refers to a line that defines the optimal relationship of the x-axis and y-axis coordinates of the data points plotted as a scatter plot on the graph.

The best fit line or optimal relationship can be achieved by minimizing the distances of the data points from the purposed line.

A linear equation represents a line mathematically. The normal equation of the line is as follow:

(A * x) + (B * y) + C = 0

  • Here, x and y are the variables that represent the x-axis and y-axis values of data points.
  • A and B are the coefficients of variable x and y, and C is the constant. Collectively, these are known as the parameters of a line which decides the line’s shape and position on the graph.

But, the most commonly used form of a line is the intercept-slope form, which is as follows:

y = (m * x) + c

  • Here, x and y are the variables that represent the x-axis and y-axis values of data points.
  • m is the coefficient of the variable x which represents the slope of the line on the graph. Slope is the parameter of the line that decides the angle of the line on the graph.
  • c is the constant value that represents the y-intercept of the line on the graph. Intercept is the parameter of the line that decides the position of the line on the graph.

We can convert a normal form to the slope-intercept form as follows:

(A * x) + (B * y) + C = 0

(B * y) = -C – (A * x)

y = (-(A * x) – C) / B

y = ((-A / B) * x) + (-C / B)

On comparing this equation with the slope-intercept form of a line.

We get, m = (-A / B) and c = (-C / B)

We will be using the slope-intercept form of the line throughout this post.

The most commonly used method to find the parameters of a line to best fit the given data points is the least square method in regression analysis.

The simple regression analysis is the method of specifying a relationship between a single numeric dependent variable (Here, y) and a numeric independent variable (Here, x).

Read: Matplotlib subplot tutorial

Matplotlib best fit line

We can plot a line that fits best to the scatter data points in matplotlib. First, we need to find the parameters of the line that makes it the best fit.

We will be doing it by applying the vectorization concept of linear algebra.

First, let’s understand the algorithm that we will be using to find the parameters of the best fit line.

The equation of the line is: y = (m * x) + c

Let’s change this into y = theta0 + (theta1 * x); Here, theta0 and theta1 are the parameters representing the c (intercept) and m (slope) respectively of the line.

Now, let’s change this equation into the vector form:

  • Let, N be the number of data points given.
  • Let, the y be the column vector of N rows where each row represents the y-coordinate of each data point.
  • Let, theta be the column vector of 2 rows with each parameter of the line (theta0 and theta1) be as the row value of the vector.
  • Let, X be the matrix of 2XN where 1st column consists of value 1 for each row and 2nd column consists of the x-coordinate values of the N data points.

Now, the equation in vector form will be like this: y = X . theta

We can calculate and get the optimal parameter values (theta0 and theta1) for the given data points by using the least square method equation in vector form, that is as follows:

theta = (XT . X)-1 . (XT . y); Here, XT is the transpose of the matrix X, and (XT . X)-1 is the inverse of the resulted matrix from (XT . X)

Now, let’s implement this algorithm using python and plot the resulted line.

# Importing the necessary libraries
from matplotlib import pyplot as plt
import numpy as np

# Preparing the data to be computed and plotted
dt = np.array([
          [0.05, 0.11],
          [0.13, 0.14],
          [0.19, 0.17],
          [0.24, 0.21],
          [0.27, 0.24],
          [0.29, 0.32],
          [0.32, 0.30],
          [0.36, 0.39],
          [0.37, 0.42],
          [0.40, 0.40],
          [0.07, 0.09],
          [0.02, 0.04],
          [0.15, 0.19],
          [0.39, 0.32],
          [0.43, 0.48],
          [0.44, 0.41],
          [0.47, 0.49],
          [0.50, 0.57],
          [0.53, 0.59],
          [0.57, 0.51],
          [0.58, 0.60]
])

# Preparing X and y data from the given data
x = dt[:, 0].reshape(dt.shape[0], 1)
X = np.append(x, np.ones((dt.shape[0], 1)), axis=1)
y = dt[:, 1].reshape(dt.shape[0], 1)

# Calculating the parameters using the least square method
theta = np.linalg.inv(X.T.dot(X)).dot(X.T).dot(y)

print(f'The parameters of the line: {theta}')

# Now, calculating the y-axis values against x-values according to
# the parameters theta0 and theta1
y_line = X.dot(theta)

# Plotting the data points and the best fit line
plt.scatter(x, y)
plt.plot(x, y_line, 'r')
plt.title('Best fit line using regression method')
plt.xlabel('x-axis')
plt.ylabel('y-axis')

plt.show()

Best fit line python numpy
Matplotlib best fit line

Read: Matplotlib plot bar chart

Matplotlib best fit line using numpy.polyfit()

We can plot the best fit line to given data points using the numpy.polyfit() function.

This function is a pre-defined function that takes 3 mandatory arguments as x-coordinate values (as an iterable), y-coordinate values (as an iterable), and degree of the equation (1 for linear, 2 for quadratic, 3 for cubic, …).

The syntax is as follows:

numpy.polyfit(x, y, degree)

Now, let’s take a look at the example and understand the implementation of the function.

# Importing the necessary libraries
from matplotlib import pyplot as plt
import numpy as np

# Preparing the data to be computed and plotted
dt = np.array([
          [0.05, 0.11],
          [0.13, 0.14],
          [0.19, 0.17],
          [0.24, 0.21],
          [0.27, 0.24],
          [0.29, 0.32],
          [0.32, 0.30],
          [0.36, 0.39],
          [0.37, 0.42],
          [0.40, 0.40],
          [0.07, 0.09],
          [0.02, 0.04],
          [0.15, 0.19],
          [0.39, 0.32],
          [0.43, 0.48],
          [0.44, 0.41],
          [0.47, 0.49],
          [0.50, 0.57],
          [0.53, 0.59],
          [0.57, 0.51],
          [0.58, 0.60]
])

# Preparing X and y from the given data
X = dt[:, 0]
y = dt[:, 1]

# Calculating parameters (Here, intercept-theta1 and slope-theta0)
# of the line using the numpy.polyfit() function
theta = np.polyfit(X, y, 1)

print(f'The parameters of the line: {theta}')

# Now, calculating the y-axis values against x-values according to
# the parameters theta0, theta1 and theta2
y_line = theta[1] + theta[0] * X

# Plotting the data points and the best fit line
plt.scatter(X, y)
plt.plot(X, y_line, 'r')
plt.title('Best fit line using numpy.polyfit()')
plt.xlabel('x-axis')
plt.ylabel('y-axis')

plt.show()

Best fit line python numpy
Matplotlib best fit line using numpy.polyfit()

Read: What is matplotlib inline

Matplotlib best fit line histogram

We can fit the distribution of a histogram and plot that curve/line in python.

We can use the library scipy in python, the steps to do the task are given below:

  • First, we can call the function scipy.stats.norm.fit() with the parameter data to plot the histogram, to get the statistics of the data like mean and standard deviation.
  • And then, we will call the function scipy.stats.norm.pdf() with the parameters x (bins for histogram), mean of the data, and standard deviation of the data, to get the y-values against the given data for the best fit curve.
  • Then, we can plot the curve with the histogram.

Let’s follow the above

# Importing the necessary libraries
from matplotlib import pyplot as plt
import numpy as np
import scipy.stats

dt = np.random.normal(0, 1, 1000)

# Plotting the sample data on histogram and getting the bins
_, bins, _ = plt.hist(dt, 25, density=1, alpha=0.5)


# Getting the mean and standard deviation of the sample data dt
mn, std = scipy.stats.norm.fit(dt)


# Getting the best fit curve y values against the x data, bins
y_curve = scipy.stats.norm.pdf(bins, mn, std)

# Plotting the best fit curve
plt.plot(bins, y_curve, 'k')

plt.title('Best fit curve for histogram')
plt.xlabel('x-axis')
plt.ylabel('y-axis')
plt.show()

Best fit line python numpy
Matplotlib best fit line histogram

Read: Python plot multiple lines using Matplotlib

Matplotlib best fit curve

We can plot a curve that fits best to the given data points in the python if the data points when scatter plotted on the graph show some upper degree curve trend (quadratic, cubic, …).

We can use the numpy.polyfit() function. This function actually returns the best fit curve for any polynomial trend. As we have discussed this function in the earlier topic, so let’s practice an example for better understanding:

# Importing the necessary libraries
from matplotlib import pyplot as plt
import numpy as np

# Preparing the data to be computed and plotted
dt = np.array([
          [0.5, 0.28],
          [0.5, 0.29],
          [0.5, 0.33],
          [0.7, 0.21],
          [0.7, 0.23],
          [0.7, 0.26],
          [0.8, 0.24],
          [0.8, 0.25],
          [0.8, 0.29],
          [0.9, 0.28],
          [0.9, 0.30],
          [0.9, 0.31],
          [1.0, 0.30],
          [1.0, 0.33],
          [1.0, 0.35]
])

# Preparing X and y from the given data
X = dt[:, 0]
y = dt[:, 1]

# Calculating parameters (theta0, theta1 and theta2)
# of the 2nd degree curve using the numpy.polyfit() function
theta = np.polyfit(X, y, 2)

print(f'The parameters of the curve: {theta}')

# Now, calculating the y-axis values against x-values according to
# the parameters theta0, theta1 and theta2
y_line = theta[2] + theta[1] * pow(X, 1) + theta[0] * pow(X, 2)

# Plotting the data points and the best fit 2nd degree curve
plt.scatter(X, y)
plt.plot(X, y_line, 'r')
plt.title('2nd degree best fit curve using numpy.polyfit()')
plt.xlabel('x-axis')
plt.ylabel('y-axis')
plt.show()

Best fit line python numpy
Matplotlib best fit curve

Read: Matplotlib plot a line

Matplotlib best fit line to scatter

We have already discussed two different methods, for getting the best fit line to scatter. So, let’s do another method to get the best fit line.

We can use the pre-defined linear regression model in sklearn librery’s/module’s linear_model sub-module to get the best fit line for the given data points. The steps to create a model and get the best fit line parameters are as follows:

  • First, import the LinearRegression from the sklearn.linear_model sub-module.
  • Then, create a new model using LinearRegression(), lets say model = LinearRegression().
  • And, fit the given data to the created model using model.fit() method that takes 2 arguments x and y.
  • And then, get the y values for the predicted best fit line using the function model.predict() against the x values given in the function as the parameter.
  • Now, we can plot the resulted y values with the x values as a line plot that gives the best fit line for the given data points.
# Importing the necessary libraries
from matplotlib import pyplot as plt
import numpy as np

# Importing the sklearn's linear_model,
# a pre-defined linear regression model
from sklearn.linear_model import LinearRegression


# Preparing the data to be computed and plotted
dt = np.array([
          [0.05, 0.11],
          [0.13, 0.14],
          [0.19, 0.17],
          [0.24, 0.21],
          [0.27, 0.24],
          [0.29, 0.32],
          [0.32, 0.30],
          [0.36, 0.39],
          [0.37, 0.42],
          [0.40, 0.40],
          [0.07, 0.09],
          [0.02, 0.04],
          [0.15, 0.19],
          [0.39, 0.32],
          [0.43, 0.48],
          [0.44, 0.41],
          [0.47, 0.49],
          [0.50, 0.57],
          [0.53, 0.59],
          [0.57, 0.51],
          [0.58, 0.60]
])

# Preparing X and y from the given data
X = dt[:, 0].reshape(len(dt), 1)
y = dt[:, 1].reshape(len(dt), 1)

# Creatoing a linear regression model and fitting the data to the model
model = LinearRegression()
model.fit(X, y)

# Now, predicting the y values according to the model
y_line = model.predict(X)

# Printing thr coffecient/parameter of the resulted line
print(f'The parameters of the line: {model.coef_}')

# Plotting the data points and the best fit line
plt.scatter(X, y)
plt.plot(X, y_line, 'r')
plt.title('Best fit line using linear regression model from sklearn')
plt.xlabel('x-axis')
plt.ylabel('y-axis')

plt.show()

Best fit line python numpy
Matplotlib best fit line to scatter

You may also like to read the following tutorials.

  • How to install matplotlib python
  • Matplotlib subplots_adjust
  • Matplotlib scatter marker
  • Matplotlib log log plot
  • What is Matplotlib and how to use it in Python
  • modulenotfounderror: no module named ‘matplotlib’
  • Matplotlib plot_date
  • Matplotlib dashed line
  • Matplotlib savefig blank image

In this Python tutorial, we have discussed, How to plot the best-fit line in matplotlib in python, and we have also covered the following topics:

  • Best fit line
  • Matplotlib best fit line
  • Matplotlib best fit line using numpy.polyfit()
  • Matplotlib best fit line histogram
  • Matplotlib best fit curve
  • Matplotlib best fit line to scatter

Best fit line python numpy

Python is one of the most popular languages in the United States of America. I have been working with Python for a long time and I have expertise in working with various libraries on Tkinter, Pandas, NumPy, Turtle, Django, Matplotlib, Tensorflow, Scipy, Scikit-Learn, etc… I have experience in working with various clients in countries like United States, Canada, United Kingdom, Australia, New Zealand, etc. Check out my profile.

How do you find the line of best fit in Python?

Use numpy..
x = np. array([1, 3, 5, 7]).
y = np. array([ 6, 3, 9, 5 ]).
m, b = np. polyfit(x, y, 1) m = slope, b = intercept..
plt. plot(x, y, 'o') create scatter plot..
plt. plot(x, m*x + b) add line of best fit..

How do you fit a graph in Python?

data = dataframe. values. ... .
x, y = data[:, 4], data[:, -1] # curve fit..
popt, _ = curve_fit(objective, x, y) # summarize the parameter values..
print('y = %.5f * x + %.5f' % (a, b)) # plot input vs output..
pyplot. scatter(x, y) ... .
x_line = arange(min(x), max(x), 1) ... .
y_line = objective(x_line, a, b).

How do I find the line of best fit?

The equation of a line of best fit can be represented as y=mx+b y = m x + b , where m is the slope and b is the y-intercept.

How do you fit a regression line in Python?

How to plot a linear regression line on a scatter plot in Python.
x = np. array([1, 3, 5, 7]) generate data. y = np. array([ 6, 3, 9, 5 ]).
plt. plot(x, y, 'o') create scatter plot..
m, b = np. polyfit(x, y, 1) m = slope, b=intercept..
plt. plot(x, m*x + b) add line of best fit..