Hướng dẫn implementation of linear regression in python - thực hiện hồi quy tuyến tính trong python

Bài viết này thảo luận về những điều cơ bản của hồi quy tuyến tính và việc thực hiện nó trong ngôn ngữ lập trình Python. Hồi quy tuyến là một phương pháp thống kê để mô hình hóa mối quan hệ giữa một biến phụ thuộc với một tập hợp các biến độc lập nhất định.
Linear regression is a statistical method for modeling relationships between a dependent variable with a given set of independent variables.

Lưu ý: Trong bài viết này, chúng tôi gọi các biến phụ thuộc là phản hồi và các biến độc lập là các tính năng để đơn giản. Theo thứ tự để cung cấp sự hiểu biết cơ bản về hồi quy tuyến tính, chúng tôi bắt đầu với phiên bản cơ bản nhất của hồi quy tuyến tính, tức là hồi quy tuyến tính đơn giản. & NBSP; In this article, we refer to dependent variables as responses and independent variables as features for simplicity.
In order to provide a basic understanding of linear regression, we start with the most basic version of linear regression, i.e. Simple linear regression.

Hồi quy tuyến tính cơ bản

Hồi quy tuyến tính đơn giản là một cách tiếp cận để dự đoán phản hồi bằng một tính năng duy nhất. Nó được giả định rằng hai biến có liên quan tuyến tính. Do đó, chúng tôi cố gắng tìm một hàm tuyến tính dự đoán giá trị phản hồi [y] chính xác nhất có thể như là một hàm của tính năng hoặc biến độc lập [x]. x: & nbsp;response using a single feature.
It is assumed that the two variables are linearly related. Hence, we try to find a linear function that predicts the response value[y] as accurately as possible as a function of the feature or independent variable[x].
Let us consider a dataset where we have a value of response y for every feature x:

Đối với tính tổng quát, chúng tôi xác định: x là vectơ tính năng, tức là x = [x_1, x_2, Hồi., X_n], y là vectơ phản hồi, i.e y = [y_1, y_2, , n = 10] .a biểu đồ phân tán của bộ dữ liệu trên trông giống như:-
x as feature vector, i.e x = [x_1, x_2, …., x_n],
y as response vector, i.e y = [y_1, y_2, …., y_n]
for n observations [in above example, n=10].
A scatter plot of the above dataset looks like:-

Bây giờ, nhiệm vụ là tìm một dòng phù hợp nhất trong biểu đồ phân tán ở trên để chúng tôi có thể dự đoán phản hồi cho bất kỳ giá trị tính năng mới nào. .line that fits best in the above scatter plot so that we can predict the response for any new feature values. [i.e a value of x not present in a dataset]
This line is called a regression line.
The equation of regression line is represented as:

Here,

H [x_i] đại diện cho giá trị phản hồi dự đoán cho quan sát ith.predicted response value for ith observation.
B_0 và B_1 là các hệ số hồi quy và đại diện cho hệ thống y và độ dốc của đường hồi quy tương ứng.y-intercept and slope of regression line respectively.

Để tạo mô hình của chúng tôi, chúng tôi phải học cách tìm hiểu hoặc ước tính các giá trị của các hệ số hồi quy B_0 và B_1. Và một khi chúng tôi đã ước tính các hệ số này, chúng tôi có thể sử dụng mô hình để dự đoán các phản hồi! ; Vì vậy, mục tiêu của chúng tôi là giảm thiểu tổng số lỗi còn lại. Không đi sâu vào các chi tiết toán học, chúng tôi trình bày kết quả ở đây: trong đó SS_XY là tổng số các độ lệch của Y và X: & NBSP; và SS_XX là tổng của độ lệch bình phương của X: & NBSP; Lưu ý: Việc tìm kiếm hoàn toàn để tìm thấy ít nhất Ước tính hình vuông trong hồi quy tuyến tính đơn giản có thể được tìm thấy ở đây.
In this article, we are going to use the principle of Least Squares.
Now consider:

Here, e_i is a residual error in ith observation.
So, our aim is to minimize the total residual error.
We define the squared error or cost function, J as:

and our task is to find the value of b_0 and b_1 for which J[b_0,b_1] is minimum!
Without going into the mathematical details, we present the result here:

where SS_xy is the sum of cross-deviations of y and x:

and SS_xx is the sum of squared deviations of x:

Note: The complete derivation for finding least squares estimates in simple linear regression can be found here.

Mã: Thực hiện Python của kỹ thuật trên trên bộ dữ liệu nhỏ của chúng tôi & NBSP;

Python

import numpy as np

import matplotlib.pyplot as plt

def estimate_coef[x, y]:

n ____10

Coefficients:
[ -8.80740828e-02   6.72507352e-02   5.10280463e-02   2.18879172e+00
-1.72283734e+01   3.62985243e+00   2.13933641e-03  -1.36531300e+00
2.88788067e-01  -1.22618657e-02  -8.36014969e-01   9.53058061e-03
-5.05036163e-01]
Variance score: 0.720898784611

Coefficients:
[ -8.80740828e-02   6.72507352e-02   5.10280463e-02   2.18879172e+00
-1.72283734e+01   3.62985243e+00   2.13933641e-03  -1.36531300e+00
2.88788067e-01  -1.22618657e-02  -8.36014969e-01   9.53058061e-03
-5.05036163e-01]
Variance score: 0.720898784611

Coefficients:
[ -8.80740828e-02   6.72507352e-02   5.10280463e-02   2.18879172e+00
-1.72283734e+01   3.62985243e+00   2.13933641e-03  -1.36531300e+00
2.88788067e-01  -1.22618657e-02  -8.36014969e-01   9.53058061e-03
-5.05036163e-01]
Variance score: 0.720898784611

Coefficients:
[ -8.80740828e-02   6.72507352e-02   5.10280463e-02   2.18879172e+00
-1.72283734e+01   3.62985243e+00   2.13933641e-03  -1.36531300e+00
2.88788067e-01  -1.22618657e-02  -8.36014969e-01   9.53058061e-03
-5.05036163e-01]
Variance score: 0.720898784611

Coefficients:
[ -8.80740828e-02   6.72507352e-02   5.10280463e-02   2.18879172e+00
-1.72283734e+01   3.62985243e+00   2.13933641e-03  -1.36531300e+00
2.88788067e-01  -1.22618657e-02  -8.36014969e-01   9.53058061e-03
-5.05036163e-01]
Variance score: 0.720898784611

Coefficients:
[ -8.80740828e-02   6.72507352e-02   5.10280463e-02   2.18879172e+00
-1.72283734e+01   3.62985243e+00   2.13933641e-03  -1.36531300e+00
2.88788067e-01  -1.22618657e-02  -8.36014969e-01   9.53058061e-03
-5.05036163e-01]
Variance score: 0.720898784611

Coefficients:
[ -8.80740828e-02   6.72507352e-02   5.10280463e-02   2.18879172e+00
-1.72283734e+01   3.62985243e+00   2.13933641e-03  -1.36531300e+00
2.88788067e-01  -1.22618657e-02  -8.36014969e-01   9.53058061e-03
-5.05036163e-01]
Variance score: 0.720898784611

Các

import9____10 import1matplotlib.pyplot as plt2 matplotlib.pyplot as plt3

matplotlib.pyplot as plt5

Coefficients:
[ -8.80740828e-02   6.72507352e-02   5.10280463e-02   2.18879172e+00
-1.72283734e+01   3.62985243e+00   2.13933641e-03  -1.36531300e+00
2.88788067e-01  -1.22618657e-02  -8.36014969e-01   9.53058061e-03
-5.05036163e-01]
Variance score: 0.720898784611

Coefficients:
[ -8.80740828e-02   6.72507352e-02   5.10280463e-02   2.18879172e+00
-1.72283734e+01   3.62985243e+00   2.13933641e-03  -1.36531300e+00
2.88788067e-01  -1.22618657e-02  -8.36014969e-01   9.53058061e-03
-5.05036163e-01]
Variance score: 0.720898784611

7import8 matplotlib.pyplot as plt9import6numpy as np3

def3 def4

def def6

def8

Coefficients:
[ -8.80740828e-02   6.72507352e-02   5.10280463e-02   2.18879172e+00
-1.72283734e+01   3.62985243e+00   2.13933641e-03  -1.36531300e+00
2.88788067e-01  -1.22618657e-02  -8.36014969e-01   9.53058061e-03
-5.05036163e-01]
Variance score: 0.720898784611

0 estimate_coef[x, y]:0estimate_coef[x, y]:1

estimate_coef[x, y]:2estimate_coef[x, y]:3

Coefficients:
[ -8.80740828e-02   6.72507352e-02   5.10280463e-02   2.18879172e+00
-1.72283734e+01   3.62985243e+00   2.13933641e-03  -1.36531300e+00
2.88788067e-01  -1.22618657e-02  -8.36014969e-01   9.53058061e-03
-5.05036163e-01]
Variance score: 0.720898784611

0 estimate_coef[x, y]:5estimate_coef[x, y]:6____10 estimate_coef[x, y]:8__

Coefficients:
[ -8.80740828e-02   6.72507352e-02   5.10280463e-02   2.18879172e+00
-1.72283734e+01   3.62985243e+00   2.13933641e-03  -1.36531300e+00
2.88788067e-01  -1.22618657e-02  -8.36014969e-01   9.53058061e-03
-5.05036163e-01]
Variance score: 0.720898784611

0 3 4 55____86 3 8 9import6______

n 3

Coefficients:
[ -8.80740828e-02   6.72507352e-02   5.10280463e-02   2.18879172e+00
-1.72283734e+01   3.62985243e+00   2.13933641e-03  -1.36531300e+00
2.88788067e-01  -1.22618657e-02  -8.36014969e-01   9.53058061e-03
-5.05036163e-01]
Variance score: 0.720898784611

0 n 5estimate_coef[x, y]:9

n 8n 9estimate_coef[x, y]:9

Coefficients:
[ -8.80740828e-02   6.72507352e-02   5.10280463e-02   2.18879172e+00
-1.72283734e+01   3.62985243e+00   2.13933641e-03  -1.36531300e+00
2.88788067e-01  -1.22618657e-02  -8.36014969e-01   9.53058061e-03
-5.05036163e-01]
Variance score: 0.720898784611

022.

Coefficients:
[ -8.80740828e-02   6.72507352e-02   5.10280463e-02   2.18879172e+00
-1.72283734e+01   3.62985243e+00   2.13933641e-03  -1.36531300e+00
2.88788067e-01  -1.22618657e-02  -8.36014969e-01   9.53058061e-03
-5.05036163e-01]
Variance score: 0.720898784611

def

Coefficients:
[ -8.80740828e-02   6.72507352e-02   5.10280463e-02   2.18879172e+00
-1.72283734e+01   3.62985243e+00   2.13933641e-03  -1.36531300e+00
2.88788067e-01  -1.22618657e-02  -8.36014969e-01   9.53058061e-03
-5.05036163e-01]
Variance score: 0.720898784611

Coefficients:
[ -8.80740828e-02   6.72507352e-02   5.10280463e-02   2.18879172e+00
-1.72283734e+01   3.62985243e+00   2.13933641e-03  -1.36531300e+00
2.88788067e-01  -1.22618657e-02  -8.36014969e-01   9.53058061e-03
-5.05036163e-01]
Variance score: 0.720898784611

10____10

Coefficients:
[ -8.80740828e-02   6.72507352e-02   5.10280463e-02   2.18879172e+00
-1.72283734e+01   3.62985243e+00   2.13933641e-03  -1.36531300e+00
2.88788067e-01  -1.22618657e-02  -8.36014969e-01   9.53058061e-03
-5.05036163e-01]
Variance score: 0.720898784611

12___

Coefficients:
[ -8.80740828e-02   6.72507352e-02   5.10280463e-02   2.18879172e+00
-1.72283734e+01   3.62985243e+00   2.13933641e-03  -1.36531300e+00
2.88788067e-01  -1.22618657e-02  -8.36014969e-01   9.53058061e-03
-5.05036163e-01]
Variance score: 0.720898784611

Coefficients:
[ -8.80740828e-02   6.72507352e-02   5.10280463e-02   2.18879172e+00
-1.72283734e+01   3.62985243e+00   2.13933641e-03  -1.36531300e+00
2.88788067e-01  -1.22618657e-02  -8.36014969e-01   9.53058061e-03
-5.05036163e-01]
Variance score: 0.720898784611

Coefficients:
[ -8.80740828e-02   6.72507352e-02   5.10280463e-02   2.18879172e+00
-1.72283734e+01   3.62985243e+00   2.13933641e-03  -1.36531300e+00
2.88788067e-01  -1.22618657e-02  -8.36014969e-01   9.53058061e-03
-5.05036163e-01]
Variance score: 0.720898784611

12 8___

Coefficients:
[ -8.80740828e-02   6.72507352e-02   5.10280463e-02   2.18879172e+00
-1.72283734e+01   3.62985243e+00   2.13933641e-03  -1.36531300e+00
2.88788067e-01  -1.22618657e-02  -8.36014969e-01   9.53058061e-03
-5.05036163e-01]
Variance score: 0.720898784611

Coefficients:
[ -8.80740828e-02   6.72507352e-02   5.10280463e-02   2.18879172e+00
-1.72283734e+01   3.62985243e+00   2.13933641e-03  -1.36531300e+00
2.88788067e-01  -1.22618657e-02  -8.36014969e-01   9.53058061e-03
-5.05036163e-01]
Variance score: 0.720898784611

Coefficients:
[ -8.80740828e-02   6.72507352e-02   5.10280463e-02   2.18879172e+00
-1.72283734e+01   3.62985243e+00   2.13933641e-03  -1.36531300e+00
2.88788067e-01  -1.22618657e-02  -8.36014969e-01   9.53058061e-03
-5.05036163e-01]
Variance score: 0.720898784611

Coefficients:
[ -8.80740828e-02   6.72507352e-02   5.10280463e-02   2.18879172e+00
-1.72283734e+01   3.62985243e+00   2.13933641e-03  -1.36531300e+00
2.88788067e-01  -1.22618657e-02  -8.36014969e-01   9.53058061e-03
-5.05036163e-01]
Variance score: 0.720898784611

Coefficients:
[ -8.80740828e-02   6.72507352e-02   5.10280463e-02   2.18879172e+00
-1.72283734e+01   3.62985243e+00   2.13933641e-03  -1.36531300e+00
2.88788067e-01  -1.22618657e-02  -8.36014969e-01   9.53058061e-03
-5.05036163e-01]
Variance score: 0.720898784611

Coefficients:
[ -8.80740828e-02   6.72507352e-02   5.10280463e-02   2.18879172e+00
-1.72283734e+01   3.62985243e+00   2.13933641e-03  -1.36531300e+00
2.88788067e-01  -1.22618657e-02  -8.36014969e-01   9.53058061e-03
-5.05036163e-01]
Variance score: 0.720898784611

Coefficients:
[ -8.80740828e-02   6.72507352e-02   5.10280463e-02   2.18879172e+00
-1.72283734e+01   3.62985243e+00   2.13933641e-03  -1.36531300e+00
2.88788067e-01  -1.22618657e-02  -8.36014969e-01   9.53058061e-03
-5.05036163e-01]
Variance score: 0.720898784611

Các

Coefficients:
[ -8.80740828e-02   6.72507352e-02   5.10280463e-02   2.18879172e+00
-1.72283734e+01   3.62985243e+00   2.13933641e-03  -1.36531300e+00
2.88788067e-01  -1.22618657e-02  -8.36014969e-01   9.53058061e-03
-5.05036163e-01]
Variance score: 0.720898784611

Coefficients:
[ -8.80740828e-02   6.72507352e-02   5.10280463e-02   2.18879172e+00
-1.72283734e+01   3.62985243e+00   2.13933641e-03  -1.36531300e+00
2.88788067e-01  -1.22618657e-02  -8.36014969e-01   9.53058061e-03
-5.05036163e-01]
Variance score: 0.720898784611

Coefficients:
[ -8.80740828e-02   6.72507352e-02   5.10280463e-02   2.18879172e+00
-1.72283734e+01   3.62985243e+00   2.13933641e-03  -1.36531300e+00
2.88788067e-01  -1.22618657e-02  -8.36014969e-01   9.53058061e-03
-5.05036163e-01]
Variance score: 0.720898784611

79____1010

Coefficients:
[ -8.80740828e-02   6.72507352e-02   5.10280463e-02   2.18879172e+00
-1.72283734e+01   3.62985243e+00   2.13933641e-03  -1.36531300e+00
2.88788067e-01  -1.22618657e-02  -8.36014969e-01   9.53058061e-03
-5.05036163e-01]
Variance score: 0.720898784611

Coefficients:
[ -8.80740828e-02   6.72507352e-02   5.10280463e-02   2.18879172e+00
-1.72283734e+01   3.62985243e+00   2.13933641e-03  -1.36531300e+00
2.88788067e-01  -1.22618657e-02  -8.36014969e-01   9.53058061e-03
-5.05036163e-01]
Variance score: 0.720898784611

Coefficients:
[ -8.80740828e-02   6.72507352e-02   5.10280463e-02   2.18879172e+00
-1.72283734e+01   3.62985243e+00   2.13933641e-03  -1.36531300e+00
2.88788067e-01  -1.22618657e-02  -8.36014969e-01   9.53058061e-03
-5.05036163e-01]
Variance score: 0.720898784611

Output:

Estimated coefficients:
b_0 = -0.0586206896552
b_1 = 1.45747126437

Và đồ thị thu được trông như thế này: & nbsp; & nbsp;

Hồi quy nhiều tuyến tính

Thử nghiệm hồi quy tuyến tính để mô hình hóa mối quan hệ giữa hai hoặc nhiều tính năng và phản hồi bằng cách lắp phương trình tuyến tính với dữ liệu được quan sát. Nói một cách không có gì ngoài phần mở rộng của hồi quy tuyến tính đơn giản. và một phản hồi [hoặc biến phụ thuộc]. & nbsp; Ngoài ra, bộ dữ liệu chứa n hàng/quan sát. Chúng tôi xác định: x [ma trận tính năng] = một ma trận có kích thước quan sát.so, & nbsp; andy [vector phản hồi] = một vectơ có kích thước n trong đó y_ {i} biểu thị giá trị của phản hồi cho quan sát ith. Đường hồi quy cho các tính năng p được biểu thị là: & nbsp; trong đó h [x_i] được dự đoán Giá trị phản hồi cho quan sát ith và B_0, B_1, Mạnh, B_P là các hệ số hồi quy. như: & nbsp; vì vậy bây giờ, mô hình tuyến tính có thể được mở rộng SED về ma trận là: & nbsp; ở đâu, & nbs giảm thiểu. Chúng tôi trình bày kết quả trực tiếp ở đây: & nbsp; trong đó 'đại diện cho chuyển vị của ma trận trong khi -1 đại diện cho ma trận nghịch đảo. Kích thích các ước tính vuông nhất, B', mô hình hồi quy nhiều tuyến tính hiện có thể được ước tính là: trong đó y ' là vectơ đáp ứng ước tính. Không có nguồn gốc hoàn chỉnh để có được ước tính vuông tối thiểu trong hồi quy tuyến tính nhiều có thể được tìm thấy ở đây.two or more features and a response by fitting a linear equation to the observed data.
Clearly, it is nothing but an extension of simple linear regression.
Consider a dataset with p features[or independent variables] and one response[or dependent variable].
Also, the dataset contains n rows/observations.
We define:
X [feature matrix] = a matrix of size n X p where x_{ij} denotes the values of jth feature for ith observation.
So,

and
y [response vector] = a vector of size n where y_{i} denotes the value of response for ith observation.

The regression line for p features is represented as:

where h[x_i] is predicted response value for ith observation and b_0, b_1, …, b_p are the regression coefficients.
Also, we can write:

where e_i represents residual error in ith observation.
We can generalize our linear model a little bit more by representing feature matrix X as:

So now, the linear model can be expressed in terms of matrices as:

where,

and

Now, we determine an estimate of b, i.e. b’ using the Least Squares method.
As already explained, the Least Squares method tends to determine b’ for which total residual error is minimized.
We present the result directly here:

where ‘ represents the transpose of the matrix while -1 represents the matrix inverse.
Knowing the least square estimates, b’, the multiple linear regression model can now be estimated as:

where y’ is the estimated response vector.
Note: The complete derivation for obtaining least square estimates in multiple linear regression can be found here.

Mã: Thực hiện Python của nhiều kỹ thuật hồi quy tuyến tính trên bộ dữ liệu định giá nhà Boston bằng cách sử dụng Scikit-Learn. & NBSP;

Python

import matplotlib.pyplot as plt

import numpy as np

Coefficients:
[ -8.80740828e-02   6.72507352e-02   5.10280463e-02   2.18879172e+00
-1.72283734e+01   3.62985243e+00   2.13933641e-03  -1.36531300e+00
2.88788067e-01  -1.22618657e-02  -8.36014969e-01   9.53058061e-03
-5.05036163e-01]
Variance score: 0.720898784611

Coefficients:
[ -8.80740828e-02   6.72507352e-02   5.10280463e-02   2.18879172e+00
-1.72283734e+01   3.62985243e+00   2.13933641e-03  -1.36531300e+00
2.88788067e-01  -1.22618657e-02  -8.36014969e-01   9.53058061e-03
-5.05036163e-01]
Variance score: 0.720898784611

91import

Coefficients:
[ -8.80740828e-02   6.72507352e-02   5.10280463e-02   2.18879172e+00
-1.72283734e+01   3.62985243e+00   2.13933641e-03  -1.36531300e+00
2.88788067e-01  -1.22618657e-02  -8.36014969e-01   9.53058061e-03
-5.05036163e-01]
Variance score: 0.720898784611

‘

import00

Coefficients:
[ -8.80740828e-02   6.72507352e-02   5.10280463e-02   2.18879172e+00
-1.72283734e+01   3.62985243e+00   2.13933641e-03  -1.36531300e+00
2.88788067e-01  -1.22618657e-02  -8.36014969e-01   9.53058061e-03
-5.05036163e-01]
Variance score: 0.720898784611

0 import02

Coefficients:
[ -8.80740828e-02   6.72507352e-02   5.10280463e-02   2.18879172e+00
-1.72283734e+01   3.62985243e+00   2.13933641e-03  -1.36531300e+00
2.88788067e-01  -1.22618657e-02  -8.36014969e-01   9.53058061e-03
-5.05036163e-01]
Variance score: 0.720898784611

Coefficients:
[ -8.80740828e-02   6.72507352e-02   5.10280463e-02   2.18879172e+00
-1.72283734e+01   3.62985243e+00   2.13933641e-03  -1.36531300e+00
2.88788067e-01  -1.22618657e-02  -8.36014969e-01   9.53058061e-03
-5.05036163e-01]
Variance score: 0.720898784611

0 import05

Coefficients:
[ -8.80740828e-02   6.72507352e-02   5.10280463e-02   2.18879172e+00
-1.72283734e+01   3.62985243e+00   2.13933641e-03  -1.36531300e+00
2.88788067e-01  -1.22618657e-02  -8.36014969e-01   9.53058061e-03
-5.05036163e-01]
Variance score: 0.720898784611

90 import07____2 import09

import10

Coefficients:
[ -8.80740828e-02   6.72507352e-02   5.10280463e-02   2.18879172e+00
-1.72283734e+01   3.62985243e+00   2.13933641e-03  -1.36531300e+00
2.88788067e-01  -1.22618657e-02  -8.36014969e-01   9.53058061e-03
-5.05036163e-01]
Variance score: 0.720898784611

0 import12

Coefficients:
[ -8.80740828e-02   6.72507352e-02   5.10280463e-02   2.18879172e+00
-1.72283734e+01   3.62985243e+00   2.13933641e-03  -1.36531300e+00
2.88788067e-01  -1.22618657e-02  -8.36014969e-01   9.53058061e-03
-5.05036163e-01]
Variance score: 0.720898784611

0____import14estimate_coef[x, y]:1

import16import17

Coefficients:
[ -8.80740828e-02   6.72507352e-02   5.10280463e-02   2.18879172e+00
-1.72283734e+01   3.62985243e+00   2.13933641e-03  -1.36531300e+00
2.88788067e-01  -1.22618657e-02  -8.36014969e-01   9.53058061e-03
-5.05036163e-01]
Variance score: 0.720898784611

0 8estimate_coef[x, y]:9

import21

Coefficients:
[ -8.80740828e-02   6.72507352e-02   5.10280463e-02   2.18879172e+00
-1.72283734e+01   3.62985243e+00   2.13933641e-03  -1.36531300e+00
2.88788067e-01  -1.22618657e-02  -8.36014969e-01   9.53058061e-03
-5.05036163e-01]
Variance score: 0.720898784611

0 import23

import24

Coefficients:
[ -8.80740828e-02   6.72507352e-02   5.10280463e-02   2.18879172e+00
-1.72283734e+01   3.62985243e+00   2.13933641e-03  -1.36531300e+00
2.88788067e-01  -1.22618657e-02  -8.36014969e-01   9.53058061e-03
-5.05036163e-01]
Variance score: 0.720898784611

62import26import27import28

Coefficients:
[ -8.80740828e-02   6.72507352e-02   5.10280463e-02   2.18879172e+00
-1.72283734e+01   3.62985243e+00   2.13933641e-03  -1.36531300e+00
2.88788067e-01  -1.22618657e-02  -8.36014969e-01   9.53058061e-03
-5.05036163e-01]
Variance score: 0.720898784611

62import26import31import32

Coefficients:
[ -8.80740828e-02   6.72507352e-02   5.10280463e-02   2.18879172e+00
-1.72283734e+01   3.62985243e+00   2.13933641e-03  -1.36531300e+00
2.88788067e-01  -1.22618657e-02  -8.36014969e-01   9.53058061e-03
-5.05036163e-01]
Variance score: 0.720898784611

70import34

import35import36estimate_coef[x, y]:9

import38import8 import40

import41import42

Coefficients:
[ -8.80740828e-02   6.72507352e-02   5.10280463e-02   2.18879172e+00
-1.72283734e+01   3.62985243e+00   2.13933641e-03  -1.36531300e+00
2.88788067e-01  -1.22618657e-02  -8.36014969e-01   9.53058061e-03
-5.05036163e-01]
Variance score: 0.720898784611

import52import8 import54

import41import42

Coefficients:
[ -8.80740828e-02   6.72507352e-02   5.10280463e-02   2.18879172e+00
-1.72283734e+01   3.62985243e+00   2.13933641e-03  -1.36531300e+00
2.88788067e-01  -1.22618657e-02  -8.36014969e-01   9.53058061e-03
-5.05036163e-01]
Variance score: 0.720898784611

0 import58__

import79____10 import81estimate_coef[x, y]:9

import83import84estimate_coef[x, y]:9

Coefficients:
[ -8.80740828e-02   6.72507352e-02   5.10280463e-02   2.18879172e+00
-1.72283734e+01   3.62985243e+00   2.13933641e-03  -1.36531300e+00
2.88788067e-01  -1.22618657e-02  -8.36014969e-01   9.53058061e-03
-5.05036163e-01]
Variance score: 0.720898784611

Output:

Coefficients:
[ -8.80740828e-02   6.72507352e-02   5.10280463e-02   2.18879172e+00
-1.72283734e+01   3.62985243e+00   2.13933641e-03  -1.36531300e+00
2.88788067e-01  -1.22618657e-02  -8.36014969e-01   9.53058061e-03
-5.05036163e-01]
Variance score: 0.720898784611

và sơ đồ lỗi còn lại trông như thế này: & nbsp; & nbsp;Residual Error plot looks like this:

Trong ví dụ trên, chúng tôi xác định điểm chính xác bằng cách sử dụng điểm phương sai được giải thích. & Nbsp; chúng tôi xác định: & nbsp; Đầu ra mục tiêu tương ứng [đúng] và var là phương sai, bình phương của độ lệch chuẩn. & nbsp; điểm tốt nhất có thể là 1.0, giá trị thấp hơn là tồi tệ hơn. & nbsp; & nbsp;Explained Variance Score.
We define:
explained_variance_score = 1 – Var{y – y’}/Var{y}
where y’ is the estimated target output, y the corresponding [correct] target output, and Var is Variance, the square of the standard deviation.
The best possible score is 1.0, lower values are worse.

Assumptions:

Đưa ra dưới đây là các giả định cơ bản mà mô hình hồi quy tuyến tính đưa ra liên quan đến một bộ dữ liệu mà nó được áp dụng: & nbsp; & nbsp;

Mối quan hệ tuyến tính: Mối quan hệ giữa phản hồi và các biến tính năng phải là tuyến tính. Giả định tuyến tính có thể được kiểm tra bằng cách sử dụng các sơ đồ phân tán. Như được hiển thị bên dưới, hình 1 đại diện cho các biến liên quan đến tuyến tính trong khi các biến trong các hình thứ 2 và thứ 3 có khả năng là phi tuyến tính. Vì vậy, con số 1 sẽ đưa ra dự đoán tốt hơn bằng cách sử dụng hồi quy tuyến tính. & Nbsp;: Relationship between response and feature variables should be linear. The linearity assumption can be tested using scatter plots. As shown below, 1st figure represents linearly related variables whereas variables in the 2nd and 3rd figures are most likely non-linear. So, 1st figure will give better predictions using linear regression.

Ít hoặc không có đa colinearity: người ta cho rằng có rất ít hoặc không có đa hình trong dữ liệu. Đa hình suất xảy ra khi các tính năng [hoặc các biến độc lập] không độc lập với nhau.: It is assumed that there is little or no multicollinearity in the data. Multicollinearity occurs when the features [or independent variables] are not independent of each other.
Ít hoặc không có mối tương quan tự động: Một giả định khác là có rất ít hoặc không có sự tự tương quan trong dữ liệu. Sự tự tương quan xảy ra khi các lỗi còn lại không độc lập với nhau. Bạn có thể tham khảo ở đây để biết thêm cái nhìn sâu sắc về chủ đề này.: Another assumption is that there is little or no autocorrelation in the data. Autocorrelation occurs when the residual errors are not independent of each other. You can refer here for more insight into this topic.
Tính đồng nhất: Tính đồng nhất mô tả một tình huống trong đó thuật ngữ lỗi [nghĩa là nhiễu nhiễu hoặc nhiễu ngẫu nhiên trong mối quan hệ giữa các biến độc lập và biến phụ thuộc] là giống nhau trên tất cả các giá trị của các biến độc lập. Như được hiển thị bên dưới, Hình 1 có độ đồng nhất trong khi Hình 2 có độ không đồng nhất. & NBSP;: Homoscedasticity describes a situation in which the error term [that is, the “noise” or random disturbance in the relationship between the independent variables and the dependent variable] is the same across all values of the independent variables. As shown below, figure 1 has homoscedasticity while figure 2 has heteroscedasticity.

Khi chúng tôi đạt đến cuối bài viết này, chúng tôi thảo luận về một số ứng dụng của hồi quy tuyến tính dưới đây. & NBSP;

Applications:

Dòng xu hướng: Một dòng xu hướng thể hiện sự thay đổi trong dữ liệu định lượng với thời gian trôi qua [như GDP, giá dầu, v.v.]. Những xu hướng này thường theo một mối quan hệ tuyến tính. Do đó, hồi quy tuyến tính có thể được áp dụng để dự đoán các giá trị trong tương lai. Tuy nhiên, phương pháp này bị thiếu hiệu lực khoa học trong trường hợp những thay đổi tiềm năng khác có thể ảnh hưởng đến dữ liệu. A trend line represents the variation in quantitative data with the passage of time [like GDP, oil prices, etc.]. These trends usually follow a linear relationship. Hence, linear regression can be applied to predict future values. However, this method suffers from a lack of scientific validity in cases where other potential changes can affect the data.
Kinh tế: Hồi quy tuyến tính là công cụ thực nghiệm chiếm ưu thế trong kinh tế. Ví dụ, nó được sử dụng để dự đoán chi tiêu của người tiêu dùng, chi tiêu đầu tư cố định, đầu tư hàng tồn kho, mua hàng xuất khẩu của một quốc gia, chi tiêu cho nhập khẩu, nhu cầu nắm giữ tài sản thanh khoản, nhu cầu lao động và cung lao động. Linear regression is the predominant empirical tool in economics. For example, it is used to predict consumer spending, fixed investment spending, inventory investment, purchases of a country’s exports, spending on imports, the demand to hold liquid assets, labor demand, and labor supply.
Tài chính: Mô hình tài sản giá vốn sử dụng hồi quy tuyến tính để phân tích và định lượng các rủi ro hệ thống của một khoản đầu tư.4. Sinh học: Hồi quy tuyến tính được sử dụng để mô hình hóa mối quan hệ nhân quả giữa các tham số trong các hệ thống sinh học. The capital price asset model uses linear regression to analyze and quantify the systematic risks of an investment.
4. Biology: Linear regression is used to model causal relationships between parameters in biological systems.

References:

//en.wikipedia.org/wiki/Linear_regression
//en.wikipedia.org/wiki/Simple_linear_regression
//scikit-learn.org/stable/auto_examples/linear_model/plot_ols.html
//www.statisticssolutions.com/assumptions-of-linear-regression/

Hồi quy tuyến tính cơ bản

Python

Hồi quy nhiều tuyến tính

Python

Assumptions:

Applications:

References:

Bài Viết Liên Quan

Toplist mới

Bài mới nhất

Chủ Đề