programming python

Con trăn tạo thùng dựa trên entropy

Mục tiêu của các phương pháp tập hợp là kết hợp các dự đoán của một số công cụ ước tính cơ sở được xây dựng với một thuật toán học tập nhất định để cải thiện khả năng khái quát hóa / độ mạnh của một công cụ ước tính duy nhất

Hai họ phương pháp tập hợp thường được phân biệt

In averaging methods, the driving principle is to build several estimators independently and then to average their predictions. On average, the combined estimator is usually better than any of the single base estimator because its variance is reduced
Examples. Bagging methods , Forests of randomized trees , …
By contrast, in boosting methods, base estimators are built sequentially and one tries to reduce the bias of the combined estimator. The motivation is to combine several weak models to produce a powerful ensemble
Examples. AdaBoost , Gradient Tree Boosting , …

1. 11. 1. Bagging meta-estimator¶

In ensemble algorithms, bagging methods form a class of algorithms which build several instances of a black-box estimator on random subsets of the original training set and then aggregate their individual predictions to form a final prediction. These methods are used as a way to reduce the variance of a base estimator [e. g. , a decision tree], by introducing randomization into its construction procedure and then making an ensemble out of it. In many cases, bagging methods constitute a very simple way to improve with respect to a single model, without making it necessary to adapt the underlying base algorithm. As they provide a way to reduce overfitting, bagging methods work best with strong and complex models [e. g. , fully developed decision trees], in contrast with boosting methods which usually work best with weak models [e. g. , shallow decision trees]

Bagging methods come in many flavours but mostly differ from each other by the way they draw random subsets of the training set

When random subsets of the dataset are drawn as random subsets of the samples, then this algorithm is known as Pasting [B1999] .
When samples are drawn with replacement, then the method is known as Bagging [B1996] .
When random subsets of the dataset are drawn as random subsets of the features, then the method is known as Random Subspaces [H1998] .
Finally, when base estimators are built on subsets of both samples and features, then the method is known as Random Patches [LG2012] .

In scikit-learn, bagging methods are offered as a unified

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

8 meta-estimator [resp.

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

9], taking as input a user-specified estimator along with parameters specifying the strategy to draw random subsets. In particular,

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

0 and

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

1 control the size of the subsets [in terms of samples and features], while

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

2 and

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

3 control whether samples and features are drawn with or without replacement. Khi sử dụng một tập hợp con của các mẫu có sẵn, độ chính xác tổng quát hóa có thể được ước tính với các mẫu bên ngoài bằng cách đặt

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

4. Ví dụ: đoạn mã dưới đây minh họa cách khởi tạo một tập hợp đóng bao gồm 25 công cụ ước tính, mỗi công cụ được xây dựng trên các tập hợp con ngẫu nhiên gồm 50% mẫu và 50% tính năng

________số 8

ví dụ

Công cụ ước tính đơn lẻ so với đóng bao. phân tách sai lệch phương sai

Người giới thiệu

[ B1999 ]

L. Breiman, “Pasting small votes for classification in large databases and on-line”, Machine Learning, 36[1], 85-103, 1999

[ B1996 ]

L. Breiman, “Bagging predictors”, Machine Learning, 24[2], 123-140, 1996

[ H1998 ]

T. Ho, “The random subspace method for constructing decision forests”, Pattern Analysis and Machine Intelligence, 20[8], 832-844, 1998

[ LG2012 ]

G. Louppe and P. Geurts, “Ensembles on Random Patches”, Machine Learning and Knowledge Discovery in Databases, 346-361, 2012

1. 11. 2. Forests of randomized trees¶

The

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

6 module includes two averaging algorithms based on randomized decision trees . the RandomForest algorithm and the Extra-Trees method. Both algorithms are perturb-and-combine techniques [B1998] specifically designed for trees. This means a diverse set of classifiers is created by introducing randomness in the classifier construction. The prediction of the ensemble is given as the averaged prediction of the individual classifiers.

As other classifiers, forest classifiers have to be fitted with two arrays. a sparse or dense array X of shape

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

7 holding the training samples, and an array Y of shape

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

8 holding the target values [class labels] for the training samples

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

Like decision trees , forests of trees also extend to multi-output problems [if Y is an array of shape

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

9].

1. 11. 2. 1. Random Forests¶

In random forests [see

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

40 and

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

41 classes], each tree in the ensemble is built from a sample drawn with replacement [i. e. , a bootstrap sample] from the training set

Furthermore, when splitting each node during the construction of a tree, the best split is found either from all input features or a random subset of size

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

1. [See the parameter tuning guidelines for more details].

The purpose of these two sources of randomness is to decrease the variance of the forest estimator. Indeed, individual decision trees typically exhibit high variance and tend to overfit. The injected randomness in forests yield decision trees with somewhat decoupled prediction errors. By taking an average of those predictions, some errors can cancel out. Random forests achieve a reduced variance by combining diverse trees, sometimes at the cost of a slight increase in bias. In practice the variance reduction is often significant hence yielding an overall better model

In contrast to the original publication [B2001] , the scikit-learn implementation combines classifiers by averaging their probabilistic prediction, instead of letting each classifier vote for a single class.

1. 11. 2. 2. Extremely Randomized Trees¶

In extremely randomized trees [see

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

43 and

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

44 classes], randomness goes one step further in the way splits are computed. As in random forests, a random subset of candidate features is used, but instead of looking for the most discriminative thresholds, thresholds are drawn at random for each candidate feature and the best of these randomly-generated thresholds is picked as the splitting rule. This usually allows to reduce the variance of the model a bit more, at the expense of a slightly greater increase in bias

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

1. 11. 2. 3. Parameters¶

Các thông số chính cần điều chỉnh khi sử dụng các phương pháp này là

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

45 và

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

1. Cái trước là số cây trong rừng. Càng lớn càng tốt, nhưng sẽ mất nhiều thời gian hơn để tính toán. Ngoài ra, lưu ý rằng kết quả sẽ ngừng tốt hơn đáng kể khi vượt quá số lượng cây quan trọng. The latter is the size of the random subsets of features to consider when splitting a node. The lower the greater the reduction of variance, but also the greater the increase in bias. Empirical good default values are

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

47 or equivalently

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

48 [always considering all features instead of a random subset] for regression problems, and

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

49 [using a random subset of size

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

50] for classification tasks [where

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

51 is the number of features in the data]. The default value of

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

47 is equivalent to bagged trees and more randomness can be achieved by setting smaller values [e. g. 0. 3 is a typical default in the literature]. Kết quả tốt thường đạt được khi cài đặt

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

53 kết hợp với

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

54 [i. e. , when fully developing the trees]. Bear in mind though that these values are usually not optimal, and might result in models that consume a lot of RAM. The best parameter values should always be cross-validated. In addition, note that in random forests, bootstrap samples are used by default [

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

55] while the default strategy for extra-trees is to use the whole dataset [

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

56]. When using bootstrap sampling the generalization error can be estimated on the left out or out-of-bag samples. This can be enabled by setting

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

Note

The size of the model with the default parameters is \[O[ M * N * log [N] ]\] , where \[M\] is the number of trees and \[N\] is the number of samples. In order to reduce the size of the model, you can change these parameters.

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

58,

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

59,

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

50 and

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

51.

1. 11. 2. 4. Parallelization¶

Finally, this module also features the parallel construction of the trees and the parallel computation of the predictions through the

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

52 parameter. If

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

53 then computations are partitioned into

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

54 jobs, and run on

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

54 cores of the machine. If

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

56 then all cores available on the machine are used. Note that because of inter-process communication overhead, the speedup might not be linear [i. e. , using

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

54 jobs will unfortunately not be

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

54 times as fast]. Tốc độ tăng đáng kể vẫn có thể đạt được mặc dù khi xây dựng một số lượng lớn cây hoặc khi xây dựng một cây đơn lẻ cần một khoảng thời gian hợp lý [e. g. , on large datasets]

ví dụ

Plot the decision surfaces of ensembles of trees on the iris dataset
Pixel importances with a parallel forest of trees
Face completion with a multi-output estimators

Người giới thiệu

[ B2001 ]

Breiman, “Rừng ngẫu nhiên”, Học máy, 45[1], 5-32, 2001

[ B1998 ]

Breiman, “Arcing Classifiers”, Annals of Statistics 1998

P. Geurts, D. Ernst. , and L. Wehenkel, “Extremely randomized trees”, Machine Learning, 63[1], 3-42, 2006

1. 11. 2. 5. Feature importance evaluation¶

The relative rank [i. e. depth] of a feature used as a decision node in a tree can be used to assess the relative importance of that feature with respect to the predictability of the target variable. Features used at the top of the tree contribute to the final prediction decision of a larger fraction of the input samples. The expected fraction of the samples they contribute to can thus be used as an estimate of the relative importance of the features. In scikit-learn, the fraction of samples a feature contributes to is combined with the decrease in impurity from splitting them to create a normalized estimate of the predictive power of that feature

By averaging the estimates of predictive ability over several randomized trees one can reduce the variance of such an estimate and use it for feature selection. This is known as the mean decrease in impurity, or MDI. Refer to [L2014] for more information on MDI and feature importance evaluation with Random Forests.

Warning

The impurity-based feature importances computed on tree-based models suffer from two flaws that can lead to misleading conclusions. First they are computed on statistics derived from the training dataset and therefore do not necessarily inform us on which features are most important to make good predictions on held-out dataset. Secondly, they favor high cardinality features, that is features with many unique values. Permutation feature importance is an alternative to impurity-based feature importance that does not suffer from these flaws. These two methods of obtaining feature importance are explored in. Permutation Importance vs Random Forest Feature Importance [MDI] .

The following example shows a color-coded representation of the relative importances of each individual pixel for a face recognition task using a

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

43 model

In practice those estimates are stored as an attribute named

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

50 on the fitted model. This is an array with shape

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

51 whose values are positive and sum to 1. 0. The higher the value, the more important is the contribution of the matching feature to the prediction function

ví dụ

Pixel importances with a parallel forest of trees
Feature importances with a forest of trees

Người giới thiệu

[ L2014 ]

G. Louppe, “Understanding Random Forests. From Theory to Practice”, PhD Thesis, U. of Liege, 2014

1. 11. 2. 6. Totally Random Trees Embedding¶

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

52 implements an unsupervised transformation of the data. Using a forest of completely random trees,

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

52 encodes the data by the indices of the leaves a data point ends up in. This index is then encoded in a one-of-K manner, leading to a high dimensional, sparse binary coding. This coding can be computed very efficiently and can then be used as a basis for other learning tasks. The size and sparsity of the code can be influenced by choosing the number of trees and the maximum depth per tree. For each tree in the ensemble, the coding contains one entry of one. The size of the coding is at most

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

54, the maximum number of leaves in the forest

As neighboring data points are more likely to lie within the same leaf of a tree, the transformation performs an implicit, non-parametric density estimation

ví dụ

Hashing feature transformation using Totally Random Trees
Manifold learning on handwritten digits. Locally Linear Embedding, Isomap… compares non-linear dimensionality reduction techniques on handwritten digits.
Feature transformations with ensembles of trees compares supervised and unsupervised tree based feature transformations.

1. 11. 3. AdaBoost¶

The module

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

6 includes the popular boosting algorithm AdaBoost, introduced in 1995 by Freund and Schapire [FS1995] .

The core principle of AdaBoost is to fit a sequence of weak learners [i. e. , models that are only slightly better than random guessing, such as small decision trees] on repeatedly modified versions of the data. The predictions from all of them are then combined through a weighted majority vote [or sum] to produce the final prediction. The data modifications at each so-called boosting iteration consist of applying weights \[w_1\] , \[w_2\] , …, \[w_N\] to each of the training samples. Initially, those weights are all set to \[w_i = 1/N\] , so that the first step simply trains a weak learner on the original data. For each successive iteration, the sample weights are individually modified and the learning algorithm is reapplied to the reweighted data. At a given step, those training examples that were incorrectly predicted by the boosted model induced at the previous step have their weights increased, whereas the weights are decreased for those that were predicted correctly. As iterations proceed, examples that are difficult to predict receive ever-increasing influence. Each subsequent weak learner is thereby forced to concentrate on the examples that are missed by the previous ones in the sequence [HTF] .

AdaBoost can be used both for classification and regression problems

For multi-class classification,

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

56 implements AdaBoost-SAMME and AdaBoost-SAMME. R [ZZRH2009] .

Đối với hồi quy,

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

57 triển khai AdaBoost. R2 [D1997] .

1. 11. 3. 1. Usage¶

The following example shows how to fit an AdaBoost classifier with 100 weak learners

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

The number of weak learners is controlled by the parameter

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

45. The

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

59 parameter controls the contribution of the weak learners in the final combination. By default, weak learners are decision stumps. Different weak learners can be specified through the

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

60 parameter. The main parameters to tune to obtain good results are

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

45 and the complexity of the base estimators [e. g. , độ sâu của nó

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

50 hoặc số lượng mẫu tối thiểu cần thiết để xem xét sự phân tách

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

58]

ví dụ

AdaBoost rời rạc so với AdaBoost thực so sánh lỗi phân loại của gốc quyết định, cây quyết định và gốc quyết định được tăng cường bằng cách sử dụng AdaBoost-SAMME và AdaBoost-SAMME. R.
Cây quyết định AdaBoosted nhiều lớp thể hiện hiệu suất của AdaBoost-SAMME và AdaBoost-SAMME. R về một vấn đề nhiều lớp.
AdaBoost hai lớp hiển thị ranh giới quyết định và giá trị hàm quyết định cho vấn đề hai lớp có thể phân tách phi tuyến tính bằng AdaBoost-SAMME.
Hồi quy cây quyết định với AdaBoost thể hiện hồi quy với AdaBoost. thuật toán R2.

Người giới thiệu

[ FS1995 ]

Y. Freund, and R. Schapire, “A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting”, 1997

[ ZZRH2009 ]

J. Zhu, H. Zou, S. Rosset, T. Hastie. “Multi-class AdaBoost”, 2009

[ D1997 ]

Drucker. “Cải thiện hồi quy bằng cách sử dụng các kỹ thuật tăng cường”, 1997

[ HTF ] [1,2,3]

T. vội vã, R. Tibshirani và J. Friedman, “Các yếu tố của học tập thống kê Ed. 2”, Mùa xuân, 2009

1. 11. 4. Tăng cường cây chuyển sắc¶

Cây tăng cường độ dốc hoặc Cây quyết định tăng cường độ dốc [GBDT] là sự tổng quát hóa của việc tăng cường cho các hàm mất mát khả vi tùy ý, hãy xem công việc chính của [Friedman2001]. GBDT is an accurate and effective off-the-shelf procedure that can be used for both regression and classification problems in a variety of areas including Web search ranking and ecology.

Mô-đun

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

6 cung cấp các phương pháp cho cả phân loại và hồi quy thông qua cây quyết định tăng cường độ dốc

Note

Scikit-learning 0. 21 giới thiệu hai triển khai mới của cây tăng cường độ dốc, cụ thể là

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

65 và

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

66, lấy cảm hứng từ LightGBM [Xem [LightGBM] ].

Các công cụ ước tính dựa trên biểu đồ này có thể có cấp độ nhanh hơn

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

67 và

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

68 khi số lượng mẫu lớn hơn hàng chục nghìn mẫu

Chúng cũng có hỗ trợ tích hợp cho các giá trị bị thiếu, giúp tránh sự cần thiết của một máy tính

Các công cụ ước tính này được mô tả chi tiết hơn bên dưới trong Tăng cường độ dốc dựa trên biểu đồ .

Hướng dẫn sau đây tập trung vào

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

67 và

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

68, có thể được ưu tiên cho các cỡ mẫu nhỏ vì việc tạo thành nhóm có thể dẫn đến các điểm phân tách quá gần đúng trong cài đặt này

Cách sử dụng và các thông số của

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

67 và

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

68 được mô tả bên dưới. 2 tham số quan trọng nhất của các công cụ ước tính này là

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

45 và

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

1. 11. 4. 1. Phân loại¶

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

67 hỗ trợ cả phân loại nhị phân và đa lớp. Ví dụ sau đây cho thấy cách điều chỉnh bộ phân loại tăng cường độ dốc với 100 gốc quyết định khi người học yếu

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

Số học sinh yếu kém [i. e. cây hồi quy] được điều khiển bởi tham số

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

45; .

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

59 là một siêu tham số trong phạm vi [0. 0, 1. 0] kiểm soát quá mức thông qua The size of each tree can be controlled either by setting the tree depth via

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

50 or by setting the number of leaf nodes via

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

59. The

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

59 is a hyper-parameter in the range [0.0, 1.0] that controls overfitting via co rút .

Note

Việc phân loại có nhiều hơn 2 lớp yêu cầu tạo ra _______1200 cây hồi quy ở mỗi lần lặp, do đó, tổng số cây được tạo ra bằng _______1201. Đối với các bộ dữ liệu có số lượng lớn các lớp, chúng tôi thực sự khuyên bạn nên sử dụng

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

65 thay thế cho

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

1. 11. 4. 2. Hồi quy¶

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

68 hỗ trợ một số hàm mất mát khác nhau cho hồi quy có thể được chỉ định thông qua đối số

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

205; .

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

Hình dưới đây cho thấy kết quả của việc áp dụng

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

68 với tổn thất bình phương nhỏ nhất và 500 người học cơ sở cho bộ dữ liệu bệnh tiểu đường [

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

208]. Biểu đồ hiển thị lỗi đào tạo và kiểm tra ở mỗi lần lặp. Lỗi đào tạo ở mỗi lần lặp được lưu trữ trong thuộc tính

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

209 của mô hình tăng cường độ dốc. Có thể thu được lỗi kiểm tra ở mỗi lần lặp lại thông qua phương thức

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

210 trả về một trình tạo mang lại các dự đoán ở mỗi giai đoạn. Các ô như thế này có thể được sử dụng để xác định số lượng cây tối ưu [i. e.

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

45] bằng cách dừng sớm

ví dụ

Hồi quy tăng cường độ dốc
Ước tính Out-of-Bag của Gradient Boosting

1. 11. 4. 3. Bổ trợ thêm cho những học sinh yếu kém¶

Both

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

68 and

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

67 support

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

214 which allows you to add more estimators to an already fitted model

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

1. 11. 4. 4. Kiểm soát kích thước cây¶

Kích thước của những người học cơ sở cây hồi quy xác định mức độ tương tác thay đổi có thể được mô hình tăng cường độ dốc nắm bắt. Nói chung, một cây có chiều sâu

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

215 có thể ghi lại các tương tác theo thứ tự

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

215. Có hai cách để kiểm soát kích thước của các cây hồi quy riêng lẻ

Nếu bạn chỉ định

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

217 thì cây nhị phân hoàn chỉnh có độ sâu

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

215 sẽ được phát triển. Những cây như vậy sẽ có [nhiều nhất]

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

219 nút lá và

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

220 nút tách

Ngoài ra, bạn có thể kiểm soát kích thước cây bằng cách chỉ định số nút lá thông qua tham số

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

59. Trong trường hợp này, cây sẽ được trồng bằng cách sử dụng tìm kiếm tốt nhất đầu tiên trong đó các nút có mức độ tạp chất được cải thiện cao nhất sẽ được mở rộng trước. Một cây có

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

222 có các nút phân tách

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

223 và do đó có thể mô hình hóa các tương tác lên tới thứ tự

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

224

Chúng tôi nhận thấy rằng

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

222 cho kết quả tương đương với

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

226 nhưng đào tạo nhanh hơn đáng kể với chi phí là lỗi đào tạo cao hơn một chút. Tham số

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

59 tương ứng với biến

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

228 trong chương về tăng cường độ dốc trong [Friedman2001] và có liên quan đến tham số

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

229 trong gói gbm của R trong đó

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

230.

1. 11. 4. 5. Công thức toán học¶

Trước tiên, chúng tôi trình bày GBRT cho hồi quy, sau đó nêu chi tiết trường hợp phân loại

1. 11. 4. 5. 1. Hồi quy¶

Các biến hồi quy GBRT là các mô hình cộng có dự đoán \[\hat{y}_i\] cho một đầu vào nhất định \[x_i\] is of the following form:

\[\hat{y}_i = F_M[x_i] = \sum_{m=1}^{M} h_m[x_i]\]

trong đó \[h_m\] là công cụ ước tính được gọi là người học yếu trong bối cảnh tăng cường. Tăng cường cây chuyển màu sử dụng các biến hồi quy cây quyết định có kích thước cố định như những người học yếu. Hằng số M tương ứng với tham số

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

45.

Tương tự như các thuật toán tăng tốc khác, GBRT được xây dựng theo kiểu tham lam

\[F_m[x] = F_{m-1}[x] + h_m[x],\]

nơi cây mới được thêm vào \[h_m\] được lắp vào để giảm thiểu tổng thiệt hại \[L_m\], given the previous ensemble \[F_{m-1}\]:

\[h_m = \arg\min_{h} L_m = \arg\min_{h} \sum_{i=1}^{n} l[y_i, F_{m-1}[x_i] + h[x_i]]

where \[l[y_i, F[x_i]]\] được xác định bởi tham số

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

205, chi tiết trong phần tiếp theo.

Theo mặc định, mô hình ban đầu \[F_{0}\] được chọn làm hằng số giúp giảm thiểu tổn thất. đối với tổn thất bình phương nhỏ nhất, đây là giá trị trung bình theo kinh nghiệm của các giá trị mục tiêu. Mô hình ban đầu cũng có thể được chỉ định thông qua đối số

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

233.

Sử dụng xấp xỉ Taylor bậc nhất, giá trị của \[l\] có thể được xấp xỉ như sau.

\[l[y_i, F_{m-1}[x_i] + h_m[x_i]] \approx l[y_i, F_{m-1}[x_i]] + h_m[x_i] \left[ \frac{\partial . \]

Note

Tóm lại, xấp xỉ Taylor bậc nhất nói rằng \[l[z] \approx l[a] + [z - a] \frac{\partial l}{ . Ở đây, . Here, \[z\] tương ứng với \[F_{m - 1}[x_i] + h_m[, and \[a\] corresponds to \[F_{m-1}[x_i]\]

Số lượng \[\left[ \frac{\partial l[y_i, F[x_i]]}{\partial F[x_i]} \right]_{F . Dễ dàng tính toán cho bất kỳ is the derivative of the loss with respect to its second parameter, evaluated at \[F_{m-1}[x]\]. It is easy to compute for any given \[F_{m - 1}[x_i]\] nào ở dạng đóng vì tổn thất có thể khả vi. Chúng tôi sẽ biểu thị nó bằng \[g_i\] .

Loại bỏ các số hạng không đổi, ta có

\[h_m \approx \arg\min_{h} \sum_{i=1}^{n} h[x_i] g_i\]

This is minimized if \[h[x_i]\] is fitted to predict a value that is proportional to the negative gradient \[-g_i\] . Do đó, tại mỗi lần lặp lại, công cụ ước tính \[h_m\] được trang bị để dự đoán độ dốc âm của các mẫu. Các gradient được cập nhật ở mỗi lần lặp lại. Đây có thể được coi là một số kiểu giảm dần độ dốc trong một không gian chức năng.

Note

Đối với một số tổn thất, e. g.

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

234 trong đó độ dốc là \[\pm 1\] , các giá trị được dự đoán bởi \[h_m\ . cây chỉ có thể xuất các giá trị số nguyên. Do đó, các giá trị lá của cây are not accurate enough: the tree can only output integer values. As a result, the leaves values of the tree \[h_m\] được sửa đổi sau khi cây được lắp vào, sao cho các giá trị lá giảm thiểu tổn thất < . Bản cập nhật phụ thuộc vào tổn thất. đối với mất lỗi tuyệt đối, giá trị của một lá được cập nhật thành trung vị của các mẫu trong lá đó. \[L_m\]. The update is loss-dependent: for the absolute error loss, the value of a leaf is updated to the median of the samples in that leaf.

1. 11. 4. 5. 2. Phân loại¶

Tăng độ dốc để phân loại rất giống với trường hợp hồi quy. Tuy nhiên, tổng của các cây \[F_M[x_i] = \sum_m h_m[x_i]\] không đồng nhất với dự đoán. nó không thể là một lớp, vì cây dự đoán các giá trị liên tục.

Việc ánh xạ từ giá trị \[F_M[x_i]\] tới một lớp hoặc xác suất phụ thuộc vào tổn thất. Đối với trường hợp mất nhật ký, xác suất mà \[x_i\] thuộc loại tích cực được mô hình hóa là \ . x_i] = \sigma[F_M[x_i]]\] trong đó \[\sigma\] là sigmoid .

Đối với phân loại nhiều lớp, K cây [đối với K lớp] được tạo ở mỗi lần lặp \[M\] . Xác suất mà \[x_i\] thuộc lớp k được mô hình hóa dưới dạng softmax của \[F_{ . values.

Lưu ý rằng ngay cả đối với tác vụ phân loại, công cụ ước lượng phụ \[h_m\] vẫn là một biến hồi quy, không phải là một hàm phân loại. Điều này là do các công cụ ước tính phụ được đào tạo để dự đoán độ dốc [âm], luôn là đại lượng liên tục.

1. 11. 4. 6. Hàm mất mát¶

Các hàm mất mát sau đây được hỗ trợ và có thể được chỉ định bằng tham số

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

205

hồi quy

Lỗi bình phương [

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

206]. Sự lựa chọn tự nhiên cho hồi quy do tính chất tính toán vượt trội của nó. Mô hình ban đầu được đưa ra bởi giá trị trung bình của các giá trị mục tiêu

Lỗi tuyệt đối [

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

234]. Một hàm mất mát mạnh mẽ cho hồi quy. Mô hình ban đầu được đưa ra bởi trung bình của các giá trị mục tiêu

Huber [______1238]. Một hàm mất mát mạnh mẽ khác kết hợp bình phương nhỏ nhất và độ lệch tuyệt đối nhỏ nhất; . [Friedman2001] for more details].

Số lượng phân vị [______1240]. Hàm mất mát cho hồi quy lượng tử. Sử dụng

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

241 để chỉ định lượng tử. Hàm mất mát này có thể được sử dụng để tạo các khoảng thời gian dự đoán [xem Khoảng thời gian dự đoán cho hồi quy tăng cường độ dốc ].

phân loại

Mất nhật ký nhị phân [______1242]. Hàm mất khả năng log âm nhị thức để phân loại nhị phân. Nó cung cấp ước tính xác suất. Mô hình ban đầu được đưa ra bởi tỷ lệ chênh lệch log

Mất nhật ký nhiều lớp [______1242]. Hàm mất khả năng log âm đa thức cho phân loại nhiều lớp với

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

200 lớp loại trừ lẫn nhau. It provides probability estimates. Mô hình ban đầu được đưa ra bởi xác suất trước của mỗi lớp. Tại mỗi lần lặp lại, cây hồi quy

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

200 phải được xây dựng, điều này làm cho GBRT không hiệu quả đối với các tập dữ liệu có số lượng lớn các lớp

Mất theo cấp số nhân [

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

246]. Chức năng mất tương tự như

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

56. Ít mạnh mẽ hơn đối với các ví dụ bị dán nhãn sai so với

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

242;

1. 11. 4. 7. Suy giảm thông qua tỷ lệ học tập¶

[Friedman2001] đã đề xuất một chiến lược chính quy hóa đơn giản giúp chia tỷ lệ đóng góp của mỗi người học yếu theo hệ số không đổi \[\nu\]< . :

\[F_m[x] = F_{m-1}[x] + \nu h_m[x]\]

Tham số \[\nu\] còn được gọi là tốc độ học vì nó chia tỷ lệ độ dài bước của quy trình giảm dần độ dốc; .

Tham số

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

59 tương tác mạnh với tham số

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

45, số lượng học viên yếu để phù hợp. Các giá trị nhỏ hơn của

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

59 yêu cầu số lượng người học yếu lớn hơn để duy trì lỗi đào tạo liên tục. Bằng chứng thực nghiệm cho thấy rằng các giá trị nhỏ của

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

59 có lợi cho lỗi kiểm tra tốt hơn. [HTF] khuyên bạn nên đặt tốc độ học thành một hằng số nhỏ [e. g.

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

254] và chọn

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

45 bằng cách dừng sớm. Để thảo luận chi tiết hơn về sự tương tác giữa

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

59 và

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

45, hãy xem [R2007] .

1. 11. 4. 8. Lấy mẫu phụ¶

[Friedman2002] đã đề xuất tăng cường độ dốc ngẫu nhiên, kết hợp tăng cường độ dốc với tính trung bình bootstrap [đóng gói]. Tại mỗi lần lặp lại, trình phân loại cơ sở được đào tạo trên một phần nhỏ

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

258 của dữ liệu đào tạo có sẵn. Mẫu phụ được rút ra mà không cần thay thế. Giá trị điển hình của

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

258 là 0. 5.

Hình bên dưới minh họa ảnh hưởng của việc thu nhỏ và lấy mẫu con đối với mức độ phù hợp của mô hình. Chúng ta có thể thấy rõ rằng sự co ngót vượt trội so với không co ngót. Lấy mẫu con với độ co ngót có thể làm tăng thêm độ chính xác của mô hình. Mặt khác, việc lấy mẫu con không co ngót lại hoạt động kém

Một chiến lược khác để giảm phương sai là lấy mẫu con các đặc điểm tương tự như các phân tách ngẫu nhiên trong

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

40. Số lượng các tính năng được lấy mẫu phụ có thể được kiểm soát thông qua tham số

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

Note

Sử dụng một giá trị nhỏ

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

1 có thể làm giảm đáng kể thời gian chạy

Tăng cường độ dốc ngẫu nhiên cho phép tính toán các ước tính sẵn có về độ lệch thử nghiệm bằng cách tính toán sự cải thiện độ lệch trên các ví dụ không có trong mẫu bootstrap [i. e. các ví dụ xuất túi]. Các cải tiến được lưu trữ trong thuộc tính

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

263.

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

264 giữ nguyên sự cải thiện về tổn thất trên các mẫu OOB nếu bạn thêm giai đoạn thứ i vào các dự đoán hiện tại. Có thể sử dụng các ước tính ngoài túi để lựa chọn mô hình, chẳng hạn như để xác định số lần lặp lại tối ưu. Các ước tính OOB thường rất bi quan, do đó chúng tôi khuyên bạn nên sử dụng xác thực chéo thay thế và chỉ sử dụng OOB nếu xác thực chéo quá tốn thời gian

ví dụ

Chính quy hóa tăng cường độ dốc
Ước tính Out-of-Bag của Gradient Boosting
Lỗi OOB cho các khu rừng ngẫu nhiên

1. 11. 4. 9. Giải thích với tầm quan trọng của tính năng¶

Các cây quyết định riêng lẻ có thể được giải thích dễ dàng bằng cách trực quan hóa cấu trúc cây. Tuy nhiên, các mô hình tăng cường độ dốc bao gồm hàng trăm cây hồi quy, do đó chúng không thể được giải thích dễ dàng bằng cách kiểm tra trực quan từng cây riêng lẻ. May mắn thay, một số kỹ thuật đã được đề xuất để tóm tắt và giải thích các mô hình tăng cường độ dốc

Thông thường các tính năng không đóng góp như nhau để dự đoán phản hồi mục tiêu; . Khi giải thích một mô hình, câu hỏi đầu tiên thường là. những tính năng quan trọng đó là gì và chúng đóng góp như thế nào trong việc dự đoán phản hồi mục tiêu?

Cây quyết định riêng lẻ về bản chất thực hiện lựa chọn tính năng bằng cách chọn các điểm phân chia thích hợp. Thông tin này có thể được sử dụng để đo lường tầm quan trọng của từng tính năng; . một tính năng được sử dụng trong các điểm phân chia của cây càng thường xuyên thì tính năng đó càng quan trọng. Khái niệm về tầm quan trọng này có thể được mở rộng cho các nhóm cây quyết định bằng cách đơn giản lấy trung bình tầm quan trọng của tính năng dựa trên tạp chất của mỗi cây [xem Đánh giá tầm quan trọng của tính năng để biết thêm chi tiết].

Điểm quan trọng của tính năng của mô hình tăng độ dốc phù hợp có thể được truy cập thông qua thuộc tính

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

Lưu ý rằng tính toán về tầm quan trọng của tính năng này dựa trên entropy và nó khác với

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

266 dựa trên hoán vị của các tính năng

ví dụ

Hồi quy tăng cường độ dốc

Người giới thiệu

[ Friedman2001 ] [1,2,3,4]

Friedman, J. H. [2001]. Xấp xỉ hàm tham lam. Máy tăng cường độ dốc. Biên niên sử thống kê, 29, 1189-1232

[ Friedman2002 ]

Friedman, J. H. [2002]. Tăng cường độ dốc ngẫu nhiên. Thống kê tính toán & Phân tích dữ liệu, 38, 367-378

[ R2007 ]

G. Con đường mòn [2006]. Mô hình tăng cường tổng quát. Hướng dẫn về gói gbm

1. 11. 5. Tăng cường độ dốc dựa trên biểu đồ¶

Scikit-learning 0. Ngày 21 tháng 10 đã giới thiệu hai triển khai mới của cây tăng cường độ dốc, đó là

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

65 và

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

66, lấy cảm hứng từ LightGBM [Xem [LightGBM] ].

Các công cụ ước tính dựa trên biểu đồ này có thể có cấp độ nhanh hơn

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

67 và

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

68 khi số lượng mẫu lớn hơn hàng chục nghìn mẫu

Chúng cũng có hỗ trợ tích hợp cho các giá trị bị thiếu, giúp tránh sự cần thiết của một máy tính

Các công cụ ước tính nhanh này trước tiên sẽ chuyển các mẫu đầu vào

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

271 thành các ngăn có giá trị số nguyên [thường là 256 ngăn], giúp giảm đáng kể số điểm phân tách cần xem xét và cho phép thuật toán tận dụng các cấu trúc dữ liệu [biểu đồ] dựa trên số nguyên thay vì dựa vào liên tục được sắp xếp . API của các công cụ ước tính này hơi khác một chút và một số tính năng từ

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

67 và

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

68 chưa được hỗ trợ, chẳng hạn như một số hàm mất mát

ví dụ

Sự phụ thuộc một phần và các ô mong đợi có điều kiện của cá nhân

1. 11. 5. 1. Sử dụng¶

Hầu hết các thông số không thay đổi từ

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

67 và

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

68. Một ngoại lệ là tham số

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

276 thay thế cho

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

45 và kiểm soát số lần lặp lại của quy trình tăng tốc

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

Các tổn thất khả dụng đối với hồi quy là 'squared_error', 'absolute_error', ít nhạy cảm hơn với các giá trị ngoại lệ và 'poisson', rất phù hợp với số lượng và tần suất của mô hình. Để phân loại, ‘log_loss’ là tùy chọn duy nhất. Để phân loại nhị phân, nó sử dụng mất nhật ký nhị phân, còn được gọi là độ lệch nhị thức hoặc entropy chéo nhị phân. Đối với

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

278, nó sử dụng hàm mất nhật ký nhiều lớp, với độ lệch đa thức và entropy chéo phân loại làm tên thay thế. Phiên bản mất dữ liệu phù hợp được chọn dựa trên y được chuyển đến fit .

Kích thước của cây có thể được kiểm soát thông qua các tham số

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

59,

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

50 và

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

Số lượng thùng được sử dụng để chuyển dữ liệu được kiểm soát bằng tham số

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

282. Sử dụng ít thùng hơn hoạt động như một hình thức chính quy hóa. Thông thường nên sử dụng càng nhiều thùng càng tốt, đây là giá trị mặc định

Tham số

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

283 là một bộ điều chỉnh của hàm mất mát và tương ứng với \[\lambda\] trong phương trình [2] của [XGBoost].

Lưu ý rằng tính năng dừng sớm được bật theo mặc định nếu số lượng mẫu lớn hơn 10.000. Hành vi dừng sớm được kiểm soát thông qua các tham số

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

284,

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

285,

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

286,

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

287 và

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

288. Có thể dừng sớm việc sử dụng một trình ghi điểm tùy ý hoặc chỉ mất quá trình đào tạo hoặc xác thực. Lưu ý rằng vì lý do kỹ thuật, sử dụng cầu thủ ghi điểm chậm hơn đáng kể so với sử dụng thua. Theo mặc định, dừng sớm được thực hiện nếu có ít nhất 10.000 mẫu trong tập huấn luyện, sử dụng mất xác nhận.

1. 11. 5. 2. Hỗ trợ thiếu giá trị¶

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

65 và

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

66 có hỗ trợ tích hợp cho các giá trị bị thiếu [NaN]

During training, the tree grower learns at each split point whether samples with missing values should go to the left or right child, based on the potential gain. Khi dự đoán, các mẫu có giá trị bị thiếu sẽ được gán cho con bên trái hoặc bên phải

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

Khi mẫu thiếu là dự đoán, việc phân tách có thể được thực hiện dựa trên việc liệu giá trị tính năng có bị thiếu hay không

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

Nếu không gặp phải giá trị bị thiếu cho một tính năng nhất định trong quá trình đào tạo, thì các mẫu có giá trị bị thiếu sẽ được ánh xạ tới bất kỳ phần tử con nào có nhiều mẫu nhất

1. 11. 5. 3. Hỗ trợ trọng lượng mẫu¶

______565 và

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

66 trọng lượng hỗ trợ mẫu trong thời gian phù hợp .

Ví dụ về đồ chơi sau đây minh họa cách mô hình bỏ qua các mẫu có trọng lượng mẫu bằng không

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

Như bạn có thể thấy,

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

293 được phân loại thoải mái là

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

294 vì hai mẫu đầu tiên bị bỏ qua do trọng lượng mẫu của chúng

Chi tiết triển khai. lấy trọng lượng mẫu tính đến số lượng để nhân độ dốc [và hessian] với trọng lượng mẫu. Lưu ý rằng giai đoạn tạo thùng [cụ thể là tính toán lượng tử] không tính đến trọng số

1. 11. 5. 4. Hỗ trợ tính năng phân loại¶

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

65 và

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

66 có hỗ trợ riêng cho các tính năng phân loại. họ có thể xem xét phân chia trên dữ liệu phân loại, không theo thứ tự

Đối với các bộ dữ liệu có các tính năng phân loại, sử dụng hỗ trợ phân loại riêng thường tốt hơn là dựa vào mã hóa một lần nóng [

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

297], bởi vì mã hóa một lần yêu cầu độ sâu của cây nhiều hơn để đạt được các phân tách tương đương. Cũng thường tốt hơn nếu dựa vào hỗ trợ phân loại riêng thay vì coi các đặc điểm phân loại là liên tục [thứ tự], điều này xảy ra đối với dữ liệu phân loại được mã hóa theo thứ tự, vì các danh mục là số lượng danh nghĩa trong đó thứ tự không quan trọng

Để bật hỗ trợ phân loại, một mặt nạ boolean có thể được chuyển đến tham số

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

298, cho biết tính năng nào là phân loại. Sau đây, tính năng đầu tiên sẽ được coi là phân loại và tính năng thứ hai là số

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

Tương tự, người ta có thể chuyển một danh sách các số nguyên chỉ ra các chỉ số của các tính năng phân loại

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

Số lượng bản số của từng đối tượng địa lý phải nhỏ hơn tham số

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

282 và mỗi đối tượng địa lý dự kiến sẽ được mã hóa bằng

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

00. Để đạt được điều đó, có thể hữu ích nếu xử lý trước dữ liệu bằng một

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

01 như được thực hiện trong Hỗ trợ tính năng phân loại trong Tăng cường độ dốc .

Nếu có các giá trị bị thiếu trong quá trình đào tạo, các giá trị bị thiếu sẽ được coi là một danh mục phù hợp. Nếu không có giá trị nào bị thiếu trong quá trình đào tạo, thì tại thời điểm dự đoán, các giá trị bị thiếu sẽ được ánh xạ tới nút con có nhiều mẫu nhất [giống như đối với các tính năng liên tục]. Khi dự đoán, các danh mục không được nhìn thấy trong thời gian phù hợp sẽ được coi là giá trị bị thiếu

Tìm kiếm phân tách với các tính năng phân loại. Cách chính tắc để xem xét các phân chia phân loại trong một cây là xem xét tất cả các phân vùng \[2^{K - 1} - 1\] , trong đó . Điều này có thể nhanh chóng trở nên nghiêm trọng khi \[K\] is the number of categories. This can quickly become prohibitive when \[K\] lớn. May mắn thay, vì các cây tăng cường độ dốc luôn là cây hồi quy [ngay cả đối với các bài toán phân loại], tồn tại một chiến lược nhanh hơn có thể mang lại các phân tách tương đương. Đầu tiên, các danh mục của một tính năng được sắp xếp theo phương sai của mục tiêu, đối với mỗi danh mục

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

54. Khi các danh mục được sắp xếp, người ta có thể xem xét các phân vùng liên tục, tôi. e. xử lý các danh mục như thể chúng được sắp xếp theo thứ tự các giá trị liên tục [xem Fisher [Fisher1958] để biết bằng chứng chính thức]. Do đó, chỉ \[K - 1\] các phần tách cần được xem xét thay vì \[2^ . Việc sắp xếp ban đầu là một . The initial sorting is a \[\mathcal{O}[K \log[K]]\] , dẫn đến tổng độ phức tạp là \[\mathcal{O}[K \log[K] + K]\], instead of \[\mathcal{O}[2^K]\].

ví dụ

Hỗ trợ tính năng phân loại trong Tăng cường độ dốc

1. 11. 5. 5. Ràng buộc đơn điệu¶

Tùy thuộc vào vấn đề hiện tại, bạn có thể có kiến thức trước cho thấy rằng một tính năng nhất định nói chung sẽ có tác động tích cực [hoặc tiêu cực] đến giá trị mục tiêu. Ví dụ: tất cả những thứ khác đều bình đẳng, điểm tín dụng cao hơn sẽ làm tăng khả năng được chấp thuận cho vay. Các ràng buộc đơn điệu cho phép bạn kết hợp kiến thức trước đó vào mô hình

Dành cho công cụ dự đoán \[F\] có hai tính năng.

một ràng buộc tăng đơn điệu là một ràng buộc của hình thức
\[x_1 \leq x_1' \ngụ ý F[x_1, x_2] \leq F[x_1', x_2]\]
một ràng buộc giảm đơn điệu là một ràng buộc của hình thức
\[x_1 \leq x_1' \implies F[x_1, x_2] \geq F[x_1', x_2]\]

Bạn có thể chỉ định một ràng buộc đơn điệu trên mỗi tính năng bằng cách sử dụng tham số

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

03. Đối với mỗi tính năng, giá trị 0 biểu thị không có ràng buộc, trong khi 1 và -1 biểu thị ràng buộc tăng đơn điệu và giảm đơn điệu tương ứng

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

Trong bối cảnh phân loại nhị phân, áp đặt ràng buộc tăng [giảm] đơn điệu có nghĩa là các giá trị cao hơn của tính năng được cho là có tác động tích cực [tiêu cực] đối với xác suất mẫu thuộc về lớp tích cực

Tuy nhiên, các ràng buộc đơn điệu chỉ hạn chế nhẹ các hiệu ứng đặc trưng trên đầu ra. Chẳng hạn, không thể sử dụng các ràng buộc tăng và giảm đơn điệu để thực thi ràng buộc mô hình sau

\[x_1 \leq x_1' \ngụ ý F[x_1, x_2] \leq F[x_1', x_2']\]

Ngoài ra, các ràng buộc đơn điệu không được hỗ trợ cho phân loại đa lớp

Note

Vì các danh mục là số lượng không có thứ tự, nên không thể thực thi các ràng buộc đơn điệu đối với các tính năng phân loại

ví dụ

ràng buộc đơn điệu

1. 11. 5. 6. Ràng buộc tương tác¶

Tiên nghiệm, các cây tăng cường độ dốc biểu đồ được phép sử dụng bất kỳ tính năng nào để chia một nút thành các nút con. Điều này tạo ra cái gọi là tương tác giữa các tính năng, tôi. e. sử dụng các tính năng khác nhau khi phân chia dọc theo một nhánh. Đôi khi, một người muốn hạn chế các tương tác có thể xảy ra, hãy xem [Mayer2022] . Điều này có thể được thực hiện bằng tham số

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

04, trong đó người ta có thể chỉ định các chỉ số của các tính năng được phép tương tác. Chẳng hạn, với tổng cộng 3 tính năng,

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

05 cấm mọi tương tác. Các ràng buộc

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

06 chỉ định hai nhóm tính năng có thể tương tác. Các tính năng 0 và 1 có thể tương tác với nhau, cũng như các tính năng 1 và 2. Nhưng lưu ý rằng các tính năng 0 và 2 bị cấm tương tác. Hình dưới đây mô tả một cái cây và các nhánh có thể có của cây.

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

LightGBM sử dụng logic tương tự cho các nhóm chồng chéo

Lưu ý rằng các tính năng không được liệt kê trong

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

04 sẽ tự động được chỉ định một nhóm tương tác cho chính chúng. Với lại 3 tính năng, điều này có nghĩa là

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

08 tương đương với

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

Người giới thiệu

[ Mayer2022 ]

M. Mayer, S. C. Bourassa, M. Hoesli và D. F. Scognamiglio. 2022. Ứng dụng học máy để định giá đất đai và cấu trúc. Tạp chí Quản lý rủi ro và tài chính 15, số. 5. 193

1. 11. 5. 7. Song song cấp thấp¶

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

65 và

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

66 có triển khai sử dụng OpenMP để song song hóa thông qua Cython. Để biết thêm chi tiết về cách kiểm soát số lượng luồng, vui lòng tham khảo ghi chú Song song của chúng tôi.

Các phần sau được song song hóa

ánh xạ các mẫu từ các giá trị thực sang các thùng có giá trị số nguyên [tuy nhiên, việc tìm các ngưỡng của thùng là tuần tự]
biểu đồ tòa nhà được song song hóa trên các tính năng
tìm điểm phân chia tốt nhất tại một nút được song song hóa trên các tính năng
trong quá trình điều chỉnh, các mẫu ánh xạ vào các phần con bên trái và bên phải được song song hóa trên các mẫu
tính toán độ dốc và hessian được song song hóa trên các mẫu
dự đoán được song song hóa trên các mẫu

1. 11. 5. 8. Tại sao nó nhanh hơn¶

Nút thắt cổ chai của quy trình tăng cường độ dốc đang xây dựng cây quyết định. Xây dựng cây quyết định truyền thống [như trong các GBDT khác

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

67 và

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

68] yêu cầu sắp xếp các mẫu tại mỗi nút [cho mỗi tính năng]. Việc sắp xếp là cần thiết để có thể tính toán hiệu quả mức tăng tiềm năng của một điểm phân chia. Do đó, việc tách một nút có độ phức tạp là \[\mathcal{O}[n_\text{features} \times n \log[n]]\] where \[n\] is the number of samples at the node.

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

65 và

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

66 không yêu cầu sắp xếp các giá trị tính năng và thay vào đó sử dụng cấu trúc dữ liệu được gọi là biểu đồ, trong đó các mẫu được sắp xếp ngầm định. Xây dựng một biểu đồ có độ phức tạp \[\mathcal{O}[n]\] , vì vậy quy trình tách nút có \[\mathcal{O}[n_\text{features} \times n]\] complexity, much smaller than the previous one. In addition, instead of considering \[n\] , ở đây chúng tôi chỉ xem xét các điểm chia

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

282, nhỏ hơn nhiều.

Để xây dựng biểu đồ, dữ liệu đầu vào

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

271 cần được sắp xếp thành các thùng có giá trị nguyên. Quy trình tạo thùng này yêu cầu sắp xếp các giá trị tính năng, nhưng nó chỉ xảy ra một lần khi bắt đầu quá trình tăng cường [không phải tại mỗi nút, như trong

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

67 và

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

68]

Cuối cùng, nhiều phần của việc triển khai

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

65 và

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import AdaBoostClassifier

>>> X, y = load_iris[return_X_y=True]
>>> clf = AdaBoostClassifier[n_estimators=100]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.9...

66 được song song hóa

Người giới thiệu

[ XGBoost ]

Tianqi Chen, Carlos Guestrin, “XGBoost. Một hệ thống tăng cường cây có thể mở rộng”

[ LightGBM ] [1,2]

Kế hoạch. tất cả. “Ánh sángGBM. Cây quyết định tăng cường độ dốc hiệu quả cao”

[ Fisher1958 ]

ngư dân, W. D. [1958]. Tạp chí “On Grouping for Maximum Homogeneity” của Hiệp hội Thống kê Hoa Kỳ, 53, 789-798

1. 11. 6. Bộ phân loại biểu quyết¶

Ý tưởng đằng sau

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

22 là kết hợp các bộ phân loại học máy khác nhau về mặt khái niệm và sử dụng phiếu bầu theo đa số hoặc xác suất dự đoán trung bình [phiếu mềm] để dự đoán nhãn lớp. Một bộ phân loại như vậy có thể hữu ích cho một tập hợp các mô hình hoạt động tốt như nhau để cân bằng các điểm yếu riêng lẻ của chúng

1. 11. 6. 1. Nhãn Nhóm theo Đa số [Biểu quyết theo Đa số/Hạn chế]¶

Trong biểu quyết đa số, nhãn lớp được dự đoán cho một mẫu cụ thể là nhãn lớp đại diện cho đa số [chế độ] nhãn lớp được dự đoán bởi từng bộ phân loại riêng lẻ

E. g. , nếu dự đoán cho một mẫu nhất định là

phân loại 1 -> lớp 1
phân loại 2 -> loại 1
phân loại 3 -> loại 2

VotingClassifier [với

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

23] sẽ phân loại mẫu là “loại 1” dựa trên nhãn loại đa số

Trong trường hợp hòa,

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

22 sẽ chọn lớp dựa trên thứ tự sắp xếp tăng dần. e. g. , trong trường hợp sau

phân loại 1 -> loại 2
phân loại 2 -> loại 1

the class label 1 will be assigned to the sample

1. 11. 6. 2. Sử dụng¶

Ví dụ sau đây cho thấy cách điều chỉnh bộ phân loại theo quy tắc đa số

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

1. 11. 6. 3. Xác suất trung bình có trọng số [Bỏ phiếu mềm]¶

Trái ngược với biểu quyết đa số [biểu quyết cứng], biểu quyết mềm trả về nhãn lớp là argmax của tổng xác suất dự đoán

Các trọng số cụ thể có thể được gán cho từng bộ phân loại thông qua tham số

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

25. Khi trọng số được cung cấp, xác suất lớp dự đoán cho mỗi bộ phân loại được thu thập, nhân với trọng số của bộ phân loại và tính trung bình. Nhãn lớp cuối cùng sau đó được lấy từ nhãn lớp có xác suất trung bình cao nhất

Để minh họa điều này bằng một ví dụ đơn giản, giả sử chúng ta có 3 bộ phân loại và bài toán phân loại 3 lớp trong đó chúng ta gán các trọng số bằng nhau cho tất cả các bộ phân loại. w1=1, w2=1, w3=1

Xác suất trung bình có trọng số cho một mẫu sau đó sẽ được tính như sau

phân loại

lớp 1

lớp 2

lớp 3

phân loại 1

w1 * 0. 2

w1 * 0. 5

w1 * 0. 3

phân loại 2

w2 * 0. 6

w2 * 0. 3

w2 * 0. 1

phân loại 3

w3 * 0. 3

w3 * 0. 4

w3 * 0. 3

bình quân gia quyền

0. 37

0. 4

0. 23

Ở đây, nhãn lớp dự đoán là 2, vì nó có xác suất trung bình cao nhất

Ví dụ sau đây minh họa cách các vùng quyết định có thể thay đổi khi sử dụng

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

22 mềm dựa trên Máy vectơ hỗ trợ tuyến tính, Cây quyết định và bộ phân loại K-láng giềng gần nhất

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

1. 11. 6. 4. Sử dụng

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

22 với

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

28¶

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

22 cũng có thể được sử dụng cùng với

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

28 để điều chỉnh siêu tham số của các công cụ ước tính riêng lẻ

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

1. 11. 6. 5. Sử dụng¶

Để dự đoán nhãn lớp dựa trên xác suất lớp được dự đoán [công cụ ước tính scikit-learning trong VotingClassifier phải hỗ trợ phương thức

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

31]

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

Tùy chọn, trọng số có thể được cung cấp cho các phân loại riêng lẻ

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

1. 11. 7. Công cụ hồi quy biểu quyết¶

Ý tưởng đằng sau

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

32 là kết hợp các biến hồi quy học máy khác nhau về mặt khái niệm và trả về các giá trị dự đoán trung bình. Such a regressor can be useful for a set of equally well performing models in order to balance out their individual weaknesses

1. 11. 7. 1. Sử dụng¶

Ví dụ sau đây cho thấy cách điều chỉnh VotingRegressor

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

ví dụ

Vẽ biểu đồ dự đoán hồi quy cá nhân và biểu quyết

1. 11. 8. Tổng quát hóa xếp chồng¶

Tổng quát hóa xếp chồng là phương pháp kết hợp các công cụ ước tính để giảm sai số của chúng [W1992] [HTF] . Chính xác hơn, các dự đoán của từng công cụ ước tính riêng lẻ được xếp chồng lên nhau và được sử dụng làm đầu vào cho công cụ ước tính cuối cùng để tính toán dự đoán. Công cụ ước tính cuối cùng này được đào tạo thông qua xác thực chéo.

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

33 và

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

34 cung cấp các chiến lược như vậy có thể áp dụng cho các bài toán phân loại và hồi quy

Tham số

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

35 tương ứng với danh sách các ước lượng được xếp song song với nhau trên dữ liệu đầu vào. Nó nên được đưa ra dưới dạng danh sách tên và ước tính

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

36 sẽ sử dụng các dự đoán của

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

35 làm đầu vào. Nó cần phải là một bộ phân loại hoặc một bộ hồi quy khi sử dụng

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

33 hoặc

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

34, tương ứng

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

Để huấn luyện

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

35 và

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

36, phương thức

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

42 cần được gọi trên dữ liệu huấn luyện

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

Trong quá trình đào tạo,

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

35 được trang bị trên toàn bộ dữ liệu đào tạo

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

44. Chúng sẽ được sử dụng khi gọi

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

45 hoặc

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

31. Để khái quát hóa và tránh khớp quá mức,

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

36 được đào tạo về các mẫu bên ngoài bằng cách sử dụng

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

48 trong nội bộ

Đối với

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

33, lưu ý rằng đầu ra của

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

35 được điều khiển bởi tham số

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

51 và nó được gọi bởi mỗi bộ ước tính. Tham số này là một chuỗi, là tên phương thức ước tính hoặc

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

52 sẽ tự động xác định một phương thức khả dụng tùy thuộc vào tính khả dụng, được kiểm tra theo thứ tự ưu tiên.

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

31,

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

54 và

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

34 and

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

33 can be used as any other regressor or classifier, exposing a

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

45,

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

31, and

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

54 methods, e. g

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

Lưu ý rằng cũng có thể lấy đầu ra của

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

35 xếp chồng lên nhau bằng cách sử dụng phương pháp

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

Trong thực tế, một công cụ dự đoán xếp chồng dự đoán tốt như công cụ dự đoán tốt nhất của lớp cơ sở và thậm chí đôi khi vượt trội hơn nó bằng cách kết hợp các điểm mạnh khác nhau của các công cụ dự đoán này. However, training a stacking predictor is computationally expensive

Note

Đối với

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

33, khi sử dụng

>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.datasets import make_blobs
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.tree import DecisionTreeClassifier

>>> X, y = make_blobs[n_samples=10000, n_features=10, centers=100,
..     random_state=0]

>>> clf = DecisionTreeClassifier[max_depth=None, min_samples_split=2,
..     random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.98...

>>> clf = RandomForestClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[]
0.999...

>>> clf = ExtraTreesClassifier[n_estimators=10, max_depth=None,
..     min_samples_split=2, random_state=0]
>>> scores = cross_val_score[clf, X, y, cv=5]
>>> scores.mean[] > 0.999
True

64, cột đầu tiên bị loại bỏ khi sự cố là sự cố phân loại nhị phân. Thật vậy, cả hai cột xác suất được dự đoán bởi mỗi công cụ ước tính đều hoàn toàn thẳng hàng

1. 11. 1. Bagging meta-estimator¶

1. 11. 2. Forests of randomized trees¶

1. 11. 2. 1. Random Forests¶

1. 11. 2. 2. Extremely Randomized Trees¶

1. 11. 2. 3. Parameters¶

1. 11. 2. 4. Parallelization¶

1. 11. 2. 5. Feature importance evaluation¶

1. 11. 2. 6. Totally Random Trees Embedding¶

1. 11. 3. AdaBoost¶

1. 11. 3. 1. Usage¶

1. 11. 4. Tăng cường cây chuyển sắc¶

1. 11. 4. 1. Phân loại¶

1. 11. 4. 2. Hồi quy¶

1. 11. 4. 3. Bổ trợ thêm cho những học sinh yếu kém¶

1. 11. 4. 4. Kiểm soát kích thước cây¶

1. 11. 4. 5. Công thức toán học¶

1. 11. 4. 5. 1. Hồi quy¶

1. 11. 4. 5. 2. Phân loại¶

1. 11. 4. 6. Hàm mất mát¶

1. 11. 4. 7. Suy giảm thông qua tỷ lệ học tập¶

1. 11. 4. 8. Lấy mẫu phụ¶

1. 11. 4. 9. Giải thích với tầm quan trọng của tính năng¶

1. 11. 5. Tăng cường độ dốc dựa trên biểu đồ¶

1. 11. 5. 1. Sử dụng¶

1. 11. 5. 2. Hỗ trợ thiếu giá trị¶

1. 11. 5. 3. Hỗ trợ trọng lượng mẫu¶

1. 11. 5. 4. Hỗ trợ tính năng phân loại¶

1. 11. 5. 5. Ràng buộc đơn điệu¶

1. 11. 5. 6. Ràng buộc tương tác¶

1. 11. 5. 7. Song song cấp thấp¶

1. 11. 5. 8. Tại sao nó nhanh hơn¶

1. 11. 6. Bộ phân loại biểu quyết¶

1. 11. 6. 1. Nhãn Nhóm theo Đa số [Biểu quyết theo Đa số/Hạn chế]¶

1. 11. 6. 2. Sử dụng¶

1. 11. 6. 3. Xác suất trung bình có trọng số [Bỏ phiếu mềm]¶

1. 11. 6. 5. Sử dụng¶

1. 11. 7. Công cụ hồi quy biểu quyết¶

1. 11. 7. 1. Sử dụng¶

1. 11. 8. Tổng quát hóa xếp chồng¶

Bài Viết Liên Quan

Toplist mới

Bài mới nhất

Chủ Đề