AutoGluon Tabular - 深入介绍

Open In Colab Open In SageMaker Studio Lab

提示:如果您是 AutoGluon 的新手,请查看 预测表格中的列 - 快速入门 以学习 AutoGluon API 的基础知识。要了解如何将您自己的自定义模型添加到 AutoGluon 训练、调优和集成的模型集中,请查看 向 AutoGluon 添加自定义模型

本教程介绍了在使用 AutoGluon 的 fit()predict() 时如何更好地控制。回想一下,为了最大化预测性能,您应该首先尝试使用所有默认参数的 TabularPredictor()fit()。然后,考虑 TabularPredictor(eval_metric=...)fit(presets=...) 的非默认参数。之后,您可以尝试本深入教程中介绍的 fit() 的其他参数,例如 hyperparameter_tune_kwargshyperparametersnum_stack_levelsnum_bag_foldsnum_bag_sets 等。

使用与 预测表格中的列 - 快速入门 教程中相同的人口普查数据表,我们现在将预测个人的 occupation - 这是一个多类别分类问题。首先导入 AutoGluon 的 TabularPredictor 和 TabularDataset,并加载数据。

from autogluon.tabular import TabularDataset, TabularPredictor

import numpy as np

train_data = TabularDataset('https://autogluon.s3.amazonaws.com/datasets/Inc/train.csv')
subsample_size = 1000  # subsample subset of data for faster demo, try setting this to much larger values
train_data = train_data.sample(n=subsample_size, random_state=0)
print(train_data.head())

label = 'occupation'
print("Summary of occupation column: \n", train_data['occupation'].describe())

test_data = TabularDataset('https://autogluon.s3.amazonaws.com/datasets/Inc/test.csv')
y_test = test_data[label]
test_data_nolabel = test_data.drop(columns=[label])  # delete label column

metric = 'accuracy' # we specify eval-metric just for demo (unnecessary as it's the default)
       age workclass  fnlwgt      education  education-num  \
6118    51   Private   39264   Some-college             10   
23204   58   Private   51662           10th              6   
29590   40   Private  326310   Some-college             10   
18116   37   Private  222450        HS-grad              9   
33964   62   Private  109190      Bachelors             13   

            marital-status        occupation    relationship    race      sex  \
6118    Married-civ-spouse   Exec-managerial            Wife   White   Female   
23204   Married-civ-spouse     Other-service            Wife   White   Female   
29590   Married-civ-spouse      Craft-repair         Husband   White     Male   
18116        Never-married             Sales   Not-in-family   White     Male   
33964   Married-civ-spouse   Exec-managerial         Husband   White     Male   

       capital-gain  capital-loss  hours-per-week  native-country   class  
6118              0             0              40   United-States    >50K  
23204             0             0               8   United-States   <=50K  
29590             0             0              44   United-States   <=50K  
18116             0          2339              40     El-Salvador   <=50K  
33964         15024             0              40   United-States    >50K  
Summary of occupation column: 
 count              1000
unique               15
top        Craft-repair
freq                142
Name: occupation, dtype: object

指定超参数并调优

注意:在大多数情况下,我们不建议使用 AutoGluon 进行超参数调优。AutoGluon 在不进行超参数调优的情况下,仅通过指定 presets="best_quality" 即可获得最佳性能。

我们首先演示超参数调优以及如何提供自己的验证数据集,AutoGluon 在内部依赖该数据集来:调优超参数、提前停止迭代训练以及构建模型集成。您指定验证数据的一个原因可能是未来的测试数据与训练数据分布不同(并且您指定的验证数据更能代表未来可能遇到的数据)。

如果您没有强烈的理由提供自己的验证数据集,我们建议您省略 tuning_data 参数。这让 AutoGluon 可以自动从您提供的训练集中选择验证数据(它使用了诸如分层抽样等智能策略)。为了更好地控制,您可以指定 holdout_frac 参数来告诉 AutoGluon 从提供的训练数据中留出多少比例用于验证。

注意:由于 AutoGluon 根据此验证数据调优内部参数,因此在此数据上报告的性能估计可能过于乐观。为了获得无偏的性能估计,您应该始终在单独的数据集(从未传递给 fit())上调用 predict(),就像我们在之前的**快速入门**教程中所做的那样。我们还强调,本教程中指定的大多数选项都是为了演示目的而选择的,以最小化运行时间,您应该选择更合理的值以获得高质量的模型。

fit() 默认训练神经网络和各种类型的树集成模型。您可以为每种类型的模型指定不同的超参数值。对于每个超参数,您可以指定单个固定值,或指定超参数优化期间要考虑的值的搜索空间。您未指定的超参数将保留为 AutoGluon 自动选择的默认设置,这些设置可能是固定值或搜索空间。

请参阅搜索空间文档了解更多关于 AutoGluon 搜索空间的信息。

from autogluon.common import space

nn_options = {  # specifies non-default hyperparameter values for neural network models
    'num_epochs': 10,  # number of training epochs (controls training time of NN models)
    'learning_rate': space.Real(1e-4, 1e-2, default=5e-4, log=True),  # learning rate used in training (real-valued hyperparameter searched on log-scale)
    'activation': space.Categorical('relu', 'softrelu', 'tanh'),  # activation function used in NN (categorical hyperparameter, default = first entry)
    'dropout_prob': space.Real(0.0, 0.5, default=0.1),  # dropout probability (real-valued hyperparameter)
}

gbm_options = {  # specifies non-default hyperparameter values for lightGBM gradient boosted trees
    'num_boost_round': 100,  # number of boosting rounds (controls training time of GBM models)
    'num_leaves': space.Int(lower=26, upper=66, default=36),  # number of leaves in trees (integer hyperparameter)
}

hyperparameters = {  # hyperparameters of each model type
                   'GBM': gbm_options,
                   'NN_TORCH': nn_options,  # NOTE: comment this line out if you get errors on Mac OSX
                  }  # When these keys are missing from hyperparameters dict, no models of that type are trained

time_limit = 2*60  # train various models for ~2 min
num_trials = 5  # try at most 5 different hyperparameter configurations for each type of model
search_strategy = 'auto'  # to tune hyperparameters using random search routine with a local scheduler

hyperparameter_tune_kwargs = {  # HPO is not performed unless hyperparameter_tune_kwargs is specified
    'num_trials': num_trials,
    'scheduler' : 'local',
    'searcher': search_strategy,
}  # Refer to TabularPredictor.fit docstring for all valid values

predictor = TabularPredictor(label=label, eval_metric=metric).fit(
    train_data,
    time_limit=time_limit,
    hyperparameters=hyperparameters,
    hyperparameter_tune_kwargs=hyperparameter_tune_kwargs,
)
Fitted model: NeuralNetTorch/1c49f759 ...
	0.365	 = Validation score   (accuracy)
	3.32s	 = Training   runtime
	0.01s	 = Validation runtime
Fitted model: NeuralNetTorch/b3b55be6 ...
	0.32	 = Validation score   (accuracy)
	3.63s	 = Training   runtime
	0.01s	 = Validation runtime
Fitted model: NeuralNetTorch/dcd2520d ...
	0.335	 = Validation score   (accuracy)
	3.58s	 = Training   runtime
	0.01s	 = Validation runtime
Fitted model: NeuralNetTorch/5730cde0 ...
	0.355	 = Validation score   (accuracy)
	3.68s	 = Training   runtime
	0.01s	 = Validation runtime
Fitted model: NeuralNetTorch/c7da0e67 ...
	0.33	 = Validation score   (accuracy)
	3.51s	 = Training   runtime
	0.01s	 = Validation runtime
Fitting model: WeightedEnsemble_L2 ... Training model for up to 119.91s of the 94.84s of remaining time.
	Ensemble Weights: {'LightGBM/T3': 1.0}
	0.375	 = Validation score   (accuracy)
	0.03s	 = Training   runtime
	0.0s	 = Validation runtime
AutoGluon training complete, total runtime = 25.23s ... Best model: WeightedEnsemble_L2 | Estimated inference throughput: 45336.5 rows/s (200 batch size)
TabularPredictor saved. To load, use: predictor = TabularPredictor.load("/home/ci/autogluon/docs/tutorials/tabular/AutogluonModels/ag-20250508_205938")

我们再次演示如何使用训练好的模型在测试数据上进行预测。

y_pred = predictor.predict(test_data_nolabel)
print("Predictions:  ", list(y_pred)[:5])
perf = predictor.evaluate(test_data, auxiliary_metrics=False)
Predictions:   [' Other-service', ' Craft-repair', ' Exec-managerial', ' Sales', ' Other-service']

使用以下命令查看 fit() 期间发生的情况摘要。现在此命令将显示每种模型超参数调优过程的详细信息。

results = predictor.fit_summary()
*** Summary of fit() ***
Estimated performance of each model:
                      model  score_val eval_metric  pred_time_val  fit_time  pred_time_val_marginal  fit_time_marginal  stack_level  can_infer  fit_order
0               LightGBM/T3      0.375    accuracy       0.003542  0.357088                0.003542           0.357088            1       True          3
1       WeightedEnsemble_L2      0.375    accuracy       0.004411  0.389364                0.000869           0.032276            2       True         11
2               LightGBM/T5      0.375    accuracy       0.004484  0.519957                0.004484           0.519957            1       True          5
3               LightGBM/T1      0.370    accuracy       0.003504  0.721791                0.003504           0.721791            1       True          1
4   NeuralNetTorch/1c49f759      0.365    accuracy       0.010307  3.324137                0.010307           3.324137            1       True          6
5               LightGBM/T4      0.360    accuracy       0.005873  0.599082                0.005873           0.599082            1       True          4
6               LightGBM/T2      0.355    accuracy       0.004143  0.601759                0.004143           0.601759            1       True          2
7   NeuralNetTorch/5730cde0      0.355    accuracy       0.012197  3.678790                0.012197           3.678790            1       True          9
8   NeuralNetTorch/dcd2520d      0.335    accuracy       0.012314  3.579366                0.012314           3.579366            1       True          8
9   NeuralNetTorch/c7da0e67      0.330    accuracy       0.012733  3.507454                0.012733           3.507454            1       True         10
10  NeuralNetTorch/b3b55be6      0.320    accuracy       0.008916  3.626802                0.008916           3.626802            1       True          7
Number of models trained: 11
Types of models trained:
{'TabularNeuralNetTorchModel', 'WeightedEnsembleModel', 'LGBModel'}
Bagging used: False 
Multi-layer stack-ensembling used: False 
Feature Metadata (Processed):
(raw dtype, special dtypes):
('category', [])  : 6 | ['workclass', 'education', 'marital-status', 'relationship', 'race', ...]
('int', [])       : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...]
('int', ['bool']) : 2 | ['sex', 'class']
*** End of fit() summary ***
/home/ci/autogluon/core/src/autogluon/core/utils/plots.py:169: UserWarning: AutoGluon summary plots cannot be created because bokeh is not installed. To see plots, please do: "pip install bokeh==2.0.1"
  warnings.warn('AutoGluon summary plots cannot be created because bokeh is not installed. To see plots, please do: "pip install bokeh==2.0.1"')

在上面的示例中,预测性能可能很差,因为我们指定了非常少的训练时间以确保快速运行。您可以多次调用 fit(),同时修改上述设置,以便更好地了解这些选择如何影响性能结果。例如:您可以增加 subsample_size 以使用更大的数据集进行训练,增加 num_epochsnum_boost_round 超参数,并增加 time_limit(您应该在这些教程中的所有代码中都这样做)。要在 fit() 执行期间看到更详细的输出,您还可以传递参数:verbosity=3

使用堆叠/bagging 进行模型集成

除了使用正确指定的评估指标进行超参数调优之外,另外两种提升预测性能的方法是 bagging 和 stack-ensembling。如果在调用 fit() 时指定 num_bag_folds = 5-10, num_stack_levels = 1,您通常会看到性能提升,但这会增加训练时间和内存/磁盘使用量。

label = 'class'  # Now lets predict the "class" column (binary classification)
test_data_nolabel = test_data.drop(columns=[label])
y_test = test_data[label]
save_path = 'agModels-predictClass'  # folder where to store trained models

predictor = TabularPredictor(label=label, eval_metric=metric).fit(train_data,
    num_bag_folds=5, num_bag_sets=1, num_stack_levels=1,
    hyperparameters = {'NN_TORCH': {'num_epochs': 2}, 'GBM': {'num_boost_round': 20}},  # last  argument is just for quick demo here, omit it in real applications
)
No path specified. Models will be saved in: "AutogluonModels/ag-20250508_210003"
Verbosity: 2 (Standard Logging)
=================== System Info ===================
AutoGluon Version:  1.3.1b20250508
Python Version:     3.11.9
Operating System:   Linux
Platform Machine:   x86_64
Platform Version:   #1 SMP Wed Mar 12 14:53:59 UTC 2025
CPU Count:          8
Memory Avail:       28.02 GB / 30.95 GB (90.6%)
Disk Space Avail:   211.73 GB / 255.99 GB (82.7%)
===================================================
No presets specified! To achieve strong results with AutoGluon, it is recommended to use the available presets. Defaulting to `'medium'`...
	Recommended Presets (For more details refer to https://autogluon.cn/stable/tutorials/tabular/tabular-essentials.html#presets):
	presets='experimental' : New in v1.2: Pre-trained foundation model + parallel fits. The absolute best accuracy without consideration for inference speed. Does not support GPU.
	presets='best'         : Maximize accuracy. Recommended for most users. Use in competitions and benchmarks.
	presets='high'         : Strong accuracy with fast inference speed.
	presets='good'         : Good accuracy with very fast inference speed.
	presets='medium'       : Fast training time, ideal for initial prototyping.
Beginning AutoGluon training ...
AutoGluon will save models to "/home/ci/autogluon/docs/tutorials/tabular/AutogluonModels/ag-20250508_210003"
Train Data Rows:    1000
Train Data Columns: 14
Label Column:       class
AutoGluon infers your prediction problem is: 'binary' (because only two unique label-values observed).
	2 unique label values:  [' >50K', ' <=50K']
	If 'binary' is not the correct problem_type, please manually specify the problem_type parameter during Predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression', 'quantile'])
Problem Type:       binary
Preprocessing data ...
Selected class <--> label mapping:  class 1 =  >50K, class 0 =  <=50K
	Note: For your binary classification, AutoGluon arbitrarily selected which label-value represents positive ( >50K) vs negative ( <=50K) class.
	To explicitly set the positive_class, either rename classes to 1 and 0, or specify positive_class in Predictor init.
Using Feature Generators to preprocess the data ...
Fitting AutoMLPipelineFeatureGenerator...
	Available Memory:                    28696.56 MB
	Train Data (Original)  Memory Usage: 0.56 MB (0.0% of available memory)
	Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.
	Stage 1 Generators:
		Fitting AsTypeFeatureGenerator...
			Note: Converting 1 features to boolean dtype as they only contain 2 unique values.
	Stage 2 Generators:
		Fitting FillNaFeatureGenerator...
	Stage 3 Generators:
		Fitting IdentityFeatureGenerator...
		Fitting CategoryFeatureGenerator...
			Fitting CategoryMemoryMinimizeFeatureGenerator...
	Stage 4 Generators:
		Fitting DropUniqueFeatureGenerator...
	Stage 5 Generators:
		Fitting DropDuplicatesFeatureGenerator...
	Types of features in original data (raw dtype, special dtypes):
		('int', [])    : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...]
		('object', []) : 8 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...]
	Types of features in processed data (raw dtype, special dtypes):
		('category', [])  : 7 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...]
		('int', [])       : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...]
		('int', ['bool']) : 1 | ['sex']
	0.1s = Fit runtime
	14 features in original data used to generate 14 features in processed data.
	Train Data (Processed) Memory Usage: 0.06 MB (0.0% of available memory)
Data preprocessing and feature engineering runtime = 0.12s ...
AutoGluon will gauge predictive performance using evaluation metric: 'accuracy'
	To change this, specify the eval_metric parameter of Predictor()
User-specified model hyperparameters to be fit:
{
	'NN_TORCH': [{'num_epochs': 2}],
	'GBM': [{'num_boost_round': 20}],
}
AutoGluon will fit 2 stack levels (L1 to L2) ...
Fitting 2 L1 models, fit_strategy="sequential" ...
Fitting model: LightGBM_BAG_L1 ...
	Fitting 5 child models (S1F1 - S1F5) | Fitting with ParallelLocalFoldFittingStrategy (5 workers, per: cpus=1, gpus=0, memory=0.01%)
	0.823	 = Validation score   (accuracy)
	1.84s	 = Training   runtime
	0.02s	 = Validation runtime
Fitting model: NeuralNetTorch_BAG_L1 ...
	Fitting 5 child models (S1F1 - S1F5) | Fitting with ParallelLocalFoldFittingStrategy (5 workers, per: cpus=1, gpus=0, memory=0.00%)
	0.744	 = Validation score   (accuracy)
	8.11s	 = Training   runtime
	0.06s	 = Validation runtime
Fitting model: WeightedEnsemble_L2 ...
	Ensemble Weights: {'LightGBM_BAG_L1': 1.0}
	0.823	 = Validation score   (accuracy)
	0.03s	 = Training   runtime
	0.0s	 = Validation runtime
Fitting 2 L2 models, fit_strategy="sequential" ...
Fitting model: LightGBM_BAG_L2 ...
	Fitting 5 child models (S1F1 - S1F5) | Fitting with ParallelLocalFoldFittingStrategy (5 workers, per: cpus=1, gpus=0, memory=0.01%)
	0.826	 = Validation score   (accuracy)
	1.91s	 = Training   runtime
	0.02s	 = Validation runtime
Fitting model: NeuralNetTorch_BAG_L2 ...
	Fitting 5 child models (S1F1 - S1F5) | Fitting with ParallelLocalFoldFittingStrategy (5 workers, per: cpus=1, gpus=0, memory=0.00%)
	0.748	 = Validation score   (accuracy)
	8.58s	 = Training   runtime
	0.07s	 = Validation runtime
Fitting model: WeightedEnsemble_L3 ...
	Ensemble Weights: {'LightGBM_BAG_L2': 0.889, 'LightGBM_BAG_L1': 0.111}
	0.827	 = Validation score   (accuracy)
	0.05s	 = Training   runtime
	0.0s	 = Validation runtime
AutoGluon training complete, total runtime = 27.91s ... Best model: WeightedEnsemble_L3 | Estimated inference throughput: 2102.4 rows/s (200 batch size)
Disabling decision threshold calibration for metric `accuracy` due to having fewer than 10000 rows of validation data for calibration, to avoid overfitting (1000 rows).
	`accuracy` is generally not improved through threshold calibration. Force calibration via specifying `calibrate_decision_threshold=True`.
TabularPredictor saved. To load, use: predictor = TabularPredictor.load("/home/ci/autogluon/docs/tutorials/tabular/AutogluonModels/ag-20250508_210003")

在进行堆叠/bagging 时,不应提供 tuning_data,而应将所有可用数据作为 train_data 提供(AutoGluon 会以更智能的方式进行划分)。num_bag_sets 控制 k 折 bagging 过程重复多少次以进一步减少方差(增加此值可能会进一步提升准确性,但会显著增加训练时间、推理延迟和内存/磁盘使用量)。与其手动搜索好的 bagging/stacking 值,如果您指定 auto_stack(在 best_quality 预设中使用),AutoGluon 会自动为您选择好的值。

# Lets also specify the "balanced_accuracy" metric
predictor = TabularPredictor(label=label, eval_metric='balanced_accuracy', path=save_path).fit(
    train_data, auto_stack=True,
    calibrate_decision_threshold=False,  # Disabling for demonstration in next section
    hyperparameters={'FASTAI': {'num_epochs': 10}, 'GBM': {'num_boost_round': 200}}  # last 2 arguments are for quick demo, omit them in real applications
)
predictor.leaderboard(test_data)
Verbosity: 2 (Standard Logging)
=================== System Info ===================
AutoGluon Version:  1.3.1b20250508
Python Version:     3.11.9
Operating System:   Linux
Platform Machine:   x86_64
Platform Version:   #1 SMP Wed Mar 12 14:53:59 UTC 2025
CPU Count:          8
Memory Avail:       27.67 GB / 30.95 GB (89.4%)
Disk Space Avail:   211.72 GB / 255.99 GB (82.7%)
===================================================
No presets specified! To achieve strong results with AutoGluon, it is recommended to use the available presets. Defaulting to `'medium'`...
	Recommended Presets (For more details refer to https://autogluon.cn/stable/tutorials/tabular/tabular-essentials.html#presets):
	presets='experimental' : New in v1.2: Pre-trained foundation model + parallel fits. The absolute best accuracy without consideration for inference speed. Does not support GPU.
	presets='best'         : Maximize accuracy. Recommended for most users. Use in competitions and benchmarks.
	presets='high'         : Strong accuracy with fast inference speed.
	presets='good'         : Good accuracy with very fast inference speed.
	presets='medium'       : Fast training time, ideal for initial prototyping.
Stack configuration (auto_stack=True): num_stack_levels=0, num_bag_folds=8, num_bag_sets=1
Beginning AutoGluon training ...
AutoGluon will save models to "/home/ci/autogluon/docs/tutorials/tabular/agModels-predictClass"
Train Data Rows:    1000
Train Data Columns: 14
Label Column:       class
AutoGluon infers your prediction problem is: 'binary' (because only two unique label-values observed).
	2 unique label values:  [' >50K', ' <=50K']
	If 'binary' is not the correct problem_type, please manually specify the problem_type parameter during Predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression', 'quantile'])
Problem Type:       binary
Preprocessing data ...
Selected class <--> label mapping:  class 1 =  >50K, class 0 =  <=50K
	Note: For your binary classification, AutoGluon arbitrarily selected which label-value represents positive ( >50K) vs negative ( <=50K) class.
	To explicitly set the positive_class, either rename classes to 1 and 0, or specify positive_class in Predictor init.
Using Feature Generators to preprocess the data ...
Fitting AutoMLPipelineFeatureGenerator...
	Available Memory:                    28337.69 MB
	Train Data (Original)  Memory Usage: 0.56 MB (0.0% of available memory)
	Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.
	Stage 1 Generators:
		Fitting AsTypeFeatureGenerator...
			Note: Converting 1 features to boolean dtype as they only contain 2 unique values.
	Stage 2 Generators:
		Fitting FillNaFeatureGenerator...
	Stage 3 Generators:
		Fitting IdentityFeatureGenerator...
		Fitting CategoryFeatureGenerator...
			Fitting CategoryMemoryMinimizeFeatureGenerator...
	Stage 4 Generators:
		Fitting DropUniqueFeatureGenerator...
	Stage 5 Generators:
		Fitting DropDuplicatesFeatureGenerator...
	Types of features in original data (raw dtype, special dtypes):
		('int', [])    : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...]
		('object', []) : 8 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...]
	Types of features in processed data (raw dtype, special dtypes):
		('category', [])  : 7 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...]
		('int', [])       : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...]
		('int', ['bool']) : 1 | ['sex']
	0.1s = Fit runtime
	14 features in original data used to generate 14 features in processed data.
	Train Data (Processed) Memory Usage: 0.06 MB (0.0% of available memory)
Data preprocessing and feature engineering runtime = 0.09s ...
AutoGluon will gauge predictive performance using evaluation metric: 'balanced_accuracy'
	To change this, specify the eval_metric parameter of Predictor()
User-specified model hyperparameters to be fit:
{
	'FASTAI': [{'num_epochs': 10}],
	'GBM': [{'num_boost_round': 200}],
}
Fitting 2 L1 models, fit_strategy="sequential" ...
Fitting model: LightGBM_BAG_L1 ...
	Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy (8 workers, per: cpus=1, gpus=0, memory=0.01%)
	0.7764	 = Validation score   (balanced_accuracy)
	3.93s	 = Training   runtime
	0.04s	 = Validation runtime
Fitting model: NeuralNetFastAI_BAG_L1 ...
	Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy (8 workers, per: cpus=1, gpus=0, memory=0.00%)
	0.7414	 = Validation score   (balanced_accuracy)
	9.61s	 = Training   runtime
	0.07s	 = Validation runtime
Fitting model: WeightedEnsemble_L2 ...
	Ensemble Weights: {'LightGBM_BAG_L1': 1.0}
	0.7764	 = Validation score   (balanced_accuracy)
	0.03s	 = Training   runtime
	0.0s	 = Validation runtime
AutoGluon training complete, total runtime = 18.48s ... Best model: WeightedEnsemble_L2 | Estimated inference throughput: 3264.4 rows/s (125 batch size)
TabularPredictor saved. To load, use: predictor = TabularPredictor.load("/home/ci/autogluon/docs/tutorials/tabular/agModels-predictClass")
模型 score_test score_val eval_metric pred_time_test pred_time_val fit_time pred_time_test_marginal pred_time_val_marginal fit_time_marginal stack_level can_infer fit_order
0 LightGBM_BAG_L1 0.743784 0.776399 balanced_accuracy 0.209645 0.038194 3.931167 0.209645 0.038194 3.931167 1 True 1
1 WeightedEnsemble_L2 0.743784 0.776399 balanced_accuracy 0.211094 0.038975 3.958286 0.001449 0.000782 0.027119 2 True 3
2 NeuralNetFastAI_BAG_L1 0.724629 0.741368 balanced_accuracy 1.709173 0.073276 9.611355 1.709173 0.073276 9.611355 1 True 2

通常,堆叠/bagging 会产生比超参数调优更好的准确性,但您可以尝试结合这两种技术(注意:在 fit() 中指定 presets='best_quality' 仅设置 auto_stack=True)。

决策阈值校准

在二元分类中,通过 calibrate_decision_threshold 将预测决策阈值调整到 0.5 以外的值,可以显著提高 "f1""balanced_accuracy" 等指标的分数。

以下是校准和不校准决策阈值时,在测试数据上获得的 "balanced_accuracy" 分数的示例。

print(f'Prior to calibration (predictor.decision_threshold={predictor.decision_threshold}):')
scores = predictor.evaluate(test_data)

calibrated_decision_threshold = predictor.calibrate_decision_threshold()
predictor.set_decision_threshold(calibrated_decision_threshold)

print(f'After calibration (predictor.decision_threshold={predictor.decision_threshold}):')
scores_calibrated = predictor.evaluate(test_data)
Prior to calibration (predictor.decision_threshold=0.5):
After calibration (predictor.decision_threshold=0.25):
Calibrating decision threshold to optimize metric balanced_accuracy | Checking 51 thresholds...
Calibrating decision threshold via fine-grained search | Checking 38 thresholds...
	Base Threshold: 0.500	| val: 0.7764
	Best Threshold: 0.250	| val: 0.7926
Updating predictor.decision_threshold from 0.5 -> 0.25
	This will impact how prediction probabilities are converted to predictions in binary classification.
	Prediction probabilities of the positive class >0.25 will be predicted as the positive class ( >50K). This can significantly impact metric scores.
	You can update this value via `predictor.set_decision_threshold`.
	You can calculate an optimal decision threshold on the validation data via `predictor.calibrate_decision_threshold()`.
for metric_name in scores:
    metric_score = scores[metric_name]
    metric_score_calibrated = scores_calibrated[metric_name]
    decision_threshold = predictor.decision_threshold
    print(f'decision_threshold={decision_threshold:.3f}\t| metric="{metric_name}"'
          f'\n\ttest_score uncalibrated: {metric_score:.4f}'
          f'\n\ttest_score   calibrated: {metric_score_calibrated:.4f}'
          f'\n\ttest_score        delta: {metric_score_calibrated-metric_score:.4f}')
decision_threshold=0.250	| metric="balanced_accuracy"
	test_score uncalibrated: 0.7438
	test_score   calibrated: 0.8120
	test_score        delta: 0.0682
decision_threshold=0.250	| metric="accuracy"
	test_score uncalibrated: 0.8472
	test_score   calibrated: 0.8162
	test_score        delta: -0.0310
decision_threshold=0.250	| metric="mcc"
	test_score uncalibrated: 0.5457
	test_score   calibrated: 0.5654
	test_score        delta: 0.0197
decision_threshold=0.250	| metric="roc_auc"
	test_score uncalibrated: 0.8990
	test_score   calibrated: 0.8990
	test_score        delta: 0.0000
decision_threshold=0.250	| metric="f1"
	test_score uncalibrated: 0.6294
	test_score   calibrated: 0.6749
	test_score        delta: 0.0454
decision_threshold=0.250	| metric="precision"
	test_score uncalibrated: 0.7411
	test_score   calibrated: 0.5814
	test_score        delta: -0.1597
decision_threshold=0.250	| metric="recall"
	test_score uncalibrated: 0.5470
	test_score   calibrated: 0.8041
	test_score        delta: 0.2571

请注意,对“balanced_accuracy”进行校准显著提高了“balanced_accuracy”指标分数,但损害了“accuracy”分数。阈值校准通常会导致不同指标性能之间的权衡,用户应牢记这一点。

我们可以针对任何指标进行校准,而不仅仅是针对“balanced_accuracy”,前提是我们希望最大化该指标的分数。

predictor.set_decision_threshold(0.5)  # Reset decision threshold
for metric_name in ['f1', 'balanced_accuracy', 'mcc']:
    metric_score = predictor.evaluate(test_data, silent=True)[metric_name]
    calibrated_decision_threshold = predictor.calibrate_decision_threshold(metric=metric_name, verbose=False)
    metric_score_calibrated = predictor.evaluate(
        test_data, decision_threshold=calibrated_decision_threshold, silent=True
    )[metric_name]
    print(f'decision_threshold={calibrated_decision_threshold:.3f}\t| metric="{metric_name}"'
          f'\n\ttest_score uncalibrated: {metric_score:.4f}'
          f'\n\ttest_score   calibrated: {metric_score_calibrated:.4f}'
          f'\n\ttest_score        delta: {metric_score_calibrated-metric_score:.4f}')
decision_threshold=0.500	| metric="f1"
	test_score uncalibrated: 0.6294
	test_score   calibrated: 0.6294
	test_score        delta: 0.0000
decision_threshold=0.250	| metric="balanced_accuracy"
	test_score uncalibrated: 0.7438
	test_score   calibrated: 0.8120
	test_score        delta: 0.0682
decision_threshold=0.500	| metric="mcc"
	test_score uncalibrated: 0.5457
	test_score   calibrated: 0.5457
	test_score        delta: 0.0000
Updating predictor.decision_threshold from 0.25 -> 0.5
	This will impact how prediction probabilities are converted to predictions in binary classification.
	Prediction probabilities of the positive class >0.5 will be predicted as the positive class ( >50K). This can significantly impact metric scores.
	You can update this value via `predictor.set_decision_threshold`.
	You can calculate an optimal decision threshold on the validation data via `predictor.calibrate_decision_threshold()`.

您可以在 fit 调用期间通过指定 fit 参数 predictor.fit(..., calibrate_decision_threshold=True) 来自动进行决策阈值校准,而不是在 fit 后进行。

幸运的是,AutoGluon 在有利的情况下会自动应用决策阈值校准,因为默认值是 calibrate_decision_threshold="auto"。在大多数情况下,我们建议将此值保留为默认值。

其他用法示例如下

# Will use the decision_threshold specified in `predictor.decision_threshold`, can be set via `predictor.set_decision_threshold`
# y_pred = predictor.predict(test_data)
# y_pred_08 = predictor.predict(test_data, decision_threshold=0.8)  # Specify a specific threshold to use only for this predict

# y_pred_proba = predictor.predict_proba(test_data)
# y_pred = predictor.predict_from_proba(y_pred_proba)  # Identical output to calling .predict(test_data)
# y_pred_08 = predictor.predict_from_proba(y_pred_proba, decision_threshold=0.8)  # Identical output to calling .predict(test_data, decision_threshold=0.8)

预测选项(推理)

即使您自上次调用 fit() 以来启动了一个新的 Python 会话,您仍然可以从磁盘加载之前训练好的预测器。

predictor = TabularPredictor.load(save_path)  # `predictor.path` is another way to get the relative path needed to later load predictor.

上面的 save_path 是之前传递给 TabularPredictor 的同一个文件夹,所有训练好的模型都保存在其中。您可以轻松地在一台机器上训练模型,然后在另一台机器上部署它们。只需将 save_path 文件夹复制到新机器上,并在 TabularPredictor.load() 中指定其新路径即可。

要找出进行预测所需的特征列,请调用 predictor.features()

predictor.features()
['age',
 'workclass',
 'fnlwgt',
 'education',
 'education-num',
 'marital-status',
 'occupation',
 'relationship',
 'race',
 'sex',
 'capital-gain',
 'capital-loss',
 'hours-per-week',
 'native-country']

我们可以对单个示例而不是完整数据集进行预测。

datapoint = test_data_nolabel.iloc[[0]]  # Note: .iloc[0] won't work because it returns pandas Series instead of DataFrame
print(datapoint)
predictor.predict(datapoint)
   age workclass  fnlwgt education  education-num       marital-status  \
0   31   Private  169085      11th              7   Married-civ-spouse   

  occupation relationship    race      sex  capital-gain  capital-loss  \
0      Sales         Wife   White   Female             0             0   

   hours-per-week  native-country  
0              20   United-States
0     <=50K
Name: class, dtype: object

要输出预测的类别概率而不是预测的类别,可以使用

predictor.predict_proba(datapoint)  # returns a DataFrame that shows which probability corresponds to which class
<=50K >50K
0 0.951059 0.048941

默认情况下,predict()predict_proba() 将使用 AutoGluon 认为最准确的模型,这通常是许多单个模型的集成。以下是如何查看是哪个模型。

predictor.model_best
'WeightedEnsemble_L2'

我们可以改为指定一个特定的模型用于预测(例如,为了减少推理延迟)。请注意,AutoGluon 中的“模型”可能指的是单个神经网络,或者是在不同训练/验证划分上训练的许多神经网络副本的 bagged 集成,或者是汇集了许多其他模型预测结果的加权集成,或者是对其他模型输出的预测进行操作的 stacker 模型。这类似于将随机森林视为一个“模型”,而实际上它是许多决策树的集成。

在决定使用哪个模型之前,让我们评估 AutoGluon 之前在我们测试数据上训练的所有模型。

predictor.leaderboard(test_data)
模型 score_test score_val eval_metric pred_time_test pred_time_val fit_time pred_time_test_marginal pred_time_val_marginal fit_time_marginal stack_level can_infer fit_order
0 LightGBM_BAG_L1 0.743784 0.776399 balanced_accuracy 0.148440 0.038194 3.931167 0.148440 0.038194 3.931167 1 True 1
1 WeightedEnsemble_L2 0.743784 0.776399 balanced_accuracy 0.149763 0.038975 3.958286 0.001322 0.000782 0.027119 2 True 3
2 NeuralNetFastAI_BAG_L1 0.724629 0.741368 balanced_accuracy 1.188020 0.073276 9.611355 1.188020 0.073276 9.611355 1 True 2

排行榜显示了每个模型在测试数据 (score_test) 和验证数据 (score_val) 上的预测性能,以及生成测试数据预测 (pred_time_val)、验证数据预测 (pred_time_val) 以及仅训练此模型 (fit_time) 所需的时间。下面我们展示了如何在没有新数据(仅使用先前在 fit 中保留用于验证的数据)的情况下生成排行榜,并可以显示每个模型的额外信息。

predictor.leaderboard(extra_info=True)
模型 score_val eval_metric pred_time_val fit_time pred_time_val_marginal fit_time_marginal stack_level can_infer fit_order ... 超参数 hyperparameters_fit ag_args_fit 特征 编译时间 子超参数 child_hyperparameters_fit child_ag_args_fit 祖先模型 后代模型
0 LightGBM_BAG_L1 0.776399 balanced_accuracy 0.038194 3.931167 0.038194 3.931167 1 True 1 ... {'use_orig_features': True, 'valid_stacker': True, 'max_base_models': 0, 'max_base_models_per_type': 'auto', 'save_bag_folds': True, 'stratify': 'auto', 'bin': 'auto', 'n_bins': None} {} {'max_memory_usage_ratio': 1.0, 'max_time_limit_ratio': 1.0, 'max_time_limit': None, 'min_time_limit': 0, 'valid_raw_types': None, 'valid_special_types': None, 'ignored_type_group_special': None, 'ignored_type_group_raw': None, 'get_features_kwargs': None, 'get_features_kwargs_extra': None, 'predict_1_batch_size': None, 'temperature_scalar': None, 'drop_unique': False} [race, marital-status, capital-gain, hours-per-week, workclass, occupation, relationship, education-num, fnlwgt, capital-loss, sex, native-country, education, age] None {'learning_rate': 0.05, 'num_boost_round': 200} {'num_boost_round': 83} {'max_memory_usage_ratio': 1.0, 'max_time_limit_ratio': 1.0, 'max_time_limit': None, 'min_time_limit': 0, 'valid_raw_types': ['bool', 'int', 'float', 'category'], 'valid_special_types': None, 'ignored_type_group_special': ['text_ngram', 'text_as_category'], 'ignored_type_group_raw': None, 'get_features_kwargs': None, 'get_features_kwargs_extra': None, 'predict_1_batch_size': None, 'temperature_scalar': None} [] [WeightedEnsemble_L2]
1 WeightedEnsemble_L2 0.776399 balanced_accuracy 0.038975 3.958286 0.000782 0.027119 2 True 3 ... {'use_orig_features': False, 'valid_stacker': True, 'max_base_models': 0, 'max_base_models_per_type': 'auto', 'save_bag_folds': True, 'stratify': 'auto', 'bin': 'auto', 'n_bins': None} {} {'max_memory_usage_ratio': 1.0, 'max_time_limit_ratio': 1.0, 'max_time_limit': None, 'min_time_limit': 0, 'valid_raw_types': None, 'valid_special_types': None, 'ignored_type_group_special': None, 'ignored_type_group_raw': None, 'get_features_kwargs': None, 'get_features_kwargs_extra': None, 'predict_1_batch_size': None, 'temperature_scalar': None, 'drop_unique': False} [LightGBM_BAG_L1] None {'ensemble_size': 25, 'subsample_size': 1000000} {'ensemble_size': 1} {'max_memory_usage_ratio': 1.0, 'max_time_limit_ratio': 1.0, 'max_time_limit': None, 'min_time_limit': 0, 'valid_raw_types': None, 'valid_special_types': None, 'ignored_type_group_special': None, 'ignored_type_group_raw': None, 'get_features_kwargs': None, 'get_features_kwargs_extra': None, 'predict_1_batch_size': None, 'temperature_scalar': None, 'drop_unique': False} [LightGBM_BAG_L1] []
2 NeuralNetFastAI_BAG_L1 0.741368 balanced_accuracy 0.073276 9.611355 0.073276 9.611355 1 True 2 ... {'use_orig_features': True, 'valid_stacker': True, 'max_base_models': 0, 'max_base_models_per_type': 'auto', 'save_bag_folds': True, 'stratify': 'auto', 'bin': 'auto', 'n_bins': None} {} {'max_memory_usage_ratio': 1.0, 'max_time_limit_ratio': 1.0, 'max_time_limit': None, 'min_time_limit': 0, 'valid_raw_types': None, 'valid_special_types': None, 'ignored_type_group_special': None, 'ignored_type_group_raw': None, 'get_features_kwargs': None, 'get_features_kwargs_extra': None, 'predict_1_batch_size': None, 'temperature_scalar': None, 'drop_unique': False} [race, marital-status, capital-gain, hours-per-week, workclass, occupation, relationship, education-num, fnlwgt, capital-loss, sex, native-country, education, age] None {'layers': None, 'emb_drop': 0.1, 'ps': 0.1, 'bs': 'auto', 'lr': 0.01, 'epochs': 'auto', 'early.stopping.min_delta': 0.0001, 'early.stopping.patience': 20, 'smoothing': 0.0, 'num_epochs': 10} {'epochs': 30, 'best_epoch': 9} {'max_memory_usage_ratio': 1.0, 'max_time_limit_ratio': 1.0, 'max_time_limit': None, 'min_time_limit': 0, 'valid_raw_types': ['bool', 'int', 'float', 'category'], 'valid_special_types': None, 'ignored_type_group_special': ['text_ngram', 'text_as_category'], 'ignored_type_group_raw': None, 'get_features_kwargs': None, 'get_features_kwargs_extra': None, 'predict_1_batch_size': None, 'temperature_scalar': None} [] []

3 行 × 32 列

展开的排行榜显示了诸如每个模型使用的特征数量 (num_features)、哪些其他模型是祖先模型,其预测是每个模型所需的输入 (ancestors),以及如果同时持久化每个模型及其所有祖先模型将占用的内存量 (memory_size_w_ancestors) 等属性。有关详细信息,请参阅排行榜文档

要显示其他指标的分数,可以在传入 test_data 时指定 extra_metrics 参数。

predictor.leaderboard(test_data, extra_metrics=['accuracy', 'balanced_accuracy', 'log_loss'])
模型 score_test 准确率 balanced_accuracy log_loss score_val eval_metric pred_time_test pred_time_val fit_time pred_time_test_marginal pred_time_val_marginal fit_time_marginal stack_level can_infer fit_order
0 LightGBM_BAG_L1 0.743784 0.847170 0.743784 -0.334022 0.776399 balanced_accuracy 0.140065 0.038194 3.931167 0.140065 0.038194 3.931167 1 True 1
1 WeightedEnsemble_L2 0.743784 0.847170 0.743784 -0.334022 0.776399 balanced_accuracy 0.141402 0.038975 3.958286 0.001337 0.000782 0.027119 2 True 3
2 NeuralNetFastAI_BAG_L1 0.724629 0.843792 0.724629 -0.343404 0.741368 balanced_accuracy 1.179801 0.073276 9.611355 1.179801 0.073276 9.611355 1 True 2

请注意,log_loss 分数是负的。这是因为 AutoGluon 中的评估指标始终以“越高越好”的形式显示。这意味着诸如 log_lossroot_mean_squared_error 等指标的符号将被翻转,数值将为负。这是必要的,以避免用户在查看排行榜时需要了解指标本身才能判断数值是否越高越好。

另外一个注意事项:通过 extra_metrics 计算 log_loss 值时,结果可能是 -inf。这是因为模型在训练过程中并没有以 log_loss 为优化目标,并且可能会出现某个类别的预测概率为 0 的情况(这在 K 近邻模型中尤其常见)。由于当正确类别的概率为 0 时,log_loss 会给出无限误差,因此会导致分数为 -inf。因此,不建议将 log_loss 用作衡量模型质量的次要指标。要么将 log_loss 用作 eval_metric,要么完全避免使用它。

以下是如何指定一个特定的模型用于预测,而不是使用 AutoGluon 的默认模型选择。

i = 0  # index of model to use
model_to_use = predictor.model_names()[i]
model_pred = predictor.predict(datapoint, model=model_to_use)
print("Prediction from %s model: %s" % (model_to_use, model_pred.iloc[0]))
Prediction from LightGBM_BAG_L1 model:  <=50K

我们可以轻松访问有关训练好的预测器或特定模型的各种信息。

all_models = predictor.model_names()
model_to_use = all_models[i]
specific_model = predictor._trainer.load_model(model_to_use)

# Objects defined below are dicts of various information (not printed here as they are quite large):
model_info = specific_model.get_info()
predictor_information = predictor.info()

预测器还记住应该使用什么指标来评估预测结果,可以使用真实标签如下进行评估:

y_pred_proba = predictor.predict_proba(test_data_nolabel)
perf = predictor.evaluate_predictions(y_true=y_test, y_pred=y_pred_proba)

由于标签列保留在 test_data DataFrame 中,我们可以改用简写形式:

perf = predictor.evaluate(test_data)

可解释性(特征重要性)

为了更好地理解我们训练好的预测器,我们可以估计每个特征的总体重要性。

predictor.feature_importance(test_data)
Computing feature importance via permutation shuffling for 14 features using 5000 rows with 5 shuffle sets...
	8.34s	= Expected runtime (1.67s per shuffle set)
	5.38s	= Actual runtime (Completed 5 of 5 shuffle sets)
重要性 标准差 p 值 n p99 高 p99 低
婚姻状况 0.068704 0.004542 2.279366e-06 5 0.078057 0.059352
资本收益 0.046431 0.002457 9.369035e-07 5 0.051489 0.041372
教育程度编号 0.042721 0.003485 5.268617e-06 5 0.049898 0.035545
年龄 0.035115 0.005922 9.348413e-05 5 0.047308 0.022922
职业 0.033699 0.007890 3.356604e-04 5 0.049945 0.017454
关系 0.014965 0.003663 3.983866e-04 5 0.022507 0.007423
每周工时 0.012270 0.003750 9.287608e-04 5 0.019992 0.004548
资本损失 0.002217 0.001260 8.531892e-03 5 0.004812 -0.000378
教育 0.000319 0.000774 2.045314e-01 5 0.001912 -0.001274
原籍国 0.000000 0.000000 5.000000e-01 5 0.000000 0.000000
种族 -0.000256 0.000268 9.500382e-01 5 0.000296 -0.000807
性别 -0.000527 0.001303 7.914146e-01 5 0.002156 -0.003210
工作类型 -0.001594 0.002576 8.807408e-01 5 0.003709 -0.006898
fnlwgt -0.004992 0.001524 9.990763e-01 5 -0.001855 -0.008130

通过排列洗牌计算,这些特征重要性分数量化了当某一列的值在行之间随机洗牌时,预测性能(已训练好的预测器)的下降程度。此列表中排名靠前的特征对 AutoGluon 的准确性贡献最大(用于预测患者何时/是否会再次入院)。重要性分数非正的特征对预测器的准确性几乎没有贡献,甚至可能对数据包含有害(考虑从数据中删除这些特征并再次调用 fit)。这些分数有助于解释预测器的全局行为(它依赖哪些特征进行所有预测)。要获得关于哪些特征影响特定预测的局部解释,请查看使用 Shapely 值解释特定 AutoGluon 预测的示例笔记本

在判断 AutoGluon 是否比其他解决方案更具可解释性之前,我们建议阅读 Zachary Lipton 的《模型可解释性的神话》,该文论述了为什么像树和线性模型这样常被声称具有可解释性的模型,很少比更高级的模型具有更强的实际可解释性。

加速推理

我们介绍了多种方法来减少 AutoGluon 生成预测所需的时间。

在提供代码示例之前,了解 AutoGluon 中有几种加速推理的方法非常重要。下表按优先级列出了这些选项。

优化

推理加速

成本

备注

refit_full

至少 8 倍以上,最高可达 160 倍(需要 bagging)

-质量,+拟合时间

仅在启用 bagging 时提供加速。

persist

在线推理中最高可达 10 倍

++内存使用

如果内存不足以持久化模型,则无法获得加速。此加速最适用于在线推理,不适用于批量推理。

infer_limit

可配置,约 50 倍

-质量(相对于加速)

如果启用 bagging,在使用 infer_limit 时始终使用 refit_full

distill

约等于 refit_fullinfer_limit 设置为极端值时的组合加速效果

--质量,++拟合时间

refit_fullinfer_limit 不兼容。

特征剪枝

通常最多 1.5 倍。如果愿意大幅降低质量,可以更高。

-质量?,++拟合时间

取决于数据中是否存在不重要的特征。调用 predictor.feature_importance(test_data) 来衡量哪些特征可以移除。

使用更快的硬件

通常最多 3 倍。取决于硬件(忽略 GPU)。

+硬件

例如,EC2 c6i.2xlarge 的速度比 m5.2xlarge 快约 1.6 倍,价格相似。特别是笔记本电脑,相比云实例可能较慢。

手动调整超参数

假设已指定 infer_limit,通常最多 2 倍。

---质量?,+++用户机器学习专业知识

可能非常复杂,不建议使用。通过这种方式获得加速的潜在方法是减少 LightGBM、XGBoost、CatBoost、RandomForest 和 ExtraTrees 中树的数量。

手动数据预处理

假设已指定所有其他优化并且设置为在线推理,通常最多 1.2 倍。

++++用户机器学习专业知识,++++用户代码

仅与在线推理相关。不建议这样做,因为 AutoGluon 的默认预处理已经高度优化。

如果启用了 bagging(num_bag_folds > 0 或 num_stack_levels > 0 或使用 'best_quality' 预设),推理优化的顺序应为:

  1. refit_full

  2. persist

  3. infer_limit

如果未启用 bagging(num_bag_folds = 0, num_stack_levels = 0),推理优化的顺序应为:

  1. persist

  2. infer_limit

如果遵循这些建议仍未获得足够快的模型,您可以考虑表格中更高级的选项。

将模型保留在内存中

默认情况下,AutoGluon 会一次只加载一个模型到内存中,并且仅在需要进行预测时才加载。这种策略对于大型堆叠/bagged 集成模型来说很稳健,但会导致预测时间变慢。如果您计划重复进行预测(例如,一次处理一个新的数据点而不是一个大型测试数据集),您可以首先指定将推理所需的所有模型加载到内存中,如下所示:

predictor.persist()

num_test = 20
preds = np.array(['']*num_test, dtype='object')
for i in range(num_test):
    datapoint = test_data_nolabel.iloc[[i]]
    pred_numpy = predictor.predict(datapoint, as_pandas=False)
    preds[i] = pred_numpy[0]

perf = predictor.evaluate_predictions(y_test[:num_test], preds, auxiliary_metrics=True)
print("Predictions: ", preds)

predictor.unpersist()  # free memory by clearing models, future predict() calls will load models from disk
Predictions:  [' <=50K' ' <=50K' ' >50K' ' <=50K' ' <=50K' ' >50K' ' >50K' ' >50K'
 ' <=50K' ' <=50K' ' <=50K' ' <=50K' ' <=50K' ' <=50K' ' <=50K' ' <=50K'
 ' <=50K' ' >50K' ' >50K' ' <=50K']
Persisting 2 models in memory. Models will require 0.01% of memory.
Unpersisted 2 models: ['WeightedEnsemble_L2', 'LightGBM_BAG_L1']
['WeightedEnsemble_L2', 'LightGBM_BAG_L1']

您也可以通过 persist()models 参数指定要持久化的特定模型,或者简单地设置 models='all' 来同时加载 fit 期间训练的所有模型。

将推理速度作为 fit 约束

如果您在拟合预测器之前知道您的延迟约束,可以将其明确指定为 fit 参数。AutoGluon 将自动以试图满足该约束的方式训练模型。

此约束有两个组成部分:infer_limitinfer_limit_batch_size

  • infer_limit 是预测 1 行数据所需的时间(以秒为单位)。例如,infer_limit=0.05 表示每行数据 50 毫秒,即吞吐量为 20 行/秒。

  • infer_limit_batch_size 是计算每行速度时一次传入进行预测的行数。这非常重要,因为 infer_limit_batch_size=1(在线推理)是非常次优的,因为各种操作都有一个固定的开销成本,与数据大小无关。如果您可以批量传递测试数据,应指定 infer_limit_batch_size=10000

# At most 0.05 ms per row (20000 rows per second throughput)
infer_limit = 0.00005
# adhere to infer_limit with batches of size 10000 (batch-inference, easier to satisfy infer_limit)
infer_limit_batch_size = 10000
# adhere to infer_limit with batches of size 1 (online-inference, much harder to satisfy infer_limit)
# infer_limit_batch_size = 1  # Note that infer_limit<0.02 when infer_limit_batch_size=1 can be difficult to satisfy.
predictor_infer_limit = TabularPredictor(label=label, eval_metric=metric).fit(
    train_data=train_data,
    time_limit=30,
    infer_limit=infer_limit,
    infer_limit_batch_size=infer_limit_batch_size,
)

# NOTE: If bagging was enabled, it is important to call refit_full at this stage.
#  infer_limit assumes that the user will call refit_full after fit.
# predictor_infer_limit.refit_full()

# NOTE: To align with inference speed calculated during fit, models must be persisted.
predictor_infer_limit.persist()
# Below is an optimized version that only persists the minimum required models for prediction.
# predictor_infer_limit.persist('best')

predictor_infer_limit.leaderboard()
No path specified. Models will be saved in: "AutogluonModels/ag-20250508_210115"
Verbosity: 2 (Standard Logging)
=================== System Info ===================
AutoGluon Version:  1.3.1b20250508
Python Version:     3.11.9
Operating System:   Linux
Platform Machine:   x86_64
Platform Version:   #1 SMP Wed Mar 12 14:53:59 UTC 2025
CPU Count:          8
Memory Avail:       28.02 GB / 30.95 GB (90.6%)
Disk Space Avail:   211.72 GB / 255.99 GB (82.7%)
===================================================
No presets specified! To achieve strong results with AutoGluon, it is recommended to use the available presets. Defaulting to `'medium'`...
	Recommended Presets (For more details refer to https://autogluon.cn/stable/tutorials/tabular/tabular-essentials.html#presets):
	presets='experimental' : New in v1.2: Pre-trained foundation model + parallel fits. The absolute best accuracy without consideration for inference speed. Does not support GPU.
	presets='best'         : Maximize accuracy. Recommended for most users. Use in competitions and benchmarks.
	presets='high'         : Strong accuracy with fast inference speed.
	presets='good'         : Good accuracy with very fast inference speed.
	presets='medium'       : Fast training time, ideal for initial prototyping.
Beginning AutoGluon training ... Time limit = 30s
AutoGluon will save models to "/home/ci/autogluon/docs/tutorials/tabular/AutogluonModels/ag-20250508_210115"
Train Data Rows:    1000
Train Data Columns: 14
Label Column:       class
AutoGluon infers your prediction problem is: 'binary' (because only two unique label-values observed).
	2 unique label values:  [' >50K', ' <=50K']
	If 'binary' is not the correct problem_type, please manually specify the problem_type parameter during Predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression', 'quantile'])
Problem Type:       binary
Preprocessing data ...
Selected class <--> label mapping:  class 1 =  >50K, class 0 =  <=50K
	Note: For your binary classification, AutoGluon arbitrarily selected which label-value represents positive ( >50K) vs negative ( <=50K) class.
	To explicitly set the positive_class, either rename classes to 1 and 0, or specify positive_class in Predictor init.
Using Feature Generators to preprocess the data ...
Fitting AutoMLPipelineFeatureGenerator...
	Available Memory:                    28696.39 MB
	Train Data (Original)  Memory Usage: 0.56 MB (0.0% of available memory)
	Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.
	Stage 1 Generators:
		Fitting AsTypeFeatureGenerator...
			Note: Converting 1 features to boolean dtype as they only contain 2 unique values.
	Stage 2 Generators:
		Fitting FillNaFeatureGenerator...
	Stage 3 Generators:
		Fitting IdentityFeatureGenerator...
		Fitting CategoryFeatureGenerator...
			Fitting CategoryMemoryMinimizeFeatureGenerator...
	Stage 4 Generators:
		Fitting DropUniqueFeatureGenerator...
	Stage 5 Generators:
		Fitting DropDuplicatesFeatureGenerator...
	Types of features in original data (raw dtype, special dtypes):
		('int', [])    : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...]
		('object', []) : 8 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...]
	Types of features in processed data (raw dtype, special dtypes):
		('category', [])  : 7 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...]
		('int', [])       : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...]
		('int', ['bool']) : 1 | ['sex']
	0.1s = Fit runtime
	14 features in original data used to generate 14 features in processed data.
	Train Data (Processed) Memory Usage: 0.06 MB (0.0% of available memory)
	1.503μs	= Feature Preprocessing Time (1 row | 10000 batch size)
		Feature Preprocessing requires 3.01% of the overall inference constraint (0.05ms)
		0.048ms inference time budget remaining for models...
Data preprocessing and feature engineering runtime = 0.29s ...
AutoGluon will gauge predictive performance using evaluation metric: 'accuracy'
	To change this, specify the eval_metric parameter of Predictor()
Automatically generating train/validation split with holdout_frac=0.2, Train Rows: 800, Val Rows: 200
User-specified model hyperparameters to be fit:
{
	'NN_TORCH': [{}],
	'GBM': [{'extra_trees': True, 'ag_args': {'name_suffix': 'XT'}}, {}, {'learning_rate': 0.03, 'num_leaves': 128, 'feature_fraction': 0.9, 'min_data_in_leaf': 3, 'ag_args': {'name_suffix': 'Large', 'priority': 0, 'hyperparameter_tune_kwargs': None}}],
	'CAT': [{}],
	'XGB': [{}],
	'FASTAI': [{}],
	'RF': [{'criterion': 'gini', 'ag_args': {'name_suffix': 'Gini', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'entropy', 'ag_args': {'name_suffix': 'Entr', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'squared_error', 'ag_args': {'name_suffix': 'MSE', 'problem_types': ['regression', 'quantile']}}],
	'XT': [{'criterion': 'gini', 'ag_args': {'name_suffix': 'Gini', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'entropy', 'ag_args': {'name_suffix': 'Entr', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'squared_error', 'ag_args': {'name_suffix': 'MSE', 'problem_types': ['regression', 'quantile']}}],
	'KNN': [{'weights': 'uniform', 'ag_args': {'name_suffix': 'Unif'}}, {'weights': 'distance', 'ag_args': {'name_suffix': 'Dist'}}],
}
Fitting 13 L1 models, fit_strategy="sequential" ...
Fitting model: KNeighborsUnif ... Training model for up to 29.71s of the 29.71s of remaining time.
	0.725	 = Validation score   (accuracy)
	0.04s	 = Training   runtime
	0.01s	 = Validation runtime
	3.713μs	 = Validation runtime (1 row | 10000 batch size | MARGINAL)
	3.713μs	 = Validation runtime (1 row | 10000 batch size)
	3.713μs	 = Validation runtime (1 row | 10000 batch size | REFIT | MARGINAL)
	3.713μs	 = Validation runtime (1 row | 10000 batch size | REFIT)
Fitting model: KNeighborsDist ... Training model for up to 29.65s of the 29.65s of remaining time.
	0.71	 = Validation score   (accuracy)
	0.04s	 = Training   runtime
	0.01s	 = Validation runtime
	3.201μs	 = Validation runtime (1 row | 10000 batch size | MARGINAL)
	3.201μs	 = Validation runtime (1 row | 10000 batch size)
	3.201μs	 = Validation runtime (1 row | 10000 batch size | REFIT | MARGINAL)
	3.201μs	 = Validation runtime (1 row | 10000 batch size | REFIT)
Fitting model: LightGBMXT ... Training model for up to 29.59s of the 29.59s of remaining time.
	0.85	 = Validation score   (accuracy)
	0.39s	 = Training   runtime
	0.0s	 = Validation runtime
	1.246μs	 = Validation runtime (1 row | 10000 batch size | MARGINAL)
	1.246μs	 = Validation runtime (1 row | 10000 batch size)
	1.246μs	 = Validation runtime (1 row | 10000 batch size | REFIT | MARGINAL)
	1.246μs	 = Validation runtime (1 row | 10000 batch size | REFIT)
Fitting model: LightGBM ... Training model for up to 29.19s of the 29.19s of remaining time.
	0.84	 = Validation score   (accuracy)
	0.48s	 = Training   runtime
	0.0s	 = Validation runtime
	1.231μs	 = Validation runtime (1 row | 10000 batch size | MARGINAL)
	1.231μs	 = Validation runtime (1 row | 10000 batch size)
	1.231μs	 = Validation runtime (1 row | 10000 batch size | REFIT | MARGINAL)
	1.231μs	 = Validation runtime (1 row | 10000 batch size | REFIT)
Fitting model: RandomForestGini ... Training model for up to 28.69s of the 28.69s of remaining time.
	0.84	 = Validation score   (accuracy)
	0.77s	 = Training   runtime
	0.06s	 = Validation runtime
	8.754μs	 = Validation runtime (1 row | 10000 batch size | MARGINAL)
	8.754μs	 = Validation runtime (1 row | 10000 batch size)
	8.754μs	 = Validation runtime (1 row | 10000 batch size | REFIT | MARGINAL)
	8.754μs	 = Validation runtime (1 row | 10000 batch size | REFIT)
Fitting model: RandomForestEntr ... Training model for up to 27.85s of the 27.84s of remaining time.
	0.835	 = Validation score   (accuracy)
	0.68s	 = Training   runtime
	0.06s	 = Validation runtime
	8.729μs	 = Validation runtime (1 row | 10000 batch size | MARGINAL)
	8.729μs	 = Validation runtime (1 row | 10000 batch size)
	8.729μs	 = Validation runtime (1 row | 10000 batch size | REFIT | MARGINAL)
	8.729μs	 = Validation runtime (1 row | 10000 batch size | REFIT)
Fitting model: CatBoost ... Training model for up to 27.09s of the 27.09s of remaining time.
	0.86	 = Validation score   (accuracy)
	1.95s	 = Training   runtime
	0.0s	 = Validation runtime
	0.815μs	 = Validation runtime (1 row | 10000 batch size | MARGINAL)
	0.815μs	 = Validation runtime (1 row | 10000 batch size)
	0.815μs	 = Validation runtime (1 row | 10000 batch size | REFIT | MARGINAL)
	0.815μs	 = Validation runtime (1 row | 10000 batch size | REFIT)
Fitting model: ExtraTreesGini ... Training model for up to 25.13s of the 25.13s of remaining time.
	0.815	 = Validation score   (accuracy)
	0.68s	 = Training   runtime
	0.06s	 = Validation runtime
	8.729μs	 = Validation runtime (1 row | 10000 batch size | MARGINAL)
	8.729μs	 = Validation runtime (1 row | 10000 batch size)
	8.729μs	 = Validation runtime (1 row | 10000 batch size | REFIT | MARGINAL)
	8.729μs	 = Validation runtime (1 row | 10000 batch size | REFIT)
Fitting model: ExtraTreesEntr ... Training model for up to 24.38s of the 24.38s of remaining time.
	0.82	 = Validation score   (accuracy)
	0.67s	 = Training   runtime
	0.06s	 = Validation runtime
	8.713μs	 = Validation runtime (1 row | 10000 batch size | MARGINAL)
	8.713μs	 = Validation runtime (1 row | 10000 batch size)
	8.713μs	 = Validation runtime (1 row | 10000 batch size | REFIT | MARGINAL)
	8.713μs	 = Validation runtime (1 row | 10000 batch size | REFIT)
Fitting model: NeuralNetFastAI ... Training model for up to 23.64s of the 23.63s of remaining time.
No improvement since epoch 7: early stopping
	0.84	 = Validation score   (accuracy)
	0.99s	 = Training   runtime
	0.01s	 = Validation runtime
	0.012ms	 = Validation runtime (1 row | 10000 batch size | MARGINAL)
	0.012ms	 = Validation runtime (1 row | 10000 batch size)
	0.012ms	 = Validation runtime (1 row | 10000 batch size | REFIT | MARGINAL)
	0.012ms	 = Validation runtime (1 row | 10000 batch size | REFIT)
Fitting model: XGBoost ... Training model for up to 22.62s of the 22.62s of remaining time.
	0.855	 = Validation score   (accuracy)
	0.4s	 = Training   runtime
	0.01s	 = Validation runtime
	2.198μs	 = Validation runtime (1 row | 10000 batch size | MARGINAL)
	2.198μs	 = Validation runtime (1 row | 10000 batch size)
	2.198μs	 = Validation runtime (1 row | 10000 batch size | REFIT | MARGINAL)
	2.198μs	 = Validation runtime (1 row | 10000 batch size | REFIT)
Fitting model: NeuralNetTorch ... Training model for up to 22.21s of the 22.21s of remaining time.
	0.855	 = Validation score   (accuracy)
	3.51s	 = Training   runtime
	0.01s	 = Validation runtime
	4.458μs	 = Validation runtime (1 row | 10000 batch size | MARGINAL)
	4.458μs	 = Validation runtime (1 row | 10000 batch size)
	4.458μs	 = Validation runtime (1 row | 10000 batch size | REFIT | MARGINAL)
	4.458μs	 = Validation runtime (1 row | 10000 batch size | REFIT)
Fitting model: LightGBMLarge ... Training model for up to 18.68s of the 18.68s of remaining time.
	0.795	 = Validation score   (accuracy)
	0.83s	 = Training   runtime
	0.0s	 = Validation runtime
	3.908μs	 = Validation runtime (1 row | 10000 batch size | MARGINAL)
	3.908μs	 = Validation runtime (1 row | 10000 batch size)
	3.908μs	 = Validation runtime (1 row | 10000 batch size | REFIT | MARGINAL)
	3.908μs	 = Validation runtime (1 row | 10000 batch size | REFIT)
Removing 5/13 base models to satisfy inference constraint (constraint=0.046ms) ...
	0.068ms	-> 0.065ms	(KNeighborsDist)
	0.065ms	-> 0.061ms	(KNeighborsUnif)
	0.061ms	-> 0.057ms	(LightGBMLarge)
	0.057ms	-> 0.049ms	(ExtraTreesGini)
	0.049ms	-> 0.04ms	(ExtraTreesEntr)
Fitting model: WeightedEnsemble_L2 ... Training model for up to 29.71s of the 17.79s of remaining time.
	Ensemble Weights: {'RandomForestGini': 0.333, 'CatBoost': 0.333, 'XGBoost': 0.333}
	0.875	 = Validation score   (accuracy)
	0.1s	 = Training   runtime
	0.0s	 = Validation runtime
	0.131μs	 = Validation runtime (1 row | 10000 batch size | MARGINAL)
	0.012ms	 = Validation runtime (1 row | 10000 batch size)
	0.131μs	 = Validation runtime (1 row | 10000 batch size | REFIT | MARGINAL)
	0.012ms	 = Validation runtime (1 row | 10000 batch size | REFIT)
AutoGluon training complete, total runtime = 12.34s ... Best model: WeightedEnsemble_L2 | Estimated inference throughput: 2909.9 rows/s (200 batch size)
Disabling decision threshold calibration for metric `accuracy` due to having fewer than 10000 rows of validation data for calibration, to avoid overfitting (200 rows).
	`accuracy` is generally not improved through threshold calibration. Force calibration via specifying `calibrate_decision_threshold=True`.
TabularPredictor saved. To load, use: predictor = TabularPredictor.load("/home/ci/autogluon/docs/tutorials/tabular/AutogluonModels/ag-20250508_210115")
Persisting 4 models in memory. Models will require 0.02% of memory.
模型 score_val eval_metric pred_time_val fit_time pred_time_val_marginal fit_time_marginal stack_level can_infer fit_order
0 WeightedEnsemble_L2 0.875 准确率 0.068731 3.214255 0.000926 0.103268 2 True 14
1 CatBoost 0.860 准确率 0.003737 1.947394 0.003737 1.947394 1 True 7
2 XGBoost 0.855 准确率 0.005009 0.395715 0.005009 0.395715 1 True 11
3 NeuralNetTorch 0.855 准确率 0.009604 3.513811 0.009604 3.513811 1 True 12
4 LightGBMXT 0.850 准确率 0.003705 0.390207 0.003705 0.390207 1 True 3
5 LightGBM 0.840 准确率 0.003618 0.483905 0.003618 0.483905 1 True 4
6 NeuralNetFastAI 0.840 准确率 0.008258 0.992681 0.008258 0.992681 1 True 10
7 RandomForestGini 0.840 准确率 0.059059 0.767877 0.059059 0.767877 1 True 5
8 RandomForestEntr 0.835 准确率 0.060193 0.675278 0.060193 0.675278 1 True 6
9 ExtraTreesEntr 0.820 准确率 0.058165 0.667979 0.058165 0.667979 1 True 9
10 ExtraTreesGini 0.815 准确率 0.056565 0.676969 0.056565 0.676969 1 True 8
11 LightGBMLarge 0.795 准确率 0.004443 0.831450 0.004443 0.831450 1 True 13
12 KNeighborsUnif 0.725 准确率 0.013350 0.043004 0.013350 0.043004 1 True 1
13 KNeighborsDist 0.710 准确率 0.013417 0.036869 0.013417 0.036869 1 True 2

现在我们可以测试最终模型的推理速度,并检查它是否满足推理约束。

test_data_batch = test_data.sample(infer_limit_batch_size, replace=True, ignore_index=True)

import time
time_start = time.time()
predictor_infer_limit.predict(test_data_batch)
time_end = time.time()

infer_time_per_row = (time_end - time_start) / len(test_data_batch)
rows_per_second = 1 / infer_time_per_row
infer_time_per_row_ratio = infer_time_per_row / infer_limit
is_constraint_satisfied = infer_time_per_row_ratio <= 1

print(f'Model is able to predict {round(rows_per_second, 1)} rows per second. (User-specified Throughput = {1 / infer_limit})')
print(f'Model uses {round(infer_time_per_row_ratio * 100, 1)}% of infer_limit time per row.')
print(f'Model satisfies inference constraint: {is_constraint_satisfied}')
Model is able to predict 73034.0 rows per second. (User-specified Throughput = 20000.0)
Model uses 27.4% of infer_limit time per row.
Model satisfies inference constraint: True

使用更小的集成模型或更快的模型进行预测

无需重新训练任何模型,就可以构建替代的集成模型,这些模型使用不同的加权方案聚合各个模型的预测结果。如果这些集成模型对较少的模型分配非零权重,它们会变得更小(因此预测速度更快)。您可以像这样生成各种权衡准确性与速度的集成模型。

additional_ensembles = predictor.fit_weighted_ensemble(expand_pareto_frontier=True)
print("Alternative ensembles you can use for prediction:", additional_ensembles)

predictor.leaderboard(only_pareto_frontier=True)
Alternative ensembles you can use for prediction: ['WeightedEnsemble_L2Best']
Fitting model: WeightedEnsemble_L2Best ...
	Ensemble Weights: {'LightGBM_BAG_L1': 1.0}
	0.7764	 = Validation score   (balanced_accuracy)
	0.03s	 = Training   runtime
	0.0s	 = Validation runtime
模型 score_val eval_metric pred_time_val fit_time pred_time_val_marginal fit_time_marginal stack_level can_infer fit_order
0 LightGBM_BAG_L1 0.776399 balanced_accuracy 0.038194 3.931167 0.038194 3.931167 1 True 1

生成的排行榜将包含在给定推理延迟下最准确的模型。您可以从排行榜中选择任何延迟可接受的模型进行预测。

model_for_prediction = additional_ensembles[0]
predictions = predictor.predict(test_data, model=model_for_prediction)
predictor.delete_models(models_to_delete=additional_ensembles, dry_run=False)  # delete these extra models so they don't affect rest of tutorial
Deleting model WeightedEnsemble_L2Best. All files under /home/ci/autogluon/docs/tutorials/tabular/agModels-predictClass/models/WeightedEnsemble_L2Best will be removed.

通过 refit_full 折叠 bagged 集成模型

对于使用 bagging 训练的集成预测器(如上所述),回想一下,每个单独的模型在不同的训练/验证折叠上训练了大约 10 个 bagged 副本。我们可以将这大约 10 个 bagged 模型折叠成一个拟合到完整数据集的单个模型,这可以大大减少其内存/延迟要求(但也可能降低准确性)。下面我们为每个原始模型重新拟合了这样一个模型,但您也可以通过指定 refit_full()model 参数来仅针对特定模型进行此操作。

refit_model_map = predictor.refit_full()
print("Name of each refit-full model corresponding to a previous bagged ensemble:")
print(refit_model_map)
predictor.leaderboard(test_data)
Name of each refit-full model corresponding to a previous bagged ensemble:
{'LightGBM_BAG_L1': 'LightGBM_BAG_L1_FULL', 'NeuralNetFastAI_BAG_L1': 'NeuralNetFastAI_BAG_L1_FULL', 'WeightedEnsemble_L2': 'WeightedEnsemble_L2_FULL'}
Refitting models via `predictor.refit_full` using all of the data (combined train and validation)...
	Models trained in this way will have the suffix "_FULL" and have NaN validation score.
	This process is not bound by time_limit, but should take less time than the original `predictor.fit` call.
	To learn more, refer to the `.refit_full` method docstring which explains how "_FULL" models differ from normal models.
Fitting 1 L1 models, fit_strategy="sequential" ...
Fitting model: LightGBM_BAG_L1_FULL ...
	0.33s	 = Training   runtime
Fitting 1 L1 models, fit_strategy="sequential" ...
Fitting model: NeuralNetFastAI_BAG_L1_FULL ...
Metric balanced_accuracy is not supported by this model - using log_loss instead
	Stopping at the best epoch learned earlier - 9.
	0.44s	 = Training   runtime
Fitting model: WeightedEnsemble_L2_FULL | Skipping fit via cloning parent ...
	Ensemble Weights: {'LightGBM_BAG_L1': 1.0}
	0.03s	 = Training   runtime
Updated best model to "WeightedEnsemble_L2_FULL" (Previously "WeightedEnsemble_L2"). AutoGluon will default to using "WeightedEnsemble_L2_FULL" for predict() and predict_proba().
Refit complete, total runtime = 0.85s ... Best model: "WeightedEnsemble_L2_FULL"
模型 score_test score_val eval_metric pred_time_test pred_time_val fit_time pred_time_test_marginal pred_time_val_marginal fit_time_marginal stack_level can_infer fit_order
0 LightGBM_BAG_L1_FULL 0.750092 NaN balanced_accuracy 0.025266 NaN 0.326699 0.025266 NaN 0.326699 1 True 4
1 WeightedEnsemble_L2_FULL 0.750092 NaN balanced_accuracy 0.026516 NaN 0.353818 0.001251 NaN 0.027119 2 True 6
2 LightGBM_BAG_L1 0.743784 0.776399 balanced_accuracy 0.137620 0.038194 3.931167 0.137620 0.038194 3.931167 1 True 1
3 WeightedEnsemble_L2 0.743784 0.776399 balanced_accuracy 0.138921 0.038975 3.958286 0.001301 0.000782 0.027119 2 True 3
4 NeuralNetFastAI_BAG_L1 0.724629 0.741368 balanced_accuracy 1.188221 0.073276 9.611355 1.188221 0.073276 9.611355 1 True 2
5 NeuralNetFastAI_BAG_L1_FULL 0.700878 NaN balanced_accuracy 0.282469 NaN 0.437760 0.282469 NaN 0.437760 1 True 5

这将 refit-full 模型添加到排行榜中,我们可以选择使用其中任何一个进行预测,就像使用任何其他模型一样。注意 pred_time_testpred_time_val 列出了使用每个模型在测试/验证数据上生成预测所需的时间(以秒为单位)。由于 refit-full 模型是使用所有数据进行训练的,因此它们没有可用的内部验证分数 (score_val)。您也可以使用非 bagged 模型调用 refit_full(),将相同的模型重新拟合到您的完整数据集(在这种情况下不会有内存/延迟增益,但测试准确性可能会提高)。

模型蒸馏

虽然计算开销较低,但单个模型通常比加权/堆叠/bagged 集成模型的准确性低。模型蒸馏提供了一种方法,可以在保留单个模型的计算优势的同时,享受集成模型带来的准确性提升。其思想是训练单个模型(我们可以称之为学生)来模仿完整的堆栈集成模型(教师)的预测结果。与 refit_full() 一样,distill() 函数将生成可供我们选择用于预测的额外模型。

student_models = predictor.distill(time_limit=30)  # specify much longer time limit in real applications
print(student_models)
preds_student = predictor.predict(test_data_nolabel, model=student_models[0])
print(f"predictions from {student_models[0]}:", list(preds_student)[:5])
predictor.leaderboard(test_data)
['RandomForestMSE_DSTL', 'WeightedEnsemble_L2_DSTL']
predictions from RandomForestMSE_DSTL: [' <=50K', ' <=50K', ' >50K', ' <=50K', ' <=50K']
Distilling with teacher='WeightedEnsemble_L2_FULL', teacher_preds=soft, augment_method=spunge ...
SPUNGE: Augmenting training data with 4000 synthetic samples for distillation...
Distilling with each of these student models: ['LightGBM_DSTL', 'CatBoost_DSTL', 'RandomForestMSE_DSTL', 'NeuralNetTorch_DSTL']
Fitting 4 L1 models, fit_strategy="sequential" ...
Fitting model: LightGBM_DSTL ... Training model for up to 30.00s of the 30.00s of remaining time.
	Warning: Exception caused LightGBM_DSTL to fail during training... Skipping this model.
		pandas dtypes must be int, float or bool.
Fields with bad pandas dtypes: workclass: object, education: object, marital-status: object, occupation: object, relationship: object, race: object, native-country: object
Detailed Traceback:
Traceback (most recent call last):
  File "/home/ci/autogluon/tabular/src/autogluon/tabular/trainer/abstract_trainer.py", line 2169, in _train_and_save
    model = self._train_single(**model_fit_kwargs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ci/autogluon/tabular/src/autogluon/tabular/trainer/abstract_trainer.py", line 2055, in _train_single
    model = model.fit(X=X, y=y, X_val=X_val, y_val=y_val, X_test=X_test, y_test=y_test, total_resources=total_resources, **model_fit_kwargs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ci/autogluon/core/src/autogluon/core/models/abstract/abstract_model.py", line 1051, in fit
    out = self._fit(**kwargs)
          ^^^^^^^^^^^^^^^^^^^
  File "/home/ci/autogluon/tabular/src/autogluon/tabular/models/lgb/lgb_model.py", line 299, in _fit
    self.model = train_lgb_model(early_stopping_callback_kwargs=early_stopping_callback_kwargs, **train_params)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ci/autogluon/tabular/src/autogluon/tabular/models/lgb/lgb_utils.py", line 134, in train_lgb_model
    return lgb.train(**train_params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ci/opt/venv/lib/python3.11/site-packages/lightgbm/engine.py", line 297, in train
    booster = Booster(params=params, train_set=train_set)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ci/opt/venv/lib/python3.11/site-packages/lightgbm/basic.py", line 3656, in __init__
    train_set.construct()
  File "/home/ci/opt/venv/lib/python3.11/site-packages/lightgbm/basic.py", line 2590, in construct
    self._lazy_init(
  File "/home/ci/opt/venv/lib/python3.11/site-packages/lightgbm/basic.py", line 2123, in _lazy_init
    data, feature_name, categorical_feature, self.pandas_categorical = _data_from_pandas(
                                                                       ^^^^^^^^^^^^^^^^^^
  File "/home/ci/opt/venv/lib/python3.11/site-packages/lightgbm/basic.py", line 868, in _data_from_pandas
    _pandas_to_numpy(data, target_dtype=target_dtype),
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ci/opt/venv/lib/python3.11/site-packages/lightgbm/basic.py", line 814, in _pandas_to_numpy
    _check_for_bad_pandas_dtypes(data.dtypes)
  File "/home/ci/opt/venv/lib/python3.11/site-packages/lightgbm/basic.py", line 805, in _check_for_bad_pandas_dtypes
    raise ValueError(
ValueError: pandas dtypes must be int, float or bool.
Fields with bad pandas dtypes: workclass: object, education: object, marital-status: object, occupation: object, relationship: object, race: object, native-country: object
Fitting model: CatBoost_DSTL ... Training model for up to 29.48s of the 29.48s of remaining time.
	Warning: Exception caused CatBoost_DSTL to fail during training... Skipping this model.
		features data: pandas.DataFrame column 'workclass' has dtype 'category' but is not in  cat_features list
Detailed Traceback:
Traceback (most recent call last):
  File "/home/ci/autogluon/tabular/src/autogluon/tabular/trainer/abstract_trainer.py", line 2169, in _train_and_save
    model = self._train_single(**model_fit_kwargs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ci/autogluon/tabular/src/autogluon/tabular/trainer/abstract_trainer.py", line 2055, in _train_single
    model = model.fit(X=X, y=y, X_val=X_val, y_val=y_val, X_test=X_test, y_test=y_test, total_resources=total_resources, **model_fit_kwargs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ci/autogluon/core/src/autogluon/core/models/abstract/abstract_model.py", line 1051, in fit
    out = self._fit(**kwargs)
          ^^^^^^^^^^^^^^^^^^^
  File "/home/ci/autogluon/tabular/src/autogluon/tabular/models/catboost/catboost_model.py", line 154, in _fit
    X_val = Pool(data=X_val, label=y_val, cat_features=cat_features, weight=sample_weight_val)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ci/opt/venv/lib/python3.11/site-packages/catboost/core.py", line 855, in __init__
    self._init(data, label, cat_features, text_features, embedding_features, embedding_features_data, pairs, graph, weight,
  File "/home/ci/opt/venv/lib/python3.11/site-packages/catboost/core.py", line 1491, in _init
    self._init_pool(data, label, cat_features, text_features, embedding_features, embedding_features_data, pairs, graph, weight,
  File "_catboost.pyx", line 4329, in _catboost._PoolBase._init_pool
  File "_catboost.pyx", line 4381, in _catboost._PoolBase._init_pool
  File "_catboost.pyx", line 4190, in _catboost._PoolBase._init_features_order_layout_pool
  File "_catboost.pyx", line 3070, in _catboost._set_features_order_data_pd_data_frame
_catboost.CatBoostError: features data: pandas.DataFrame column 'workclass' has dtype 'category' but is not in  cat_features list
Fitting model: RandomForestMSE_DSTL ... Training model for up to 29.22s of the 29.22s of remaining time.
/home/ci/autogluon/tabular/src/autogluon/tabular/models/rf/rf_model.py:83: FutureWarning: Downcasting object dtype arrays on .fillna, .ffill, .bfill is deprecated and will change in a future version. Call result.infer_objects(copy=False) instead. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`
  X = X.fillna(0).to_numpy(dtype=np.float32)
	Note: model has different eval_metric than default.
	-0.1103	 = Validation score   (-mean_squared_error)
	1.39s	 = Training   runtime
	0.07s	 = Validation runtime
Fitting model: NeuralNetTorch_DSTL ... Training model for up to 27.65s of the 27.65s of remaining time.
	Warning: Exception caused NeuralNetTorch_DSTL to fail during training... Skipping this model.
		Found array with 0 feature(s) (shape=(4800, 0)) while a minimum of 1 is required.
Detailed Traceback:
Traceback (most recent call last):
  File "/home/ci/autogluon/tabular/src/autogluon/tabular/trainer/abstract_trainer.py", line 2169, in _train_and_save
    model = self._train_single(**model_fit_kwargs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ci/autogluon/tabular/src/autogluon/tabular/trainer/abstract_trainer.py", line 2055, in _train_single
    model = model.fit(X=X, y=y, X_val=X_val, y_val=y_val, X_test=X_test, y_test=y_test, total_resources=total_resources, **model_fit_kwargs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ci/autogluon/core/src/autogluon/core/models/abstract/abstract_model.py", line 1051, in fit
    out = self._fit(**kwargs)
          ^^^^^^^^^^^^^^^^^^^
  File "/home/ci/autogluon/tabular/src/autogluon/tabular/models/tabular_nn/torch/tabular_nn_torch.py", line 209, in _fit
    train_dataset = self._generate_dataset(X=X, y=y, train_params=processor_kwargs, is_train=True)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ci/autogluon/tabular/src/autogluon/tabular/models/tabular_nn/torch/tabular_nn_torch.py", line 687, in _generate_dataset
    dataset = self._process_train_data(
              ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ci/autogluon/tabular/src/autogluon/tabular/models/tabular_nn/torch/tabular_nn_torch.py", line 759, in _process_train_data
    df = self.processor.fit_transform(df)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ci/opt/venv/lib/python3.11/site-packages/sklearn/utils/_set_output.py", line 319, in wrapped
    data_to_wrap = f(self, X, *args, **kwargs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ci/opt/venv/lib/python3.11/site-packages/sklearn/base.py", line 1389, in wrapper
    return fit_method(estimator, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ci/opt/venv/lib/python3.11/site-packages/sklearn/compose/_column_transformer.py", line 1001, in fit_transform
    result = self._call_func_on_transformers(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ci/opt/venv/lib/python3.11/site-packages/sklearn/compose/_column_transformer.py", line 910, in _call_func_on_transformers
    return Parallel(n_jobs=self.n_jobs)(jobs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ci/opt/venv/lib/python3.11/site-packages/sklearn/utils/parallel.py", line 77, in __call__
    return super().__call__(iterable_with_config)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ci/opt/venv/lib/python3.11/site-packages/joblib/parallel.py", line 1985, in __call__
    return output if self.return_generator else list(output)
                                                ^^^^^^^^^^^^
  File "/home/ci/opt/venv/lib/python3.11/site-packages/joblib/parallel.py", line 1913, in _get_sequential_output
    res = func(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^
  File "/home/ci/opt/venv/lib/python3.11/site-packages/sklearn/utils/parallel.py", line 139, in __call__
    return self.function(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ci/opt/venv/lib/python3.11/site-packages/sklearn/pipeline.py", line 1551, in _fit_transform_one
    res = transformer.fit_transform(X, y, **params.get("fit_transform", {}))
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ci/opt/venv/lib/python3.11/site-packages/sklearn/base.py", line 1389, in wrapper
    return fit_method(estimator, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ci/opt/venv/lib/python3.11/site-packages/sklearn/pipeline.py", line 730, in fit_transform
    return last_step.fit_transform(
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ci/opt/venv/lib/python3.11/site-packages/sklearn/utils/_set_output.py", line 319, in wrapped
    data_to_wrap = f(self, X, *args, **kwargs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ci/opt/venv/lib/python3.11/site-packages/sklearn/base.py", line 918, in fit_transform
    return self.fit(X, **fit_params).transform(X)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ci/autogluon/tabular/src/autogluon/tabular/models/tabular_nn/utils/categorical_encoders.py", line 767, in fit
    self._fit(X, handle_unknown="ignore")
  File "/home/ci/autogluon/tabular/src/autogluon/tabular/models/tabular_nn/utils/categorical_encoders.py", line 202, in _fit
    X_list, n_samples, n_features = self._check_X(X)
                                    ^^^^^^^^^^^^^^^^
  File "/home/ci/autogluon/tabular/src/autogluon/tabular/models/tabular_nn/utils/categorical_encoders.py", line 168, in _check_X
    X_temp = check_array(X, dtype=None, ensure_all_finite=False)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ci/opt/venv/lib/python3.11/site-packages/sklearn/utils/validation.py", line 1139, in check_array
    raise ValueError(
ValueError: Found array with 0 feature(s) (shape=(4800, 0)) while a minimum of 1 is required.
Distilling with each of these student models: ['WeightedEnsemble_L2_DSTL']
Fitting model: WeightedEnsemble_L2_DSTL ... Training model for up to 30.00s of the 27.37s of remaining time.
	Ensemble Weights: {'RandomForestMSE_DSTL': 1.0}
	Note: model has different eval_metric than default.
	-0.1103	 = Validation score   (-mean_squared_error)
	0.0s	 = Training   runtime
	0.0s	 = Validation runtime
Distilled model leaderboard:
                      model  score_val         eval_metric  pred_time_val  fit_time  pred_time_val_marginal  fit_time_marginal  stack_level  can_infer  fit_order
0      RandomForestMSE_DSTL   0.718252  mean_squared_error       0.067172  1.385662                0.067172           1.385662            1       True          7
1  WeightedEnsemble_L2_DSTL   0.718252  mean_squared_error       0.067812  1.388410                0.000640           0.002748            2       True          8
模型 score_test score_val eval_metric pred_time_test pred_time_val fit_time pred_time_test_marginal pred_time_val_marginal fit_time_marginal stack_level can_infer fit_order
0 LightGBM_BAG_L1_FULL 0.750092 NaN balanced_accuracy 0.021516 NaN 0.326699 0.021516 NaN 0.326699 1 True 4
1 WeightedEnsemble_L2_FULL 0.750092 NaN balanced_accuracy 0.022803 NaN 0.353818 0.001287 NaN 0.027119 2 True 6
2 LightGBM_BAG_L1 0.743784 0.776399 balanced_accuracy 0.145854 0.038194 3.931167 0.145854 0.038194 3.931167 1 True 1
3 WeightedEnsemble_L2 0.743784 0.776399 balanced_accuracy 0.147171 0.038975 3.958286 0.001318 0.000782 0.027119 2 True 3
4 RandomForestMSE_DSTL 0.732074 0.718252 mean_squared_error 0.187299 0.067172 1.385662 0.187299 0.067172 1.385662 1 True 7
5 WeightedEnsemble_L2_DSTL 0.732074 0.718252 mean_squared_error 0.189372 0.067812 1.388410 0.002073 0.000640 0.002748 2 True 8
6 NeuralNetFastAI_BAG_L1 0.724629 0.741368 balanced_accuracy 1.158053 0.073276 9.611355 1.158053 0.073276 9.611355 1 True 2
7 NeuralNetFastAI_BAG_L1_FULL 0.700878 NaN balanced_accuracy 0.285300 NaN 0.437760 0.285300 NaN 0.437760 1 True 5

更快的预设或超参数

与其在预测时试图加速一个笨重的训练模型,如果您从一开始就知道推理延迟或内存会成为问题,那么您可以相应地调整训练过程,以确保 fit() 不会产生难以处理的模型。

一种选择是指定更轻量的 presets

presets = ['good_quality', 'optimize_for_deployment']
predictor_light = TabularPredictor(label=label, eval_metric=metric).fit(train_data, presets=presets, time_limit=30)
No path specified. Models will be saved in: "AutogluonModels/ag-20250508_210145"
Verbosity: 2 (Standard Logging)
=================== System Info ===================
AutoGluon Version:  1.3.1b20250508
Python Version:     3.11.9
Operating System:   Linux
Platform Machine:   x86_64
Platform Version:   #1 SMP Wed Mar 12 14:53:59 UTC 2025
CPU Count:          8
Memory Avail:       27.49 GB / 30.95 GB (88.8%)
Disk Space Avail:   211.59 GB / 255.99 GB (82.7%)
===================================================
Presets specified: ['good_quality', 'optimize_for_deployment']
Setting dynamic_stacking from 'auto' to True. Reason: Enable dynamic_stacking when use_bag_holdout is disabled. (use_bag_holdout=False)
Stack configuration (auto_stack=True): num_stack_levels=1, num_bag_folds=8, num_bag_sets=1
Note: `save_bag_folds=False`! This will greatly reduce peak disk usage during fit (by ~8x), but runs the risk of an out-of-memory error during model refit if memory is small relative to the data size.
	You can avoid this risk by setting `save_bag_folds=True`.
DyStack is enabled (dynamic_stacking=True). AutoGluon will try to determine whether the input data is affected by stacked overfitting and enable or disable stacking as a consequence.
	This is used to identify the optimal `num_stack_levels` value. Copies of AutoGluon will be fit on subsets of the data. Then holdout validation data is used to detect stacked overfitting.
	Running DyStack for up to 7s of the 30s of remaining time (25%).
		Context path: "/home/ci/autogluon/docs/tutorials/tabular/AutogluonModels/ag-20250508_210145/ds_sub_fit/sub_fit_ho"
---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
Cell In[32], line 2
      1 presets = ['good_quality', 'optimize_for_deployment']
----> 2 predictor_light = TabularPredictor(label=label, eval_metric=metric).fit(train_data, presets=presets, time_limit=30)

File ~/autogluon/core/src/autogluon/core/utils/decorators.py:31, in unpack.<locals>._unpack_inner.<locals>._call(*args, **kwargs)
     28 @functools.wraps(f)
     29 def _call(*args, **kwargs):
     30     gargs, gkwargs = g(*other_args, *args, **kwargs)
---> 31     return f(*gargs, **gkwargs)

File ~/autogluon/tabular/src/autogluon/tabular/predictor/predictor.py:1282, in TabularPredictor.fit(self, train_data, tuning_data, time_limit, presets, hyperparameters, feature_metadata, infer_limit, infer_limit_batch_size, fit_weighted_ensemble, fit_full_last_level_weighted_ensemble, full_weighted_ensemble_additionally, dynamic_stacking, calibrate_decision_threshold, num_cpus, num_gpus, fit_strategy, memory_limit, callbacks, **kwargs)
   1276 if dynamic_stacking:
   1277     logger.log(
   1278         20,
   1279         f"DyStack is enabled (dynamic_stacking={dynamic_stacking}). "
   1280         "AutoGluon will try to determine whether the input data is affected by stacked overfitting and enable or disable stacking as a consequence.",
   1281     )
-> 1282     num_stack_levels, time_limit = self._dynamic_stacking(**ds_args, ag_fit_kwargs=ag_fit_kwargs, ag_post_fit_kwargs=ag_post_fit_kwargs)
   1283     logger.info(
   1284         f"Starting main fit with num_stack_levels={num_stack_levels}.\n"
   1285         f"\tFor future fit calls on this dataset, you can skip DyStack to save time: "
   1286         f"`predictor.fit(..., dynamic_stacking=False, num_stack_levels={num_stack_levels})`"
   1287     )
   1289     if (time_limit is not None) and (time_limit <= 0):

File ~/autogluon/tabular/src/autogluon/tabular/predictor/predictor.py:1382, in TabularPredictor._dynamic_stacking(self, ag_fit_kwargs, ag_post_fit_kwargs, validation_procedure, detection_time_frac, holdout_frac, n_folds, n_repeats, memory_safe_fits, clean_up_fits, enable_ray_logging, enable_callbacks, holdout_data)
   1379         _, holdout_data, _, _ = self._validate_fit_data(train_data=X, tuning_data=holdout_data)
   1380         ds_fit_kwargs["ds_fit_context"] = os.path.join(ds_fit_context, "sub_fit_custom_ho")
-> 1382     stacked_overfitting = self._sub_fit_memory_save_wrapper(
   1383         train_data=X,
   1384         time_limit=time_limit,
   1385         time_start=time_start,
   1386         ds_fit_kwargs=ds_fit_kwargs,
   1387         ag_fit_kwargs=inner_ag_fit_kwargs,
   1388         ag_post_fit_kwargs=inner_ag_post_fit_kwargs,
   1389         holdout_data=holdout_data,
   1390     )
   1391 else:
   1392     # Holdout is false, use (repeated) cross-validation
   1393     is_stratified = self.problem_type in [BINARY, MULTICLASS]

File ~/autogluon/tabular/src/autogluon/tabular/predictor/predictor.py:1574, in TabularPredictor._sub_fit_memory_save_wrapper(self, train_data, time_limit, time_start, ds_fit_kwargs, ag_fit_kwargs, ag_post_fit_kwargs, holdout_data)
   1560 # FIXME: For some reason ray does not treat `num_cpus` and `num_gpus` the same.
   1561 #  For `num_gpus`, the process will reserve the capacity and is unable to share it to child ray processes, causing a deadlock.
   1562 #  For `num_cpus`, the value is completely ignored by children, and they can even use more num_cpus than the parent.
   1563 #  Because of this, num_gpus is set to 0 here to avoid a deadlock, but num_cpus does not need to be changed.
   1564 #  For more info, refer to Ray documentation: https://docs.rayai.org.cn/en/latest/ray-core/tasks/nested-tasks.html#yielding-resources-while-blocked
   1565 ref = sub_fit_caller.options(num_cpus=num_cpus, num_gpus=0).remote(
   1566     predictor=predictor_ref,
   1567     train_data=train_data_ref,
   (...)
   1572     holdout_data=holdout_data_ref,
   1573 )
-> 1574 finished, unfinished = _ds_ray.wait([ref], num_returns=1)
   1575 stacked_overfitting, ho_leaderboard, exception = _ds_ray.get(finished[0])
   1577 # TODO: This is present to ensure worker logs are properly logged and don't get skipped / printed out of order.
   1578 #  Ideally find a faster way to do this that doesn't introduce a 100 ms overhead.

File ~/opt/venv/lib/python3.11/site-packages/ray/_private/auto_init_hook.py:21, in wrap_auto_init.<locals>.auto_init_wrapper(*args, **kwargs)
     18 @wraps(fn)
     19 def auto_init_wrapper(*args, **kwargs):
     20     auto_init_ray()
---> 21     return fn(*args, **kwargs)

File ~/opt/venv/lib/python3.11/site-packages/ray/_private/client_mode_hook.py:103, in client_mode_hook.<locals>.wrapper(*args, **kwargs)
    101     if func.__name__ != "init" or is_client_mode_enabled_by_default:
    102         return getattr(ray, func.__name__)(*args, **kwargs)
--> 103 return func(*args, **kwargs)

File ~/opt/venv/lib/python3.11/site-packages/ray/_private/worker.py:3013, in wait(ray_waitables, num_returns, timeout, fetch_local)
   3011 timeout = timeout if timeout is not None else 10**6
   3012 timeout_milliseconds = int(timeout * 1000)
-> 3013 ready_ids, remaining_ids = worker.core_worker.wait(
   3014     ray_waitables,
   3015     num_returns,
   3016     timeout_milliseconds,
   3017     fetch_local,
   3018 )
   3019 return ready_ids, remaining_ids

File python/ray/_raylet.pyx:3529, in ray._raylet.CoreWorker.wait()

File python/ray/includes/common.pxi:83, in ray._raylet.check_status()

KeyboardInterrupt: 

另一种选择是指定更轻量的 hyperparameters

predictor_light = TabularPredictor(label=label, eval_metric=metric).fit(train_data, hyperparameters='very_light', time_limit=30)

在这里,您可以将 hyperparameters 设置为 'light'、'very_light' 或 'toy',以获得越来越小(但准确性较低)的模型和预测器。高级用户可以尝试手动指定特定模型的超参数,以使其更快/更小。

最后,您还可以完全排除训练某些笨重的模型。下面我们排除了通常较慢的模型(K 近邻、神经网络)。

excluded_model_types = ['KNN', 'NN_TORCH']
predictor_light = TabularPredictor(label=label, eval_metric=metric).fit(train_data, excluded_model_types=excluded_model_types, time_limit=30)

(高级)缓存预处理数据

如果您重复在相同的数据上进行预测,可以缓存预处理后的数据版本,并直接将预处理数据发送到 predictor.predict 以实现更快的推理。

test_data_preprocessed = predictor.transform_features(test_data)

# The following call will be faster than a normal predict call because we are skipping the preprocessing stage.
predictions = predictor.predict(test_data_preprocessed, transform_features=False)

请注意,这仅在您重复在相同数据上进行预测的情况下有用。如果这显著加快了您的用例,请考虑您当前的方法是否合理,或者对预测结果进行缓存是否是更好的解决方案。

(高级)禁用预处理

如果您宁愿在 TabularPredictor 之外进行数据预处理,可以通过以下方式完全禁用 TabularPredictor 的预处理功能:

predictor.fit(..., feature_generator=None, feature_metadata=YOUR_CUSTOM_FEATURE_METADATA)

请注意,这将移除数据清理的所有保护措施。除非您非常熟悉 AutoGluon,否则很可能会遇到错误。

这种情况可能有用的一种情况是,如果您有许多问题重复使用具有完全相同特征的完全相同数据。如果您有 30 个任务重复使用相同的特征,您可以在数据上一次性拟合一个 autogluon.features 特征生成器,然后当您需要在 30 个任务上进行预测时,只需预处理一次数据,然后将预处理后的数据发送到所有 30 个预测器。

如果遇到内存问题

为了减少训练期间的内存使用,您可以单独尝试或组合使用以下每种策略(这些策略可能会损害准确性)。

  • fit() 中,设置 excluded_model_types = ['KNN', 'XT' ,'RF'](或这些模型的一些子集)。

  • fit() 中尝试不同的 presets

  • fit() 中,设置 hyperparameters = 'light'hyperparameters = 'very_light'

  • 表格中的文本字段需要大量内存用于 N-gram 特征化。为了在 fit() 中减轻这种情况,您可以选择:(1)将 'ignore_text' 添加到您的 presets 列表中(以忽略文本特征),或(2)指定参数:

from sklearn.feature_extraction.text import CountVectorizer
from autogluon.features.generators import AutoMLPipelineFeatureGenerator
feature_generator = AutoMLPipelineFeatureGenerator(vectorizer=CountVectorizer(min_df=30, ngram_range=(1, 3), max_features=MAX_NGRAM, dtype=np.uint8))

例如使用 MAX_NGRAM = 1000(尝试小于 10000 的各种值以减少用于表示每个文本字段的 N-gram 特征数量)

除了减少内存使用外,上述许多策略也可用于减少训练时间。

为了减少推理期间的内存使用

  • 如果尝试为大型测试数据集生成预测,请将测试数据分成更小的块,如常见问题解答中所示。

  • 如果模型之前已持久化在内存中,但推理速度不是主要关注点,请调用 predictor.unpersist()

  • 如果模型之前已持久化在内存中,并且在 fit() 中使用了 bagging,并且推理速度是一个关注点:调用 predictor.refit_full() 并使用其中一个 refit-full 模型进行预测(确保这是内存中唯一持久化的模型)。

如果遇到磁盘空间问题

为了减少磁盘使用,您可以单独尝试或组合使用以下每种策略。

  • 请务必删除之前运行 fit() 生成的所有 predictor.path 文件夹!如果您多次调用 fit(),这些文件夹会占用您的可用空间。如果您没有指定 path,AutoGluon 仍然会自动将模型保存到名为“AutogluonModels/ag-[TIMESTAMP]”的文件夹中,其中 TIMESTAMP 记录了 fit() 调用发生的时间,因此如果可用空间不足,请务必也删除这些文件夹。

  • 调用 predictor.save_space() 删除在 fit() 期间产生的辅助文件。

  • 如果您只打算以后使用此预测器进行推理,请调用 predictor.delete_models(models_to_keep='best', dry_run=False)(这将删除非预测相关功能(例如 fit_summary)所需的文件)。

  • fit() 中,您可以将 'optimize_for_deployment' 添加到 presets 列表中,这将在训练后自动调用前两种策略。

  • 上述大多数减少内存使用的策略也会减少磁盘使用(但可能损害准确性)。

参考文献

以下论文描述了 AutoGluon 在内部如何处理表格数据。

Erickson 等人。AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data。《Arxiv》,2020 年。

下一步

如果您对部署优化感兴趣,请参阅预测表格中的列 - 部署优化教程。