AutoGluon Tabular - 深入介绍¶
提示:如果您是 AutoGluon 的新手,请查看 预测表格中的列 - 快速入门 以学习 AutoGluon API 的基础知识。要了解如何将您自己的自定义模型添加到 AutoGluon 训练、调优和集成的模型集中,请查看 向 AutoGluon 添加自定义模型。
本教程介绍了在使用 AutoGluon 的 fit()
或 predict()
时如何更好地控制。回想一下,为了最大化预测性能,您应该首先尝试使用所有默认参数的 TabularPredictor()
和 fit()
。然后,考虑 TabularPredictor(eval_metric=...)
和 fit(presets=...)
的非默认参数。之后,您可以尝试本深入教程中介绍的 fit()
的其他参数,例如 hyperparameter_tune_kwargs
、hyperparameters
、num_stack_levels
、num_bag_folds
、num_bag_sets
等。
使用与 预测表格中的列 - 快速入门 教程中相同的人口普查数据表,我们现在将预测个人的 occupation
- 这是一个多类别分类问题。首先导入 AutoGluon 的 TabularPredictor 和 TabularDataset,并加载数据。
from autogluon.tabular import TabularDataset, TabularPredictor
import numpy as np
train_data = TabularDataset('https://autogluon.s3.amazonaws.com/datasets/Inc/train.csv')
subsample_size = 1000 # subsample subset of data for faster demo, try setting this to much larger values
train_data = train_data.sample(n=subsample_size, random_state=0)
print(train_data.head())
label = 'occupation'
print("Summary of occupation column: \n", train_data['occupation'].describe())
test_data = TabularDataset('https://autogluon.s3.amazonaws.com/datasets/Inc/test.csv')
y_test = test_data[label]
test_data_nolabel = test_data.drop(columns=[label]) # delete label column
metric = 'accuracy' # we specify eval-metric just for demo (unnecessary as it's the default)
age workclass fnlwgt education education-num \
6118 51 Private 39264 Some-college 10
23204 58 Private 51662 10th 6
29590 40 Private 326310 Some-college 10
18116 37 Private 222450 HS-grad 9
33964 62 Private 109190 Bachelors 13
marital-status occupation relationship race sex \
6118 Married-civ-spouse Exec-managerial Wife White Female
23204 Married-civ-spouse Other-service Wife White Female
29590 Married-civ-spouse Craft-repair Husband White Male
18116 Never-married Sales Not-in-family White Male
33964 Married-civ-spouse Exec-managerial Husband White Male
capital-gain capital-loss hours-per-week native-country class
6118 0 0 40 United-States >50K
23204 0 0 8 United-States <=50K
29590 0 0 44 United-States <=50K
18116 0 2339 40 El-Salvador <=50K
33964 15024 0 40 United-States >50K
Summary of occupation column:
count 1000
unique 15
top Craft-repair
freq 142
Name: occupation, dtype: object
指定超参数并调优¶
注意:在大多数情况下,我们不建议使用 AutoGluon 进行超参数调优。AutoGluon 在不进行超参数调优的情况下,仅通过指定 presets="best_quality"
即可获得最佳性能。
我们首先演示超参数调优以及如何提供自己的验证数据集,AutoGluon 在内部依赖该数据集来:调优超参数、提前停止迭代训练以及构建模型集成。您指定验证数据的一个原因可能是未来的测试数据与训练数据分布不同(并且您指定的验证数据更能代表未来可能遇到的数据)。
如果您没有强烈的理由提供自己的验证数据集,我们建议您省略 tuning_data
参数。这让 AutoGluon 可以自动从您提供的训练集中选择验证数据(它使用了诸如分层抽样等智能策略)。为了更好地控制,您可以指定 holdout_frac
参数来告诉 AutoGluon 从提供的训练数据中留出多少比例用于验证。
注意:由于 AutoGluon 根据此验证数据调优内部参数,因此在此数据上报告的性能估计可能过于乐观。为了获得无偏的性能估计,您应该始终在单独的数据集(从未传递给 fit()
)上调用 predict()
,就像我们在之前的**快速入门**教程中所做的那样。我们还强调,本教程中指定的大多数选项都是为了演示目的而选择的,以最小化运行时间,您应该选择更合理的值以获得高质量的模型。
fit()
默认训练神经网络和各种类型的树集成模型。您可以为每种类型的模型指定不同的超参数值。对于每个超参数,您可以指定单个固定值,或指定超参数优化期间要考虑的值的搜索空间。您未指定的超参数将保留为 AutoGluon 自动选择的默认设置,这些设置可能是固定值或搜索空间。
请参阅搜索空间文档了解更多关于 AutoGluon 搜索空间的信息。
from autogluon.common import space
nn_options = { # specifies non-default hyperparameter values for neural network models
'num_epochs': 10, # number of training epochs (controls training time of NN models)
'learning_rate': space.Real(1e-4, 1e-2, default=5e-4, log=True), # learning rate used in training (real-valued hyperparameter searched on log-scale)
'activation': space.Categorical('relu', 'softrelu', 'tanh'), # activation function used in NN (categorical hyperparameter, default = first entry)
'dropout_prob': space.Real(0.0, 0.5, default=0.1), # dropout probability (real-valued hyperparameter)
}
gbm_options = { # specifies non-default hyperparameter values for lightGBM gradient boosted trees
'num_boost_round': 100, # number of boosting rounds (controls training time of GBM models)
'num_leaves': space.Int(lower=26, upper=66, default=36), # number of leaves in trees (integer hyperparameter)
}
hyperparameters = { # hyperparameters of each model type
'GBM': gbm_options,
'NN_TORCH': nn_options, # NOTE: comment this line out if you get errors on Mac OSX
} # When these keys are missing from hyperparameters dict, no models of that type are trained
time_limit = 2*60 # train various models for ~2 min
num_trials = 5 # try at most 5 different hyperparameter configurations for each type of model
search_strategy = 'auto' # to tune hyperparameters using random search routine with a local scheduler
hyperparameter_tune_kwargs = { # HPO is not performed unless hyperparameter_tune_kwargs is specified
'num_trials': num_trials,
'scheduler' : 'local',
'searcher': search_strategy,
} # Refer to TabularPredictor.fit docstring for all valid values
predictor = TabularPredictor(label=label, eval_metric=metric).fit(
train_data,
time_limit=time_limit,
hyperparameters=hyperparameters,
hyperparameter_tune_kwargs=hyperparameter_tune_kwargs,
)
Fitted model: NeuralNetTorch/1c49f759 ...
0.365 = Validation score (accuracy)
3.32s = Training runtime
0.01s = Validation runtime
Fitted model: NeuralNetTorch/b3b55be6 ...
0.32 = Validation score (accuracy)
3.63s = Training runtime
0.01s = Validation runtime
Fitted model: NeuralNetTorch/dcd2520d ...
0.335 = Validation score (accuracy)
3.58s = Training runtime
0.01s = Validation runtime
Fitted model: NeuralNetTorch/5730cde0 ...
0.355 = Validation score (accuracy)
3.68s = Training runtime
0.01s = Validation runtime
Fitted model: NeuralNetTorch/c7da0e67 ...
0.33 = Validation score (accuracy)
3.51s = Training runtime
0.01s = Validation runtime
Fitting model: WeightedEnsemble_L2 ... Training model for up to 119.91s of the 94.84s of remaining time.
Ensemble Weights: {'LightGBM/T3': 1.0}
0.375 = Validation score (accuracy)
0.03s = Training runtime
0.0s = Validation runtime
AutoGluon training complete, total runtime = 25.23s ... Best model: WeightedEnsemble_L2 | Estimated inference throughput: 45336.5 rows/s (200 batch size)
TabularPredictor saved. To load, use: predictor = TabularPredictor.load("/home/ci/autogluon/docs/tutorials/tabular/AutogluonModels/ag-20250508_205938")
我们再次演示如何使用训练好的模型在测试数据上进行预测。
y_pred = predictor.predict(test_data_nolabel)
print("Predictions: ", list(y_pred)[:5])
perf = predictor.evaluate(test_data, auxiliary_metrics=False)
Predictions: [' Other-service', ' Craft-repair', ' Exec-managerial', ' Sales', ' Other-service']
使用以下命令查看 fit()
期间发生的情况摘要。现在此命令将显示每种模型超参数调优过程的详细信息。
results = predictor.fit_summary()
*** Summary of fit() ***
Estimated performance of each model:
model score_val eval_metric pred_time_val fit_time pred_time_val_marginal fit_time_marginal stack_level can_infer fit_order
0 LightGBM/T3 0.375 accuracy 0.003542 0.357088 0.003542 0.357088 1 True 3
1 WeightedEnsemble_L2 0.375 accuracy 0.004411 0.389364 0.000869 0.032276 2 True 11
2 LightGBM/T5 0.375 accuracy 0.004484 0.519957 0.004484 0.519957 1 True 5
3 LightGBM/T1 0.370 accuracy 0.003504 0.721791 0.003504 0.721791 1 True 1
4 NeuralNetTorch/1c49f759 0.365 accuracy 0.010307 3.324137 0.010307 3.324137 1 True 6
5 LightGBM/T4 0.360 accuracy 0.005873 0.599082 0.005873 0.599082 1 True 4
6 LightGBM/T2 0.355 accuracy 0.004143 0.601759 0.004143 0.601759 1 True 2
7 NeuralNetTorch/5730cde0 0.355 accuracy 0.012197 3.678790 0.012197 3.678790 1 True 9
8 NeuralNetTorch/dcd2520d 0.335 accuracy 0.012314 3.579366 0.012314 3.579366 1 True 8
9 NeuralNetTorch/c7da0e67 0.330 accuracy 0.012733 3.507454 0.012733 3.507454 1 True 10
10 NeuralNetTorch/b3b55be6 0.320 accuracy 0.008916 3.626802 0.008916 3.626802 1 True 7
Number of models trained: 11
Types of models trained:
{'TabularNeuralNetTorchModel', 'WeightedEnsembleModel', 'LGBModel'}
Bagging used: False
Multi-layer stack-ensembling used: False
Feature Metadata (Processed):
(raw dtype, special dtypes):
('category', []) : 6 | ['workclass', 'education', 'marital-status', 'relationship', 'race', ...]
('int', []) : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...]
('int', ['bool']) : 2 | ['sex', 'class']
*** End of fit() summary ***
/home/ci/autogluon/core/src/autogluon/core/utils/plots.py:169: UserWarning: AutoGluon summary plots cannot be created because bokeh is not installed. To see plots, please do: "pip install bokeh==2.0.1"
warnings.warn('AutoGluon summary plots cannot be created because bokeh is not installed. To see plots, please do: "pip install bokeh==2.0.1"')
在上面的示例中,预测性能可能很差,因为我们指定了非常少的训练时间以确保快速运行。您可以多次调用 fit()
,同时修改上述设置,以便更好地了解这些选择如何影响性能结果。例如:您可以增加 subsample_size
以使用更大的数据集进行训练,增加 num_epochs
和 num_boost_round
超参数,并增加 time_limit
(您应该在这些教程中的所有代码中都这样做)。要在 fit()
执行期间看到更详细的输出,您还可以传递参数:verbosity=3
。
使用堆叠/bagging 进行模型集成¶
除了使用正确指定的评估指标进行超参数调优之外,另外两种提升预测性能的方法是 bagging 和 stack-ensembling。如果在调用 fit()
时指定 num_bag_folds
= 5-10, num_stack_levels
= 1,您通常会看到性能提升,但这会增加训练时间和内存/磁盘使用量。
label = 'class' # Now lets predict the "class" column (binary classification)
test_data_nolabel = test_data.drop(columns=[label])
y_test = test_data[label]
save_path = 'agModels-predictClass' # folder where to store trained models
predictor = TabularPredictor(label=label, eval_metric=metric).fit(train_data,
num_bag_folds=5, num_bag_sets=1, num_stack_levels=1,
hyperparameters = {'NN_TORCH': {'num_epochs': 2}, 'GBM': {'num_boost_round': 20}}, # last argument is just for quick demo here, omit it in real applications
)
No path specified. Models will be saved in: "AutogluonModels/ag-20250508_210003"
Verbosity: 2 (Standard Logging)
=================== System Info ===================
AutoGluon Version: 1.3.1b20250508
Python Version: 3.11.9
Operating System: Linux
Platform Machine: x86_64
Platform Version: #1 SMP Wed Mar 12 14:53:59 UTC 2025
CPU Count: 8
Memory Avail: 28.02 GB / 30.95 GB (90.6%)
Disk Space Avail: 211.73 GB / 255.99 GB (82.7%)
===================================================
No presets specified! To achieve strong results with AutoGluon, it is recommended to use the available presets. Defaulting to `'medium'`...
Recommended Presets (For more details refer to https://autogluon.cn/stable/tutorials/tabular/tabular-essentials.html#presets):
presets='experimental' : New in v1.2: Pre-trained foundation model + parallel fits. The absolute best accuracy without consideration for inference speed. Does not support GPU.
presets='best' : Maximize accuracy. Recommended for most users. Use in competitions and benchmarks.
presets='high' : Strong accuracy with fast inference speed.
presets='good' : Good accuracy with very fast inference speed.
presets='medium' : Fast training time, ideal for initial prototyping.
Beginning AutoGluon training ...
AutoGluon will save models to "/home/ci/autogluon/docs/tutorials/tabular/AutogluonModels/ag-20250508_210003"
Train Data Rows: 1000
Train Data Columns: 14
Label Column: class
AutoGluon infers your prediction problem is: 'binary' (because only two unique label-values observed).
2 unique label values: [' >50K', ' <=50K']
If 'binary' is not the correct problem_type, please manually specify the problem_type parameter during Predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression', 'quantile'])
Problem Type: binary
Preprocessing data ...
Selected class <--> label mapping: class 1 = >50K, class 0 = <=50K
Note: For your binary classification, AutoGluon arbitrarily selected which label-value represents positive ( >50K) vs negative ( <=50K) class.
To explicitly set the positive_class, either rename classes to 1 and 0, or specify positive_class in Predictor init.
Using Feature Generators to preprocess the data ...
Fitting AutoMLPipelineFeatureGenerator...
Available Memory: 28696.56 MB
Train Data (Original) Memory Usage: 0.56 MB (0.0% of available memory)
Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.
Stage 1 Generators:
Fitting AsTypeFeatureGenerator...
Note: Converting 1 features to boolean dtype as they only contain 2 unique values.
Stage 2 Generators:
Fitting FillNaFeatureGenerator...
Stage 3 Generators:
Fitting IdentityFeatureGenerator...
Fitting CategoryFeatureGenerator...
Fitting CategoryMemoryMinimizeFeatureGenerator...
Stage 4 Generators:
Fitting DropUniqueFeatureGenerator...
Stage 5 Generators:
Fitting DropDuplicatesFeatureGenerator...
Types of features in original data (raw dtype, special dtypes):
('int', []) : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...]
('object', []) : 8 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...]
Types of features in processed data (raw dtype, special dtypes):
('category', []) : 7 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...]
('int', []) : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...]
('int', ['bool']) : 1 | ['sex']
0.1s = Fit runtime
14 features in original data used to generate 14 features in processed data.
Train Data (Processed) Memory Usage: 0.06 MB (0.0% of available memory)
Data preprocessing and feature engineering runtime = 0.12s ...
AutoGluon will gauge predictive performance using evaluation metric: 'accuracy'
To change this, specify the eval_metric parameter of Predictor()
User-specified model hyperparameters to be fit:
{
'NN_TORCH': [{'num_epochs': 2}],
'GBM': [{'num_boost_round': 20}],
}
AutoGluon will fit 2 stack levels (L1 to L2) ...
Fitting 2 L1 models, fit_strategy="sequential" ...
Fitting model: LightGBM_BAG_L1 ...
Fitting 5 child models (S1F1 - S1F5) | Fitting with ParallelLocalFoldFittingStrategy (5 workers, per: cpus=1, gpus=0, memory=0.01%)
0.823 = Validation score (accuracy)
1.84s = Training runtime
0.02s = Validation runtime
Fitting model: NeuralNetTorch_BAG_L1 ...
Fitting 5 child models (S1F1 - S1F5) | Fitting with ParallelLocalFoldFittingStrategy (5 workers, per: cpus=1, gpus=0, memory=0.00%)
0.744 = Validation score (accuracy)
8.11s = Training runtime
0.06s = Validation runtime
Fitting model: WeightedEnsemble_L2 ...
Ensemble Weights: {'LightGBM_BAG_L1': 1.0}
0.823 = Validation score (accuracy)
0.03s = Training runtime
0.0s = Validation runtime
Fitting 2 L2 models, fit_strategy="sequential" ...
Fitting model: LightGBM_BAG_L2 ...
Fitting 5 child models (S1F1 - S1F5) | Fitting with ParallelLocalFoldFittingStrategy (5 workers, per: cpus=1, gpus=0, memory=0.01%)
0.826 = Validation score (accuracy)
1.91s = Training runtime
0.02s = Validation runtime
Fitting model: NeuralNetTorch_BAG_L2 ...
Fitting 5 child models (S1F1 - S1F5) | Fitting with ParallelLocalFoldFittingStrategy (5 workers, per: cpus=1, gpus=0, memory=0.00%)
0.748 = Validation score (accuracy)
8.58s = Training runtime
0.07s = Validation runtime
Fitting model: WeightedEnsemble_L3 ...
Ensemble Weights: {'LightGBM_BAG_L2': 0.889, 'LightGBM_BAG_L1': 0.111}
0.827 = Validation score (accuracy)
0.05s = Training runtime
0.0s = Validation runtime
AutoGluon training complete, total runtime = 27.91s ... Best model: WeightedEnsemble_L3 | Estimated inference throughput: 2102.4 rows/s (200 batch size)
Disabling decision threshold calibration for metric `accuracy` due to having fewer than 10000 rows of validation data for calibration, to avoid overfitting (1000 rows).
`accuracy` is generally not improved through threshold calibration. Force calibration via specifying `calibrate_decision_threshold=True`.
TabularPredictor saved. To load, use: predictor = TabularPredictor.load("/home/ci/autogluon/docs/tutorials/tabular/AutogluonModels/ag-20250508_210003")
在进行堆叠/bagging 时,不应提供 tuning_data
,而应将所有可用数据作为 train_data
提供(AutoGluon 会以更智能的方式进行划分)。num_bag_sets
控制 k 折 bagging 过程重复多少次以进一步减少方差(增加此值可能会进一步提升准确性,但会显著增加训练时间、推理延迟和内存/磁盘使用量)。与其手动搜索好的 bagging/stacking 值,如果您指定 auto_stack
(在 best_quality
预设中使用),AutoGluon 会自动为您选择好的值。
# Lets also specify the "balanced_accuracy" metric
predictor = TabularPredictor(label=label, eval_metric='balanced_accuracy', path=save_path).fit(
train_data, auto_stack=True,
calibrate_decision_threshold=False, # Disabling for demonstration in next section
hyperparameters={'FASTAI': {'num_epochs': 10}, 'GBM': {'num_boost_round': 200}} # last 2 arguments are for quick demo, omit them in real applications
)
predictor.leaderboard(test_data)
Verbosity: 2 (Standard Logging)
=================== System Info ===================
AutoGluon Version: 1.3.1b20250508
Python Version: 3.11.9
Operating System: Linux
Platform Machine: x86_64
Platform Version: #1 SMP Wed Mar 12 14:53:59 UTC 2025
CPU Count: 8
Memory Avail: 27.67 GB / 30.95 GB (89.4%)
Disk Space Avail: 211.72 GB / 255.99 GB (82.7%)
===================================================
No presets specified! To achieve strong results with AutoGluon, it is recommended to use the available presets. Defaulting to `'medium'`...
Recommended Presets (For more details refer to https://autogluon.cn/stable/tutorials/tabular/tabular-essentials.html#presets):
presets='experimental' : New in v1.2: Pre-trained foundation model + parallel fits. The absolute best accuracy without consideration for inference speed. Does not support GPU.
presets='best' : Maximize accuracy. Recommended for most users. Use in competitions and benchmarks.
presets='high' : Strong accuracy with fast inference speed.
presets='good' : Good accuracy with very fast inference speed.
presets='medium' : Fast training time, ideal for initial prototyping.
Stack configuration (auto_stack=True): num_stack_levels=0, num_bag_folds=8, num_bag_sets=1
Beginning AutoGluon training ...
AutoGluon will save models to "/home/ci/autogluon/docs/tutorials/tabular/agModels-predictClass"
Train Data Rows: 1000
Train Data Columns: 14
Label Column: class
AutoGluon infers your prediction problem is: 'binary' (because only two unique label-values observed).
2 unique label values: [' >50K', ' <=50K']
If 'binary' is not the correct problem_type, please manually specify the problem_type parameter during Predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression', 'quantile'])
Problem Type: binary
Preprocessing data ...
Selected class <--> label mapping: class 1 = >50K, class 0 = <=50K
Note: For your binary classification, AutoGluon arbitrarily selected which label-value represents positive ( >50K) vs negative ( <=50K) class.
To explicitly set the positive_class, either rename classes to 1 and 0, or specify positive_class in Predictor init.
Using Feature Generators to preprocess the data ...
Fitting AutoMLPipelineFeatureGenerator...
Available Memory: 28337.69 MB
Train Data (Original) Memory Usage: 0.56 MB (0.0% of available memory)
Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.
Stage 1 Generators:
Fitting AsTypeFeatureGenerator...
Note: Converting 1 features to boolean dtype as they only contain 2 unique values.
Stage 2 Generators:
Fitting FillNaFeatureGenerator...
Stage 3 Generators:
Fitting IdentityFeatureGenerator...
Fitting CategoryFeatureGenerator...
Fitting CategoryMemoryMinimizeFeatureGenerator...
Stage 4 Generators:
Fitting DropUniqueFeatureGenerator...
Stage 5 Generators:
Fitting DropDuplicatesFeatureGenerator...
Types of features in original data (raw dtype, special dtypes):
('int', []) : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...]
('object', []) : 8 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...]
Types of features in processed data (raw dtype, special dtypes):
('category', []) : 7 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...]
('int', []) : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...]
('int', ['bool']) : 1 | ['sex']
0.1s = Fit runtime
14 features in original data used to generate 14 features in processed data.
Train Data (Processed) Memory Usage: 0.06 MB (0.0% of available memory)
Data preprocessing and feature engineering runtime = 0.09s ...
AutoGluon will gauge predictive performance using evaluation metric: 'balanced_accuracy'
To change this, specify the eval_metric parameter of Predictor()
User-specified model hyperparameters to be fit:
{
'FASTAI': [{'num_epochs': 10}],
'GBM': [{'num_boost_round': 200}],
}
Fitting 2 L1 models, fit_strategy="sequential" ...
Fitting model: LightGBM_BAG_L1 ...
Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy (8 workers, per: cpus=1, gpus=0, memory=0.01%)
0.7764 = Validation score (balanced_accuracy)
3.93s = Training runtime
0.04s = Validation runtime
Fitting model: NeuralNetFastAI_BAG_L1 ...
Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy (8 workers, per: cpus=1, gpus=0, memory=0.00%)
0.7414 = Validation score (balanced_accuracy)
9.61s = Training runtime
0.07s = Validation runtime
Fitting model: WeightedEnsemble_L2 ...
Ensemble Weights: {'LightGBM_BAG_L1': 1.0}
0.7764 = Validation score (balanced_accuracy)
0.03s = Training runtime
0.0s = Validation runtime
AutoGluon training complete, total runtime = 18.48s ... Best model: WeightedEnsemble_L2 | Estimated inference throughput: 3264.4 rows/s (125 batch size)
TabularPredictor saved. To load, use: predictor = TabularPredictor.load("/home/ci/autogluon/docs/tutorials/tabular/agModels-predictClass")
模型 | score_test | score_val | eval_metric | pred_time_test | pred_time_val | fit_time | pred_time_test_marginal | pred_time_val_marginal | fit_time_marginal | stack_level | can_infer | fit_order | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | LightGBM_BAG_L1 | 0.743784 | 0.776399 | balanced_accuracy | 0.209645 | 0.038194 | 3.931167 | 0.209645 | 0.038194 | 3.931167 | 1 | True | 1 |
1 | WeightedEnsemble_L2 | 0.743784 | 0.776399 | balanced_accuracy | 0.211094 | 0.038975 | 3.958286 | 0.001449 | 0.000782 | 0.027119 | 2 | True | 3 |
2 | NeuralNetFastAI_BAG_L1 | 0.724629 | 0.741368 | balanced_accuracy | 1.709173 | 0.073276 | 9.611355 | 1.709173 | 0.073276 | 9.611355 | 1 | True | 2 |
通常,堆叠/bagging 会产生比超参数调优更好的准确性,但您可以尝试结合这两种技术(注意:在 fit()
中指定 presets='best_quality'
仅设置 auto_stack=True
)。
决策阈值校准¶
在二元分类中,通过 calibrate_decision_threshold
将预测决策阈值调整到 0.5 以外的值,可以显著提高 "f1"
和 "balanced_accuracy"
等指标的分数。
以下是校准和不校准决策阈值时,在测试数据上获得的 "balanced_accuracy"
分数的示例。
print(f'Prior to calibration (predictor.decision_threshold={predictor.decision_threshold}):')
scores = predictor.evaluate(test_data)
calibrated_decision_threshold = predictor.calibrate_decision_threshold()
predictor.set_decision_threshold(calibrated_decision_threshold)
print(f'After calibration (predictor.decision_threshold={predictor.decision_threshold}):')
scores_calibrated = predictor.evaluate(test_data)
Prior to calibration (predictor.decision_threshold=0.5):
After calibration (predictor.decision_threshold=0.25):
Calibrating decision threshold to optimize metric balanced_accuracy | Checking 51 thresholds...
Calibrating decision threshold via fine-grained search | Checking 38 thresholds...
Base Threshold: 0.500 | val: 0.7764
Best Threshold: 0.250 | val: 0.7926
Updating predictor.decision_threshold from 0.5 -> 0.25
This will impact how prediction probabilities are converted to predictions in binary classification.
Prediction probabilities of the positive class >0.25 will be predicted as the positive class ( >50K). This can significantly impact metric scores.
You can update this value via `predictor.set_decision_threshold`.
You can calculate an optimal decision threshold on the validation data via `predictor.calibrate_decision_threshold()`.
for metric_name in scores:
metric_score = scores[metric_name]
metric_score_calibrated = scores_calibrated[metric_name]
decision_threshold = predictor.decision_threshold
print(f'decision_threshold={decision_threshold:.3f}\t| metric="{metric_name}"'
f'\n\ttest_score uncalibrated: {metric_score:.4f}'
f'\n\ttest_score calibrated: {metric_score_calibrated:.4f}'
f'\n\ttest_score delta: {metric_score_calibrated-metric_score:.4f}')
decision_threshold=0.250 | metric="balanced_accuracy"
test_score uncalibrated: 0.7438
test_score calibrated: 0.8120
test_score delta: 0.0682
decision_threshold=0.250 | metric="accuracy"
test_score uncalibrated: 0.8472
test_score calibrated: 0.8162
test_score delta: -0.0310
decision_threshold=0.250 | metric="mcc"
test_score uncalibrated: 0.5457
test_score calibrated: 0.5654
test_score delta: 0.0197
decision_threshold=0.250 | metric="roc_auc"
test_score uncalibrated: 0.8990
test_score calibrated: 0.8990
test_score delta: 0.0000
decision_threshold=0.250 | metric="f1"
test_score uncalibrated: 0.6294
test_score calibrated: 0.6749
test_score delta: 0.0454
decision_threshold=0.250 | metric="precision"
test_score uncalibrated: 0.7411
test_score calibrated: 0.5814
test_score delta: -0.1597
decision_threshold=0.250 | metric="recall"
test_score uncalibrated: 0.5470
test_score calibrated: 0.8041
test_score delta: 0.2571
请注意,对“balanced_accuracy”进行校准显著提高了“balanced_accuracy”指标分数,但损害了“accuracy”分数。阈值校准通常会导致不同指标性能之间的权衡,用户应牢记这一点。
我们可以针对任何指标进行校准,而不仅仅是针对“balanced_accuracy”,前提是我们希望最大化该指标的分数。
predictor.set_decision_threshold(0.5) # Reset decision threshold
for metric_name in ['f1', 'balanced_accuracy', 'mcc']:
metric_score = predictor.evaluate(test_data, silent=True)[metric_name]
calibrated_decision_threshold = predictor.calibrate_decision_threshold(metric=metric_name, verbose=False)
metric_score_calibrated = predictor.evaluate(
test_data, decision_threshold=calibrated_decision_threshold, silent=True
)[metric_name]
print(f'decision_threshold={calibrated_decision_threshold:.3f}\t| metric="{metric_name}"'
f'\n\ttest_score uncalibrated: {metric_score:.4f}'
f'\n\ttest_score calibrated: {metric_score_calibrated:.4f}'
f'\n\ttest_score delta: {metric_score_calibrated-metric_score:.4f}')
decision_threshold=0.500 | metric="f1"
test_score uncalibrated: 0.6294
test_score calibrated: 0.6294
test_score delta: 0.0000
decision_threshold=0.250 | metric="balanced_accuracy"
test_score uncalibrated: 0.7438
test_score calibrated: 0.8120
test_score delta: 0.0682
decision_threshold=0.500 | metric="mcc"
test_score uncalibrated: 0.5457
test_score calibrated: 0.5457
test_score delta: 0.0000
Updating predictor.decision_threshold from 0.25 -> 0.5
This will impact how prediction probabilities are converted to predictions in binary classification.
Prediction probabilities of the positive class >0.5 will be predicted as the positive class ( >50K). This can significantly impact metric scores.
You can update this value via `predictor.set_decision_threshold`.
You can calculate an optimal decision threshold on the validation data via `predictor.calibrate_decision_threshold()`.
您可以在 fit
调用期间通过指定 fit 参数 predictor.fit(..., calibrate_decision_threshold=True)
来自动进行决策阈值校准,而不是在 fit 后进行。
幸运的是,AutoGluon 在有利的情况下会自动应用决策阈值校准,因为默认值是 calibrate_decision_threshold="auto"
。在大多数情况下,我们建议将此值保留为默认值。
其他用法示例如下
# Will use the decision_threshold specified in `predictor.decision_threshold`, can be set via `predictor.set_decision_threshold`
# y_pred = predictor.predict(test_data)
# y_pred_08 = predictor.predict(test_data, decision_threshold=0.8) # Specify a specific threshold to use only for this predict
# y_pred_proba = predictor.predict_proba(test_data)
# y_pred = predictor.predict_from_proba(y_pred_proba) # Identical output to calling .predict(test_data)
# y_pred_08 = predictor.predict_from_proba(y_pred_proba, decision_threshold=0.8) # Identical output to calling .predict(test_data, decision_threshold=0.8)
预测选项(推理)¶
即使您自上次调用 fit()
以来启动了一个新的 Python 会话,您仍然可以从磁盘加载之前训练好的预测器。
predictor = TabularPredictor.load(save_path) # `predictor.path` is another way to get the relative path needed to later load predictor.
上面的 save_path
是之前传递给 TabularPredictor
的同一个文件夹,所有训练好的模型都保存在其中。您可以轻松地在一台机器上训练模型,然后在另一台机器上部署它们。只需将 save_path
文件夹复制到新机器上,并在 TabularPredictor.load()
中指定其新路径即可。
要找出进行预测所需的特征列,请调用 predictor.features()
。
predictor.features()
['age',
'workclass',
'fnlwgt',
'education',
'education-num',
'marital-status',
'occupation',
'relationship',
'race',
'sex',
'capital-gain',
'capital-loss',
'hours-per-week',
'native-country']
我们可以对单个示例而不是完整数据集进行预测。
datapoint = test_data_nolabel.iloc[[0]] # Note: .iloc[0] won't work because it returns pandas Series instead of DataFrame
print(datapoint)
predictor.predict(datapoint)
age workclass fnlwgt education education-num marital-status \
0 31 Private 169085 11th 7 Married-civ-spouse
occupation relationship race sex capital-gain capital-loss \
0 Sales Wife White Female 0 0
hours-per-week native-country
0 20 United-States
0 <=50K
Name: class, dtype: object
要输出预测的类别概率而不是预测的类别,可以使用
predictor.predict_proba(datapoint) # returns a DataFrame that shows which probability corresponds to which class
<=50K | >50K | |
---|---|---|
0 | 0.951059 | 0.048941 |
默认情况下,predict()
和 predict_proba()
将使用 AutoGluon 认为最准确的模型,这通常是许多单个模型的集成。以下是如何查看是哪个模型。
predictor.model_best
'WeightedEnsemble_L2'
我们可以改为指定一个特定的模型用于预测(例如,为了减少推理延迟)。请注意,AutoGluon 中的“模型”可能指的是单个神经网络,或者是在不同训练/验证划分上训练的许多神经网络副本的 bagged 集成,或者是汇集了许多其他模型预测结果的加权集成,或者是对其他模型输出的预测进行操作的 stacker 模型。这类似于将随机森林视为一个“模型”,而实际上它是许多决策树的集成。
在决定使用哪个模型之前,让我们评估 AutoGluon 之前在我们测试数据上训练的所有模型。
predictor.leaderboard(test_data)
模型 | score_test | score_val | eval_metric | pred_time_test | pred_time_val | fit_time | pred_time_test_marginal | pred_time_val_marginal | fit_time_marginal | stack_level | can_infer | fit_order | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | LightGBM_BAG_L1 | 0.743784 | 0.776399 | balanced_accuracy | 0.148440 | 0.038194 | 3.931167 | 0.148440 | 0.038194 | 3.931167 | 1 | True | 1 |
1 | WeightedEnsemble_L2 | 0.743784 | 0.776399 | balanced_accuracy | 0.149763 | 0.038975 | 3.958286 | 0.001322 | 0.000782 | 0.027119 | 2 | True | 3 |
2 | NeuralNetFastAI_BAG_L1 | 0.724629 | 0.741368 | balanced_accuracy | 1.188020 | 0.073276 | 9.611355 | 1.188020 | 0.073276 | 9.611355 | 1 | True | 2 |
排行榜显示了每个模型在测试数据 (score_test
) 和验证数据 (score_val
) 上的预测性能,以及生成测试数据预测 (pred_time_val
)、验证数据预测 (pred_time_val
) 以及仅训练此模型 (fit_time
) 所需的时间。下面我们展示了如何在没有新数据(仅使用先前在 fit
中保留用于验证的数据)的情况下生成排行榜,并可以显示每个模型的额外信息。
predictor.leaderboard(extra_info=True)
模型 | score_val | eval_metric | pred_time_val | fit_time | pred_time_val_marginal | fit_time_marginal | stack_level | can_infer | fit_order | ... | 超参数 | hyperparameters_fit | ag_args_fit | 特征 | 编译时间 | 子超参数 | child_hyperparameters_fit | child_ag_args_fit | 祖先模型 | 后代模型 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | LightGBM_BAG_L1 | 0.776399 | balanced_accuracy | 0.038194 | 3.931167 | 0.038194 | 3.931167 | 1 | True | 1 | ... | {'use_orig_features': True, 'valid_stacker': True, 'max_base_models': 0, 'max_base_models_per_type': 'auto', 'save_bag_folds': True, 'stratify': 'auto', 'bin': 'auto', 'n_bins': None} | {} | {'max_memory_usage_ratio': 1.0, 'max_time_limit_ratio': 1.0, 'max_time_limit': None, 'min_time_limit': 0, 'valid_raw_types': None, 'valid_special_types': None, 'ignored_type_group_special': None, 'ignored_type_group_raw': None, 'get_features_kwargs': None, 'get_features_kwargs_extra': None, 'predict_1_batch_size': None, 'temperature_scalar': None, 'drop_unique': False} | [race, marital-status, capital-gain, hours-per-week, workclass, occupation, relationship, education-num, fnlwgt, capital-loss, sex, native-country, education, age] | None | {'learning_rate': 0.05, 'num_boost_round': 200} | {'num_boost_round': 83} | {'max_memory_usage_ratio': 1.0, 'max_time_limit_ratio': 1.0, 'max_time_limit': None, 'min_time_limit': 0, 'valid_raw_types': ['bool', 'int', 'float', 'category'], 'valid_special_types': None, 'ignored_type_group_special': ['text_ngram', 'text_as_category'], 'ignored_type_group_raw': None, 'get_features_kwargs': None, 'get_features_kwargs_extra': None, 'predict_1_batch_size': None, 'temperature_scalar': None} | [] | [WeightedEnsemble_L2] |
1 | WeightedEnsemble_L2 | 0.776399 | balanced_accuracy | 0.038975 | 3.958286 | 0.000782 | 0.027119 | 2 | True | 3 | ... | {'use_orig_features': False, 'valid_stacker': True, 'max_base_models': 0, 'max_base_models_per_type': 'auto', 'save_bag_folds': True, 'stratify': 'auto', 'bin': 'auto', 'n_bins': None} | {} | {'max_memory_usage_ratio': 1.0, 'max_time_limit_ratio': 1.0, 'max_time_limit': None, 'min_time_limit': 0, 'valid_raw_types': None, 'valid_special_types': None, 'ignored_type_group_special': None, 'ignored_type_group_raw': None, 'get_features_kwargs': None, 'get_features_kwargs_extra': None, 'predict_1_batch_size': None, 'temperature_scalar': None, 'drop_unique': False} | [LightGBM_BAG_L1] | None | {'ensemble_size': 25, 'subsample_size': 1000000} | {'ensemble_size': 1} | {'max_memory_usage_ratio': 1.0, 'max_time_limit_ratio': 1.0, 'max_time_limit': None, 'min_time_limit': 0, 'valid_raw_types': None, 'valid_special_types': None, 'ignored_type_group_special': None, 'ignored_type_group_raw': None, 'get_features_kwargs': None, 'get_features_kwargs_extra': None, 'predict_1_batch_size': None, 'temperature_scalar': None, 'drop_unique': False} | [LightGBM_BAG_L1] | [] |
2 | NeuralNetFastAI_BAG_L1 | 0.741368 | balanced_accuracy | 0.073276 | 9.611355 | 0.073276 | 9.611355 | 1 | True | 2 | ... | {'use_orig_features': True, 'valid_stacker': True, 'max_base_models': 0, 'max_base_models_per_type': 'auto', 'save_bag_folds': True, 'stratify': 'auto', 'bin': 'auto', 'n_bins': None} | {} | {'max_memory_usage_ratio': 1.0, 'max_time_limit_ratio': 1.0, 'max_time_limit': None, 'min_time_limit': 0, 'valid_raw_types': None, 'valid_special_types': None, 'ignored_type_group_special': None, 'ignored_type_group_raw': None, 'get_features_kwargs': None, 'get_features_kwargs_extra': None, 'predict_1_batch_size': None, 'temperature_scalar': None, 'drop_unique': False} | [race, marital-status, capital-gain, hours-per-week, workclass, occupation, relationship, education-num, fnlwgt, capital-loss, sex, native-country, education, age] | None | {'layers': None, 'emb_drop': 0.1, 'ps': 0.1, 'bs': 'auto', 'lr': 0.01, 'epochs': 'auto', 'early.stopping.min_delta': 0.0001, 'early.stopping.patience': 20, 'smoothing': 0.0, 'num_epochs': 10} | {'epochs': 30, 'best_epoch': 9} | {'max_memory_usage_ratio': 1.0, 'max_time_limit_ratio': 1.0, 'max_time_limit': None, 'min_time_limit': 0, 'valid_raw_types': ['bool', 'int', 'float', 'category'], 'valid_special_types': None, 'ignored_type_group_special': ['text_ngram', 'text_as_category'], 'ignored_type_group_raw': None, 'get_features_kwargs': None, 'get_features_kwargs_extra': None, 'predict_1_batch_size': None, 'temperature_scalar': None} | [] | [] |
3 行 × 32 列
展开的排行榜显示了诸如每个模型使用的特征数量 (num_features
)、哪些其他模型是祖先模型,其预测是每个模型所需的输入 (ancestors
),以及如果同时持久化每个模型及其所有祖先模型将占用的内存量 (memory_size_w_ancestors
) 等属性。有关详细信息,请参阅排行榜文档。
要显示其他指标的分数,可以在传入 test_data
时指定 extra_metrics
参数。
predictor.leaderboard(test_data, extra_metrics=['accuracy', 'balanced_accuracy', 'log_loss'])
模型 | score_test | 准确率 | balanced_accuracy | log_loss | score_val | eval_metric | pred_time_test | pred_time_val | fit_time | pred_time_test_marginal | pred_time_val_marginal | fit_time_marginal | stack_level | can_infer | fit_order | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | LightGBM_BAG_L1 | 0.743784 | 0.847170 | 0.743784 | -0.334022 | 0.776399 | balanced_accuracy | 0.140065 | 0.038194 | 3.931167 | 0.140065 | 0.038194 | 3.931167 | 1 | True | 1 |
1 | WeightedEnsemble_L2 | 0.743784 | 0.847170 | 0.743784 | -0.334022 | 0.776399 | balanced_accuracy | 0.141402 | 0.038975 | 3.958286 | 0.001337 | 0.000782 | 0.027119 | 2 | True | 3 |
2 | NeuralNetFastAI_BAG_L1 | 0.724629 | 0.843792 | 0.724629 | -0.343404 | 0.741368 | balanced_accuracy | 1.179801 | 0.073276 | 9.611355 | 1.179801 | 0.073276 | 9.611355 | 1 | True | 2 |
请注意,log_loss
分数是负的。这是因为 AutoGluon 中的评估指标始终以“越高越好”的形式显示。这意味着诸如 log_loss
和 root_mean_squared_error
等指标的符号将被翻转,数值将为负。这是必要的,以避免用户在查看排行榜时需要了解指标本身才能判断数值是否越高越好。
另外一个注意事项:通过 extra_metrics
计算 log_loss
值时,结果可能是 -inf
。这是因为模型在训练过程中并没有以 log_loss
为优化目标,并且可能会出现某个类别的预测概率为 0 的情况(这在 K 近邻模型中尤其常见)。由于当正确类别的概率为 0 时,log_loss
会给出无限误差,因此会导致分数为 -inf
。因此,不建议将 log_loss
用作衡量模型质量的次要指标。要么将 log_loss
用作 eval_metric
,要么完全避免使用它。
以下是如何指定一个特定的模型用于预测,而不是使用 AutoGluon 的默认模型选择。
i = 0 # index of model to use
model_to_use = predictor.model_names()[i]
model_pred = predictor.predict(datapoint, model=model_to_use)
print("Prediction from %s model: %s" % (model_to_use, model_pred.iloc[0]))
Prediction from LightGBM_BAG_L1 model: <=50K
我们可以轻松访问有关训练好的预测器或特定模型的各种信息。
all_models = predictor.model_names()
model_to_use = all_models[i]
specific_model = predictor._trainer.load_model(model_to_use)
# Objects defined below are dicts of various information (not printed here as they are quite large):
model_info = specific_model.get_info()
predictor_information = predictor.info()
预测器还记住应该使用什么指标来评估预测结果,可以使用真实标签如下进行评估:
y_pred_proba = predictor.predict_proba(test_data_nolabel)
perf = predictor.evaluate_predictions(y_true=y_test, y_pred=y_pred_proba)
由于标签列保留在 test_data
DataFrame 中,我们可以改用简写形式:
perf = predictor.evaluate(test_data)
可解释性(特征重要性)¶
为了更好地理解我们训练好的预测器,我们可以估计每个特征的总体重要性。
predictor.feature_importance(test_data)
Computing feature importance via permutation shuffling for 14 features using 5000 rows with 5 shuffle sets...
8.34s = Expected runtime (1.67s per shuffle set)
5.38s = Actual runtime (Completed 5 of 5 shuffle sets)
重要性 | 标准差 | p 值 | n | p99 高 | p99 低 | |
---|---|---|---|---|---|---|
婚姻状况 | 0.068704 | 0.004542 | 2.279366e-06 | 5 | 0.078057 | 0.059352 |
资本收益 | 0.046431 | 0.002457 | 9.369035e-07 | 5 | 0.051489 | 0.041372 |
教育程度编号 | 0.042721 | 0.003485 | 5.268617e-06 | 5 | 0.049898 | 0.035545 |
年龄 | 0.035115 | 0.005922 | 9.348413e-05 | 5 | 0.047308 | 0.022922 |
职业 | 0.033699 | 0.007890 | 3.356604e-04 | 5 | 0.049945 | 0.017454 |
关系 | 0.014965 | 0.003663 | 3.983866e-04 | 5 | 0.022507 | 0.007423 |
每周工时 | 0.012270 | 0.003750 | 9.287608e-04 | 5 | 0.019992 | 0.004548 |
资本损失 | 0.002217 | 0.001260 | 8.531892e-03 | 5 | 0.004812 | -0.000378 |
教育 | 0.000319 | 0.000774 | 2.045314e-01 | 5 | 0.001912 | -0.001274 |
原籍国 | 0.000000 | 0.000000 | 5.000000e-01 | 5 | 0.000000 | 0.000000 |
种族 | -0.000256 | 0.000268 | 9.500382e-01 | 5 | 0.000296 | -0.000807 |
性别 | -0.000527 | 0.001303 | 7.914146e-01 | 5 | 0.002156 | -0.003210 |
工作类型 | -0.001594 | 0.002576 | 8.807408e-01 | 5 | 0.003709 | -0.006898 |
fnlwgt | -0.004992 | 0.001524 | 9.990763e-01 | 5 | -0.001855 | -0.008130 |
通过排列洗牌计算,这些特征重要性分数量化了当某一列的值在行之间随机洗牌时,预测性能(已训练好的预测器)的下降程度。此列表中排名靠前的特征对 AutoGluon 的准确性贡献最大(用于预测患者何时/是否会再次入院)。重要性分数非正的特征对预测器的准确性几乎没有贡献,甚至可能对数据包含有害(考虑从数据中删除这些特征并再次调用 fit
)。这些分数有助于解释预测器的全局行为(它依赖哪些特征进行所有预测)。要获得关于哪些特征影响特定预测的局部解释,请查看使用 Shapely 值解释特定 AutoGluon 预测的示例笔记本。
在判断 AutoGluon 是否比其他解决方案更具可解释性之前,我们建议阅读 Zachary Lipton 的《模型可解释性的神话》,该文论述了为什么像树和线性模型这样常被声称具有可解释性的模型,很少比更高级的模型具有更强的实际可解释性。
加速推理¶
我们介绍了多种方法来减少 AutoGluon 生成预测所需的时间。
在提供代码示例之前,了解 AutoGluon 中有几种加速推理的方法非常重要。下表按优先级列出了这些选项。
优化 |
推理加速 |
成本 |
备注 |
---|---|---|---|
refit_full |
至少 8 倍以上,最高可达 160 倍(需要 bagging) |
-质量,+拟合时间 |
仅在启用 bagging 时提供加速。 |
persist |
在线推理中最高可达 10 倍 |
++内存使用 |
如果内存不足以持久化模型,则无法获得加速。此加速最适用于在线推理,不适用于批量推理。 |
infer_limit |
可配置,约 50 倍 |
-质量(相对于加速) |
如果启用 bagging,在使用 |
distill |
约等于 |
--质量,++拟合时间 |
与 |
特征剪枝 |
通常最多 1.5 倍。如果愿意大幅降低质量,可以更高。 |
-质量?,++拟合时间 |
取决于数据中是否存在不重要的特征。调用 |
使用更快的硬件 |
通常最多 3 倍。取决于硬件(忽略 GPU)。 |
+硬件 |
例如,EC2 c6i.2xlarge 的速度比 m5.2xlarge 快约 1.6 倍,价格相似。特别是笔记本电脑,相比云实例可能较慢。 |
手动调整超参数 |
假设已指定 |
---质量?,+++用户机器学习专业知识 |
可能非常复杂,不建议使用。通过这种方式获得加速的潜在方法是减少 LightGBM、XGBoost、CatBoost、RandomForest 和 ExtraTrees 中树的数量。 |
手动数据预处理 |
假设已指定所有其他优化并且设置为在线推理,通常最多 1.2 倍。 |
++++用户机器学习专业知识,++++用户代码 |
仅与在线推理相关。不建议这样做,因为 AutoGluon 的默认预处理已经高度优化。 |
如果启用了 bagging(num_bag_folds
> 0 或 num_stack_levels
> 0 或使用 'best_quality
' 预设),推理优化的顺序应为:
refit_full
persist
infer_limit
如果未启用 bagging(num_bag_folds
= 0, num_stack_levels
= 0),推理优化的顺序应为:
persist
infer_limit
如果遵循这些建议仍未获得足够快的模型,您可以考虑表格中更高级的选项。
将模型保留在内存中¶
默认情况下,AutoGluon 会一次只加载一个模型到内存中,并且仅在需要进行预测时才加载。这种策略对于大型堆叠/bagged 集成模型来说很稳健,但会导致预测时间变慢。如果您计划重复进行预测(例如,一次处理一个新的数据点而不是一个大型测试数据集),您可以首先指定将推理所需的所有模型加载到内存中,如下所示:
predictor.persist()
num_test = 20
preds = np.array(['']*num_test, dtype='object')
for i in range(num_test):
datapoint = test_data_nolabel.iloc[[i]]
pred_numpy = predictor.predict(datapoint, as_pandas=False)
preds[i] = pred_numpy[0]
perf = predictor.evaluate_predictions(y_test[:num_test], preds, auxiliary_metrics=True)
print("Predictions: ", preds)
predictor.unpersist() # free memory by clearing models, future predict() calls will load models from disk
Predictions: [' <=50K' ' <=50K' ' >50K' ' <=50K' ' <=50K' ' >50K' ' >50K' ' >50K'
' <=50K' ' <=50K' ' <=50K' ' <=50K' ' <=50K' ' <=50K' ' <=50K' ' <=50K'
' <=50K' ' >50K' ' >50K' ' <=50K']
Persisting 2 models in memory. Models will require 0.01% of memory.
Unpersisted 2 models: ['WeightedEnsemble_L2', 'LightGBM_BAG_L1']
['WeightedEnsemble_L2', 'LightGBM_BAG_L1']
您也可以通过 persist()
的 models
参数指定要持久化的特定模型,或者简单地设置 models='all'
来同时加载 fit
期间训练的所有模型。
将推理速度作为 fit 约束¶
如果您在拟合预测器之前知道您的延迟约束,可以将其明确指定为 fit 参数。AutoGluon 将自动以试图满足该约束的方式训练模型。
此约束有两个组成部分:infer_limit
和 infer_limit_batch_size
infer_limit
是预测 1 行数据所需的时间(以秒为单位)。例如,infer_limit=0.05
表示每行数据 50 毫秒,即吞吐量为 20 行/秒。infer_limit_batch_size
是计算每行速度时一次传入进行预测的行数。这非常重要,因为infer_limit_batch_size=1
(在线推理)是非常次优的,因为各种操作都有一个固定的开销成本,与数据大小无关。如果您可以批量传递测试数据,应指定infer_limit_batch_size=10000
。
# At most 0.05 ms per row (20000 rows per second throughput)
infer_limit = 0.00005
# adhere to infer_limit with batches of size 10000 (batch-inference, easier to satisfy infer_limit)
infer_limit_batch_size = 10000
# adhere to infer_limit with batches of size 1 (online-inference, much harder to satisfy infer_limit)
# infer_limit_batch_size = 1 # Note that infer_limit<0.02 when infer_limit_batch_size=1 can be difficult to satisfy.
predictor_infer_limit = TabularPredictor(label=label, eval_metric=metric).fit(
train_data=train_data,
time_limit=30,
infer_limit=infer_limit,
infer_limit_batch_size=infer_limit_batch_size,
)
# NOTE: If bagging was enabled, it is important to call refit_full at this stage.
# infer_limit assumes that the user will call refit_full after fit.
# predictor_infer_limit.refit_full()
# NOTE: To align with inference speed calculated during fit, models must be persisted.
predictor_infer_limit.persist()
# Below is an optimized version that only persists the minimum required models for prediction.
# predictor_infer_limit.persist('best')
predictor_infer_limit.leaderboard()
No path specified. Models will be saved in: "AutogluonModels/ag-20250508_210115"
Verbosity: 2 (Standard Logging)
=================== System Info ===================
AutoGluon Version: 1.3.1b20250508
Python Version: 3.11.9
Operating System: Linux
Platform Machine: x86_64
Platform Version: #1 SMP Wed Mar 12 14:53:59 UTC 2025
CPU Count: 8
Memory Avail: 28.02 GB / 30.95 GB (90.6%)
Disk Space Avail: 211.72 GB / 255.99 GB (82.7%)
===================================================
No presets specified! To achieve strong results with AutoGluon, it is recommended to use the available presets. Defaulting to `'medium'`...
Recommended Presets (For more details refer to https://autogluon.cn/stable/tutorials/tabular/tabular-essentials.html#presets):
presets='experimental' : New in v1.2: Pre-trained foundation model + parallel fits. The absolute best accuracy without consideration for inference speed. Does not support GPU.
presets='best' : Maximize accuracy. Recommended for most users. Use in competitions and benchmarks.
presets='high' : Strong accuracy with fast inference speed.
presets='good' : Good accuracy with very fast inference speed.
presets='medium' : Fast training time, ideal for initial prototyping.
Beginning AutoGluon training ... Time limit = 30s
AutoGluon will save models to "/home/ci/autogluon/docs/tutorials/tabular/AutogluonModels/ag-20250508_210115"
Train Data Rows: 1000
Train Data Columns: 14
Label Column: class
AutoGluon infers your prediction problem is: 'binary' (because only two unique label-values observed).
2 unique label values: [' >50K', ' <=50K']
If 'binary' is not the correct problem_type, please manually specify the problem_type parameter during Predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression', 'quantile'])
Problem Type: binary
Preprocessing data ...
Selected class <--> label mapping: class 1 = >50K, class 0 = <=50K
Note: For your binary classification, AutoGluon arbitrarily selected which label-value represents positive ( >50K) vs negative ( <=50K) class.
To explicitly set the positive_class, either rename classes to 1 and 0, or specify positive_class in Predictor init.
Using Feature Generators to preprocess the data ...
Fitting AutoMLPipelineFeatureGenerator...
Available Memory: 28696.39 MB
Train Data (Original) Memory Usage: 0.56 MB (0.0% of available memory)
Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.
Stage 1 Generators:
Fitting AsTypeFeatureGenerator...
Note: Converting 1 features to boolean dtype as they only contain 2 unique values.
Stage 2 Generators:
Fitting FillNaFeatureGenerator...
Stage 3 Generators:
Fitting IdentityFeatureGenerator...
Fitting CategoryFeatureGenerator...
Fitting CategoryMemoryMinimizeFeatureGenerator...
Stage 4 Generators:
Fitting DropUniqueFeatureGenerator...
Stage 5 Generators:
Fitting DropDuplicatesFeatureGenerator...
Types of features in original data (raw dtype, special dtypes):
('int', []) : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...]
('object', []) : 8 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...]
Types of features in processed data (raw dtype, special dtypes):
('category', []) : 7 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...]
('int', []) : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...]
('int', ['bool']) : 1 | ['sex']
0.1s = Fit runtime
14 features in original data used to generate 14 features in processed data.
Train Data (Processed) Memory Usage: 0.06 MB (0.0% of available memory)
1.503μs = Feature Preprocessing Time (1 row | 10000 batch size)
Feature Preprocessing requires 3.01% of the overall inference constraint (0.05ms)
0.048ms inference time budget remaining for models...
Data preprocessing and feature engineering runtime = 0.29s ...
AutoGluon will gauge predictive performance using evaluation metric: 'accuracy'
To change this, specify the eval_metric parameter of Predictor()
Automatically generating train/validation split with holdout_frac=0.2, Train Rows: 800, Val Rows: 200
User-specified model hyperparameters to be fit:
{
'NN_TORCH': [{}],
'GBM': [{'extra_trees': True, 'ag_args': {'name_suffix': 'XT'}}, {}, {'learning_rate': 0.03, 'num_leaves': 128, 'feature_fraction': 0.9, 'min_data_in_leaf': 3, 'ag_args': {'name_suffix': 'Large', 'priority': 0, 'hyperparameter_tune_kwargs': None}}],
'CAT': [{}],
'XGB': [{}],
'FASTAI': [{}],
'RF': [{'criterion': 'gini', 'ag_args': {'name_suffix': 'Gini', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'entropy', 'ag_args': {'name_suffix': 'Entr', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'squared_error', 'ag_args': {'name_suffix': 'MSE', 'problem_types': ['regression', 'quantile']}}],
'XT': [{'criterion': 'gini', 'ag_args': {'name_suffix': 'Gini', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'entropy', 'ag_args': {'name_suffix': 'Entr', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'squared_error', 'ag_args': {'name_suffix': 'MSE', 'problem_types': ['regression', 'quantile']}}],
'KNN': [{'weights': 'uniform', 'ag_args': {'name_suffix': 'Unif'}}, {'weights': 'distance', 'ag_args': {'name_suffix': 'Dist'}}],
}
Fitting 13 L1 models, fit_strategy="sequential" ...
Fitting model: KNeighborsUnif ... Training model for up to 29.71s of the 29.71s of remaining time.
0.725 = Validation score (accuracy)
0.04s = Training runtime
0.01s = Validation runtime
3.713μs = Validation runtime (1 row | 10000 batch size | MARGINAL)
3.713μs = Validation runtime (1 row | 10000 batch size)
3.713μs = Validation runtime (1 row | 10000 batch size | REFIT | MARGINAL)
3.713μs = Validation runtime (1 row | 10000 batch size | REFIT)
Fitting model: KNeighborsDist ... Training model for up to 29.65s of the 29.65s of remaining time.
0.71 = Validation score (accuracy)
0.04s = Training runtime
0.01s = Validation runtime
3.201μs = Validation runtime (1 row | 10000 batch size | MARGINAL)
3.201μs = Validation runtime (1 row | 10000 batch size)
3.201μs = Validation runtime (1 row | 10000 batch size | REFIT | MARGINAL)
3.201μs = Validation runtime (1 row | 10000 batch size | REFIT)
Fitting model: LightGBMXT ... Training model for up to 29.59s of the 29.59s of remaining time.
0.85 = Validation score (accuracy)
0.39s = Training runtime
0.0s = Validation runtime
1.246μs = Validation runtime (1 row | 10000 batch size | MARGINAL)
1.246μs = Validation runtime (1 row | 10000 batch size)
1.246μs = Validation runtime (1 row | 10000 batch size | REFIT | MARGINAL)
1.246μs = Validation runtime (1 row | 10000 batch size | REFIT)
Fitting model: LightGBM ... Training model for up to 29.19s of the 29.19s of remaining time.
0.84 = Validation score (accuracy)
0.48s = Training runtime
0.0s = Validation runtime
1.231μs = Validation runtime (1 row | 10000 batch size | MARGINAL)
1.231μs = Validation runtime (1 row | 10000 batch size)
1.231μs = Validation runtime (1 row | 10000 batch size | REFIT | MARGINAL)
1.231μs = Validation runtime (1 row | 10000 batch size | REFIT)
Fitting model: RandomForestGini ... Training model for up to 28.69s of the 28.69s of remaining time.
0.84 = Validation score (accuracy)
0.77s = Training runtime
0.06s = Validation runtime
8.754μs = Validation runtime (1 row | 10000 batch size | MARGINAL)
8.754μs = Validation runtime (1 row | 10000 batch size)
8.754μs = Validation runtime (1 row | 10000 batch size | REFIT | MARGINAL)
8.754μs = Validation runtime (1 row | 10000 batch size | REFIT)
Fitting model: RandomForestEntr ... Training model for up to 27.85s of the 27.84s of remaining time.
0.835 = Validation score (accuracy)
0.68s = Training runtime
0.06s = Validation runtime
8.729μs = Validation runtime (1 row | 10000 batch size | MARGINAL)
8.729μs = Validation runtime (1 row | 10000 batch size)
8.729μs = Validation runtime (1 row | 10000 batch size | REFIT | MARGINAL)
8.729μs = Validation runtime (1 row | 10000 batch size | REFIT)
Fitting model: CatBoost ... Training model for up to 27.09s of the 27.09s of remaining time.
0.86 = Validation score (accuracy)
1.95s = Training runtime
0.0s = Validation runtime
0.815μs = Validation runtime (1 row | 10000 batch size | MARGINAL)
0.815μs = Validation runtime (1 row | 10000 batch size)
0.815μs = Validation runtime (1 row | 10000 batch size | REFIT | MARGINAL)
0.815μs = Validation runtime (1 row | 10000 batch size | REFIT)
Fitting model: ExtraTreesGini ... Training model for up to 25.13s of the 25.13s of remaining time.
0.815 = Validation score (accuracy)
0.68s = Training runtime
0.06s = Validation runtime
8.729μs = Validation runtime (1 row | 10000 batch size | MARGINAL)
8.729μs = Validation runtime (1 row | 10000 batch size)
8.729μs = Validation runtime (1 row | 10000 batch size | REFIT | MARGINAL)
8.729μs = Validation runtime (1 row | 10000 batch size | REFIT)
Fitting model: ExtraTreesEntr ... Training model for up to 24.38s of the 24.38s of remaining time.
0.82 = Validation score (accuracy)
0.67s = Training runtime
0.06s = Validation runtime
8.713μs = Validation runtime (1 row | 10000 batch size | MARGINAL)
8.713μs = Validation runtime (1 row | 10000 batch size)
8.713μs = Validation runtime (1 row | 10000 batch size | REFIT | MARGINAL)
8.713μs = Validation runtime (1 row | 10000 batch size | REFIT)
Fitting model: NeuralNetFastAI ... Training model for up to 23.64s of the 23.63s of remaining time.
No improvement since epoch 7: early stopping
0.84 = Validation score (accuracy)
0.99s = Training runtime
0.01s = Validation runtime
0.012ms = Validation runtime (1 row | 10000 batch size | MARGINAL)
0.012ms = Validation runtime (1 row | 10000 batch size)
0.012ms = Validation runtime (1 row | 10000 batch size | REFIT | MARGINAL)
0.012ms = Validation runtime (1 row | 10000 batch size | REFIT)
Fitting model: XGBoost ... Training model for up to 22.62s of the 22.62s of remaining time.
0.855 = Validation score (accuracy)
0.4s = Training runtime
0.01s = Validation runtime
2.198μs = Validation runtime (1 row | 10000 batch size | MARGINAL)
2.198μs = Validation runtime (1 row | 10000 batch size)
2.198μs = Validation runtime (1 row | 10000 batch size | REFIT | MARGINAL)
2.198μs = Validation runtime (1 row | 10000 batch size | REFIT)
Fitting model: NeuralNetTorch ... Training model for up to 22.21s of the 22.21s of remaining time.
0.855 = Validation score (accuracy)
3.51s = Training runtime
0.01s = Validation runtime
4.458μs = Validation runtime (1 row | 10000 batch size | MARGINAL)
4.458μs = Validation runtime (1 row | 10000 batch size)
4.458μs = Validation runtime (1 row | 10000 batch size | REFIT | MARGINAL)
4.458μs = Validation runtime (1 row | 10000 batch size | REFIT)
Fitting model: LightGBMLarge ... Training model for up to 18.68s of the 18.68s of remaining time.
0.795 = Validation score (accuracy)
0.83s = Training runtime
0.0s = Validation runtime
3.908μs = Validation runtime (1 row | 10000 batch size | MARGINAL)
3.908μs = Validation runtime (1 row | 10000 batch size)
3.908μs = Validation runtime (1 row | 10000 batch size | REFIT | MARGINAL)
3.908μs = Validation runtime (1 row | 10000 batch size | REFIT)
Removing 5/13 base models to satisfy inference constraint (constraint=0.046ms) ...
0.068ms -> 0.065ms (KNeighborsDist)
0.065ms -> 0.061ms (KNeighborsUnif)
0.061ms -> 0.057ms (LightGBMLarge)
0.057ms -> 0.049ms (ExtraTreesGini)
0.049ms -> 0.04ms (ExtraTreesEntr)
Fitting model: WeightedEnsemble_L2 ... Training model for up to 29.71s of the 17.79s of remaining time.
Ensemble Weights: {'RandomForestGini': 0.333, 'CatBoost': 0.333, 'XGBoost': 0.333}
0.875 = Validation score (accuracy)
0.1s = Training runtime
0.0s = Validation runtime
0.131μs = Validation runtime (1 row | 10000 batch size | MARGINAL)
0.012ms = Validation runtime (1 row | 10000 batch size)
0.131μs = Validation runtime (1 row | 10000 batch size | REFIT | MARGINAL)
0.012ms = Validation runtime (1 row | 10000 batch size | REFIT)
AutoGluon training complete, total runtime = 12.34s ... Best model: WeightedEnsemble_L2 | Estimated inference throughput: 2909.9 rows/s (200 batch size)
Disabling decision threshold calibration for metric `accuracy` due to having fewer than 10000 rows of validation data for calibration, to avoid overfitting (200 rows).
`accuracy` is generally not improved through threshold calibration. Force calibration via specifying `calibrate_decision_threshold=True`.
TabularPredictor saved. To load, use: predictor = TabularPredictor.load("/home/ci/autogluon/docs/tutorials/tabular/AutogluonModels/ag-20250508_210115")
Persisting 4 models in memory. Models will require 0.02% of memory.
模型 | score_val | eval_metric | pred_time_val | fit_time | pred_time_val_marginal | fit_time_marginal | stack_level | can_infer | fit_order | |
---|---|---|---|---|---|---|---|---|---|---|
0 | WeightedEnsemble_L2 | 0.875 | 准确率 | 0.068731 | 3.214255 | 0.000926 | 0.103268 | 2 | True | 14 |
1 | CatBoost | 0.860 | 准确率 | 0.003737 | 1.947394 | 0.003737 | 1.947394 | 1 | True | 7 |
2 | XGBoost | 0.855 | 准确率 | 0.005009 | 0.395715 | 0.005009 | 0.395715 | 1 | True | 11 |
3 | NeuralNetTorch | 0.855 | 准确率 | 0.009604 | 3.513811 | 0.009604 | 3.513811 | 1 | True | 12 |
4 | LightGBMXT | 0.850 | 准确率 | 0.003705 | 0.390207 | 0.003705 | 0.390207 | 1 | True | 3 |
5 | LightGBM | 0.840 | 准确率 | 0.003618 | 0.483905 | 0.003618 | 0.483905 | 1 | True | 4 |
6 | NeuralNetFastAI | 0.840 | 准确率 | 0.008258 | 0.992681 | 0.008258 | 0.992681 | 1 | True | 10 |
7 | RandomForestGini | 0.840 | 准确率 | 0.059059 | 0.767877 | 0.059059 | 0.767877 | 1 | True | 5 |
8 | RandomForestEntr | 0.835 | 准确率 | 0.060193 | 0.675278 | 0.060193 | 0.675278 | 1 | True | 6 |
9 | ExtraTreesEntr | 0.820 | 准确率 | 0.058165 | 0.667979 | 0.058165 | 0.667979 | 1 | True | 9 |
10 | ExtraTreesGini | 0.815 | 准确率 | 0.056565 | 0.676969 | 0.056565 | 0.676969 | 1 | True | 8 |
11 | LightGBMLarge | 0.795 | 准确率 | 0.004443 | 0.831450 | 0.004443 | 0.831450 | 1 | True | 13 |
12 | KNeighborsUnif | 0.725 | 准确率 | 0.013350 | 0.043004 | 0.013350 | 0.043004 | 1 | True | 1 |
13 | KNeighborsDist | 0.710 | 准确率 | 0.013417 | 0.036869 | 0.013417 | 0.036869 | 1 | True | 2 |
现在我们可以测试最终模型的推理速度,并检查它是否满足推理约束。
test_data_batch = test_data.sample(infer_limit_batch_size, replace=True, ignore_index=True)
import time
time_start = time.time()
predictor_infer_limit.predict(test_data_batch)
time_end = time.time()
infer_time_per_row = (time_end - time_start) / len(test_data_batch)
rows_per_second = 1 / infer_time_per_row
infer_time_per_row_ratio = infer_time_per_row / infer_limit
is_constraint_satisfied = infer_time_per_row_ratio <= 1
print(f'Model is able to predict {round(rows_per_second, 1)} rows per second. (User-specified Throughput = {1 / infer_limit})')
print(f'Model uses {round(infer_time_per_row_ratio * 100, 1)}% of infer_limit time per row.')
print(f'Model satisfies inference constraint: {is_constraint_satisfied}')
Model is able to predict 73034.0 rows per second. (User-specified Throughput = 20000.0)
Model uses 27.4% of infer_limit time per row.
Model satisfies inference constraint: True
使用更小的集成模型或更快的模型进行预测¶
无需重新训练任何模型,就可以构建替代的集成模型,这些模型使用不同的加权方案聚合各个模型的预测结果。如果这些集成模型对较少的模型分配非零权重,它们会变得更小(因此预测速度更快)。您可以像这样生成各种权衡准确性与速度的集成模型。
additional_ensembles = predictor.fit_weighted_ensemble(expand_pareto_frontier=True)
print("Alternative ensembles you can use for prediction:", additional_ensembles)
predictor.leaderboard(only_pareto_frontier=True)
Alternative ensembles you can use for prediction: ['WeightedEnsemble_L2Best']
Fitting model: WeightedEnsemble_L2Best ...
Ensemble Weights: {'LightGBM_BAG_L1': 1.0}
0.7764 = Validation score (balanced_accuracy)
0.03s = Training runtime
0.0s = Validation runtime
模型 | score_val | eval_metric | pred_time_val | fit_time | pred_time_val_marginal | fit_time_marginal | stack_level | can_infer | fit_order | |
---|---|---|---|---|---|---|---|---|---|---|
0 | LightGBM_BAG_L1 | 0.776399 | balanced_accuracy | 0.038194 | 3.931167 | 0.038194 | 3.931167 | 1 | True | 1 |
生成的排行榜将包含在给定推理延迟下最准确的模型。您可以从排行榜中选择任何延迟可接受的模型进行预测。
model_for_prediction = additional_ensembles[0]
predictions = predictor.predict(test_data, model=model_for_prediction)
predictor.delete_models(models_to_delete=additional_ensembles, dry_run=False) # delete these extra models so they don't affect rest of tutorial
Deleting model WeightedEnsemble_L2Best. All files under /home/ci/autogluon/docs/tutorials/tabular/agModels-predictClass/models/WeightedEnsemble_L2Best will be removed.
通过 refit_full 折叠 bagged 集成模型¶
对于使用 bagging 训练的集成预测器(如上所述),回想一下,每个单独的模型在不同的训练/验证折叠上训练了大约 10 个 bagged 副本。我们可以将这大约 10 个 bagged 模型折叠成一个拟合到完整数据集的单个模型,这可以大大减少其内存/延迟要求(但也可能降低准确性)。下面我们为每个原始模型重新拟合了这样一个模型,但您也可以通过指定 refit_full()
的 model
参数来仅针对特定模型进行此操作。
refit_model_map = predictor.refit_full()
print("Name of each refit-full model corresponding to a previous bagged ensemble:")
print(refit_model_map)
predictor.leaderboard(test_data)
Name of each refit-full model corresponding to a previous bagged ensemble:
{'LightGBM_BAG_L1': 'LightGBM_BAG_L1_FULL', 'NeuralNetFastAI_BAG_L1': 'NeuralNetFastAI_BAG_L1_FULL', 'WeightedEnsemble_L2': 'WeightedEnsemble_L2_FULL'}
Refitting models via `predictor.refit_full` using all of the data (combined train and validation)...
Models trained in this way will have the suffix "_FULL" and have NaN validation score.
This process is not bound by time_limit, but should take less time than the original `predictor.fit` call.
To learn more, refer to the `.refit_full` method docstring which explains how "_FULL" models differ from normal models.
Fitting 1 L1 models, fit_strategy="sequential" ...
Fitting model: LightGBM_BAG_L1_FULL ...
0.33s = Training runtime
Fitting 1 L1 models, fit_strategy="sequential" ...
Fitting model: NeuralNetFastAI_BAG_L1_FULL ...
Metric balanced_accuracy is not supported by this model - using log_loss instead
Stopping at the best epoch learned earlier - 9.
0.44s = Training runtime
Fitting model: WeightedEnsemble_L2_FULL | Skipping fit via cloning parent ...
Ensemble Weights: {'LightGBM_BAG_L1': 1.0}
0.03s = Training runtime
Updated best model to "WeightedEnsemble_L2_FULL" (Previously "WeightedEnsemble_L2"). AutoGluon will default to using "WeightedEnsemble_L2_FULL" for predict() and predict_proba().
Refit complete, total runtime = 0.85s ... Best model: "WeightedEnsemble_L2_FULL"
模型 | score_test | score_val | eval_metric | pred_time_test | pred_time_val | fit_time | pred_time_test_marginal | pred_time_val_marginal | fit_time_marginal | stack_level | can_infer | fit_order | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | LightGBM_BAG_L1_FULL | 0.750092 | NaN | balanced_accuracy | 0.025266 | NaN | 0.326699 | 0.025266 | NaN | 0.326699 | 1 | True | 4 |
1 | WeightedEnsemble_L2_FULL | 0.750092 | NaN | balanced_accuracy | 0.026516 | NaN | 0.353818 | 0.001251 | NaN | 0.027119 | 2 | True | 6 |
2 | LightGBM_BAG_L1 | 0.743784 | 0.776399 | balanced_accuracy | 0.137620 | 0.038194 | 3.931167 | 0.137620 | 0.038194 | 3.931167 | 1 | True | 1 |
3 | WeightedEnsemble_L2 | 0.743784 | 0.776399 | balanced_accuracy | 0.138921 | 0.038975 | 3.958286 | 0.001301 | 0.000782 | 0.027119 | 2 | True | 3 |
4 | NeuralNetFastAI_BAG_L1 | 0.724629 | 0.741368 | balanced_accuracy | 1.188221 | 0.073276 | 9.611355 | 1.188221 | 0.073276 | 9.611355 | 1 | True | 2 |
5 | NeuralNetFastAI_BAG_L1_FULL | 0.700878 | NaN | balanced_accuracy | 0.282469 | NaN | 0.437760 | 0.282469 | NaN | 0.437760 | 1 | True | 5 |
这将 refit-full 模型添加到排行榜中,我们可以选择使用其中任何一个进行预测,就像使用任何其他模型一样。注意 pred_time_test
和 pred_time_val
列出了使用每个模型在测试/验证数据上生成预测所需的时间(以秒为单位)。由于 refit-full 模型是使用所有数据进行训练的,因此它们没有可用的内部验证分数 (score_val
)。您也可以使用非 bagged 模型调用 refit_full()
,将相同的模型重新拟合到您的完整数据集(在这种情况下不会有内存/延迟增益,但测试准确性可能会提高)。
模型蒸馏¶
虽然计算开销较低,但单个模型通常比加权/堆叠/bagged 集成模型的准确性低。模型蒸馏提供了一种方法,可以在保留单个模型的计算优势的同时,享受集成模型带来的准确性提升。其思想是训练单个模型(我们可以称之为学生)来模仿完整的堆栈集成模型(教师)的预测结果。与 refit_full()
一样,distill()
函数将生成可供我们选择用于预测的额外模型。
student_models = predictor.distill(time_limit=30) # specify much longer time limit in real applications
print(student_models)
preds_student = predictor.predict(test_data_nolabel, model=student_models[0])
print(f"predictions from {student_models[0]}:", list(preds_student)[:5])
predictor.leaderboard(test_data)
['RandomForestMSE_DSTL', 'WeightedEnsemble_L2_DSTL']
predictions from RandomForestMSE_DSTL: [' <=50K', ' <=50K', ' >50K', ' <=50K', ' <=50K']
Distilling with teacher='WeightedEnsemble_L2_FULL', teacher_preds=soft, augment_method=spunge ...
SPUNGE: Augmenting training data with 4000 synthetic samples for distillation...
Distilling with each of these student models: ['LightGBM_DSTL', 'CatBoost_DSTL', 'RandomForestMSE_DSTL', 'NeuralNetTorch_DSTL']
Fitting 4 L1 models, fit_strategy="sequential" ...
Fitting model: LightGBM_DSTL ... Training model for up to 30.00s of the 30.00s of remaining time.
Warning: Exception caused LightGBM_DSTL to fail during training... Skipping this model.
pandas dtypes must be int, float or bool.
Fields with bad pandas dtypes: workclass: object, education: object, marital-status: object, occupation: object, relationship: object, race: object, native-country: object
Detailed Traceback:
Traceback (most recent call last):
File "/home/ci/autogluon/tabular/src/autogluon/tabular/trainer/abstract_trainer.py", line 2169, in _train_and_save
model = self._train_single(**model_fit_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ci/autogluon/tabular/src/autogluon/tabular/trainer/abstract_trainer.py", line 2055, in _train_single
model = model.fit(X=X, y=y, X_val=X_val, y_val=y_val, X_test=X_test, y_test=y_test, total_resources=total_resources, **model_fit_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ci/autogluon/core/src/autogluon/core/models/abstract/abstract_model.py", line 1051, in fit
out = self._fit(**kwargs)
^^^^^^^^^^^^^^^^^^^
File "/home/ci/autogluon/tabular/src/autogluon/tabular/models/lgb/lgb_model.py", line 299, in _fit
self.model = train_lgb_model(early_stopping_callback_kwargs=early_stopping_callback_kwargs, **train_params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ci/autogluon/tabular/src/autogluon/tabular/models/lgb/lgb_utils.py", line 134, in train_lgb_model
return lgb.train(**train_params)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ci/opt/venv/lib/python3.11/site-packages/lightgbm/engine.py", line 297, in train
booster = Booster(params=params, train_set=train_set)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ci/opt/venv/lib/python3.11/site-packages/lightgbm/basic.py", line 3656, in __init__
train_set.construct()
File "/home/ci/opt/venv/lib/python3.11/site-packages/lightgbm/basic.py", line 2590, in construct
self._lazy_init(
File "/home/ci/opt/venv/lib/python3.11/site-packages/lightgbm/basic.py", line 2123, in _lazy_init
data, feature_name, categorical_feature, self.pandas_categorical = _data_from_pandas(
^^^^^^^^^^^^^^^^^^
File "/home/ci/opt/venv/lib/python3.11/site-packages/lightgbm/basic.py", line 868, in _data_from_pandas
_pandas_to_numpy(data, target_dtype=target_dtype),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ci/opt/venv/lib/python3.11/site-packages/lightgbm/basic.py", line 814, in _pandas_to_numpy
_check_for_bad_pandas_dtypes(data.dtypes)
File "/home/ci/opt/venv/lib/python3.11/site-packages/lightgbm/basic.py", line 805, in _check_for_bad_pandas_dtypes
raise ValueError(
ValueError: pandas dtypes must be int, float or bool.
Fields with bad pandas dtypes: workclass: object, education: object, marital-status: object, occupation: object, relationship: object, race: object, native-country: object
Fitting model: CatBoost_DSTL ... Training model for up to 29.48s of the 29.48s of remaining time.
Warning: Exception caused CatBoost_DSTL to fail during training... Skipping this model.
features data: pandas.DataFrame column 'workclass' has dtype 'category' but is not in cat_features list
Detailed Traceback:
Traceback (most recent call last):
File "/home/ci/autogluon/tabular/src/autogluon/tabular/trainer/abstract_trainer.py", line 2169, in _train_and_save
model = self._train_single(**model_fit_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ci/autogluon/tabular/src/autogluon/tabular/trainer/abstract_trainer.py", line 2055, in _train_single
model = model.fit(X=X, y=y, X_val=X_val, y_val=y_val, X_test=X_test, y_test=y_test, total_resources=total_resources, **model_fit_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ci/autogluon/core/src/autogluon/core/models/abstract/abstract_model.py", line 1051, in fit
out = self._fit(**kwargs)
^^^^^^^^^^^^^^^^^^^
File "/home/ci/autogluon/tabular/src/autogluon/tabular/models/catboost/catboost_model.py", line 154, in _fit
X_val = Pool(data=X_val, label=y_val, cat_features=cat_features, weight=sample_weight_val)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ci/opt/venv/lib/python3.11/site-packages/catboost/core.py", line 855, in __init__
self._init(data, label, cat_features, text_features, embedding_features, embedding_features_data, pairs, graph, weight,
File "/home/ci/opt/venv/lib/python3.11/site-packages/catboost/core.py", line 1491, in _init
self._init_pool(data, label, cat_features, text_features, embedding_features, embedding_features_data, pairs, graph, weight,
File "_catboost.pyx", line 4329, in _catboost._PoolBase._init_pool
File "_catboost.pyx", line 4381, in _catboost._PoolBase._init_pool
File "_catboost.pyx", line 4190, in _catboost._PoolBase._init_features_order_layout_pool
File "_catboost.pyx", line 3070, in _catboost._set_features_order_data_pd_data_frame
_catboost.CatBoostError: features data: pandas.DataFrame column 'workclass' has dtype 'category' but is not in cat_features list
Fitting model: RandomForestMSE_DSTL ... Training model for up to 29.22s of the 29.22s of remaining time.
/home/ci/autogluon/tabular/src/autogluon/tabular/models/rf/rf_model.py:83: FutureWarning: Downcasting object dtype arrays on .fillna, .ffill, .bfill is deprecated and will change in a future version. Call result.infer_objects(copy=False) instead. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`
X = X.fillna(0).to_numpy(dtype=np.float32)
Note: model has different eval_metric than default.
-0.1103 = Validation score (-mean_squared_error)
1.39s = Training runtime
0.07s = Validation runtime
Fitting model: NeuralNetTorch_DSTL ... Training model for up to 27.65s of the 27.65s of remaining time.
Warning: Exception caused NeuralNetTorch_DSTL to fail during training... Skipping this model.
Found array with 0 feature(s) (shape=(4800, 0)) while a minimum of 1 is required.
Detailed Traceback:
Traceback (most recent call last):
File "/home/ci/autogluon/tabular/src/autogluon/tabular/trainer/abstract_trainer.py", line 2169, in _train_and_save
model = self._train_single(**model_fit_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ci/autogluon/tabular/src/autogluon/tabular/trainer/abstract_trainer.py", line 2055, in _train_single
model = model.fit(X=X, y=y, X_val=X_val, y_val=y_val, X_test=X_test, y_test=y_test, total_resources=total_resources, **model_fit_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ci/autogluon/core/src/autogluon/core/models/abstract/abstract_model.py", line 1051, in fit
out = self._fit(**kwargs)
^^^^^^^^^^^^^^^^^^^
File "/home/ci/autogluon/tabular/src/autogluon/tabular/models/tabular_nn/torch/tabular_nn_torch.py", line 209, in _fit
train_dataset = self._generate_dataset(X=X, y=y, train_params=processor_kwargs, is_train=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ci/autogluon/tabular/src/autogluon/tabular/models/tabular_nn/torch/tabular_nn_torch.py", line 687, in _generate_dataset
dataset = self._process_train_data(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ci/autogluon/tabular/src/autogluon/tabular/models/tabular_nn/torch/tabular_nn_torch.py", line 759, in _process_train_data
df = self.processor.fit_transform(df)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ci/opt/venv/lib/python3.11/site-packages/sklearn/utils/_set_output.py", line 319, in wrapped
data_to_wrap = f(self, X, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ci/opt/venv/lib/python3.11/site-packages/sklearn/base.py", line 1389, in wrapper
return fit_method(estimator, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ci/opt/venv/lib/python3.11/site-packages/sklearn/compose/_column_transformer.py", line 1001, in fit_transform
result = self._call_func_on_transformers(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ci/opt/venv/lib/python3.11/site-packages/sklearn/compose/_column_transformer.py", line 910, in _call_func_on_transformers
return Parallel(n_jobs=self.n_jobs)(jobs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ci/opt/venv/lib/python3.11/site-packages/sklearn/utils/parallel.py", line 77, in __call__
return super().__call__(iterable_with_config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ci/opt/venv/lib/python3.11/site-packages/joblib/parallel.py", line 1985, in __call__
return output if self.return_generator else list(output)
^^^^^^^^^^^^
File "/home/ci/opt/venv/lib/python3.11/site-packages/joblib/parallel.py", line 1913, in _get_sequential_output
res = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/ci/opt/venv/lib/python3.11/site-packages/sklearn/utils/parallel.py", line 139, in __call__
return self.function(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ci/opt/venv/lib/python3.11/site-packages/sklearn/pipeline.py", line 1551, in _fit_transform_one
res = transformer.fit_transform(X, y, **params.get("fit_transform", {}))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ci/opt/venv/lib/python3.11/site-packages/sklearn/base.py", line 1389, in wrapper
return fit_method(estimator, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ci/opt/venv/lib/python3.11/site-packages/sklearn/pipeline.py", line 730, in fit_transform
return last_step.fit_transform(
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ci/opt/venv/lib/python3.11/site-packages/sklearn/utils/_set_output.py", line 319, in wrapped
data_to_wrap = f(self, X, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ci/opt/venv/lib/python3.11/site-packages/sklearn/base.py", line 918, in fit_transform
return self.fit(X, **fit_params).transform(X)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ci/autogluon/tabular/src/autogluon/tabular/models/tabular_nn/utils/categorical_encoders.py", line 767, in fit
self._fit(X, handle_unknown="ignore")
File "/home/ci/autogluon/tabular/src/autogluon/tabular/models/tabular_nn/utils/categorical_encoders.py", line 202, in _fit
X_list, n_samples, n_features = self._check_X(X)
^^^^^^^^^^^^^^^^
File "/home/ci/autogluon/tabular/src/autogluon/tabular/models/tabular_nn/utils/categorical_encoders.py", line 168, in _check_X
X_temp = check_array(X, dtype=None, ensure_all_finite=False)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ci/opt/venv/lib/python3.11/site-packages/sklearn/utils/validation.py", line 1139, in check_array
raise ValueError(
ValueError: Found array with 0 feature(s) (shape=(4800, 0)) while a minimum of 1 is required.
Distilling with each of these student models: ['WeightedEnsemble_L2_DSTL']
Fitting model: WeightedEnsemble_L2_DSTL ... Training model for up to 30.00s of the 27.37s of remaining time.
Ensemble Weights: {'RandomForestMSE_DSTL': 1.0}
Note: model has different eval_metric than default.
-0.1103 = Validation score (-mean_squared_error)
0.0s = Training runtime
0.0s = Validation runtime
Distilled model leaderboard:
model score_val eval_metric pred_time_val fit_time pred_time_val_marginal fit_time_marginal stack_level can_infer fit_order
0 RandomForestMSE_DSTL 0.718252 mean_squared_error 0.067172 1.385662 0.067172 1.385662 1 True 7
1 WeightedEnsemble_L2_DSTL 0.718252 mean_squared_error 0.067812 1.388410 0.000640 0.002748 2 True 8
模型 | score_test | score_val | eval_metric | pred_time_test | pred_time_val | fit_time | pred_time_test_marginal | pred_time_val_marginal | fit_time_marginal | stack_level | can_infer | fit_order | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | LightGBM_BAG_L1_FULL | 0.750092 | NaN | balanced_accuracy | 0.021516 | NaN | 0.326699 | 0.021516 | NaN | 0.326699 | 1 | True | 4 |
1 | WeightedEnsemble_L2_FULL | 0.750092 | NaN | balanced_accuracy | 0.022803 | NaN | 0.353818 | 0.001287 | NaN | 0.027119 | 2 | True | 6 |
2 | LightGBM_BAG_L1 | 0.743784 | 0.776399 | balanced_accuracy | 0.145854 | 0.038194 | 3.931167 | 0.145854 | 0.038194 | 3.931167 | 1 | True | 1 |
3 | WeightedEnsemble_L2 | 0.743784 | 0.776399 | balanced_accuracy | 0.147171 | 0.038975 | 3.958286 | 0.001318 | 0.000782 | 0.027119 | 2 | True | 3 |
4 | RandomForestMSE_DSTL | 0.732074 | 0.718252 | mean_squared_error | 0.187299 | 0.067172 | 1.385662 | 0.187299 | 0.067172 | 1.385662 | 1 | True | 7 |
5 | WeightedEnsemble_L2_DSTL | 0.732074 | 0.718252 | mean_squared_error | 0.189372 | 0.067812 | 1.388410 | 0.002073 | 0.000640 | 0.002748 | 2 | True | 8 |
6 | NeuralNetFastAI_BAG_L1 | 0.724629 | 0.741368 | balanced_accuracy | 1.158053 | 0.073276 | 9.611355 | 1.158053 | 0.073276 | 9.611355 | 1 | True | 2 |
7 | NeuralNetFastAI_BAG_L1_FULL | 0.700878 | NaN | balanced_accuracy | 0.285300 | NaN | 0.437760 | 0.285300 | NaN | 0.437760 | 1 | True | 5 |
更快的预设或超参数¶
与其在预测时试图加速一个笨重的训练模型,如果您从一开始就知道推理延迟或内存会成为问题,那么您可以相应地调整训练过程,以确保 fit()
不会产生难以处理的模型。
一种选择是指定更轻量的 presets
presets = ['good_quality', 'optimize_for_deployment']
predictor_light = TabularPredictor(label=label, eval_metric=metric).fit(train_data, presets=presets, time_limit=30)
No path specified. Models will be saved in: "AutogluonModels/ag-20250508_210145"
Verbosity: 2 (Standard Logging)
=================== System Info ===================
AutoGluon Version: 1.3.1b20250508
Python Version: 3.11.9
Operating System: Linux
Platform Machine: x86_64
Platform Version: #1 SMP Wed Mar 12 14:53:59 UTC 2025
CPU Count: 8
Memory Avail: 27.49 GB / 30.95 GB (88.8%)
Disk Space Avail: 211.59 GB / 255.99 GB (82.7%)
===================================================
Presets specified: ['good_quality', 'optimize_for_deployment']
Setting dynamic_stacking from 'auto' to True. Reason: Enable dynamic_stacking when use_bag_holdout is disabled. (use_bag_holdout=False)
Stack configuration (auto_stack=True): num_stack_levels=1, num_bag_folds=8, num_bag_sets=1
Note: `save_bag_folds=False`! This will greatly reduce peak disk usage during fit (by ~8x), but runs the risk of an out-of-memory error during model refit if memory is small relative to the data size.
You can avoid this risk by setting `save_bag_folds=True`.
DyStack is enabled (dynamic_stacking=True). AutoGluon will try to determine whether the input data is affected by stacked overfitting and enable or disable stacking as a consequence.
This is used to identify the optimal `num_stack_levels` value. Copies of AutoGluon will be fit on subsets of the data. Then holdout validation data is used to detect stacked overfitting.
Running DyStack for up to 7s of the 30s of remaining time (25%).
Context path: "/home/ci/autogluon/docs/tutorials/tabular/AutogluonModels/ag-20250508_210145/ds_sub_fit/sub_fit_ho"
---------------------------------------------------------------------------
KeyboardInterrupt Traceback (most recent call last)
Cell In[32], line 2
1 presets = ['good_quality', 'optimize_for_deployment']
----> 2 predictor_light = TabularPredictor(label=label, eval_metric=metric).fit(train_data, presets=presets, time_limit=30)
File ~/autogluon/core/src/autogluon/core/utils/decorators.py:31, in unpack.<locals>._unpack_inner.<locals>._call(*args, **kwargs)
28 @functools.wraps(f)
29 def _call(*args, **kwargs):
30 gargs, gkwargs = g(*other_args, *args, **kwargs)
---> 31 return f(*gargs, **gkwargs)
File ~/autogluon/tabular/src/autogluon/tabular/predictor/predictor.py:1282, in TabularPredictor.fit(self, train_data, tuning_data, time_limit, presets, hyperparameters, feature_metadata, infer_limit, infer_limit_batch_size, fit_weighted_ensemble, fit_full_last_level_weighted_ensemble, full_weighted_ensemble_additionally, dynamic_stacking, calibrate_decision_threshold, num_cpus, num_gpus, fit_strategy, memory_limit, callbacks, **kwargs)
1276 if dynamic_stacking:
1277 logger.log(
1278 20,
1279 f"DyStack is enabled (dynamic_stacking={dynamic_stacking}). "
1280 "AutoGluon will try to determine whether the input data is affected by stacked overfitting and enable or disable stacking as a consequence.",
1281 )
-> 1282 num_stack_levels, time_limit = self._dynamic_stacking(**ds_args, ag_fit_kwargs=ag_fit_kwargs, ag_post_fit_kwargs=ag_post_fit_kwargs)
1283 logger.info(
1284 f"Starting main fit with num_stack_levels={num_stack_levels}.\n"
1285 f"\tFor future fit calls on this dataset, you can skip DyStack to save time: "
1286 f"`predictor.fit(..., dynamic_stacking=False, num_stack_levels={num_stack_levels})`"
1287 )
1289 if (time_limit is not None) and (time_limit <= 0):
File ~/autogluon/tabular/src/autogluon/tabular/predictor/predictor.py:1382, in TabularPredictor._dynamic_stacking(self, ag_fit_kwargs, ag_post_fit_kwargs, validation_procedure, detection_time_frac, holdout_frac, n_folds, n_repeats, memory_safe_fits, clean_up_fits, enable_ray_logging, enable_callbacks, holdout_data)
1379 _, holdout_data, _, _ = self._validate_fit_data(train_data=X, tuning_data=holdout_data)
1380 ds_fit_kwargs["ds_fit_context"] = os.path.join(ds_fit_context, "sub_fit_custom_ho")
-> 1382 stacked_overfitting = self._sub_fit_memory_save_wrapper(
1383 train_data=X,
1384 time_limit=time_limit,
1385 time_start=time_start,
1386 ds_fit_kwargs=ds_fit_kwargs,
1387 ag_fit_kwargs=inner_ag_fit_kwargs,
1388 ag_post_fit_kwargs=inner_ag_post_fit_kwargs,
1389 holdout_data=holdout_data,
1390 )
1391 else:
1392 # Holdout is false, use (repeated) cross-validation
1393 is_stratified = self.problem_type in [BINARY, MULTICLASS]
File ~/autogluon/tabular/src/autogluon/tabular/predictor/predictor.py:1574, in TabularPredictor._sub_fit_memory_save_wrapper(self, train_data, time_limit, time_start, ds_fit_kwargs, ag_fit_kwargs, ag_post_fit_kwargs, holdout_data)
1560 # FIXME: For some reason ray does not treat `num_cpus` and `num_gpus` the same.
1561 # For `num_gpus`, the process will reserve the capacity and is unable to share it to child ray processes, causing a deadlock.
1562 # For `num_cpus`, the value is completely ignored by children, and they can even use more num_cpus than the parent.
1563 # Because of this, num_gpus is set to 0 here to avoid a deadlock, but num_cpus does not need to be changed.
1564 # For more info, refer to Ray documentation: https://docs.rayai.org.cn/en/latest/ray-core/tasks/nested-tasks.html#yielding-resources-while-blocked
1565 ref = sub_fit_caller.options(num_cpus=num_cpus, num_gpus=0).remote(
1566 predictor=predictor_ref,
1567 train_data=train_data_ref,
(...)
1572 holdout_data=holdout_data_ref,
1573 )
-> 1574 finished, unfinished = _ds_ray.wait([ref], num_returns=1)
1575 stacked_overfitting, ho_leaderboard, exception = _ds_ray.get(finished[0])
1577 # TODO: This is present to ensure worker logs are properly logged and don't get skipped / printed out of order.
1578 # Ideally find a faster way to do this that doesn't introduce a 100 ms overhead.
File ~/opt/venv/lib/python3.11/site-packages/ray/_private/auto_init_hook.py:21, in wrap_auto_init.<locals>.auto_init_wrapper(*args, **kwargs)
18 @wraps(fn)
19 def auto_init_wrapper(*args, **kwargs):
20 auto_init_ray()
---> 21 return fn(*args, **kwargs)
File ~/opt/venv/lib/python3.11/site-packages/ray/_private/client_mode_hook.py:103, in client_mode_hook.<locals>.wrapper(*args, **kwargs)
101 if func.__name__ != "init" or is_client_mode_enabled_by_default:
102 return getattr(ray, func.__name__)(*args, **kwargs)
--> 103 return func(*args, **kwargs)
File ~/opt/venv/lib/python3.11/site-packages/ray/_private/worker.py:3013, in wait(ray_waitables, num_returns, timeout, fetch_local)
3011 timeout = timeout if timeout is not None else 10**6
3012 timeout_milliseconds = int(timeout * 1000)
-> 3013 ready_ids, remaining_ids = worker.core_worker.wait(
3014 ray_waitables,
3015 num_returns,
3016 timeout_milliseconds,
3017 fetch_local,
3018 )
3019 return ready_ids, remaining_ids
File python/ray/_raylet.pyx:3529, in ray._raylet.CoreWorker.wait()
File python/ray/includes/common.pxi:83, in ray._raylet.check_status()
KeyboardInterrupt:
另一种选择是指定更轻量的 hyperparameters
predictor_light = TabularPredictor(label=label, eval_metric=metric).fit(train_data, hyperparameters='very_light', time_limit=30)
在这里,您可以将 hyperparameters
设置为 'light
'、'very_light
' 或 'toy
',以获得越来越小(但准确性较低)的模型和预测器。高级用户可以尝试手动指定特定模型的超参数,以使其更快/更小。
最后,您还可以完全排除训练某些笨重的模型。下面我们排除了通常较慢的模型(K 近邻、神经网络)。
excluded_model_types = ['KNN', 'NN_TORCH']
predictor_light = TabularPredictor(label=label, eval_metric=metric).fit(train_data, excluded_model_types=excluded_model_types, time_limit=30)
(高级)缓存预处理数据¶
如果您重复在相同的数据上进行预测,可以缓存预处理后的数据版本,并直接将预处理数据发送到 predictor.predict
以实现更快的推理。
test_data_preprocessed = predictor.transform_features(test_data)
# The following call will be faster than a normal predict call because we are skipping the preprocessing stage.
predictions = predictor.predict(test_data_preprocessed, transform_features=False)
请注意,这仅在您重复在相同数据上进行预测的情况下有用。如果这显著加快了您的用例,请考虑您当前的方法是否合理,或者对预测结果进行缓存是否是更好的解决方案。
(高级)禁用预处理¶
如果您宁愿在 TabularPredictor 之外进行数据预处理,可以通过以下方式完全禁用 TabularPredictor 的预处理功能:
predictor.fit(..., feature_generator=None, feature_metadata=YOUR_CUSTOM_FEATURE_METADATA)
请注意,这将移除数据清理的所有保护措施。除非您非常熟悉 AutoGluon,否则很可能会遇到错误。
这种情况可能有用的一种情况是,如果您有许多问题重复使用具有完全相同特征的完全相同数据。如果您有 30 个任务重复使用相同的特征,您可以在数据上一次性拟合一个 autogluon.features
特征生成器,然后当您需要在 30 个任务上进行预测时,只需预处理一次数据,然后将预处理后的数据发送到所有 30 个预测器。
如果遇到内存问题¶
为了减少训练期间的内存使用,您可以单独尝试或组合使用以下每种策略(这些策略可能会损害准确性)。
在
fit()
中,设置excluded_model_types = ['KNN', 'XT' ,'RF']
(或这些模型的一些子集)。在
fit()
中尝试不同的presets
。在
fit()
中,设置hyperparameters = 'light'
或hyperparameters = 'very_light'
。表格中的文本字段需要大量内存用于 N-gram 特征化。为了在
fit()
中减轻这种情况,您可以选择:(1)将 'ignore_text
' 添加到您的presets
列表中(以忽略文本特征),或(2)指定参数:
from sklearn.feature_extraction.text import CountVectorizer
from autogluon.features.generators import AutoMLPipelineFeatureGenerator
feature_generator = AutoMLPipelineFeatureGenerator(vectorizer=CountVectorizer(min_df=30, ngram_range=(1, 3), max_features=MAX_NGRAM, dtype=np.uint8))
例如使用 MAX_NGRAM = 1000
(尝试小于 10000 的各种值以减少用于表示每个文本字段的 N-gram 特征数量)
除了减少内存使用外,上述许多策略也可用于减少训练时间。
为了减少推理期间的内存使用
如果尝试为大型测试数据集生成预测,请将测试数据分成更小的块,如常见问题解答中所示。
如果模型之前已持久化在内存中,但推理速度不是主要关注点,请调用
predictor.unpersist()
。如果模型之前已持久化在内存中,并且在
fit()
中使用了 bagging,并且推理速度是一个关注点:调用predictor.refit_full()
并使用其中一个 refit-full 模型进行预测(确保这是内存中唯一持久化的模型)。
如果遇到磁盘空间问题¶
为了减少磁盘使用,您可以单独尝试或组合使用以下每种策略。
请务必删除之前运行
fit()
生成的所有predictor.path
文件夹!如果您多次调用fit()
,这些文件夹会占用您的可用空间。如果您没有指定path
,AutoGluon 仍然会自动将模型保存到名为“AutogluonModels/ag-[TIMESTAMP]”的文件夹中,其中 TIMESTAMP 记录了fit()
调用发生的时间,因此如果可用空间不足,请务必也删除这些文件夹。调用
predictor.save_space()
删除在fit()
期间产生的辅助文件。如果您只打算以后使用此预测器进行推理,请调用
predictor.delete_models(models_to_keep='best', dry_run=False)
(这将删除非预测相关功能(例如fit_summary
)所需的文件)。在
fit()
中,您可以将 'optimize_for_deployment
' 添加到presets
列表中,这将在训练后自动调用前两种策略。上述大多数减少内存使用的策略也会减少磁盘使用(但可能损害准确性)。
参考文献¶
以下论文描述了 AutoGluon 在内部如何处理表格数据。
Erickson 等人。AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data。《Arxiv》,2020 年。
下一步¶
如果您对部署优化感兴趣,请参阅预测表格中的列 - 部署优化教程。