向 AutoGluon 添加自定义指标¶
提示:如果您是 AutoGluon 的新手,请查阅预测表格中的列 - 快速入门以了解 AutoGluon API 的基础知识。
本教程介绍了如何向 AutoGluon 添加自定义评估指标,该指标用于指导验证分数、模型集成、超参数调优等。
在此示例中,我们展示了多种评估指标以及如何将它们转换为 AutoGluon Scorer(Scorer 源代码),然后可以将 Scorer 传递给 AutoGluon 模型和预测器。
首先,我们将随机生成 10 个真实标签和预测结果,并展示如何计算它们的指标分数。
import numpy as np
rng = np.random.default_rng(seed=42)
y_true = rng.integers(low=0, high=2, size=10)
y_pred = rng.integers(low=0, high=2, size=10)
print(f'y_true: {y_true}')
print(f'y_pred: {y_pred}')
y_true: [0 1 1 0 0 1 0 1 0 0]
y_pred: [1 1 1 1 1 1 1 0 1 0]
确保指标可序列化¶
自定义指标必须在单独的 Python 文件中定义并导入,以便它们可以被 pickle(Python 的序列化协议)。如果自定义指标不可 pickle 化,当 AutoGluon 尝试使用 Ray 并行训练模型时,它将在 fit 过程中崩溃。在下面的示例中,您需要在新的 python 文件(例如 my_metrics.py
)中定义 ag_accuracy_scorer
,然后通过 from my_metrics import ag_accuracy_scorer
来使用它。
如果您的指标不可序列化,您将收到许多类似于 _pickle.PicklingError: Can't pickle
的错误。有关示例,请参阅 https://github.com/autogluon/autogluon/issues/1637。有关如何在 Kaggle 上指定自定义指标的示例,请参阅此 Kaggle Notebook。
为了演示方便,本教程中的自定义指标不可序列化。如果使用了 best_quality
预设,调用 fit()
将会崩溃。
自定义准确率指标¶
我们将首先创建一个自定义准确率指标。如果预测值与真实值相同,则预测正确;否则,预测错误。
首先,让我们使用默认的 sklearn 准确率评分器
import sklearn.metrics
sklearn.metrics.accuracy_score(y_true, y_pred)
0.4
上述逻辑存在多种局限性。例如,在不知道指标外部信息的情况下,以下信息未知:
最优值是多少 (1)
值越高越好吗 (True)
指标需要预测、类别预测还是类别概率(类别预测)
现在,让我们将此评估指标转换为 AutoGluon Scorer,以解决这些限制。
我们通过调用 autogluon.core.metrics.make_scorer
来实现(源代码:autogluon/core/metrics/__init__.py)。
from autogluon.core.metrics import make_scorer
ag_accuracy_scorer = make_scorer(name='accuracy',
score_func=sklearn.metrics.accuracy_score,
optimum=1,
greater_is_better=True,
needs_class=True)
创建 Scorer 时,我们需要为 Scorer 指定一个名称。这不需要是特定值,但会在训练期间打印 Scorer 相关信息时使用。
接下来,我们指定 score_func
。这是我们要包装的函数,在此示例中是 sklearn 的 accuracy_score
函数。
然后我们需要指定 optimum
值。这在计算 error
(也称为 regret
)而非 score
时是必需的。error
定义为 sign * optimum - score
,其中如果 greater_is_better=True
,则 sign=1
,否则 sign=-1
。它也有助于识别分数何时达到最优且无法改进。由于 sklearn.metrics.accuracy_score
返回的最佳可能值是 1
,我们指定 optimum=1
。
接下来我们需要指定 greater_is_better
。在此示例中,greater_is_better=True
,因为返回的最佳值是 1,而返回的最差值小于 1 (0)。正确设置此值非常重要,否则 AutoGluon 将尝试优化最差模型而不是最佳模型。
最后,我们根据使用的指标类型指定一个布尔值 needs_*
。可用选项包括:[needs_pred
, needs_proba
, needs_class
, needs_threshold
, needs_quantile
]。除了 needs_pred
之外,所有选项都默认为 False,而 needs_pred
根据其他四个选项推断,其中只有一个可以设置为 True。如果未指定任何选项,则指标将被视为回归指标(needs_pred=True
)。
以下是每个选项的详细说明:
needs_pred : bool | str, default="auto"
Whether score_func requires the predict model method output as input to scoring.
If "auto", will be inferred based on the values of the other `needs_*` arguments.
Defaults to True if all other `needs_*` are False.
Examples: ["root_mean_squared_error", "mean_squared_error", "r2", "mean_absolute_error", "median_absolute_error", "spearmanr", "pearsonr"]
needs_proba : bool, default=False
Whether score_func requires predict_proba to get probability estimates out of a classifier.
These scorers can benefit from calibration methods such as temperature scaling.
Examples: ["log_loss", "roc_auc_ovo", "roc_auc_ovr", "pac"]
needs_class : bool, default=False
Whether score_func requires class predictions (classification only).
This is required to determine if the scorer is impacted by a decision threshold.
These scorers can benefit from decision threshold calibration methods such as via `predictor.calibrate_decision_threshold()`.
Examples: ["accuracy", "balanced_accuracy", "f1", "precision", "recall", "mcc", "quadratic_kappa", "f1_micro", "f1_macro", "f1_weighted"]
needs_threshold : bool, default=False
Whether score_func takes a continuous decision certainty.
This only works for binary classification.
These scorers care about the rank order of the prediction probabilities to calculate their scores, and are undefined if given a single sample to score.
Examples: ["roc_auc", "average_precision"]
needs_quantile : bool, default=False
Whether score_func is based on quantile predictions.
This only works for quantile regression.
Examples: ["pinball_loss"]
因为我们正在创建准确率评分器,我们需要类别预测,因此我们指定 needs_class=True
。
高级说明:optimum
必须与原始指标可调用(在此示例中是 sklearn.metrics.accuracy_score
)的最优值对应。假设,如果某个指标可调用是 greater_is_better=False
,最优值为 -2
,您应该指定 optimum=-2, greater_is_better=False
。在这种情况下,如果 raw_metric_value=-0.5
,则 Scorer 将返回 score=0.5
以强制执行 higher_is_better(score = sign * raw_metric_value
)。Scorer 的误差将为 error=1.5
,因为 sign (-1) * optimum (-2) - score (0.5) = 1.5
创建后,AutoGluon Scorer 可以像原始指标一样调用来计算 score
。
# score
ag_accuracy_scorer(y_true, y_pred)
0.4
另外,.score
是上面可调用的别名,方便使用。
ag_accuracy_scorer.score(y_true, y_pred)
0.4
要获取误差而非分数:
# error, error=sign*optimum-score -> error=1*1-score -> error=1-score
ag_accuracy_scorer.error(y_true, y_pred)
# Can also convert score to error and vice-versa:
# score = ag_accuracy_scorer(y_true, y_pred)
# error = ag_accuracy_scorer.convert_score_to_error(score)
# score = ag_accuracy_scorer.convert_error_to_score(error)
# Can also convert score to the original score that would be returned in `score_func`:
# score_orig = ag_accuracy_scorer.convert_score_to_original(score) # score_orig = sign * score
0.6
请注意,score
采用 higher_is_better
格式,而 error 采用 lower_is_better
格式。误差为 0 表示完美预测。
自定义均方误差指标¶
接下来,让我们展示如何将回归指标转换为 Scorers 的示例。
首先,我们生成随机的真实标签及其预测,但这次它们是浮点数而不是整数。
y_true = rng.random(10)
y_pred = rng.random(10)
print(f'y_true: {y_true}')
print(f'y_pred: {y_pred}')
y_true: [0.37079802 0.92676499 0.64386512 0.82276161 0.4434142 0.22723872
0.55458479 0.06381726 0.82763117 0.6316644 ]
y_pred: [0.75808774 0.35452597 0.97069802 0.89312112 0.7783835 0.19463871
0.466721 0.04380377 0.15428949 0.68304895]
一个常见的回归指标是均方误差。
sklearn.metrics.mean_squared_error(y_true, y_pred)
0.11666381947652146
ag_mean_squared_error_scorer = make_scorer(name='mean_squared_error',
score_func=sklearn.metrics.mean_squared_error,
optimum=0,
greater_is_better=False)
在这种情况下,optimum=0
,因为这是一个误差指标。
此外,greater_is_better=False
,因为 sklearn 报告误差为正值,且值越低越好。
关于 AutoGluon Scorers 的一个非常重要的点是,在内部,它们总是以 greater_is_better=True
的形式报告分数。这意味着如果原始指标是 greater_is_better=False
,AutoGluon 的 Scorer 将翻转值。因此,score
将表示为负值。
这样做是为了确保不同指标之间的一致性。
# score
ag_mean_squared_error_scorer(y_true, y_pred)
-0.11666381947652146
# error, error=sign*optimum-score -> error=-1*0-score -> error=-score
ag_mean_squared_error_scorer.error(y_true, y_pred)
0.11666381947652146
我们还可以指定 sklearn 之外的指标。例如,下面是均方误差的最小实现:
def mse_func(y_true: np.ndarray, y_pred: np.ndarray) -> float:
return ((y_true - y_pred) ** 2).mean()
mse_func(y_true, y_pred)
np.float64(0.11666381947652146)
唯一的要求是函数接受两个参数:y_true
和 y_pred
(或 y_pred_proba
),它们是 numpy 数组,并返回一个浮点值。
使用与之前相同的代码,我们可以创建一个 AutoGluon Scorer。
ag_mean_squared_error_custom_scorer = make_scorer(name='mean_squared_error',
score_func=mse_func,
optimum=0,
greater_is_better=False)
ag_mean_squared_error_custom_scorer(y_true, y_pred)
np.float64(-0.11666381947652146)
自定义 ROC AUC 指标¶
这里我们展示一个阈值指标 roc_auc
的示例。阈值指标关注预测的相对顺序,而不是其绝对值。
y_true = rng.integers(low=0, high=2, size=10)
y_pred_proba = rng.random(10)
print(f'y_true: {y_true}')
print(f'y_pred_proba: {y_pred_proba}')
y_true: [1 1 0 1 0 0 1 0 0 0]
y_pred_proba: [0.18947136 0.12992151 0.47570493 0.22690935 0.66981399 0.43715192
0.8326782 0.7002651 0.31236664 0.8322598 ]
sklearn.metrics.roc_auc_score(y_true, y_pred_proba)
np.float64(0.25)
我们需要指定 needs_threshold=True
,以便下游模型正确使用该指标。
# Score functions that need decision values
ag_roc_auc_scorer = make_scorer(name='roc_auc',
score_func=sklearn.metrics.roc_auc_score,
optimum=1,
greater_is_better=True,
needs_threshold=True)
ag_roc_auc_scorer(y_true, y_pred_proba)
np.float64(0.25)
在 TabularPredictor 中使用自定义指标¶
现在我们已经创建了几个自定义 Scorers,让我们将它们用于训练和评估模型。
在本教程中,我们将使用 Adult Income 数据集。
from autogluon.tabular import TabularDataset
train_data = TabularDataset('https://autogluon.s3.amazonaws.com/datasets/Inc/train.csv') # can be local CSV file as well, returns Pandas DataFrame
test_data = TabularDataset('https://autogluon.s3.amazonaws.com/datasets/Inc/test.csv') # another Pandas DataFrame
label = 'class' # specifies which column we want to predict
train_data = train_data.sample(n=1000, random_state=0) # subsample dataset for faster demo
train_data.head(5)
年龄 | 工作类型 | 最终权重 | 教育 | 教育年限 | 婚姻状况 | 职业 | 关系 | 种族 | 性别 | 资本收益 | 资本损失 | 每周工作小时数 | 原籍国 | 类别 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
6118 | 51 | 私营 | 39264 | 部分大学 | 10 | 已婚平民配偶 | 执行经理 | 妻子 | 白人 | 女性 | 0 | 0 | 40 | 美国 | >>50K |
23204 | 58 | 私营 | 51662 | 10年级 | 6 | 已婚平民配偶 | 其他服务 | 妻子 | 白人 | 女性 | 0 | 0 | 8 | 美国 | ><=50K |
29590 | 40 | 私营 | 326310 | 部分大学 | 10 | 已婚平民配偶 | 手工艺修理 | 丈夫 | 白人 | 男性 | 0 | 0 | 44 | 美国 | ><=50K |
18116 | 37 | 私营 | 222450 | 高中毕业 | 9 | 未婚 | 销售 | 非家庭成员 | 白人 | 男性 | 0 | 2339 | 40 | 萨尔瓦多 | ><=50K |
33964 | 62 | 私营 | 109190 | 学士 | 13 | 已婚平民配偶 | 执行经理 | 丈夫 | 白人 | 男性 | 15024 | 0 | 40 | 美国 | >>50K |
from autogluon.tabular import TabularPredictor
predictor = TabularPredictor(label=label).fit(train_data, hyperparameters='toy')
predictor.leaderboard(test_data)
No path specified. Models will be saved in: "AutogluonModels/ag-20250508_205545"
Verbosity: 2 (Standard Logging)
=================== System Info ===================
AutoGluon Version: 1.3.1b20250508
Python Version: 3.11.9
Operating System: Linux
Platform Machine: x86_64
Platform Version: #1 SMP Wed Mar 12 14:53:59 UTC 2025
CPU Count: 8
Memory Avail: 28.78 GB / 30.95 GB (93.0%)
Disk Space Avail: 212.08 GB / 255.99 GB (82.8%)
===================================================
No presets specified! To achieve strong results with AutoGluon, it is recommended to use the available presets. Defaulting to `'medium'`...
Recommended Presets (For more details refer to https://autogluon.cn/stable/tutorials/tabular/tabular-essentials.html#presets):
presets='experimental' : New in v1.2: Pre-trained foundation model + parallel fits. The absolute best accuracy without consideration for inference speed. Does not support GPU.
presets='best' : Maximize accuracy. Recommended for most users. Use in competitions and benchmarks.
presets='high' : Strong accuracy with fast inference speed.
presets='good' : Good accuracy with very fast inference speed.
presets='medium' : Fast training time, ideal for initial prototyping.
Beginning AutoGluon training ...
AutoGluon will save models to "/home/ci/autogluon/docs/tutorials/tabular/advanced/AutogluonModels/ag-20250508_205545"
Train Data Rows: 1000
Train Data Columns: 14
Label Column: class
AutoGluon infers your prediction problem is: 'binary' (because only two unique label-values observed).
2 unique label values: [' >50K', ' <=50K']
If 'binary' is not the correct problem_type, please manually specify the problem_type parameter during Predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression', 'quantile'])
Problem Type: binary
Preprocessing data ...
Selected class <--> label mapping: class 1 = >50K, class 0 = <=50K
Note: For your binary classification, AutoGluon arbitrarily selected which label-value represents positive ( >50K) vs negative ( <=50K) class.
To explicitly set the positive_class, either rename classes to 1 and 0, or specify positive_class in Predictor init.
Using Feature Generators to preprocess the data ...
Fitting AutoMLPipelineFeatureGenerator...
Available Memory: 29465.08 MB
Train Data (Original) Memory Usage: 0.56 MB (0.0% of available memory)
Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.
Stage 1 Generators:
Fitting AsTypeFeatureGenerator...
Note: Converting 1 features to boolean dtype as they only contain 2 unique values.
Stage 2 Generators:
Fitting FillNaFeatureGenerator...
Stage 3 Generators:
Fitting IdentityFeatureGenerator...
Fitting CategoryFeatureGenerator...
Fitting CategoryMemoryMinimizeFeatureGenerator...
Stage 4 Generators:
Fitting DropUniqueFeatureGenerator...
Stage 5 Generators:
Fitting DropDuplicatesFeatureGenerator...
Types of features in original data (raw dtype, special dtypes):
('int', []) : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...]
('object', []) : 8 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...]
Types of features in processed data (raw dtype, special dtypes):
('category', []) : 7 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...]
('int', []) : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...]
('int', ['bool']) : 1 | ['sex']
0.1s = Fit runtime
14 features in original data used to generate 14 features in processed data.
Train Data (Processed) Memory Usage: 0.06 MB (0.0% of available memory)
Data preprocessing and feature engineering runtime = 0.1s ...
AutoGluon will gauge predictive performance using evaluation metric: 'accuracy'
To change this, specify the eval_metric parameter of Predictor()
Automatically generating train/validation split with holdout_frac=0.2, Train Rows: 800, Val Rows: 200
User-specified model hyperparameters to be fit:
{
'NN_TORCH': [{'num_epochs': 5}],
'GBM': [{'num_boost_round': 10}],
'CAT': [{'iterations': 10}],
'XGB': [{'n_estimators': 10}],
}
Fitting 4 L1 models, fit_strategy="sequential" ...
Fitting model: LightGBM ...
0.77 = Validation score (accuracy)
0.25s = Training runtime
0.0s = Validation runtime
Fitting model: CatBoost ...
0.86 = Validation score (accuracy)
0.17s = Training runtime
0.03s = Validation runtime
Fitting model: XGBoost ...
0.84 = Validation score (accuracy)
0.39s = Training runtime
0.01s = Validation runtime
Fitting model: NeuralNetTorch ...
0.84 = Validation score (accuracy)
2.95s = Training runtime
0.01s = Validation runtime
Fitting model: WeightedEnsemble_L2 ...
Ensemble Weights: {'CatBoost': 1.0}
0.86 = Validation score (accuracy)
0.05s = Training runtime
0.0s = Validation runtime
AutoGluon training complete, total runtime = 4.0s ... Best model: WeightedEnsemble_L2 | Estimated inference throughput: 7107.8 rows/s (200 batch size)
Disabling decision threshold calibration for metric `accuracy` due to having fewer than 10000 rows of validation data for calibration, to avoid overfitting (200 rows).
`accuracy` is generally not improved through threshold calibration. Force calibration via specifying `calibrate_decision_threshold=True`.
TabularPredictor saved. To load, use: predictor = TabularPredictor.load("/home/ci/autogluon/docs/tutorials/tabular/advanced/AutogluonModels/ag-20250508_205545")
模型 | 测试分数 | 验证分数 | 评估指标 | 测试预测时间 | 验证预测时间 | 拟合时间 | 测试预测边际时间 | 验证预测边际时间 | 拟合边际时间 | 堆叠层 | 可推理 | 拟合顺序 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | CatBoost | 0.842768 | 0.86 | 准确率 | 0.006248 | 0.027356 | 0.167093 | 0.006248 | 0.027356 | 0.167093 | 1 | True | 2 |
1 | 加权集成_L2 | 0.842768 | 0.86 | 准确率 | 0.008153 | 0.028138 | 0.215383 | 0.001905 | 0.000782 | 0.048290 | 2 | True | 5 |
2 | XGBoost | 0.836831 | 0.84 | 准确率 | 0.203856 | 0.005759 | 0.390637 | 0.203856 | 0.005759 | 0.390637 | 1 | True | 3 |
3 | NeuralNetTorch | 0.828027 | 0.84 | 准确率 | 0.048275 | 0.011158 | 2.947366 | 0.048275 | 0.011158 | 2.947366 | 1 | True | 4 |
4 | LightGBM | 0.780940 | 0.77 | 准确率 | 0.005517 | 0.004092 | 0.250402 | 0.005517 | 0.004092 | 0.250402 | 1 | True | 1 |
我们可以通过 extra_metrics
参数将自定义指标传递给 predictor.leaderboard
predictor.leaderboard(test_data, extra_metrics=[ag_roc_auc_scorer, ag_accuracy_scorer])
模型 | 测试分数 | roc_auc | 准确率 | 验证分数 | 评估指标 | 测试预测时间 | 验证预测时间 | 拟合时间 | 测试预测边际时间 | 验证预测边际时间 | 拟合边际时间 | 堆叠层 | 可推理 | 拟合顺序 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | CatBoost | 0.842768 | 0.863760 | 0.842768 | 0.86 | 准确率 | 0.005653 | 0.027356 | 0.167093 | 0.005653 | 0.027356 | 0.167093 | 1 | True | 2 |
1 | 加权集成_L2 | 0.842768 | 0.863760 | 0.842768 | 0.86 | 准确率 | 0.007480 | 0.028138 | 0.215383 | 0.001827 | 0.000782 | 0.048290 | 2 | True | 5 |
2 | XGBoost | 0.836831 | 0.890173 | 0.836831 | 0.84 | 准确率 | 0.048751 | 0.005759 | 0.390637 | 0.048751 | 0.005759 | 0.390637 | 1 | True | 3 |
3 | NeuralNetTorch | 0.828027 | 0.879181 | 0.828027 | 0.84 | 准确率 | 0.047922 | 0.011158 | 2.947366 | 0.047922 | 0.011158 | 2.947366 | 1 | True | 4 |
4 | LightGBM | 0.780940 | 0.861131 | 0.780940 | 0.77 | 准确率 | 0.005408 | 0.004092 | 0.250402 | 0.005408 | 0.004092 | 0.250402 | 1 | True | 1 |
我们还可以通过 eval_metric
参数在初始化时将自定义指标传递给预测器本身
predictor_custom = TabularPredictor(label=label, eval_metric=ag_roc_auc_scorer).fit(train_data, hyperparameters='toy')
predictor_custom.leaderboard(test_data)
No path specified. Models will be saved in: "AutogluonModels/ag-20250508_205550"
Verbosity: 2 (Standard Logging)
=================== System Info ===================
AutoGluon Version: 1.3.1b20250508
Python Version: 3.11.9
Operating System: Linux
Platform Machine: x86_64
Platform Version: #1 SMP Wed Mar 12 14:53:59 UTC 2025
CPU Count: 8
Memory Avail: 28.36 GB / 30.95 GB (91.7%)
Disk Space Avail: 212.08 GB / 255.99 GB (82.8%)
===================================================
No presets specified! To achieve strong results with AutoGluon, it is recommended to use the available presets. Defaulting to `'medium'`...
Recommended Presets (For more details refer to https://autogluon.cn/stable/tutorials/tabular/tabular-essentials.html#presets):
presets='experimental' : New in v1.2: Pre-trained foundation model + parallel fits. The absolute best accuracy without consideration for inference speed. Does not support GPU.
presets='best' : Maximize accuracy. Recommended for most users. Use in competitions and benchmarks.
presets='high' : Strong accuracy with fast inference speed.
presets='good' : Good accuracy with very fast inference speed.
presets='medium' : Fast training time, ideal for initial prototyping.
Beginning AutoGluon training ...
AutoGluon will save models to "/home/ci/autogluon/docs/tutorials/tabular/advanced/AutogluonModels/ag-20250508_205550"
Train Data Rows: 1000
Train Data Columns: 14
Label Column: class
AutoGluon infers your prediction problem is: 'binary' (because only two unique label-values observed).
2 unique label values: [' >50K', ' <=50K']
If 'binary' is not the correct problem_type, please manually specify the problem_type parameter during Predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression', 'quantile'])
Problem Type: binary
Preprocessing data ...
Selected class <--> label mapping: class 1 = >50K, class 0 = <=50K
Note: For your binary classification, AutoGluon arbitrarily selected which label-value represents positive ( >50K) vs negative ( <=50K) class.
To explicitly set the positive_class, either rename classes to 1 and 0, or specify positive_class in Predictor init.
Using Feature Generators to preprocess the data ...
Fitting AutoMLPipelineFeatureGenerator...
Available Memory: 29045.71 MB
Train Data (Original) Memory Usage: 0.56 MB (0.0% of available memory)
Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.
Stage 1 Generators:
Fitting AsTypeFeatureGenerator...
Note: Converting 1 features to boolean dtype as they only contain 2 unique values.
Stage 2 Generators:
Fitting FillNaFeatureGenerator...
Stage 3 Generators:
Fitting IdentityFeatureGenerator...
Fitting CategoryFeatureGenerator...
Fitting CategoryMemoryMinimizeFeatureGenerator...
Stage 4 Generators:
Fitting DropUniqueFeatureGenerator...
Stage 5 Generators:
Fitting DropDuplicatesFeatureGenerator...
Types of features in original data (raw dtype, special dtypes):
('int', []) : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...]
('object', []) : 8 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...]
Types of features in processed data (raw dtype, special dtypes):
('category', []) : 7 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...]
('int', []) : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...]
('int', ['bool']) : 1 | ['sex']
0.1s = Fit runtime
14 features in original data used to generate 14 features in processed data.
Train Data (Processed) Memory Usage: 0.06 MB (0.0% of available memory)
Data preprocessing and feature engineering runtime = 0.12s ...
AutoGluon will gauge predictive performance using evaluation metric: 'roc_auc'
This metric expects predicted probabilities rather than predicted class labels, so you'll need to use predict_proba() instead of predict()
To change this, specify the eval_metric parameter of Predictor()
Automatically generating train/validation split with holdout_frac=0.2, Train Rows: 800, Val Rows: 200
User-specified model hyperparameters to be fit:
{
'NN_TORCH': [{'num_epochs': 5}],
'GBM': [{'num_boost_round': 10}],
'CAT': [{'iterations': 10}],
'XGB': [{'n_estimators': 10}],
}
Fitting 4 L1 models, fit_strategy="sequential" ...
Fitting model: LightGBM ...
0.85 = Validation score (roc_auc)
0.19s = Training runtime
0.0s = Validation runtime
Fitting model: CatBoost ...
0.8693 = Validation score (roc_auc)
0.04s = Training runtime
0.0s = Validation runtime
Fitting model: XGBoost ...
0.8616 = Validation score (roc_auc)
0.04s = Training runtime
0.03s = Validation runtime
Fitting model: NeuralNetTorch ...
0.8537 = Validation score (roc_auc)
0.49s = Training runtime
0.01s = Validation runtime
Fitting model: WeightedEnsemble_L2 ...
Ensemble Weights: {'XGBoost': 0.417, 'CatBoost': 0.375, 'LightGBM': 0.125, 'NeuralNetTorch': 0.083}
0.878 = Validation score (roc_auc)
0.16s = Training runtime
0.0s = Validation runtime
AutoGluon training complete, total runtime = 1.15s ... Best model: WeightedEnsemble_L2 | Estimated inference throughput: 3967.3 rows/s (200 batch size)
TabularPredictor saved. To load, use: predictor = TabularPredictor.load("/home/ci/autogluon/docs/tutorials/tabular/advanced/AutogluonModels/ag-20250508_205550")
模型 | 测试分数 | 验证分数 | 评估指标 | 测试预测时间 | 验证预测时间 | 拟合时间 | 测试预测边际时间 | 验证预测边际时间 | 拟合边际时间 | 堆叠层 | 可推理 | 拟合顺序 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 加权集成_L2 | 0.900864 | 0.878010 | roc_auc | 0.112010 | 0.050412 | 0.910520 | 0.002755 | 0.001625 | 0.155592 | 2 | True | 5 |
1 | XGBoost | 0.890173 | 0.861627 | roc_auc | 0.047536 | 0.031453 | 0.037621 | 0.047536 | 0.031453 | 0.037621 | 1 | True | 3 |
2 | CatBoost | 0.887425 | 0.869325 | roc_auc | 0.007399 | 0.003745 | 0.035267 | 0.007399 | 0.003745 | 0.035267 | 1 | True | 2 |
3 | NeuralNetTorch | 0.879181 | 0.853665 | roc_auc | 0.046863 | 0.010511 | 0.488971 | 0.046863 | 0.010511 | 0.488971 | 1 | True | 4 |
4 | LightGBM | 0.870968 | 0.849980 | roc_auc | 0.007457 | 0.003078 | 0.193069 | 0.007457 | 0.003078 | 0.193069 | 1 | True | 1 |
创建和使用 AutoGluon 中的自定义指标就是这么简单!
如果您创建了自定义指标,可以考虑提交拉取请求 (PR),以便我们将其正式添加到 AutoGluon 中!
有关在 AutoGluon 中实现自定义模型的教程,请参阅向 AutoGluon 添加自定义模型。
有关更多教程,请参阅预测表格中的列 - 快速入门和预测表格中的列 - 深入了解。