向 AutoGluon 添加自定义指标

Open In Colab Open In SageMaker Studio Lab

提示:如果您是 AutoGluon 的新手,请查阅预测表格中的列 - 快速入门以了解 AutoGluon API 的基础知识。

本教程介绍了如何向 AutoGluon 添加自定义评估指标,该指标用于指导验证分数、模型集成、超参数调优等。

在此示例中,我们展示了多种评估指标以及如何将它们转换为 AutoGluon Scorer(Scorer 源代码),然后可以将 Scorer 传递给 AutoGluon 模型和预测器。

首先,我们将随机生成 10 个真实标签和预测结果,并展示如何计算它们的指标分数。

import numpy as np

rng = np.random.default_rng(seed=42)
y_true = rng.integers(low=0, high=2, size=10)
y_pred = rng.integers(low=0, high=2, size=10)

print(f'y_true: {y_true}')
print(f'y_pred: {y_pred}')
y_true: [0 1 1 0 0 1 0 1 0 0]
y_pred: [1 1 1 1 1 1 1 0 1 0]

确保指标可序列化

自定义指标必须在单独的 Python 文件中定义并导入,以便它们可以被 pickle(Python 的序列化协议)。如果自定义指标不可 pickle 化,当 AutoGluon 尝试使用 Ray 并行训练模型时,它将在 fit 过程中崩溃。在下面的示例中,您需要在新的 python 文件(例如 my_metrics.py)中定义 ag_accuracy_scorer,然后通过 from my_metrics import ag_accuracy_scorer 来使用它。

如果您的指标不可序列化,您将收到许多类似于 _pickle.PicklingError: Can't pickle 的错误。有关示例,请参阅 https://github.com/autogluon/autogluon/issues/1637。有关如何在 Kaggle 上指定自定义指标的示例,请参阅此 Kaggle Notebook

为了演示方便,本教程中的自定义指标不可序列化。如果使用了 best_quality 预设,调用 fit() 将会崩溃。

自定义准确率指标

我们将首先创建一个自定义准确率指标。如果预测值与真实值相同,则预测正确;否则,预测错误。

首先,让我们使用默认的 sklearn 准确率评分器

import sklearn.metrics

sklearn.metrics.accuracy_score(y_true, y_pred)
0.4

上述逻辑存在多种局限性。例如,在不知道指标外部信息的情况下,以下信息未知:

  1. 最优值是多少 (1)

  2. 值越高越好吗 (True)

  3. 指标需要预测、类别预测还是类别概率(类别预测)

现在,让我们将此评估指标转换为 AutoGluon Scorer,以解决这些限制。

我们通过调用 autogluon.core.metrics.make_scorer 来实现(源代码:autogluon/core/metrics/__init__.py)。

from autogluon.core.metrics import make_scorer

ag_accuracy_scorer = make_scorer(name='accuracy',
                                 score_func=sklearn.metrics.accuracy_score,
                                 optimum=1,
                                 greater_is_better=True,
                                 needs_class=True)

创建 Scorer 时,我们需要为 Scorer 指定一个名称。这不需要是特定值,但会在训练期间打印 Scorer 相关信息时使用。

接下来,我们指定 score_func。这是我们要包装的函数,在此示例中是 sklearn 的 accuracy_score 函数。

然后我们需要指定 optimum 值。这在计算 error(也称为 regret)而非 score 时是必需的。error 定义为 sign * optimum - score,其中如果 greater_is_better=True,则 sign=1,否则 sign=-1。它也有助于识别分数何时达到最优且无法改进。由于 sklearn.metrics.accuracy_score 返回的最佳可能值是 1,我们指定 optimum=1

接下来我们需要指定 greater_is_better。在此示例中,greater_is_better=True,因为返回的最佳值是 1,而返回的最差值小于 1 (0)。正确设置此值非常重要,否则 AutoGluon 将尝试优化最差模型而不是最佳模型。

最后,我们根据使用的指标类型指定一个布尔值 needs_*。可用选项包括:[needs_pred, needs_proba, needs_class, needs_threshold, needs_quantile]。除了 needs_pred 之外,所有选项都默认为 False,而 needs_pred 根据其他四个选项推断,其中只有一个可以设置为 True。如果未指定任何选项,则指标将被视为回归指标(needs_pred=True)。

以下是每个选项的详细说明:

needs_pred : bool | str, default="auto"
    Whether score_func requires the predict model method output as input to scoring.
    If "auto", will be inferred based on the values of the other `needs_*` arguments.
    Defaults to True if all other `needs_*` are False.
    Examples: ["root_mean_squared_error", "mean_squared_error", "r2", "mean_absolute_error", "median_absolute_error", "spearmanr", "pearsonr"]

needs_proba : bool, default=False
    Whether score_func requires predict_proba to get probability estimates out of a classifier.
    These scorers can benefit from calibration methods such as temperature scaling.
    Examples: ["log_loss", "roc_auc_ovo", "roc_auc_ovr", "pac"]

needs_class : bool, default=False
    Whether score_func requires class predictions (classification only).
    This is required to determine if the scorer is impacted by a decision threshold.
    These scorers can benefit from decision threshold calibration methods such as via `predictor.calibrate_decision_threshold()`.
    Examples: ["accuracy", "balanced_accuracy", "f1", "precision", "recall", "mcc", "quadratic_kappa", "f1_micro", "f1_macro", "f1_weighted"]

needs_threshold : bool, default=False
    Whether score_func takes a continuous decision certainty.
    This only works for binary classification.
    These scorers care about the rank order of the prediction probabilities to calculate their scores, and are undefined if given a single sample to score.
    Examples: ["roc_auc", "average_precision"]

needs_quantile : bool, default=False
    Whether score_func is based on quantile predictions.
    This only works for quantile regression.
    Examples: ["pinball_loss"]

因为我们正在创建准确率评分器,我们需要类别预测,因此我们指定 needs_class=True

高级说明optimum 必须与原始指标可调用(在此示例中是 sklearn.metrics.accuracy_score)的最优值对应。假设,如果某个指标可调用是 greater_is_better=False,最优值为 -2,您应该指定 optimum=-2, greater_is_better=False。在这种情况下,如果 raw_metric_value=-0.5,则 Scorer 将返回 score=0.5 以强制执行 higher_is_better(score = sign * raw_metric_value)。Scorer 的误差将为 error=1.5,因为 sign (-1) * optimum (-2) - score (0.5) = 1.5

创建后,AutoGluon Scorer 可以像原始指标一样调用来计算 score

# score
ag_accuracy_scorer(y_true, y_pred)
0.4

另外,.score 是上面可调用的别名,方便使用。

ag_accuracy_scorer.score(y_true, y_pred)
0.4

要获取误差而非分数:

# error, error=sign*optimum-score -> error=1*1-score -> error=1-score
ag_accuracy_scorer.error(y_true, y_pred)

# Can also convert score to error and vice-versa:
# score = ag_accuracy_scorer(y_true, y_pred)
# error = ag_accuracy_scorer.convert_score_to_error(score)
# score = ag_accuracy_scorer.convert_error_to_score(error)

# Can also convert score to the original score that would be returned in `score_func`:
# score_orig = ag_accuracy_scorer.convert_score_to_original(score)  # score_orig = sign * score
0.6

请注意,score 采用 higher_is_better 格式,而 error 采用 lower_is_better 格式。误差为 0 表示完美预测。

自定义均方误差指标

接下来,让我们展示如何将回归指标转换为 Scorers 的示例。

首先,我们生成随机的真实标签及其预测,但这次它们是浮点数而不是整数。

y_true = rng.random(10)
y_pred = rng.random(10)

print(f'y_true: {y_true}')
print(f'y_pred: {y_pred}')
y_true: [0.37079802 0.92676499 0.64386512 0.82276161 0.4434142  0.22723872
 0.55458479 0.06381726 0.82763117 0.6316644 ]
y_pred: [0.75808774 0.35452597 0.97069802 0.89312112 0.7783835  0.19463871
 0.466721   0.04380377 0.15428949 0.68304895]

一个常见的回归指标是均方误差。

sklearn.metrics.mean_squared_error(y_true, y_pred)
0.11666381947652146
ag_mean_squared_error_scorer = make_scorer(name='mean_squared_error',
                                           score_func=sklearn.metrics.mean_squared_error,
                                           optimum=0,
                                           greater_is_better=False)

在这种情况下,optimum=0,因为这是一个误差指标。

此外,greater_is_better=False,因为 sklearn 报告误差为正值,且值越低越好。

关于 AutoGluon Scorers 的一个非常重要的点是,在内部,它们总是以 greater_is_better=True 的形式报告分数。这意味着如果原始指标是 greater_is_better=False,AutoGluon 的 Scorer 将翻转值。因此,score 将表示为负值。

这样做是为了确保不同指标之间的一致性。

# score
ag_mean_squared_error_scorer(y_true, y_pred)
-0.11666381947652146
# error, error=sign*optimum-score -> error=-1*0-score -> error=-score
ag_mean_squared_error_scorer.error(y_true, y_pred)
0.11666381947652146

我们还可以指定 sklearn 之外的指标。例如,下面是均方误差的最小实现:

def mse_func(y_true: np.ndarray, y_pred: np.ndarray) -> float:
    return ((y_true - y_pred) ** 2).mean()

mse_func(y_true, y_pred)
np.float64(0.11666381947652146)

唯一的要求是函数接受两个参数:y_truey_pred(或 y_pred_proba),它们是 numpy 数组,并返回一个浮点值。

使用与之前相同的代码,我们可以创建一个 AutoGluon Scorer。

ag_mean_squared_error_custom_scorer = make_scorer(name='mean_squared_error',
                                                  score_func=mse_func,
                                                  optimum=0,
                                                  greater_is_better=False)
ag_mean_squared_error_custom_scorer(y_true, y_pred)
np.float64(-0.11666381947652146)

自定义 ROC AUC 指标

这里我们展示一个阈值指标 roc_auc 的示例。阈值指标关注预测的相对顺序,而不是其绝对值。

y_true = rng.integers(low=0, high=2, size=10)
y_pred_proba = rng.random(10)

print(f'y_true:       {y_true}')
print(f'y_pred_proba: {y_pred_proba}')
y_true:       [1 1 0 1 0 0 1 0 0 0]
y_pred_proba: [0.18947136 0.12992151 0.47570493 0.22690935 0.66981399 0.43715192
 0.8326782  0.7002651  0.31236664 0.8322598 ]
sklearn.metrics.roc_auc_score(y_true, y_pred_proba)
np.float64(0.25)

我们需要指定 needs_threshold=True,以便下游模型正确使用该指标。

# Score functions that need decision values
ag_roc_auc_scorer = make_scorer(name='roc_auc',
                                score_func=sklearn.metrics.roc_auc_score,
                                optimum=1,
                                greater_is_better=True,
                                needs_threshold=True)
ag_roc_auc_scorer(y_true, y_pred_proba)
np.float64(0.25)

在 TabularPredictor 中使用自定义指标

现在我们已经创建了几个自定义 Scorers,让我们将它们用于训练和评估模型。

在本教程中,我们将使用 Adult Income 数据集。

from autogluon.tabular import TabularDataset

train_data = TabularDataset('https://autogluon.s3.amazonaws.com/datasets/Inc/train.csv')  # can be local CSV file as well, returns Pandas DataFrame
test_data = TabularDataset('https://autogluon.s3.amazonaws.com/datasets/Inc/test.csv')  # another Pandas DataFrame
label = 'class'  # specifies which column we want to predict
train_data = train_data.sample(n=1000, random_state=0)  # subsample dataset for faster demo

train_data.head(5)
年龄 工作类型 最终权重 教育 教育年限 婚姻状况 职业 关系 种族 性别 资本收益 资本损失 每周工作小时数 原籍国 类别
6118 51 私营 39264 部分大学 10 已婚平民配偶 执行经理 妻子 白人 女性 0 0 40 美国 >>50K
23204 58 私营 51662 10年级 6 已婚平民配偶 其他服务 妻子 白人 女性 0 0 8 美国 ><=50K
29590 40 私营 326310 部分大学 10 已婚平民配偶 手工艺修理 丈夫 白人 男性 0 0 44 美国 ><=50K
18116 37 私营 222450 高中毕业 9 未婚 销售 非家庭成员 白人 男性 0 2339 40 萨尔瓦多 ><=50K
33964 62 私营 109190 学士 13 已婚平民配偶 执行经理 丈夫 白人 男性 15024 0 40 美国 >>50K
from autogluon.tabular import TabularPredictor

predictor = TabularPredictor(label=label).fit(train_data, hyperparameters='toy')

predictor.leaderboard(test_data)
No path specified. Models will be saved in: "AutogluonModels/ag-20250508_205545"
Verbosity: 2 (Standard Logging)
=================== System Info ===================
AutoGluon Version:  1.3.1b20250508
Python Version:     3.11.9
Operating System:   Linux
Platform Machine:   x86_64
Platform Version:   #1 SMP Wed Mar 12 14:53:59 UTC 2025
CPU Count:          8
Memory Avail:       28.78 GB / 30.95 GB (93.0%)
Disk Space Avail:   212.08 GB / 255.99 GB (82.8%)
===================================================
No presets specified! To achieve strong results with AutoGluon, it is recommended to use the available presets. Defaulting to `'medium'`...
	Recommended Presets (For more details refer to https://autogluon.cn/stable/tutorials/tabular/tabular-essentials.html#presets):
	presets='experimental' : New in v1.2: Pre-trained foundation model + parallel fits. The absolute best accuracy without consideration for inference speed. Does not support GPU.
	presets='best'         : Maximize accuracy. Recommended for most users. Use in competitions and benchmarks.
	presets='high'         : Strong accuracy with fast inference speed.
	presets='good'         : Good accuracy with very fast inference speed.
	presets='medium'       : Fast training time, ideal for initial prototyping.
Beginning AutoGluon training ...
AutoGluon will save models to "/home/ci/autogluon/docs/tutorials/tabular/advanced/AutogluonModels/ag-20250508_205545"
Train Data Rows:    1000
Train Data Columns: 14
Label Column:       class
AutoGluon infers your prediction problem is: 'binary' (because only two unique label-values observed).
	2 unique label values:  [' >50K', ' <=50K']
	If 'binary' is not the correct problem_type, please manually specify the problem_type parameter during Predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression', 'quantile'])
Problem Type:       binary
Preprocessing data ...
Selected class <--> label mapping:  class 1 =  >50K, class 0 =  <=50K
	Note: For your binary classification, AutoGluon arbitrarily selected which label-value represents positive ( >50K) vs negative ( <=50K) class.
	To explicitly set the positive_class, either rename classes to 1 and 0, or specify positive_class in Predictor init.
Using Feature Generators to preprocess the data ...
Fitting AutoMLPipelineFeatureGenerator...
	Available Memory:                    29465.08 MB
	Train Data (Original)  Memory Usage: 0.56 MB (0.0% of available memory)
	Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.
	Stage 1 Generators:
		Fitting AsTypeFeatureGenerator...
			Note: Converting 1 features to boolean dtype as they only contain 2 unique values.
	Stage 2 Generators:
		Fitting FillNaFeatureGenerator...
	Stage 3 Generators:
		Fitting IdentityFeatureGenerator...
		Fitting CategoryFeatureGenerator...
			Fitting CategoryMemoryMinimizeFeatureGenerator...
	Stage 4 Generators:
		Fitting DropUniqueFeatureGenerator...
	Stage 5 Generators:
		Fitting DropDuplicatesFeatureGenerator...
	Types of features in original data (raw dtype, special dtypes):
		('int', [])    : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...]
		('object', []) : 8 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...]
	Types of features in processed data (raw dtype, special dtypes):
		('category', [])  : 7 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...]
		('int', [])       : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...]
		('int', ['bool']) : 1 | ['sex']
	0.1s = Fit runtime
	14 features in original data used to generate 14 features in processed data.
	Train Data (Processed) Memory Usage: 0.06 MB (0.0% of available memory)
Data preprocessing and feature engineering runtime = 0.1s ...
AutoGluon will gauge predictive performance using evaluation metric: 'accuracy'
	To change this, specify the eval_metric parameter of Predictor()
Automatically generating train/validation split with holdout_frac=0.2, Train Rows: 800, Val Rows: 200
User-specified model hyperparameters to be fit:
{
	'NN_TORCH': [{'num_epochs': 5}],
	'GBM': [{'num_boost_round': 10}],
	'CAT': [{'iterations': 10}],
	'XGB': [{'n_estimators': 10}],
}
Fitting 4 L1 models, fit_strategy="sequential" ...
Fitting model: LightGBM ...
	0.77	 = Validation score   (accuracy)
	0.25s	 = Training   runtime
	0.0s	 = Validation runtime
Fitting model: CatBoost ...
	0.86	 = Validation score   (accuracy)
	0.17s	 = Training   runtime
	0.03s	 = Validation runtime
Fitting model: XGBoost ...
	0.84	 = Validation score   (accuracy)
	0.39s	 = Training   runtime
	0.01s	 = Validation runtime
Fitting model: NeuralNetTorch ...
	0.84	 = Validation score   (accuracy)
	2.95s	 = Training   runtime
	0.01s	 = Validation runtime
Fitting model: WeightedEnsemble_L2 ...
	Ensemble Weights: {'CatBoost': 1.0}
	0.86	 = Validation score   (accuracy)
	0.05s	 = Training   runtime
	0.0s	 = Validation runtime
AutoGluon training complete, total runtime = 4.0s ... Best model: WeightedEnsemble_L2 | Estimated inference throughput: 7107.8 rows/s (200 batch size)
Disabling decision threshold calibration for metric `accuracy` due to having fewer than 10000 rows of validation data for calibration, to avoid overfitting (200 rows).
	`accuracy` is generally not improved through threshold calibration. Force calibration via specifying `calibrate_decision_threshold=True`.
TabularPredictor saved. To load, use: predictor = TabularPredictor.load("/home/ci/autogluon/docs/tutorials/tabular/advanced/AutogluonModels/ag-20250508_205545")
模型 测试分数 验证分数 评估指标 测试预测时间 验证预测时间 拟合时间 测试预测边际时间 验证预测边际时间 拟合边际时间 堆叠层 可推理 拟合顺序
0 CatBoost 0.842768 0.86 准确率 0.006248 0.027356 0.167093 0.006248 0.027356 0.167093 1 True 2
1 加权集成_L2 0.842768 0.86 准确率 0.008153 0.028138 0.215383 0.001905 0.000782 0.048290 2 True 5
2 XGBoost 0.836831 0.84 准确率 0.203856 0.005759 0.390637 0.203856 0.005759 0.390637 1 True 3
3 NeuralNetTorch 0.828027 0.84 准确率 0.048275 0.011158 2.947366 0.048275 0.011158 2.947366 1 True 4
4 LightGBM 0.780940 0.77 准确率 0.005517 0.004092 0.250402 0.005517 0.004092 0.250402 1 True 1

我们可以通过 extra_metrics 参数将自定义指标传递给 predictor.leaderboard

predictor.leaderboard(test_data, extra_metrics=[ag_roc_auc_scorer, ag_accuracy_scorer])
模型 测试分数 roc_auc 准确率 验证分数 评估指标 测试预测时间 验证预测时间 拟合时间 测试预测边际时间 验证预测边际时间 拟合边际时间 堆叠层 可推理 拟合顺序
0 CatBoost 0.842768 0.863760 0.842768 0.86 准确率 0.005653 0.027356 0.167093 0.005653 0.027356 0.167093 1 True 2
1 加权集成_L2 0.842768 0.863760 0.842768 0.86 准确率 0.007480 0.028138 0.215383 0.001827 0.000782 0.048290 2 True 5
2 XGBoost 0.836831 0.890173 0.836831 0.84 准确率 0.048751 0.005759 0.390637 0.048751 0.005759 0.390637 1 True 3
3 NeuralNetTorch 0.828027 0.879181 0.828027 0.84 准确率 0.047922 0.011158 2.947366 0.047922 0.011158 2.947366 1 True 4
4 LightGBM 0.780940 0.861131 0.780940 0.77 准确率 0.005408 0.004092 0.250402 0.005408 0.004092 0.250402 1 True 1

我们还可以通过 eval_metric 参数在初始化时将自定义指标传递给预测器本身

predictor_custom = TabularPredictor(label=label, eval_metric=ag_roc_auc_scorer).fit(train_data, hyperparameters='toy')

predictor_custom.leaderboard(test_data)
No path specified. Models will be saved in: "AutogluonModels/ag-20250508_205550"
Verbosity: 2 (Standard Logging)
=================== System Info ===================
AutoGluon Version:  1.3.1b20250508
Python Version:     3.11.9
Operating System:   Linux
Platform Machine:   x86_64
Platform Version:   #1 SMP Wed Mar 12 14:53:59 UTC 2025
CPU Count:          8
Memory Avail:       28.36 GB / 30.95 GB (91.7%)
Disk Space Avail:   212.08 GB / 255.99 GB (82.8%)
===================================================
No presets specified! To achieve strong results with AutoGluon, it is recommended to use the available presets. Defaulting to `'medium'`...
	Recommended Presets (For more details refer to https://autogluon.cn/stable/tutorials/tabular/tabular-essentials.html#presets):
	presets='experimental' : New in v1.2: Pre-trained foundation model + parallel fits. The absolute best accuracy without consideration for inference speed. Does not support GPU.
	presets='best'         : Maximize accuracy. Recommended for most users. Use in competitions and benchmarks.
	presets='high'         : Strong accuracy with fast inference speed.
	presets='good'         : Good accuracy with very fast inference speed.
	presets='medium'       : Fast training time, ideal for initial prototyping.
Beginning AutoGluon training ...
AutoGluon will save models to "/home/ci/autogluon/docs/tutorials/tabular/advanced/AutogluonModels/ag-20250508_205550"
Train Data Rows:    1000
Train Data Columns: 14
Label Column:       class
AutoGluon infers your prediction problem is: 'binary' (because only two unique label-values observed).
	2 unique label values:  [' >50K', ' <=50K']
	If 'binary' is not the correct problem_type, please manually specify the problem_type parameter during Predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression', 'quantile'])
Problem Type:       binary
Preprocessing data ...
Selected class <--> label mapping:  class 1 =  >50K, class 0 =  <=50K
	Note: For your binary classification, AutoGluon arbitrarily selected which label-value represents positive ( >50K) vs negative ( <=50K) class.
	To explicitly set the positive_class, either rename classes to 1 and 0, or specify positive_class in Predictor init.
Using Feature Generators to preprocess the data ...
Fitting AutoMLPipelineFeatureGenerator...
	Available Memory:                    29045.71 MB
	Train Data (Original)  Memory Usage: 0.56 MB (0.0% of available memory)
	Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.
	Stage 1 Generators:
		Fitting AsTypeFeatureGenerator...
			Note: Converting 1 features to boolean dtype as they only contain 2 unique values.
	Stage 2 Generators:
		Fitting FillNaFeatureGenerator...
	Stage 3 Generators:
		Fitting IdentityFeatureGenerator...
		Fitting CategoryFeatureGenerator...
			Fitting CategoryMemoryMinimizeFeatureGenerator...
	Stage 4 Generators:
		Fitting DropUniqueFeatureGenerator...
	Stage 5 Generators:
		Fitting DropDuplicatesFeatureGenerator...
	Types of features in original data (raw dtype, special dtypes):
		('int', [])    : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...]
		('object', []) : 8 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...]
	Types of features in processed data (raw dtype, special dtypes):
		('category', [])  : 7 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...]
		('int', [])       : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...]
		('int', ['bool']) : 1 | ['sex']
	0.1s = Fit runtime
	14 features in original data used to generate 14 features in processed data.
	Train Data (Processed) Memory Usage: 0.06 MB (0.0% of available memory)
Data preprocessing and feature engineering runtime = 0.12s ...
AutoGluon will gauge predictive performance using evaluation metric: 'roc_auc'
	This metric expects predicted probabilities rather than predicted class labels, so you'll need to use predict_proba() instead of predict()
	To change this, specify the eval_metric parameter of Predictor()
Automatically generating train/validation split with holdout_frac=0.2, Train Rows: 800, Val Rows: 200
User-specified model hyperparameters to be fit:
{
	'NN_TORCH': [{'num_epochs': 5}],
	'GBM': [{'num_boost_round': 10}],
	'CAT': [{'iterations': 10}],
	'XGB': [{'n_estimators': 10}],
}
Fitting 4 L1 models, fit_strategy="sequential" ...
Fitting model: LightGBM ...
	0.85	 = Validation score   (roc_auc)
	0.19s	 = Training   runtime
	0.0s	 = Validation runtime
Fitting model: CatBoost ...
	0.8693	 = Validation score   (roc_auc)
	0.04s	 = Training   runtime
	0.0s	 = Validation runtime
Fitting model: XGBoost ...
	0.8616	 = Validation score   (roc_auc)
	0.04s	 = Training   runtime
	0.03s	 = Validation runtime
Fitting model: NeuralNetTorch ...
	0.8537	 = Validation score   (roc_auc)
	0.49s	 = Training   runtime
	0.01s	 = Validation runtime
Fitting model: WeightedEnsemble_L2 ...
	Ensemble Weights: {'XGBoost': 0.417, 'CatBoost': 0.375, 'LightGBM': 0.125, 'NeuralNetTorch': 0.083}
	0.878	 = Validation score   (roc_auc)
	0.16s	 = Training   runtime
	0.0s	 = Validation runtime
AutoGluon training complete, total runtime = 1.15s ... Best model: WeightedEnsemble_L2 | Estimated inference throughput: 3967.3 rows/s (200 batch size)
TabularPredictor saved. To load, use: predictor = TabularPredictor.load("/home/ci/autogluon/docs/tutorials/tabular/advanced/AutogluonModels/ag-20250508_205550")
模型 测试分数 验证分数 评估指标 测试预测时间 验证预测时间 拟合时间 测试预测边际时间 验证预测边际时间 拟合边际时间 堆叠层 可推理 拟合顺序
0 加权集成_L2 0.900864 0.878010 roc_auc 0.112010 0.050412 0.910520 0.002755 0.001625 0.155592 2 True 5
1 XGBoost 0.890173 0.861627 roc_auc 0.047536 0.031453 0.037621 0.047536 0.031453 0.037621 1 True 3
2 CatBoost 0.887425 0.869325 roc_auc 0.007399 0.003745 0.035267 0.007399 0.003745 0.035267 1 True 2
3 NeuralNetTorch 0.879181 0.853665 roc_auc 0.046863 0.010511 0.488971 0.046863 0.010511 0.488971 1 True 4
4 LightGBM 0.870968 0.849980 roc_auc 0.007457 0.003078 0.193069 0.007457 0.003078 0.193069 1 True 1

创建和使用 AutoGluon 中的自定义指标就是这么简单!

如果您创建了自定义指标,可以考虑提交拉取请求 (PR),以便我们将其正式添加到 AutoGluon 中!

有关在 AutoGluon 中实现自定义模型的教程,请参阅向 AutoGluon 添加自定义模型

有关更多教程,请参阅预测表格中的列 - 快速入门预测表格中的列 - 深入了解