向 AutoGluon 添加自定义指标¶

提示：如果您是 AutoGluon 的新手，请查阅预测表格中的列 - 快速入门以了解 AutoGluon API 的基础知识。

本教程介绍了如何向 AutoGluon 添加自定义评估指标，该指标用于指导验证分数、模型集成、超参数调优等。

在此示例中，我们展示了多种评估指标以及如何将它们转换为 AutoGluon Scorer（Scorer 源代码），然后可以将 Scorer 传递给 AutoGluon 模型和预测器。

首先，我们将随机生成 10 个真实标签和预测结果，并展示如何计算它们的指标分数。

import numpy as np

rng = np.random.default_rng(seed=42)
y_true = rng.integers(low=0, high=2, size=10)
y_pred = rng.integers(low=0, high=2, size=10)

print(f'y_true: {y_true}')
print(f'y_pred: {y_pred}')

y_true: [0 1 1 0 0 1 0 1 0 0]
y_pred: [1 1 1 1 1 1 1 0 1 0]

确保指标可序列化¶

自定义指标必须在单独的 Python 文件中定义并导入，以便它们可以被 pickle（Python 的序列化协议）。如果自定义指标不可 pickle 化，当 AutoGluon 尝试使用 Ray 并行训练模型时，它将在 fit 过程中崩溃。在下面的示例中，您需要在新的 python 文件（例如 my_metrics.py）中定义 ag_accuracy_scorer，然后通过 from my_metrics import ag_accuracy_scorer 来使用它。

如果您的指标不可序列化，您将收到许多类似于 _pickle.PicklingError: Can't pickle 的错误。有关示例，请参阅 https://github.com/autogluon/autogluon/issues/1637。有关如何在 Kaggle 上指定自定义指标的示例，请参阅此 Kaggle Notebook。

为了演示方便，本教程中的自定义指标不可序列化。如果使用了 best_quality 预设，调用 fit() 将会崩溃。

自定义准确率指标¶

我们将首先创建一个自定义准确率指标。如果预测值与真实值相同，则预测正确；否则，预测错误。

首先，让我们使用默认的 sklearn 准确率评分器

import sklearn.metrics

sklearn.metrics.accuracy_score(y_true, y_pred)

0.4

上述逻辑存在多种局限性。例如，在不知道指标外部信息的情况下，以下信息未知：

最优值是多少 (1)
值越高越好吗 (True)
指标需要预测、类别预测还是类别概率（类别预测）

现在，让我们将此评估指标转换为 AutoGluon Scorer，以解决这些限制。

我们通过调用 autogluon.core.metrics.make_scorer 来实现（源代码：autogluon/core/metrics/__init__.py）。

from autogluon.core.metrics import make_scorer

ag_accuracy_scorer = make_scorer(name='accuracy',
                                 score_func=sklearn.metrics.accuracy_score,
                                 optimum=1,
                                 greater_is_better=True,
                                 needs_class=True)

创建 Scorer 时，我们需要为 Scorer 指定一个名称。这不需要是特定值，但会在训练期间打印 Scorer 相关信息时使用。

接下来，我们指定 score_func。这是我们要包装的函数，在此示例中是 sklearn 的 accuracy_score 函数。

然后我们需要指定 optimum 值。这在计算 error（也称为 regret）而非 score 时是必需的。error 定义为 sign * optimum - score，其中如果 greater_is_better=True，则 sign=1，否则 sign=-1。它也有助于识别分数何时达到最优且无法改进。由于 sklearn.metrics.accuracy_score 返回的最佳可能值是 1，我们指定 optimum=1。

接下来我们需要指定 greater_is_better。在此示例中，greater_is_better=True，因为返回的最佳值是 1，而返回的最差值小于 1 (0)。正确设置此值非常重要，否则 AutoGluon 将尝试优化最差模型而不是最佳模型。

最后，我们根据使用的指标类型指定一个布尔值 needs_*。可用选项包括：[needs_pred, needs_proba, needs_class, needs_threshold, needs_quantile]。除了 needs_pred 之外，所有选项都默认为 False，而 needs_pred 根据其他四个选项推断，其中只有一个可以设置为 True。如果未指定任何选项，则指标将被视为回归指标（needs_pred=True）。

以下是每个选项的详细说明：

needs_pred : bool | str, default="auto"
    Whether score_func requires the predict model method output as input to scoring.
    If "auto", will be inferred based on the values of the other `needs_*` arguments.
    Defaults to True if all other `needs_*` are False.
    Examples: ["root_mean_squared_error", "mean_squared_error", "r2", "mean_absolute_error", "median_absolute_error", "spearmanr", "pearsonr"]

needs_proba : bool, default=False
    Whether score_func requires predict_proba to get probability estimates out of a classifier.
    These scorers can benefit from calibration methods such as temperature scaling.
    Examples: ["log_loss", "roc_auc_ovo", "roc_auc_ovr", "pac"]

needs_class : bool, default=False
    Whether score_func requires class predictions (classification only).
    This is required to determine if the scorer is impacted by a decision threshold.
    These scorers can benefit from decision threshold calibration methods such as via `predictor.calibrate_decision_threshold()`.
    Examples: ["accuracy", "balanced_accuracy", "f1", "precision", "recall", "mcc", "quadratic_kappa", "f1_micro", "f1_macro", "f1_weighted"]

needs_threshold : bool, default=False
    Whether score_func takes a continuous decision certainty.
    This only works for binary classification.
    These scorers care about the rank order of the prediction probabilities to calculate their scores, and are undefined if given a single sample to score.
    Examples: ["roc_auc", "average_precision"]

needs_quantile : bool, default=False
    Whether score_func is based on quantile predictions.
    This only works for quantile regression.
    Examples: ["pinball_loss"]

因为我们正在创建准确率评分器，我们需要类别预测，因此我们指定 needs_class=True。

高级说明：optimum 必须与原始指标可调用（在此示例中是 sklearn.metrics.accuracy_score）的最优值对应。假设，如果某个指标可调用是 greater_is_better=False，最优值为 -2，您应该指定 optimum=-2, greater_is_better=False。在这种情况下，如果 raw_metric_value=-0.5，则 Scorer 将返回 score=0.5 以强制执行 higher_is_better（score = sign * raw_metric_value）。Scorer 的误差将为 error=1.5，因为 sign (-1) * optimum (-2) - score (0.5) = 1.5

创建后，AutoGluon Scorer 可以像原始指标一样调用来计算 score。

# score
ag_accuracy_scorer(y_true, y_pred)

0.4

另外，.score 是上面可调用的别名，方便使用。

ag_accuracy_scorer.score(y_true, y_pred)

0.4

要获取误差而非分数：

# error, error=sign*optimum-score -> error=1*1-score -> error=1-score
ag_accuracy_scorer.error(y_true, y_pred)

# Can also convert score to error and vice-versa:
# score = ag_accuracy_scorer(y_true, y_pred)
# error = ag_accuracy_scorer.convert_score_to_error(score)
# score = ag_accuracy_scorer.convert_error_to_score(error)

# Can also convert score to the original score that would be returned in `score_func`:
# score_orig = ag_accuracy_scorer.convert_score_to_original(score)  # score_orig = sign * score

0.6

请注意，score 采用 higher_is_better 格式，而 error 采用 lower_is_better 格式。误差为 0 表示完美预测。

自定义均方误差指标¶

接下来，让我们展示如何将回归指标转换为 Scorers 的示例。

首先，我们生成随机的真实标签及其预测，但这次它们是浮点数而不是整数。

y_true = rng.random(10)
y_pred = rng.random(10)

print(f'y_true: {y_true}')
print(f'y_pred: {y_pred}')

y_true: [0.37079802 0.92676499 0.64386512 0.82276161 0.4434142  0.22723872
 0.55458479 0.06381726 0.82763117 0.6316644 ]
y_pred: [0.75808774 0.35452597 0.97069802 0.89312112 0.7783835  0.19463871
 0.466721   0.04380377 0.15428949 0.68304895]

一个常见的回归指标是均方误差。

sklearn.metrics.mean_squared_error(y_true, y_pred)

0.11666381947652146

ag_mean_squared_error_scorer = make_scorer(name='mean_squared_error',
                                           score_func=sklearn.metrics.mean_squared_error,
                                           optimum=0,
                                           greater_is_better=False)

在这种情况下，optimum=0，因为这是一个误差指标。

此外，greater_is_better=False，因为 sklearn 报告误差为正值，且值越低越好。

关于 AutoGluon Scorers 的一个非常重要的点是，在内部，它们总是以 greater_is_better=True 的形式报告分数。这意味着如果原始指标是 greater_is_better=False，AutoGluon 的 Scorer 将翻转值。因此，score 将表示为负值。

这样做是为了确保不同指标之间的一致性。

# score
ag_mean_squared_error_scorer(y_true, y_pred)

-0.11666381947652146

# error, error=sign*optimum-score -> error=-1*0-score -> error=-score
ag_mean_squared_error_scorer.error(y_true, y_pred)

0.11666381947652146

我们还可以指定 sklearn 之外的指标。例如，下面是均方误差的最小实现：

def mse_func(y_true: np.ndarray, y_pred: np.ndarray) -> float:
    return ((y_true - y_pred) ** 2).mean()

mse_func(y_true, y_pred)

np.float64(0.11666381947652146)

唯一的要求是函数接受两个参数：y_true 和 y_pred（或 y_pred_proba），它们是 numpy 数组，并返回一个浮点值。

使用与之前相同的代码，我们可以创建一个 AutoGluon Scorer。

ag_mean_squared_error_custom_scorer = make_scorer(name='mean_squared_error',
                                                  score_func=mse_func,
                                                  optimum=0,
                                                  greater_is_better=False)
ag_mean_squared_error_custom_scorer(y_true, y_pred)

np.float64(-0.11666381947652146)

自定义 ROC AUC 指标¶

这里我们展示一个阈值指标 roc_auc 的示例。阈值指标关注预测的相对顺序，而不是其绝对值。

y_true = rng.integers(low=0, high=2, size=10)
y_pred_proba = rng.random(10)

print(f'y_true:       {y_true}')
print(f'y_pred_proba: {y_pred_proba}')

y_true:       [1 1 0 1 0 0 1 0 0 0]
y_pred_proba: [0.18947136 0.12992151 0.47570493 0.22690935 0.66981399 0.43715192
 0.8326782  0.7002651  0.31236664 0.8322598 ]

sklearn.metrics.roc_auc_score(y_true, y_pred_proba)

np.float64(0.25)

我们需要指定 needs_threshold=True，以便下游模型正确使用该指标。

# Score functions that need decision values
ag_roc_auc_scorer = make_scorer(name='roc_auc',
                                score_func=sklearn.metrics.roc_auc_score,
                                optimum=1,
                                greater_is_better=True,
                                needs_threshold=True)
ag_roc_auc_scorer(y_true, y_pred_proba)

np.float64(0.25)

在 TabularPredictor 中使用自定义指标¶

现在我们已经创建了几个自定义 Scorers，让我们将它们用于训练和评估模型。

在本教程中，我们将使用 Adult Income 数据集。

from autogluon.tabular import TabularDataset

train_data = TabularDataset('https://autogluon.s3.amazonaws.com/datasets/Inc/train.csv')  # can be local CSV file as well, returns Pandas DataFrame
test_data = TabularDataset('https://autogluon.s3.amazonaws.com/datasets/Inc/test.csv')  # another Pandas DataFrame
label = 'class'  # specifies which column we want to predict
train_data = train_data.sample(n=1000, random_state=0)  # subsample dataset for faster demo

train_data.head(5)

	年龄	工作类型	最终权重	教育	教育年限	婚姻状况	职业	关系	种族	性别	资本收益	资本损失	每周工作小时数	原籍国	类别
6118	51	私营	39264	部分大学	10	已婚平民配偶	执行经理	妻子	白人	女性	0	0	40	美国	>>50K
23204	58	私营	51662	10年级	6	已婚平民配偶	其他服务	妻子	白人	女性	0	0	8	美国	><=50K
29590	40	私营	326310	部分大学	10	已婚平民配偶	手工艺修理	丈夫	白人	男性	0	0	44	美国	><=50K
18116	37	私营	222450	高中毕业	9	未婚	销售	非家庭成员	白人	男性	0	2339	40	萨尔瓦多	><=50K
33964	62	私营	109190	学士	13	已婚平民配偶	执行经理	丈夫	白人	男性	15024	0	40	美国	>>50K

from autogluon.tabular import TabularPredictor

predictor = TabularPredictor(label=label).fit(train_data, hyperparameters='toy')

predictor.leaderboard(test_data)

No path specified. Models will be saved in: "AutogluonModels/ag-20250508_205545"
Verbosity: 2 (Standard Logging)
=================== System Info ===================
AutoGluon Version:  1.3.1b20250508
Python Version:     3.11.9
Operating System:   Linux
Platform Machine:   x86_64
Platform Version:   #1 SMP Wed Mar 12 14:53:59 UTC 2025
CPU Count:          8
Memory Avail:       28.78 GB / 30.95 GB (93.0%)
Disk Space Avail:   212.08 GB / 255.99 GB (82.8%)
===================================================
No presets specified! To achieve strong results with AutoGluon, it is recommended to use the available presets. Defaulting to `'medium'`...
	Recommended Presets (For more details refer to https://autogluon.cn/stable/tutorials/tabular/tabular-essentials.html#presets):
	presets='experimental' : New in v1.2: Pre-trained foundation model + parallel fits. The absolute best accuracy without consideration for inference speed. Does not support GPU.
	presets='best'         : Maximize accuracy. Recommended for most users. Use in competitions and benchmarks.
	presets='high'         : Strong accuracy with fast inference speed.
	presets='good'         : Good accuracy with very fast inference speed.
	presets='medium'       : Fast training time, ideal for initial prototyping.
Beginning AutoGluon training ...
AutoGluon will save models to "/home/ci/autogluon/docs/tutorials/tabular/advanced/AutogluonModels/ag-20250508_205545"
Train Data Rows:    1000
Train Data Columns: 14
Label Column:       class
AutoGluon infers your prediction problem is: 'binary' (because only two unique label-values observed).
	2 unique label values:  [' >50K', ' <=50K']
	If 'binary' is not the correct problem_type, please manually specify the problem_type parameter during Predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression', 'quantile'])
Problem Type:       binary
Preprocessing data ...
Selected class <--> label mapping:  class 1 =  >50K, class 0 =  <=50K
	Note: For your binary classification, AutoGluon arbitrarily selected which label-value represents positive ( >50K) vs negative ( <=50K) class.
	To explicitly set the positive_class, either rename classes to 1 and 0, or specify positive_class in Predictor init.
Using Feature Generators to preprocess the data ...
Fitting AutoMLPipelineFeatureGenerator...
	Available Memory:                    29465.08 MB
	Train Data (Original)  Memory Usage: 0.56 MB (0.0% of available memory)
	Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.
	Stage 1 Generators:
		Fitting AsTypeFeatureGenerator...
			Note: Converting 1 features to boolean dtype as they only contain 2 unique values.
	Stage 2 Generators:
		Fitting FillNaFeatureGenerator...
	Stage 3 Generators:
		Fitting IdentityFeatureGenerator...
		Fitting CategoryFeatureGenerator...
			Fitting CategoryMemoryMinimizeFeatureGenerator...
	Stage 4 Generators:
		Fitting DropUniqueFeatureGenerator...
	Stage 5 Generators:
		Fitting DropDuplicatesFeatureGenerator...
	Types of features in original data (raw dtype, special dtypes):
		('int', [])    : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...]
		('object', []) : 8 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...]
	Types of features in processed data (raw dtype, special dtypes):
		('category', [])  : 7 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...]
		('int', [])       : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...]
		('int', ['bool']) : 1 | ['sex']
	0.1s = Fit runtime
	14 features in original data used to generate 14 features in processed data.
	Train Data (Processed) Memory Usage: 0.06 MB (0.0% of available memory)
Data preprocessing and feature engineering runtime = 0.1s ...
AutoGluon will gauge predictive performance using evaluation metric: 'accuracy'
	To change this, specify the eval_metric parameter of Predictor()
Automatically generating train/validation split with holdout_frac=0.2, Train Rows: 800, Val Rows: 200
User-specified model hyperparameters to be fit:
{
	'NN_TORCH': [{'num_epochs': 5}],
	'GBM': [{'num_boost_round': 10}],
	'CAT': [{'iterations': 10}],
	'XGB': [{'n_estimators': 10}],
}
Fitting 4 L1 models, fit_strategy="sequential" ...
Fitting model: LightGBM ...
	0.77	 = Validation score   (accuracy)
	0.25s	 = Training   runtime
	0.0s	 = Validation runtime
Fitting model: CatBoost ...
	0.86	 = Validation score   (accuracy)
	0.17s	 = Training   runtime
	0.03s	 = Validation runtime
Fitting model: XGBoost ...
	0.84	 = Validation score   (accuracy)
	0.39s	 = Training   runtime
	0.01s	 = Validation runtime
Fitting model: NeuralNetTorch ...
	0.84	 = Validation score   (accuracy)
	2.95s	 = Training   runtime
	0.01s	 = Validation runtime
Fitting model: WeightedEnsemble_L2 ...
	Ensemble Weights: {'CatBoost': 1.0}
	0.86	 = Validation score   (accuracy)
	0.05s	 = Training   runtime
	0.0s	 = Validation runtime
AutoGluon training complete, total runtime = 4.0s ... Best model: WeightedEnsemble_L2 | Estimated inference throughput: 7107.8 rows/s (200 batch size)
Disabling decision threshold calibration for metric `accuracy` due to having fewer than 10000 rows of validation data for calibration, to avoid overfitting (200 rows).
	`accuracy` is generally not improved through threshold calibration. Force calibration via specifying `calibrate_decision_threshold=True`.
TabularPredictor saved. To load, use: predictor = TabularPredictor.load("/home/ci/autogluon/docs/tutorials/tabular/advanced/AutogluonModels/ag-20250508_205545")

	模型	测试分数	验证分数	评估指标	测试预测时间	验证预测时间	拟合时间	测试预测边际时间	验证预测边际时间	拟合边际时间	堆叠层	可推理	拟合顺序
0	CatBoost	0.842768	0.86	准确率	0.006248	0.027356	0.167093	0.006248	0.027356	0.167093	1	True	2
1	加权集成_L2	0.842768	0.86	准确率	0.008153	0.028138	0.215383	0.001905	0.000782	0.048290	2	True	5
2	XGBoost	0.836831	0.84	准确率	0.203856	0.005759	0.390637	0.203856	0.005759	0.390637	1	True	3
3	NeuralNetTorch	0.828027	0.84	准确率	0.048275	0.011158	2.947366	0.048275	0.011158	2.947366	1	True	4
4	LightGBM	0.780940	0.77	准确率	0.005517	0.004092	0.250402	0.005517	0.004092	0.250402	1	True	1

我们可以通过 extra_metrics 参数将自定义指标传递给 predictor.leaderboard

predictor.leaderboard(test_data, extra_metrics=[ag_roc_auc_scorer, ag_accuracy_scorer])

	模型	测试分数	roc_auc	准确率	验证分数	评估指标	测试预测时间	验证预测时间	拟合时间	测试预测边际时间	验证预测边际时间	拟合边际时间	堆叠层	可推理	拟合顺序
0	CatBoost	0.842768	0.863760	0.842768	0.86	准确率	0.005653	0.027356	0.167093	0.005653	0.027356	0.167093	1	True	2
1	加权集成_L2	0.842768	0.863760	0.842768	0.86	准确率	0.007480	0.028138	0.215383	0.001827	0.000782	0.048290	2	True	5
2	XGBoost	0.836831	0.890173	0.836831	0.84	准确率	0.048751	0.005759	0.390637	0.048751	0.005759	0.390637	1	True	3
3	NeuralNetTorch	0.828027	0.879181	0.828027	0.84	准确率	0.047922	0.011158	2.947366	0.047922	0.011158	2.947366	1	True	4
4	LightGBM	0.780940	0.861131	0.780940	0.77	准确率	0.005408	0.004092	0.250402	0.005408	0.004092	0.250402	1	True	1

我们还可以通过 eval_metric 参数在初始化时将自定义指标传递给预测器本身

predictor_custom = TabularPredictor(label=label, eval_metric=ag_roc_auc_scorer).fit(train_data, hyperparameters='toy')

predictor_custom.leaderboard(test_data)

No path specified. Models will be saved in: "AutogluonModels/ag-20250508_205550"
Verbosity: 2 (Standard Logging)
=================== System Info ===================
AutoGluon Version:  1.3.1b20250508
Python Version:     3.11.9
Operating System:   Linux
Platform Machine:   x86_64
Platform Version:   #1 SMP Wed Mar 12 14:53:59 UTC 2025
CPU Count:          8
Memory Avail:       28.36 GB / 30.95 GB (91.7%)
Disk Space Avail:   212.08 GB / 255.99 GB (82.8%)
===================================================
No presets specified! To achieve strong results with AutoGluon, it is recommended to use the available presets. Defaulting to `'medium'`...
	Recommended Presets (For more details refer to https://autogluon.cn/stable/tutorials/tabular/tabular-essentials.html#presets):
	presets='experimental' : New in v1.2: Pre-trained foundation model + parallel fits. The absolute best accuracy without consideration for inference speed. Does not support GPU.
	presets='best'         : Maximize accuracy. Recommended for most users. Use in competitions and benchmarks.
	presets='high'         : Strong accuracy with fast inference speed.
	presets='good'         : Good accuracy with very fast inference speed.
	presets='medium'       : Fast training time, ideal for initial prototyping.
Beginning AutoGluon training ...
AutoGluon will save models to "/home/ci/autogluon/docs/tutorials/tabular/advanced/AutogluonModels/ag-20250508_205550"
Train Data Rows:    1000
Train Data Columns: 14
Label Column:       class
AutoGluon infers your prediction problem is: 'binary' (because only two unique label-values observed).
	2 unique label values:  [' >50K', ' <=50K']
	If 'binary' is not the correct problem_type, please manually specify the problem_type parameter during Predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression', 'quantile'])
Problem Type:       binary
Preprocessing data ...
Selected class <--> label mapping:  class 1 =  >50K, class 0 =  <=50K
	Note: For your binary classification, AutoGluon arbitrarily selected which label-value represents positive ( >50K) vs negative ( <=50K) class.
	To explicitly set the positive_class, either rename classes to 1 and 0, or specify positive_class in Predictor init.
Using Feature Generators to preprocess the data ...
Fitting AutoMLPipelineFeatureGenerator...
	Available Memory:                    29045.71 MB
	Train Data (Original)  Memory Usage: 0.56 MB (0.0% of available memory)
	Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.
	Stage 1 Generators:
		Fitting AsTypeFeatureGenerator...
			Note: Converting 1 features to boolean dtype as they only contain 2 unique values.
	Stage 2 Generators:
		Fitting FillNaFeatureGenerator...
	Stage 3 Generators:
		Fitting IdentityFeatureGenerator...
		Fitting CategoryFeatureGenerator...
			Fitting CategoryMemoryMinimizeFeatureGenerator...
	Stage 4 Generators:
		Fitting DropUniqueFeatureGenerator...
	Stage 5 Generators:
		Fitting DropDuplicatesFeatureGenerator...
	Types of features in original data (raw dtype, special dtypes):
		('int', [])    : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...]
		('object', []) : 8 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...]
	Types of features in processed data (raw dtype, special dtypes):
		('category', [])  : 7 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...]
		('int', [])       : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...]
		('int', ['bool']) : 1 | ['sex']
	0.1s = Fit runtime
	14 features in original data used to generate 14 features in processed data.
	Train Data (Processed) Memory Usage: 0.06 MB (0.0% of available memory)
Data preprocessing and feature engineering runtime = 0.12s ...
AutoGluon will gauge predictive performance using evaluation metric: 'roc_auc'
	This metric expects predicted probabilities rather than predicted class labels, so you'll need to use predict_proba() instead of predict()
	To change this, specify the eval_metric parameter of Predictor()
Automatically generating train/validation split with holdout_frac=0.2, Train Rows: 800, Val Rows: 200
User-specified model hyperparameters to be fit:
{
	'NN_TORCH': [{'num_epochs': 5}],
	'GBM': [{'num_boost_round': 10}],
	'CAT': [{'iterations': 10}],
	'XGB': [{'n_estimators': 10}],
}
Fitting 4 L1 models, fit_strategy="sequential" ...
Fitting model: LightGBM ...
	0.85	 = Validation score   (roc_auc)
	0.19s	 = Training   runtime
	0.0s	 = Validation runtime
Fitting model: CatBoost ...
	0.8693	 = Validation score   (roc_auc)
	0.04s	 = Training   runtime
	0.0s	 = Validation runtime
Fitting model: XGBoost ...
	0.8616	 = Validation score   (roc_auc)
	0.04s	 = Training   runtime
	0.03s	 = Validation runtime
Fitting model: NeuralNetTorch ...
	0.8537	 = Validation score   (roc_auc)
	0.49s	 = Training   runtime
	0.01s	 = Validation runtime
Fitting model: WeightedEnsemble_L2 ...
	Ensemble Weights: {'XGBoost': 0.417, 'CatBoost': 0.375, 'LightGBM': 0.125, 'NeuralNetTorch': 0.083}
	0.878	 = Validation score   (roc_auc)
	0.16s	 = Training   runtime
	0.0s	 = Validation runtime
AutoGluon training complete, total runtime = 1.15s ... Best model: WeightedEnsemble_L2 | Estimated inference throughput: 3967.3 rows/s (200 batch size)
TabularPredictor saved. To load, use: predictor = TabularPredictor.load("/home/ci/autogluon/docs/tutorials/tabular/advanced/AutogluonModels/ag-20250508_205550")

	模型	测试分数	验证分数	评估指标	测试预测时间	验证预测时间	拟合时间	测试预测边际时间	验证预测边际时间	拟合边际时间	堆叠层	可推理	拟合顺序
0	加权集成_L2	0.900864	0.878010	roc_auc	0.112010	0.050412	0.910520	0.002755	0.001625	0.155592	2	True	5
1	XGBoost	0.890173	0.861627	roc_auc	0.047536	0.031453	0.037621	0.047536	0.031453	0.037621	1	True	3
2	CatBoost	0.887425	0.869325	roc_auc	0.007399	0.003745	0.035267	0.007399	0.003745	0.035267	1	True	2
3	NeuralNetTorch	0.879181	0.853665	roc_auc	0.046863	0.010511	0.488971	0.046863	0.010511	0.488971	1	True	4
4	LightGBM	0.870968	0.849980	roc_auc	0.007457	0.003078	0.193069	0.007457	0.003078	0.193069	1	True	1

创建和使用 AutoGluon 中的自定义指标就是这么简单！

如果您创建了自定义指标，可以考虑提交拉取请求 (PR)，以便我们将其正式添加到 AutoGluon 中！

有关在 AutoGluon 中实现自定义模型的教程，请参阅向 AutoGluon 添加自定义模型。

有关更多教程，请参阅预测表格中的列 - 快速入门和预测表格中的列 - 深入了解。