AutoMM 预设¶

众所周知，在学习过程开始之前，我们通常需要设置超参数。深度学习模型，例如预训练的基础模型，可能拥有少量到数百个超参数。超参数会影响训练速度、最终模型性能和推理延迟。然而，对于许多专业知识有限的用户来说，选择合适的超参数可能具有挑战性。

在本教程中，我们将介绍 AutoMM 中易于使用的预设。我们的预设可以将复杂的超参数设置浓缩为简单的字符串。更具体地说，AutoMM 支持三种预设：medium_quality、high_quality 和 best_quality。

import warnings

warnings.filterwarnings('ignore')

数据集¶

为了演示，我们使用了一个子采样的斯坦福情感树库（SST）数据集，该数据集包含电影评论及其相关情感。对于一篇新的电影评论，目标是预测文本中反映的情感（在本例中是**二元分类**，如果评论表达了积极观点，则标记为 1，否则标记为 0）。要开始，让我们下载并准备数据集。

from autogluon.core.utils.loaders import load_pd

train_data = load_pd.load('https://autogluon-text.s3-accelerate.amazonaws.com/glue/sst/train.parquet')
test_data = load_pd.load('https://autogluon-text.s3-accelerate.amazonaws.com/glue/sst/dev.parquet')
subsample_size = 1000  # subsample data for faster demo, try setting this to larger values
train_data = train_data.sample(n=subsample_size, random_state=0)
train_data.head(10)

	句子	标签
43787	在最佳时刻非常令人愉悦	1
16159	，美式奶茶足以让你收起...	0
59015	太像 Ram Dass 的广告片了...	0
5108	令人激动人心的视觉序列	1
67052	炫酷的视觉逆向掩码	1
35938	坚硬的地面	0
49879	引人注目、悄然脆弱的个性...	1
51591	Pan Nalin 的阐述既美妙又神秘...	1
56780	非常古怪	1
28518	最美妙，最能引起共鸣	1

中等质量¶

在某些情况下，我们倾向于快速训练和推理，而不是预测质量。medium_quality 正为此目的而设计。在这三种预设中，medium_quality 的模型尺寸最小。现在，让我们使用 medium_quality 预设来拟合预测器。这里我们设置了一个较短的时间预算以进行快速演示。

from autogluon.multimodal import MultiModalPredictor

predictor = MultiModalPredictor(label='label', eval_metric='acc', presets="medium_quality")
predictor.fit(
    train_data=train_data,
    time_limit=20, # seconds
)

No path specified. Models will be saved in: "AutogluonModels/ag-20250508_212516"
=================== System Info ===================
AutoGluon Version:  1.3.1b20250508
Python Version:     3.11.9
Operating System:   Linux
Platform Machine:   x86_64
Platform Version:   #1 SMP Wed Mar 12 14:53:59 UTC 2025
CPU Count:          8
Pytorch Version:    2.6.0+cu124
CUDA Version:       12.4
Memory Avail:       28.40 GB / 30.95 GB (91.8%)
Disk Space Avail:   166.07 GB / 255.99 GB (64.9%)
===================================================
AutoGluon infers your prediction problem is: 'binary' (because only two unique label-values observed).
	2 unique label values:  [np.int64(1), np.int64(0)]
	If 'binary' is not the correct problem_type, please manually specify the problem_type parameter during Predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression', 'quantile'])

AutoMM starts to create your model. ✨✨✨

To track the learning progress, you can open a terminal and launch Tensorboard:
    ```shell
    # Assume you have installed tensorboard
    tensorboard --logdir /home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20250508_212516
    ```
Seed set to 0
GPU Count: 1
GPU Count to be Used: 1
Using 16bit Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name              | Type                         | Params | Mode 
---------------------------------------------------------------------------
0 | model             | HFAutoModelForTextPrediction | 13.5 M | train
1 | validation_metric | MulticlassAccuracy           | 0      | train
2 | loss_func         | CrossEntropyLoss             | 0      | train
---------------------------------------------------------------------------
13.5 M    Trainable params
0         Non-trainable params
13.5 M    Total params
53.934    Total estimated model params size (MB)
230       Modules in train mode
0         Modules in eval mode
Epoch 0, global step 3: 'val_accuracy' reached 0.47000 (best 0.47000), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20250508_212516/epoch=0-step=3.ckpt' as top 3
Epoch 0, global step 7: 'val_accuracy' reached 0.58000 (best 0.58000), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20250508_212516/epoch=0-step=7.ckpt' as top 3
Epoch 1, global step 10: 'val_accuracy' reached 0.61000 (best 0.61000), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20250508_212516/epoch=1-step=10.ckpt' as top 3
Epoch 1, global step 14: 'val_accuracy' reached 0.64000 (best 0.64000), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20250508_212516/epoch=1-step=14.ckpt' as top 3
Epoch 2, global step 17: 'val_accuracy' reached 0.72500 (best 0.72500), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20250508_212516/epoch=2-step=17.ckpt' as top 3
Time limit reached. Elapsed time is 0:00:20. Signaling Trainer to stop.
Start to fuse 3 checkpoints via the greedy soup algorithm.
Using default `ModelCheckpoint`. Consider installing `litmodels` package to enable `LitModelCheckpoint` for automatic upload to the Lightning model registry.
Using default `ModelCheckpoint`. Consider installing `litmodels` package to enable `LitModelCheckpoint` for automatic upload to the Lightning model registry.
Using default `ModelCheckpoint`. Consider installing `litmodels` package to enable `LitModelCheckpoint` for automatic upload to the Lightning model registry.
AutoMM has created your model. 🎉🎉🎉

To load the model, use the code below:
    ```python
    from autogluon.multimodal import MultiModalPredictor
    predictor = MultiModalPredictor.load("/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20250508_212516")
    ```

If you are not satisfied with the model, try to increase the training time, 
adjust the hyperparameters (https://autogluon.cn/stable/tutorials/multimodal/advanced_topics/customization.html),
or post issues on GitHub (https://github.com/autogluon/autogluon/issues).

<autogluon.multimodal.predictor.MultiModalPredictor at 0x7f7daa350ed0>

然后我们可以在测试数据上评估预测器。

scores = predictor.evaluate(test_data, metrics=["roc_auc"])
scores

Using default `ModelCheckpoint`. Consider installing `litmodels` package to enable `LitModelCheckpoint` for automatic upload to the Lightning model registry.

{'roc_auc': np.float64(0.8515092194998738)}

高质量¶

如果您想平衡预测质量与训练/推理速度，可以尝试 high_quality 预设，它使用的模型比 medium_quality 更大。相应地，由于更大的模型需要更多时间来训练，我们需要增加时间限制。

from autogluon.multimodal import MultiModalPredictor

predictor = MultiModalPredictor(label='label', eval_metric='acc', presets="high_quality")
predictor.fit(
    train_data=train_data,
    time_limit=20, # seconds
)

No path specified. Models will be saved in: "AutogluonModels/ag-20250508_212541"
=================== System Info ===================
AutoGluon Version:  1.3.1b20250508
Python Version:     3.11.9
Operating System:   Linux
Platform Machine:   x86_64
Platform Version:   #1 SMP Wed Mar 12 14:53:59 UTC 2025
CPU Count:          8
Pytorch Version:    2.6.0+cu124
CUDA Version:       12.4
Memory Avail:       27.36 GB / 30.95 GB (88.4%)
Disk Space Avail:   165.97 GB / 255.99 GB (64.8%)
===================================================
AutoGluon infers your prediction problem is: 'binary' (because only two unique label-values observed).
	2 unique label values:  [np.int64(1), np.int64(0)]
	If 'binary' is not the correct problem_type, please manually specify the problem_type parameter during Predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression', 'quantile'])

AutoMM starts to create your model. ✨✨✨

To track the learning progress, you can open a terminal and launch Tensorboard:
    ```shell
    # Assume you have installed tensorboard
    tensorboard --logdir /home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20250508_212541
    ```
Seed set to 0
GPU Count: 1
GPU Count to be Used: 1
Using 16bit Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name              | Type                         | Params | Mode 
---------------------------------------------------------------------------
0 | model             | HFAutoModelForTextPrediction | 108 M  | train
1 | validation_metric | MulticlassAccuracy           | 0      | train
2 | loss_func         | CrossEntropyLoss             | 0      | train
---------------------------------------------------------------------------
108 M     Trainable params
0         Non-trainable params
108 M     Total params
435.573   Total estimated model params size (MB)
229       Modules in train mode
0         Modules in eval mode
Epoch 0, global step 3: 'val_accuracy' reached 0.55500 (best 0.55500), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20250508_212541/epoch=0-step=3.ckpt' as top 3
Epoch 0, global step 7: 'val_accuracy' reached 0.59500 (best 0.59500), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20250508_212541/epoch=0-step=7.ckpt' as top 3
Time limit reached. Elapsed time is 0:00:22. Signaling Trainer to stop.
Start to fuse 2 checkpoints via the greedy soup algorithm.
Using default `ModelCheckpoint`. Consider installing `litmodels` package to enable `LitModelCheckpoint` for automatic upload to the Lightning model registry.
Using default `ModelCheckpoint`. Consider installing `litmodels` package to enable `LitModelCheckpoint` for automatic upload to the Lightning model registry.
AutoMM has created your model. 🎉🎉🎉

To load the model, use the code below:
    ```python
    from autogluon.multimodal import MultiModalPredictor
    predictor = MultiModalPredictor.load("/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20250508_212541")
    ```

If you are not satisfied with the model, try to increase the training time, 
adjust the hyperparameters (https://autogluon.cn/stable/tutorials/multimodal/advanced_topics/customization.html),
or post issues on GitHub (https://github.com/autogluon/autogluon/issues).

<autogluon.multimodal.predictor.MultiModalPredictor at 0x7f7cae609a10>

尽管 high_quality 比 medium_quality 需要更多的训练时间，但它也带来了性能提升。

scores = predictor.evaluate(test_data, metrics=["roc_auc"])
scores

Using default `ModelCheckpoint`. Consider installing `litmodels` package to enable `LitModelCheckpoint` for automatic upload to the Lightning model registry.

{'roc_auc': np.float64(0.6236028668855772)}

最佳质量¶

如果您想要最佳性能而不关心训练/推理成本，可以尝试 best_quality 预设。在这种情况下，推荐使用配备大内存的高端 GPU。与 high_quality 相比，它需要更长的训练时间。

from autogluon.multimodal import MultiModalPredictor

predictor = MultiModalPredictor(label='label', eval_metric='acc', presets="best_quality")
predictor.fit(train_data=train_data, time_limit=180)

No path specified. Models will be saved in: "AutogluonModels/ag-20250508_212615"
=================== System Info ===================
AutoGluon Version:  1.3.1b20250508
Python Version:     3.11.9
Operating System:   Linux
Platform Machine:   x86_64
Platform Version:   #1 SMP Wed Mar 12 14:53:59 UTC 2025
CPU Count:          8
Pytorch Version:    2.6.0+cu124
CUDA Version:       12.4
Memory Avail:       25.86 GB / 30.95 GB (83.6%)
Disk Space Avail:   165.56 GB / 255.99 GB (64.7%)
===================================================
AutoGluon infers your prediction problem is: 'binary' (because only two unique label-values observed).
	2 unique label values:  [np.int64(1), np.int64(0)]
	If 'binary' is not the correct problem_type, please manually specify the problem_type parameter during Predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression', 'quantile'])

AutoMM starts to create your model. ✨✨✨

To track the learning progress, you can open a terminal and launch Tensorboard:
    ```shell
    # Assume you have installed tensorboard
    tensorboard --logdir /home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20250508_212615
    ```
Seed set to 0
GPU Count: 1
GPU Count to be Used: 1
Using 16bit Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name              | Type                         | Params | Mode 
---------------------------------------------------------------------------
0 | model             | HFAutoModelForTextPrediction | 183 M  | train
1 | validation_metric | MulticlassAccuracy           | 0      | train
2 | loss_func         | CrossEntropyLoss             | 0      | train
---------------------------------------------------------------------------
183 M     Trainable params
0         Non-trainable params
183 M     Total params
735.332   Total estimated model params size (MB)
241       Modules in train mode
0         Modules in eval mode
Epoch 0, global step 3: 'val_accuracy' reached 0.43000 (best 0.43000), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20250508_212615/epoch=0-step=3.ckpt' as top 3
Epoch 0, global step 7: 'val_accuracy' reached 0.56500 (best 0.56500), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20250508_212615/epoch=0-step=7.ckpt' as top 3
Epoch 1, global step 10: 'val_accuracy' reached 0.57000 (best 0.57000), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20250508_212615/epoch=1-step=10.ckpt' as top 3
Epoch 1, global step 14: 'val_accuracy' reached 0.67500 (best 0.67500), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20250508_212615/epoch=1-step=14.ckpt' as top 3
Time limit reached. Elapsed time is 0:03:00. Signaling Trainer to stop.
Start to fuse 3 checkpoints via the greedy soup algorithm.
Using default `ModelCheckpoint`. Consider installing `litmodels` package to enable `LitModelCheckpoint` for automatic upload to the Lightning model registry.
Using default `ModelCheckpoint`. Consider installing `litmodels` package to enable `LitModelCheckpoint` for automatic upload to the Lightning model registry.
Using default `ModelCheckpoint`. Consider installing `litmodels` package to enable `LitModelCheckpoint` for automatic upload to the Lightning model registry.
AutoMM has created your model. 🎉🎉🎉

To load the model, use the code below:
    ```python
    from autogluon.multimodal import MultiModalPredictor
    predictor = MultiModalPredictor.load("/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20250508_212615")
    ```

If you are not satisfied with the model, try to increase the training time, 
adjust the hyperparameters (https://autogluon.cn/stable/tutorials/multimodal/advanced_topics/customization.html),
or post issues on GitHub (https://github.com/autogluon/autogluon/issues).

<autogluon.multimodal.predictor.MultiModalPredictor at 0x7f7caf5ee250>

我们可以看到 best_quality 实现了比 high_quality 更好的性能。

scores = predictor.evaluate(test_data, metrics=["roc_auc"])
scores

Using default `ModelCheckpoint`. Consider installing `litmodels` package to enable `LitModelCheckpoint` for automatic upload to the Lightning model registry.

{'roc_auc': np.float64(0.8041461438073587)}

HPO 预设¶

上述三种预设都使用默认超参数，这可能并非最优。幸运的是，我们也支持使用简单预设进行超参数优化 (HPO)。要执行 HPO，您可以在这三种预设后添加后缀 _hpo，得到 medium_quality_hpo、high_quality_hpo 和 best_quality_hpo。

显示预设¶

如果您想查看每个预设的内部详情，我们提供了一个实用函数来获取超参数设置。例如，以下是 high_quality 预设的超参数。

import json
from autogluon.multimodal.utils.presets import get_presets

hyperparameters, hyperparameter_tune_kwargs = get_presets(problem_type="default", presets="high_quality")
print(f"hyperparameters: {json.dumps(hyperparameters, sort_keys=True, indent=4)}")
print(f"hyperparameter_tune_kwargs: {json.dumps(hyperparameter_tune_kwargs, sort_keys=True, indent=4)}")

hyperparameters: {
    "model.document_transformer.checkpoint_name": "microsoft/layoutlmv3-base",
    "model.hf_text.checkpoint_name": "google/electra-base-discriminator",
    "model.names": [
        "ft_transformer",
        "timm_image",
        "hf_text",
        "document_transformer",
        "fusion_mlp"
    ],
    "model.timm_image.checkpoint_name": "caformer_b36.sail_in22k_ft_in1k"
}
hyperparameter_tune_kwargs: {}

HPO 预设使多个超参数可调，例如模型骨干、批量大小、学习率、最大 epoch 和优化器类型。以下是 high_quality_hpo 预设的详细信息。

import json
import yaml
from autogluon.multimodal.utils.presets import get_presets

hyperparameters, hyperparameter_tune_kwargs = get_presets(problem_type="default", presets="high_quality_hpo")
print(f"hyperparameters: {yaml.dump(hyperparameters, allow_unicode=True, default_flow_style=False)}")
print(f"hyperparameter_tune_kwargs: {json.dumps(hyperparameter_tune_kwargs, sort_keys=True, indent=4)}")

hyperparameters: env.batch_size: !!python/object:ray.tune.search.sample.Categorical
  categories:
  - 16
  - 32
  - 64
  - 128
  - 256
  sampler: !!python/object:ray.tune.search.sample._Uniform {}
env.per_gpu_batch_size: 2
model.document_transformer.checkpoint_name: microsoft/layoutlmv3-base
model.hf_text.checkpoint_name: !!python/object:ray.tune.search.sample.Categorical
  categories:
  - google/electra-base-discriminator
  - google/flan-t5-base
  - microsoft/deberta-v3-small
  - roberta-base
  - albert-xlarge-v2
  sampler: !!python/object:ray.tune.search.sample._Uniform {}
model.names:
- ft_transformer
- timm_image
- hf_text
- document_transformer
- fusion_mlp
model.timm_image.checkpoint_name: !!python/object:ray.tune.search.sample.Categorical
  categories:
  - swin_base_patch4_window7_224
  - convnext_base_in22ft1k
  - vit_base_patch16_clip_224.laion2b_ft_in12k_in1k
  - caformer_b36.sail_in22k_ft_in1k
  sampler: !!python/object:ray.tune.search.sample._Uniform {}
optim.lr: !!python/object:ray.tune.search.sample.Float
  lower: 1.0e-05
  sampler: !!python/object:ray.tune.search.sample._LogUniform
    base: 10
  upper: 0.01
optim.max_epochs: !!python/object:ray.tune.search.sample.Categorical
  categories:
  - 5
  - 6
  - 7
  - 8
  - 9
  - 10
  - 11
  - 12
  - 13
  - 14
  - 15
  - 16
  - 17
  - 18
  - 19
  - 20
  - 21
  - 22
  - 23
  - 24
  - 25
  - 26
  - 27
  - 28
  - 29
  - 30
  sampler: !!python/object:ray.tune.search.sample._Uniform {}
optim.optim_type: !!python/object:ray.tune.search.sample.Categorical
  categories:
  - adamw
  - sgd
  sampler: !!python/object:ray.tune.search.sample._Uniform {}

hyperparameter_tune_kwargs: {
    "num_trials": 512,
    "scheduler": "ASHA",
    "searcher": "bayes"
}

其他示例¶

您可以访问 AutoMM 示例查看其他关于 AutoMM 的示例。

自定义¶

要了解如何自定义 AutoMM，请参阅自定义 AutoMM。