AutoMM 文本和图像实体提取 - 快速入门¶
我们已经介绍了如何使用文本数据训练实体提取模型。在这里,我们将进一步整合其他模态的数据。在许多实际应用中,文本数据通常伴随着其他模态的数据。例如,Twitter 允许您使用文本、照片、视频和 GIF 撰写推文。Amazon.com 使用文本、图像和视频来描述其产品。这些辅助模态可以作为实体的额外上下文解析。现在,使用 AutoMM,您可以轻松利用多模态数据来增强实体提取,而无需担心细节问题。
import os
import pandas as pd
import warnings
warnings.filterwarnings('ignore')
获取 Twitter 数据集¶
在下面的示例中,我们将演示如何使用真实的 Twitter 数据集构建多模态命名实体识别模型。该数据集包含 2016 年至 2017 年抓取的推文,每条推文由一个句子和一张图片组成。让我们下载数据集。
download_dir = './ag_automm_tutorial_ner'
zip_file = 'https://automl-mm-bench.s3.amazonaws.com/ner/multimodal_ner.zip'
from autogluon.core.utils.loaders import load_zip
load_zip.unzip(zip_file, unzip_dir=download_dir)
Downloading ./ag_automm_tutorial_ner/file.zip from https://automl-mm-bench.s3.amazonaws.com/ner/multimodal_ner.zip...
0%| | 0.00/423M [00:00<?, ?iB/s]
2%|▏ | 10.4M/423M [00:00<00:03, 104MiB/s]
5%|▍ | 20.8M/423M [00:00<00:07, 53.6MiB/s]
8%|▊ | 32.0M/423M [00:00<00:05, 71.2MiB/s]
10%|▉ | 40.7M/423M [00:00<00:06, 57.7MiB/s]
12%|█▏ | 49.8M/423M [00:00<00:06, 59.4MiB/s]
13%|█▎ | 56.5M/423M [00:01<00:08, 43.3MiB/s]
15%|█▍ | 62.9M/423M [00:01<00:07, 47.3MiB/s]
16%|█▌ | 68.7M/423M [00:01<00:08, 43.4MiB/s]
19%|█▊ | 79.2M/423M [00:01<00:06, 56.5MiB/s]
20%|██ | 85.9M/423M [00:01<00:07, 45.9MiB/s]
23%|██▎ | 96.9M/423M [00:01<00:05, 59.3MiB/s]
25%|██▍ | 104M/423M [00:01<00:05, 55.6MiB/s]
27%|██▋ | 115M/423M [00:02<00:04, 68.4MiB/s]
29%|██▉ | 123M/423M [00:02<00:05, 58.7MiB/s]
31%|███ | 130M/423M [00:02<00:05, 55.9MiB/s]
32%|███▏ | 137M/423M [00:02<00:05, 53.0MiB/s]
35%|███▍ | 148M/423M [00:02<00:04, 65.0MiB/s]
37%|███▋ | 155M/423M [00:02<00:05, 45.3MiB/s]
38%|███▊ | 161M/423M [00:03<00:06, 38.5MiB/s]
39%|███▉ | 166M/423M [00:03<00:07, 32.5MiB/s]
41%|████ | 172M/423M [00:03<00:07, 32.3MiB/s]
43%|████▎ | 183M/423M [00:03<00:05, 45.7MiB/s]
45%|████▍ | 188M/423M [00:03<00:05, 43.5MiB/s]
46%|████▌ | 194M/423M [00:04<00:06, 36.9MiB/s]
49%|████▊ | 205M/423M [00:04<00:04, 52.2MiB/s]
50%|█████ | 212M/423M [00:04<00:04, 42.8MiB/s]
53%|█████▎ | 223M/423M [00:04<00:03, 52.5MiB/s]
55%|█████▌ | 234M/423M [00:04<00:02, 64.1MiB/s]
57%|█████▋ | 242M/423M [00:04<00:03, 55.1MiB/s]
60%|█████▉ | 253M/423M [00:04<00:02, 67.3MiB/s]
62%|██████▏ | 261M/423M [00:05<00:03, 51.6MiB/s]
64%|██████▍ | 273M/423M [00:05<00:02, 64.0MiB/s]
66%|██████▋ | 281M/423M [00:05<00:02, 55.7MiB/s]
69%|██████▉ | 291M/423M [00:05<00:02, 63.2MiB/s]
71%|███████ | 299M/423M [00:05<00:02, 50.6MiB/s]
73%|███████▎ | 309M/423M [00:05<00:02, 51.2MiB/s]
75%|███████▌ | 319M/423M [00:06<00:01, 59.8MiB/s]
77%|███████▋ | 326M/423M [00:06<00:01, 56.7MiB/s]
79%|███████▊ | 332M/423M [00:06<00:01, 54.1MiB/s]
81%|████████ | 344M/423M [00:06<00:01, 56.3MiB/s]
84%|████████▍ | 355M/423M [00:06<00:00, 68.4MiB/s]
86%|████████▌ | 362M/423M [00:06<00:01, 54.0MiB/s]
88%|████████▊ | 374M/423M [00:06<00:00, 66.2MiB/s]
90%|█████████ | 382M/423M [00:07<00:00, 46.7MiB/s]
92%|█████████▏| 388M/423M [00:07<00:00, 42.3MiB/s]
93%|█████████▎| 395M/423M [00:07<00:00, 41.4MiB/s]
96%|█████████▌| 406M/423M [00:07<00:00, 54.1MiB/s]
98%|█████████▊| 413M/423M [00:07<00:00, 45.2MiB/s]
100%|█████████▉| 422M/423M [00:08<00:00, 33.2MiB/s]
100%|██████████| 423M/423M [00:08<00:00, 50.2MiB/s]
接下来,我们将加载 CSV 文件。
dataset_path = download_dir + '/multimodal_ner'
train_data = pd.read_csv(f'{dataset_path}/twitter17_train.csv')
test_data = pd.read_csv(f'{dataset_path}/twitter17_test.csv')
label_col = 'entity_annotations'
我们需要扩展图像路径以便在训练中加载它们。
image_col = 'image'
train_data[image_col] = train_data[image_col].apply(lambda ele: ele.split(';')[0]) # Use the first image for a quick tutorial
test_data[image_col] = test_data[image_col].apply(lambda ele: ele.split(';')[0])
def path_expander(path, base_folder):
path_l = path.split(';')
p = ';'.join([os.path.abspath(base_folder+path) for path in path_l])
return p
train_data[image_col] = train_data[image_col].apply(lambda ele: path_expander(ele, base_folder=dataset_path))
test_data[image_col] = test_data[image_col].apply(lambda ele: path_expander(ele, base_folder=dataset_path))
train_data[image_col].iloc[0]
'/home/ci/autogluon/docs/tutorials/multimodal/multimodal_prediction/ag_automm_tutorial_ner/multimodal_ner/twitter2017_images/17_06_1818.jpg'
每一行都包含一条推文的文本和图像,以及包含文本列的命名实体标注的 entity_annotataions
。让我们看一个示例行,并显示推文的文本和图片。
example_row = train_data.iloc[0]
example_row
text_snippet Uefa Super Cup : Real Madrid v Manchester United
image /home/ci/autogluon/docs/tutorials/multimodal/m...
entity_annotations [{"entity_group": "B-MISC", "start": 0, "end":...
Name: 0, dtype: object
下面是这条推文的图片。
example_image = example_row[image_col]
from IPython.display import Image, display
pil_img = Image(filename=example_image, width =300)
display(pil_img)
正如您所见,这张照片包含皇家马德里足球俱乐部、曼联足球俱乐部以及欧洲超级杯的标志。显然,推文句子的关键信息以不同的模态编码在这里。
训练¶
现在,让我们使用训练数据来拟合预测器。首先,我们需要将 problem_type 指定为 **ner**。由于我们的标注用于文本列,为确保模型找到正确的文本列进行实体提取,在存在多个文本列的情况下,我们需要使用 **column_types** 参数将相应的列类型设置为 text_ner
。这里我们设置了一个紧张的时间预算,以便快速演示。
from autogluon.multimodal import MultiModalPredictor
import uuid
label_col = "entity_annotations"
model_path = f"./tmp/{uuid.uuid4().hex}-automm_multimodal_ner"
predictor = MultiModalPredictor(problem_type="ner", label=label_col, path=model_path)
predictor.fit(
train_data=train_data,
column_types={"text_snippet":"text_ner"},
time_limit=300, #second
)
=================== System Info ===================
AutoGluon Version: 1.3.1b20250508
Python Version: 3.11.9
Operating System: Linux
Platform Machine: x86_64
Platform Version: #1 SMP Wed Mar 12 14:53:59 UTC 2025
CPU Count: 8
Pytorch Version: 2.6.0+cu124
CUDA Version: 12.4
Memory Avail: 28.40 GB / 30.95 GB (91.8%)
Disk Space Avail: 180.79 GB / 255.99 GB (70.6%)
===================================================
AutoMM starts to create your model. ✨✨✨
To track the learning progress, you can open a terminal and launch Tensorboard:
```shell
# Assume you have installed tensorboard
tensorboard --logdir /home/ci/autogluon/docs/tutorials/multimodal/multimodal_prediction/tmp/2bedc65a8cb2477ea10e14a9c46427eb-automm_multimodal_ner
```
INFO: Seed set to 0
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[8], line 7
5 model_path = f"./tmp/{uuid.uuid4().hex}-automm_multimodal_ner"
6 predictor = MultiModalPredictor(problem_type="ner", label=label_col, path=model_path)
----> 7 predictor.fit(
8 train_data=train_data,
9 column_types={"text_snippet":"text_ner"},
10 time_limit=300, #second
11 )
File ~/autogluon/multimodal/src/autogluon/multimodal/predictor.py:540, in MultiModalPredictor.fit(self, train_data, presets, tuning_data, max_num_tuning_data, id_mappings, time_limit, save_path, hyperparameters, column_types, holdout_frac, teacher_predictor, seed, standalone, hyperparameter_tune_kwargs, clean_ckpts, predictions, labels, predictors)
537 assert isinstance(predictors, list)
538 learners = [ele if isinstance(ele, str) else ele._learner for ele in predictors]
--> 540 self._learner.fit(
541 train_data=train_data,
542 presets=presets,
543 tuning_data=tuning_data,
544 max_num_tuning_data=max_num_tuning_data,
545 time_limit=time_limit,
546 save_path=save_path,
547 hyperparameters=hyperparameters,
548 column_types=column_types,
549 holdout_frac=holdout_frac,
550 teacher_learner=teacher_learner,
551 seed=seed,
552 standalone=standalone,
553 hyperparameter_tune_kwargs=hyperparameter_tune_kwargs,
554 clean_ckpts=clean_ckpts,
555 id_mappings=id_mappings,
556 predictions=predictions,
557 labels=labels,
558 learners=learners,
559 )
561 return self
File ~/autogluon/multimodal/src/autogluon/multimodal/learners/base.py:665, in BaseLearner.fit(self, train_data, presets, tuning_data, time_limit, save_path, hyperparameters, column_types, holdout_frac, teacher_learner, seed, standalone, hyperparameter_tune_kwargs, clean_ckpts, **kwargs)
658 self.fit_sanity_check()
659 self.prepare_fit_args(
660 time_limit=time_limit,
661 seed=seed,
662 standalone=standalone,
663 clean_ckpts=clean_ckpts,
664 )
--> 665 fit_returns = self.execute_fit()
666 self.on_fit_end(
667 training_start=training_start,
668 strategy=fit_returns.get("strategy", None),
(...)
671 clean_ckpts=clean_ckpts,
672 )
674 return self
File ~/autogluon/multimodal/src/autogluon/multimodal/learners/base.py:577, in BaseLearner.execute_fit(self)
575 return dict()
576 else:
--> 577 attributes = self.fit_per_run(**self._fit_args)
578 self.update_attributes(**attributes) # only update attributes for non-HPO mode
579 return attributes
File ~/autogluon/multimodal/src/autogluon/multimodal/learners/ner.py:203, in NERLearner.fit_per_run(self, max_time, save_path, ckpt_path, resume, enable_progress_bar, seed, hyperparameters, advanced_hyperparameters, config, df_preprocessor, data_processors, model, standalone, clean_ckpts)
201 config = self.update_config_by_data_per_run(config=config, df_preprocessor=df_preprocessor)
202 output_shape = self.get_output_shape_per_run(df_preprocessor=df_preprocessor)
--> 203 model = self.get_model_per_run(
204 model=model,
205 config=config,
206 df_preprocessor=df_preprocessor,
207 output_shape=output_shape,
208 )
209 model = self.compile_model_per_run(config=config, model=model)
210 peft_param_names = self.get_peft_param_names_per_run(model=model, config=config)
File ~/autogluon/multimodal/src/autogluon/multimodal/learners/ner.py:105, in NERLearner.get_model_per_run(self, model, config, df_preprocessor, output_shape)
97 def get_model_per_run(
98 self,
99 model: nn.Module,
(...)
102 output_shape: int,
103 ):
104 if model is None:
--> 105 model = create_fusion_model(
106 config=config,
107 num_classes=output_shape,
108 num_numerical_columns=len(df_preprocessor.numerical_feature_names),
109 num_categories=df_preprocessor.categorical_num_categories,
110 )
111 return model
File ~/autogluon/multimodal/src/autogluon/multimodal/models/utils.py:1649, in create_fusion_model(config, num_classes, classes, num_numerical_columns, num_categories, numerical_fill_values, pretrained)
1645 single_models.append(model)
1647 if len(single_models) > 1:
1648 # must have one fusion model if there are multiple independent models
-> 1649 model = fusion_model(models=single_models)
1650 elif len(single_models) == 1:
1651 model = single_models[0]
File ~/autogluon/multimodal/src/autogluon/multimodal/models/fusion/fusion_ner.py:67, in MultimodalFusionNER.__init__(self, prefix, models, hidden_features, num_classes, adapt_in_features, activation, dropout_prob, normalization, loss_weight)
23 def __init__(
24 self,
25 prefix: str,
(...)
33 loss_weight: Optional[float] = None,
34 ):
35 """
36 Parameters
37 ----------
(...)
65 The weight of individual models.
66 """
---> 67 super().__init__(
68 prefix=prefix,
69 models=models,
70 loss_weight=loss_weight,
71 )
72 logger.debug("initializing MultimodalFusionNER")
74 if loss_weight is not None:
TypeError: AbstractMultimodalFusionModel.__init__() got an unexpected keyword argument 'loss_weight'
在底层,AutoMM 会自动检测数据模态,从多模态模型池中选择相关模型,并训练选定的模型。如果存在多个骨干模型,AutoMM 会在它们之上附加一个后期融合模型。
评估¶
predictor.evaluate(test_data, metrics=['overall_recall', "overall_precision", "overall_f1"])
预测¶
通过调用 predictor.predict()
,您可以轻松获得预测结果。
prediction_input = test_data.drop(columns=label_col).head(1)
predictions = predictor.predict(prediction_input)
print('Tweet:', prediction_input.text_snippet[0])
print('Image path:', prediction_input.image[0])
print('Predicted entities:', predictions[0])
for entity in predictions[0]:
print(f"Word '{prediction_input.text_snippet[0][entity['start']:entity['end']]}' belongs to group: {entity['entity_group']}")
重新加载和持续训练¶
训练好的预测器会自动保存,您可以使用路径轻松重新加载它。如果您对当前模型性能不满意,可以使用新数据继续训练已加载的模型。
new_predictor = MultiModalPredictor.load(model_path)
new_model_path = f"./tmp/{uuid.uuid4().hex}-automm_multimodal_ner_continue_train"
new_predictor.fit(train_data, time_limit=60, save_path=new_model_path)
test_score = new_predictor.evaluate(test_data, metrics=['overall_f1'])
print(test_score)
其他示例¶
您可以访问 AutoMM 示例,探索关于 AutoMM 的其他示例。
定制¶
要了解如何定制 AutoMM,请参考 定制 AutoMM。