使用 TensorRT 加速预测¶
TensorRT,基于 NVIDIA CUDA® 并行编程模型构建,通过利用 NVIDIA AI、自主机器、高性能计算和图形中的库、开发工具和技术,使我们能够优化推理。AutoGluon-MultiModal 现在通过 predictor.optimize_for_inference()
接口与 TensorRT 集成。本教程演示了如何利用 TensorRT 提升推理速度,这有助于提高部署环境中的效率。
import os
import numpy as np
import time
import warnings
from IPython.display import clear_output
warnings.filterwarnings('ignore')
np.random.seed(123)
安装所需的包¶
由于 tensorrt/onnx/onnxruntime-gpu 包目前是 autogluon.multimodal 的可选依赖项,我们需要确保这些包已正确安装。
try:
import tensorrt, onnx, onnxruntime
print(f"tensorrt=={tensorrt.__version__}, onnx=={onnx.__version__}, onnxruntime=={onnxruntime.__version__}")
except ImportError:
!pip install autogluon.multimodal[tests]
!pip install -U "tensorrt>=10.0.0b0,<11.0"
clear_output()
数据集¶
为了演示,我们使用了 PetFinder 数据集 的简化子采样版本。任务是根据动物的领养资料信息预测它们的领养率。在这个简化版本中,领养速度被分为两类:0(慢)和 1(快)。
首先,让我们下载并准备数据集。
download_dir = './ag_automm_tutorial'
zip_file = 'https://automl-mm-bench.s3.amazonaws.com/petfinder_for_tutorial.zip'
from autogluon.core.utils.loaders import load_zip
load_zip.unzip(zip_file, unzip_dir=download_dir)
Downloading ./ag_automm_tutorial/file.zip from https://automl-mm-bench.s3.amazonaws.com/petfinder_for_tutorial.zip...
0%| | 0.00/18.8M [00:00<?, ?iB/s]
37%|███▋ | 6.85M/18.8M [00:00<00:00, 68.5MiB/s]
73%|███████▎ | 13.7M/18.8M [00:00<00:00, 48.8MiB/s]
100%|██████████| 18.8M/18.8M [00:00<00:00, 35.3MiB/s]
接下来,我们将加载 CSV 文件。
import pandas as pd
dataset_path = download_dir + '/petfinder_for_tutorial'
train_data = pd.read_csv(f'{dataset_path}/train.csv', index_col=0)
test_data = pd.read_csv(f'{dataset_path}/test.csv', index_col=0)
label_col = 'AdoptionSpeed'
我们需要扩展图片路径以便在训练中加载它们。
image_col = 'Images'
train_data[image_col] = train_data[image_col].apply(lambda ele: ele.split(';')[0]) # Use the first image for a quick tutorial
test_data[image_col] = test_data[image_col].apply(lambda ele: ele.split(';')[0])
def path_expander(path, base_folder):
path_l = path.split(';')
return ';'.join([os.path.abspath(os.path.join(base_folder, path)) for path in path_l])
train_data[image_col] = train_data[image_col].apply(lambda ele: path_expander(ele, base_folder=dataset_path))
test_data[image_col] = test_data[image_col].apply(lambda ele: path_expander(ele, base_folder=dataset_path))
每只动物的领养资料包括图片、文本描述以及各种表格特征,例如年龄、品种、名字、颜色等。
训练¶
现在让我们用训练数据拟合预测器。这里我们设置一个紧凑的时间预算以便快速演示。
from autogluon.multimodal import MultiModalPredictor
hyperparameters = {
"optim.max_epochs": 2,
"model.names": ["numerical_mlp", "categorical_mlp", "timm_image", "hf_text", "fusion_mlp"],
"model.timm_image.checkpoint_name": "mobilenetv3_small_100",
"model.hf_text.checkpoint_name": "google/electra-small-discriminator",
}
predictor = MultiModalPredictor(label=label_col).fit(
train_data=train_data,
hyperparameters=hyperparameters,
time_limit=120, # seconds
)
clear_output()
在底层,AutoMM 自动推断问题类型(分类或回归)、检测数据模态、从多模态模型池中选择相关模型,并训练选定的模型。如果存在多个骨干网络,AutoMM 会在其顶部附加一个晚期融合模型(MLP 或 transformer)。
使用默认 PyTorch 模块进行预测¶
给定一个没有标签列的多模态数据框,我们可以预测标签。
请注意,这里我们将使用测试数据的小样本进行基准测试。稍后,我们将在整个测试数据集上进行评估以评估精度损失。
batch_size = 2
n_trails = 10
sample = test_data.head(batch_size)
# Use first prediction for initialization (e.g., allocating memory)
y_pred = predictor.predict_proba(sample)
pred_time = []
for _ in range(n_trails):
tic = time.time()
y_pred = predictor.predict_proba(sample)
elapsed = time.time()-tic
pred_time.append(elapsed)
print(f"elapsed (pytorch): {elapsed*1000:.1f} ms (batch_size={batch_size})")
elapsed (pytorch): 387.8 ms (batch_size=2)
elapsed (pytorch): 395.8 ms (batch_size=2)
elapsed (pytorch): 395.9 ms (batch_size=2)
elapsed (pytorch): 395.5 ms (batch_size=2)
elapsed (pytorch): 391.1 ms (batch_size=2)
elapsed (pytorch): 394.5 ms (batch_size=2)
elapsed (pytorch): 393.3 ms (batch_size=2)
elapsed (pytorch): 395.3 ms (batch_size=2)
elapsed (pytorch): 388.7 ms (batch_size=2)
elapsed (pytorch): 401.1 ms (batch_size=2)
INFO: Using default `ModelCheckpoint`. Consider installing `litmodels` package to enable `LitModelCheckpoint` for automatic upload to the Lightning model registry.
INFO: Using default `ModelCheckpoint`. Consider installing `litmodels` package to enable `LitModelCheckpoint` for automatic upload to the Lightning model registry.
INFO: Using default `ModelCheckpoint`. Consider installing `litmodels` package to enable `LitModelCheckpoint` for automatic upload to the Lightning model registry.
INFO: Using default `ModelCheckpoint`. Consider installing `litmodels` package to enable `LitModelCheckpoint` for automatic upload to the Lightning model registry.
INFO: Using default `ModelCheckpoint`. Consider installing `litmodels` package to enable `LitModelCheckpoint` for automatic upload to the Lightning model registry.
INFO: Using default `ModelCheckpoint`. Consider installing `litmodels` package to enable `LitModelCheckpoint` for automatic upload to the Lightning model registry.
INFO: Using default `ModelCheckpoint`. Consider installing `litmodels` package to enable `LitModelCheckpoint` for automatic upload to the Lightning model registry.
INFO: Using default `ModelCheckpoint`. Consider installing `litmodels` package to enable `LitModelCheckpoint` for automatic upload to the Lightning model registry.
INFO: Using default `ModelCheckpoint`. Consider installing `litmodels` package to enable `LitModelCheckpoint` for automatic upload to the Lightning model registry.
INFO: Using default `ModelCheckpoint`. Consider installing `litmodels` package to enable `LitModelCheckpoint` for automatic upload to the Lightning model registry.
INFO: Using default `ModelCheckpoint`. Consider installing `litmodels` package to enable `LitModelCheckpoint` for automatic upload to the Lightning model registry.
使用 TensorRT 模块进行预测¶
首先,让我们加载一个新的预测器并对其进行推理优化。
model_path = predictor.path
trt_predictor = MultiModalPredictor.load(path=model_path)
trt_predictor.optimize_for_inference()
# Again, use first prediction for initialization (e.g., allocating memory)
y_pred_trt = trt_predictor.predict_proba(sample)
clear_output()
Load pretrained checkpoint: /home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20250508_213159/model.ckpt
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[8], line 3
1 model_path = predictor.path
2 trt_predictor = MultiModalPredictor.load(path=model_path)
----> 3 trt_predictor.optimize_for_inference()
5 # Again, use first prediction for initialization (e.g., allocating memory)
6 y_pred_trt = trt_predictor.predict_proba(sample)
File ~/autogluon/multimodal/src/autogluon/multimodal/predictor.py:956, in MultiModalPredictor.optimize_for_inference(self, providers)
931 def optimize_for_inference(
932 self,
933 providers: Optional[Union[dict, List[str]]] = None,
934 ):
935 """
936 Optimize the predictor's model for inference.
937
(...)
954 The onnx-based module that can be used to replace predictor._model for model inference.
955 """
--> 956 return self._learner.optimize_for_inference(providers=providers)
File ~/autogluon/multimodal/src/autogluon/multimodal/utils/export.py:199, in ExportMixin.optimize_for_inference(self, providers)
196 data = pd.DataFrame.from_dict(data_dict)
198 onnx_module = None
--> 199 onnx_path = self.export_onnx(data=data, truncate_long_and_double=True)
201 onnx_module = OnnxModule(onnx_path, providers)
202 onnx_module.input_keys = self._model.input_keys
File ~/autogluon/multimodal/src/autogluon/multimodal/utils/export.py:122, in ExportMixin.export_onnx(self, data, path, batch_size, verbose, opset_version, truncate_long_and_double)
119 warnings.warn("Currently, the functionality of exporting to ONNX is experimental.")
121 # Data preprocessing, loading, and filtering
--> 122 batch = self.get_processed_batch_for_deployment(
123 data=data,
124 onnx_tracing=True,
125 batch_size=batch_size,
126 truncate_long_and_double=truncate_long_and_double,
127 )
128 input_keys = self._model.input_keys
129 input_vec = [batch[k] for k in input_keys]
File ~/autogluon/multimodal/src/autogluon/multimodal/utils/export.py:288, in ExportMixin.get_processed_batch_for_deployment(self, data, onnx_tracing, batch_size, to_numpy, requires_label, truncate_long_and_double)
286 inp = batch[key]
287 # support mixed precision on floating point inputs, and leave integer inputs (for language models) untouched.
--> 288 if inp.dtype.is_floating_point:
289 batch[key] = inp.to(device, dtype=dtype)
290 else:
AttributeError: 'tuple' object has no attribute 'dtype'
在底层,optimize_for_inference()
函数将生成一个基于 onnxruntime 的模块,该模块可以作为 torch.nn.Module 的直接替代。它将替换内部基于 torch 的模块 predictor._model
以进行优化推理。
警告
函数 optimize_for_inference()
将修改内部模型定义,仅用于推理。在此之后调用 predictor.fit()
将导致错误。建议使用 MultiModalPredictor.load
重新加载模型,以便重新训练模型。
然后,我们可以像往常一样执行预测或提取嵌入。为了公平地比较推理速度,这里我们多次运行预测。
pred_time_trt = []
for _ in range(n_trails):
tic = time.time()
y_pred_trt = trt_predictor.predict_proba(sample)
elapsed = time.time()-tic
pred_time_trt.append(elapsed)
print(f"elapsed (tensorrt): {elapsed*1000:.1f} ms (batch_size={batch_size})")
为了验证预测结果的正确性,我们可以并排比较结果。
让我们看一眼预期结果和 TensorRT 结果。
y_pred, y_pred_trt
由于我们默认使用混合精度(FP16),可能会有精度损失。我们可以看到概率非常接近,并且对于大多数情况,我们应该能够安全地假设这些结果相对接近。更多详细信息,请参阅 TensorRT 开发者指南中的降精度部分。
np.testing.assert_allclose(y_pred, y_pred_trt, atol=0.01)
可视化推理速度¶
我们可以通过除以预测时间来计算推理时间。
infer_speed = batch_size/np.mean(pred_time)
infer_speed_trt = batch_size/np.mean(pred_time_trt)
然后,可视化速度提升。
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
fig.set_figheight(1.5)
ax.barh(["PyTorch", "TensorRT"], [infer_speed, infer_speed_trt])
ax.annotate(f"{infer_speed:.1f} rows/s", xy=(infer_speed, 0))
ax.annotate(f"{infer_speed_trt:.1f} rows/s", xy=(infer_speed_trt, 1))
_ = plt.xlabel('Inference Speed (rows per second)')
比较评估指标¶
既然我们可以通过 optimize_for_inference()
实现更好的推理速度,那么对潜在的精度损失是否有影响呢?
让我们先进行整个测试数据集的评估。
metric = predictor.evaluate(test_data)
metric_trt = trt_predictor.evaluate(test_data)
clear_output()
metric_df = pd.DataFrame.from_dict({"PyTorch": metric, "TensorRT": metric_trt})
metric_df
评估结果预计会非常接近。
如果评估结果之间存在显著差异,请尝试通过使用 CUDA 执行提供程序禁用混合精度
predictor.optimize_for_inference(providers=["CUDAExecutionProvider"])
有关提供程序的完整列表,请参阅 执行提供程序。
其他示例¶
您可以访问 AutoMM 示例 来探索 AutoMM 的其他示例。
自定义¶
要了解如何自定义 AutoMM,请参阅 自定义 AutoMM。