使用 ONNX Runtime generate() API 运行 Phi-3 vision 和 Phi-3.5 vision 模型

Phi-3 vision 和 Phi-3.5 vision 模型是小型但功能强大的多模态模型，允许您同时使用图像和文本来输出文本。它们可用于详细描述图像内容等场景。

onnxruntime-genai 0.5.1 及更高版本支持 Phi-3 vision 和 Phi-3.5 vision 模型。

您可以在此处下载模型

设置
选择您的平台
使用 DirectML 运行
使用 CUDA 运行
在 CPU 上运行

设置

安装 git 大文件系统扩展

HuggingFace 使用 git 进行版本控制。要下载 ONNX 模型，您需要安装 git lfs，如果尚未安装。
- Windows: winget install -e --id GitHub.GitLFS (如果您没有 winget，请从官方源下载并运行 exe 文件)
- Linux: apt-get install git-lfs
- MacOS: brew install git-lfs
然后运行 git lfs install
安装 HuggingFace CLI
```
pip install huggingface-hub[cli]
```

选择您的平台

注意：根据您的硬件，只需一个软件包和一个模型。也就是说，只需执行以下部分中的一个步骤。

使用 DirectML 运行

下载模型

huggingface-cli download microsoft/Phi-3.5-vision-instruct-onnx --include gpu/gpu-int4-rtn-block-32/* --local-dir .

此命令将模型下载到名为 gpu/gpu-int4-rtn-block-32 的文件夹中。

安装 generate() API
```
pip install onnxruntime-genai-directml
```

运行模型

使用 model-vision.py 运行模型。

curl https://raw.githubusercontent.com/microsoft/onnxruntime-genai/main/examples/python/model-vision.py -o model-vision.py
pip install pyreadline3
python model-vision.py -m gpu/gpu-int4-rtn-block-32 -e dml

输入图像文件路径和提示。模型将使用图像和提示来给出答案。

例如：牌子上写了什么？

coffee

The sign says 'DO NOT ENTER'.

使用 CUDA 运行

下载模型

huggingface-cli download microsoft/Phi-3.5-vision-instruct-onnx --include gpu/gpu-int4-rtn-block-32/* --local-dir .

此命令将模型下载到名为 gpu/gpu-int4-rtn-block-32 的文件夹中。

设置您的 CUDA 环境

安装 CUDA 工具包。

确保 CUDA_PATH 环境变量已设置为您的 CUDA 安装位置。
安装 generate() API

注意：此软件包使用 CUDA 12。要使用 CUDA 11，您需要从源代码构建和安装。
```
pip install onnxruntime-genai-cuda
```

运行模型

使用 model-vision.py 运行模型。

curl https://raw.githubusercontent.com/microsoft/onnxruntime-genai/main/examples/python/model-vision.py -o model-vision.py
pip install pyreadline3
python model-vision.py -m gpu/gpu-int4-rtn-block-32 -e cuda

输入图像文件路径和提示，模型将使用图像和提示来给出答案。

例如：描述这张图片

coffee

The image shows a cup of coffee with a latte art design on top. The coffee is a light brown color,
and the art is white with a leaf-like pattern. The cup is white and has a handle on one side.</s>

在 CPU 上运行

下载模型

huggingface-cli download microsoft/Phi-3.5-vision-instruct-onnx --include cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4/* --local-dir .

此命令将模型下载到名为 cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4 的文件夹中

为 CPU 安装 generate() API
```
pip install onnxruntime-genai
```

运行模型

使用 model-vision.py 运行模型。

curl https://raw.githubusercontent.com/microsoft/onnxruntime-genai/main/examples/python/model-vision.py -o model-vision.py
pip install pyreadline3
python model-vision.py -m cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4 -e cpu

输入图像文件路径和提示，模型将使用图像和提示来给出答案。

例如：将此图片转换为 Markdown 格式

Excel table with cookie sales figures

| Product             | Qtr 1      | Qtr 2      | Grand Total |
|---------------------|------------|------------|-------------|
| Chocolade           | $744.60    | $162.56    | $907.16     |
| Gummibarchen        | $5,079.60  | $1,249.20  | $6,328.80   |
| Scottish Longbreads | $1,267.50  | $1,062.50  | $2,330.00   |
| Sir Rodney's Scones | $1,418.00  | $756.00    | $2,174.00   |
| Tarte au sucre      | $4,728.00  | $4,547.92  | $9,275.92   |
| Chocolate Biscuits  | $943.89    | $349.60    | $1,293.49   |
| Total               | $14,181.59 | $8,127.78  | $22,309.37  |

The table lists various products along with their sales figures for Qtr 1, Qtr 2, and the Grand Total.
The products include Chocolade, Gummibarchen, Scottish Longbreads, Sir Rodney's Scones, Tarte au sucre,
and Chocolate Biscuits. The Grand Total column sums up the sales for each product across the two quarters.</s>