Azure 执行提供程序 (预览)

Azure 执行提供程序使 ONNX Runtime 能够调用远程 Azure 终结点进行推理,终结点必须预先部署或可用。

自 1.16 版本起,以下可插拔运算符可从 onnxruntime-extensions 获取

通过这些运算符,Azure 执行提供程序支持两种使用模式

Azure 执行提供程序处于预览阶段,所有 API 和用法都可能会发生变化。

目录

安装

自 1.16 版本起,Azure 执行提供程序默认包含在 python 和 nuget 包中。

要求

自 1.16 版本起,所有 Azure 执行提供程序运算符都随 onnxruntime-extensions (>=v0.9.0) python 和 nuget 包一起提供。请确保在使用 Azure 执行提供程序之前安装正确的 onnxruntime-extension 包。

构建

有关构建说明,请参阅构建页面

用法

边缘和 Azure 并行

在这种模式下,有两个模型同时运行。azure 模型通过 RunAsync API 异步运行,该 API 也可通过 pythoncsharp 使用。

import os
import onnx
from onnx import helper, TensorProto
from onnxruntime_extensions import get_library_path
from onnxruntime import SessionOptions, InferenceSession
import numpy as np
import threading


# Generate the local model by:
# https://github.com/microsoft/onnxruntime-extensions/blob/main/tutorials/whisper_e2e.py
def get_whiper_tiny():
    return '/onnxruntime-extensions/tutorials/whisper_onnx_tiny_en_fp32_e2e.onnx'


# Generate the azure model
def get_openai_audio_azure_model():
    auth_token = helper.make_tensor_value_info('auth_token', TensorProto.STRING, [1])
    model = helper.make_tensor_value_info('model_name', TensorProto.STRING, [1])
    response_format = helper.make_tensor_value_info('response_format', TensorProto.STRING, [-1])
    file = helper.make_tensor_value_info('file', TensorProto.UINT8, [-1])

    transcriptions = helper.make_tensor_value_info('transcriptions', TensorProto.STRING, [-1])

    invoker = helper.make_node('OpenAIAudioToText',
                               ['auth_token', 'model_name', 'response_format', 'file'],
                               ['transcriptions'],
                               domain='com.microsoft.extensions',
                               name='audio_invoker',
                               model_uri='https://api.openai.com/v1/audio/transcriptions',
                               audio_format='wav',
                               verbose=False)

    graph = helper.make_graph([invoker], 'graph', [auth_token, model, response_format, file], [transcriptions])
    model = helper.make_model(graph, ir_version=8,
                              opset_imports=[helper.make_operatorsetid('com.microsoft.extensions', 1)])
    model_name = 'openai_whisper_azure.onnx'
    onnx.save(model, model_name)
    return model_name


if __name__ == '__main__':
    sess_opt = SessionOptions()
    sess_opt.register_custom_ops_library(get_library_path())

    azure_model_path = get_openai_audio_azure_model()
    azure_model_sess = InferenceSession(azure_model_path,
        sess_opt, providers=['CPUExecutionProvider', 'AzureExecutionProvider'])  # load AzureEP

    with open('test16.wav', "rb") as _f:  # read raw audio data from a local wav file
        audio_stream = np.asarray(list(_f.read()), dtype=np.uint8)

    azure_model_inputs = {
        "auth_token": np.array([os.getenv('AUDIO', '')]),  # read auth from env variable
        "model_name": np.array(['whisper-1']),
        "response_format":  np.array(['text']),
        "file": audio_stream
    }


    class RunAsyncState:
        def __init__(self):
            self.__event = threading.Event()
            self.__outputs = None
            self.__err = ''

        def fill_outputs(self, outputs, err):
            self.__outputs = outputs
            self.__err = err
            self.__event.set()

        def get_outputs(self):
            if self.__err != '':
                raise Exception(self.__err)
            return self.__outputs;

        def wait(self, sec):
            self.__event.wait(sec)


    def azureRunCallback(outputs: np.ndarray, state: RunAsyncState, err: str) -> None:
        state.fill_outputs(outputs, err)


    run_async_state = RunAsyncState();
    # infer azure model asynchronously
    azure_model_sess.run_async(None, azure_model_inputs, azureRunCallback, run_async_state)

    # in the same time, run the edge
    edge_model_path = get_whiper_tiny()
    edge_model_sess = InferenceSession(edge_model_path,
        sess_opt, providers=['CPUExecutionProvider'])

    edge_model_outputs = edge_model_sess.run(None, {
        'audio_stream': np.expand_dims(audio_stream, 0),
        'max_length': np.asarray([200], dtype=np.int32),
        'min_length': np.asarray([0], dtype=np.int32),
        'num_beams': np.asarray([2], dtype=np.int32),
        'num_return_sequences': np.asarray([1], dtype=np.int32),
        'length_penalty': np.asarray([1.0], dtype=np.float32),
        'repetition_penalty': np.asarray([1.0], dtype=np.float32)
    })

    print("\noutput from whisper tiny: ", edge_model_outputs)
    run_async_state.wait(10)
    print("\nresponse from openAI: ", run_async_state.get_outputs())
    # compare results and pick the better

合并并运行混合模型

或者,也可以将本地模型和 azure 模型合并为混合模型,然后像普通的 onnx 模型一样进行推理。示例脚本可以在此处找到。

当前限制

  • 仅在 Windows、Linux 和 Android 上构建和运行。
  • 对于 Android,不支持 AzureTritonInvoker。