开始使用 C# 的 ORT

使用 .NET CLI 安装 Nuget 包

dotnet add package Microsoft.ML.OnnxRuntime --version 1.16.0
dotnet add package System.Numerics.Tensors --version 0.1.0

导入库

using Microsoft.ML.OnnxRuntime;
using System.Numerics.Tensors;

创建推理方法

这是一个 Azure Function 示例，它使用 ORT 和 C# 对使用 SciKit Learn 创建的 NLP 模型进行推理。

 public static async Task<IActionResult> Run(
            [HttpTrigger(AuthorizationLevel.Function, "get", "post", Route = null)] HttpRequest req,
            ILogger log, ExecutionContext context)
        {
            log.LogInformation("C# HTTP trigger function processed a request.");

            string review = req.Query["review"];

            string requestBody = await new StreamReader(req.Body).ReadToEndAsync();
            dynamic data = JsonConvert.DeserializeObject(requestBody);
            review ??= data.review;
            Debug.Assert(!string.IsNullOrEmpty(review), "Expecting a string with a content");

            // Get path to model to create inference session.
            const string modelPath = "./model.onnx";

            // Create an InferenceSession from the Model Path.
            // Creating and loading sessions are expensive per request.
            // They better be cached
            using var session = new InferenceSession(modelPath);

            // create input tensor (nlp example)
            using var inputOrtValue = OrtValue.CreateTensorWithEmptyStrings(OrtAllocator.DefaultInstance, new long[] { 1, 1 });
            inputOrtValue.StringTensorSetElementAt(review, 0);

            // Create input data for session. Request all outputs in this case.
            var inputs = new Dictionary<string, OrtValue>
            {
                { "input", inputOrtValue }
            };

            using var runOptions = new RunOptions();

            // We are getting a sequence of maps as output. We are interested in the first element (map) of the sequence.
            // That result is a Sequence of Maps, and we only need the first map from there.
            using var outputs = session.Run(runOptions, inputs, session.OutputNames);
            Debug.Assert(outputs.Count > 0, "Expecting some output");

            // We want the last output, which is the sequence of maps
            var lastOutput = outputs[outputs.Count - 1];

            // Optional code to check the output type
            {
                var outputTypeInfo = lastOutput.GetTypeInfo();
                Debug.Assert(outputTypeInfo.OnnxType == OnnxValueType.ONNX_TYPE_SEQUENCE, "Expecting a sequence");

                var sequenceTypeInfo = outputTypeInfo.SequenceTypeInfo;
                Debug.Assert(sequenceTypeInfo.ElementType.OnnxType == OnnxValueType.ONNX_TYPE_MAP, "Expecting a sequence of maps");
            }

            var elementsNum = lastOutput.GetValueCount();
            Debug.Assert(elementsNum > 0, "Expecting a non empty sequence");

            // Get the first map in sequence
            using var firstMap = lastOutput.GetValue(0, OrtAllocator.DefaultInstance);

            // Optional code just checking
            {
                // Maps always have two elements, keys and values
                // We are expecting this to be a map of strings to floats
                var mapTypeInfo = firstMap.GetTypeInfo().MapTypeInfo;
                Debug.Assert(mapTypeInfo.KeyType == TensorElementType.String, "Expecting keys to be strings");
                Debug.Assert(mapTypeInfo.ValueType.OnnxType == OnnxValueType.ONNX_TYPE_TENSOR, "Values are in the tensor");
                Debug.Assert(mapTypeInfo.ValueType.TensorTypeAndShapeInfo.ElementDataType == TensorElementType.Float, "Result map value is float");
            }

            var inferenceResult = new Dictionary<string, float>();
            // Let use the visitor to read map keys and values
            // Here keys and values are represented with the same number of corresponding entries
            // string -> float
            firstMap.ProcessMap((keys, values) => {
                // Access native buffer directly
                var valuesSpan = values.GetTensorDataAsSpan<float>();

                var entryCount = (int)keys.GetTensorTypeAndShape().ElementCount;
                inferenceResult.EnsureCapacity(entryCount);
                for (int i = 0; i < entryCount; ++i)
                {
                    inferenceResult.Add(keys.GetStringElement(i), valuesSpan[i]);
                }
            }, OrtAllocator.DefaultInstance);


            // Return the inference result as json.
            return new JsonResult(inferenceResult);

        }

复用输入/输出张量缓冲区

在某些场景中，您可能希望复用输入/输出张量。这通常发生在您想要链式连接两个模型（即，将一个模型的输出作为另一个模型的输入），或者在多次推理运行时加速推理速度的情况下。

链式连接：将模型 A 的输出作为模型 B 的输入

using Microsoft.ML.OnnxRuntime.Tensors;
using Microsoft.ML.OnnxRuntime;

namespace Samples
{
    class FeedModelAToModelB
    {
        static void Program()
        {
            const string modelAPath = "./modelA.onnx";
            const string modelBPath = "./modelB.onnx";
            using InferenceSession session1 = new InferenceSession(modelAPath);
            using InferenceSession session2 = new InferenceSession(modelBPath);

            // Illustration only
            float[] inputData = { 1, 2, 3, 4 };
            long[] inputShape = { 1, 4 };

            using var inputOrtValue = OrtValue.CreateTensorValueFromMemory(inputData, inputShape);

            // Create input data for session. Request all outputs in this case.
            var inputs1 = new Dictionary<string, OrtValue>
            {
                { "input", inputOrtValue }
            };

            using var runOptions = new RunOptions();

            // session1 inference
            using (var outputs1 = session1.Run(runOptions, inputs1, session1.OutputNames))
            {
                // get intermediate value
                var outputToFeed = outputs1.First();

                // modify the name of the ONNX value
                // create input list for session2
                var inputs2 = new Dictionary<string, OrtValue>
                {
                    { "inputNameForModelB", outputToFeed }
                };

                // session2 inference
                using (var results = session2.Run(runOptions, inputs2, session2.OutputNames))
                {
                    // manipulate the results
                }
            }
        }
    }
}

具有固定大小输入和输出的多次推理运行

如果模型具有固定大小的数值张量输入和输出，请使用更优选的 OrtValue 及其 API 来加速推理速度并最大限度地减少数据传输。 OrtValue 类使得复用输入和输出张量的底层缓冲区成为可能。它会固定托管缓冲区并将其用于推理。它还提供了对输出的原始缓冲区的直接访问。您还可以为输出预分配 OrtValue，或在现有缓冲区之上创建它。这避免了一些开销，对于在总运行时间中可察觉到时间影响的较小模型而言可能是有益的。

请记住，OrtValue 类与 Onnxruntime C# API 中的许多其他类一样，是 IDisposable 的。它需要正确处置，以解除托管缓冲区的固定或释放原始缓冲区，从而避免内存泄漏。

在 GPU 上运行（可选）

如果使用 GPU 包，只需在创建 InferenceSession 时使用适当的 SessionOptions 即可。

int gpuDeviceId = 0; // The GPU device ID to execute on
using var gpuSessionOptoins = SessionOptions.MakeSessionOptionWithCudaProvider(gpuDeviceId);
using var session = new InferenceSession("model.onnx", gpuSessionOptoins);

ONNX Runtime C# API

ONNX Runtime 提供了 C# .NET 绑定，用于在任何 .NET 标准平台上对 ONNX 模型运行推理。

支持的版本

.NET Standard 1.1

构建版本

工件	描述	支持的平台
Microsoft.ML.OnnxRuntime	CPU（发布版）	Windows, Linux, Mac, X64, X86 (仅限 Windows), ARM64 (仅限 Windows)…更多详情请参见: 兼容性
Microsoft.ML.OnnxRuntime.Gpu	GPU - CUDA（发布版）	Windows, Linux, Mac, X64…更多详情请参见: 兼容性
Microsoft.ML.OnnxRuntime.DirectML	GPU - DirectML（发布版）	Windows 10 1709+
onnxruntime	CPU, GPU (开发版), CPU (设备端训练)	与发布版本相同
Microsoft.ML.OnnxRuntime.Training	CPU 设备端训练（发布版）	Windows, Linux, Mac, X64, X86 (仅限 Windows), ARM64 (仅限 Windows)…更多详情请参见: 兼容性