I/O 绑定

当使用非 CPU 执行提供程序时，在执行图（调用 Run()）之前，将输入（和/或输出）安排在目标设备（由所使用的执行提供程序抽象）上是最有效的。如果输入未复制到目标设备，ORT 会在 Run() 调用中将其从 CPU 复制。同样，如果输出未在设备上预分配，ORT 会假定输出是在 CPU 上请求的，并在 Run() 调用的最后一步将其从设备复制。这会侵占图的执行时间，误导用户认为 ORT 很慢，而大部分时间都花在了这些复制操作上。

为了解决这个问题，我们引入了 IOBinding 的概念。其核心思想是在调用 Run() 之前，将输入复制到设备，并将输出预分配到设备上。IOBinding 在我们所有的语言绑定中均可用。

以下是各种语言中演示此功能用法的代码片段。

C++

  Ort::Env env;
  Ort::Session session(env, model_path, session_options);
  Ort::IoBinding io_binding{session};
  auto input_tensor = Ort::Value::CreateTensor<float>(memory_info, input_tensor_values.data(), input_tensor_size, input_node_dims.data(), 4);
  io_binding.BindInput("input1", input_tensor);
  Ort::MemoryInfo output_mem_info{"Cuda", OrtDeviceAllocator, 0,
                                  OrtMemTypeDefault};
  // Use this to bind output to a device when the shape is not known in advance. If the shape is known you can use the other overload of this function that takes an Ort::Value as input (IoBinding::BindOutput(const char* name, const Value& value)).
  // This internally calls the BindOutputToDevice C API.

  io_binding.BindOutput("output1", output_mem_info);
  session.Run(run_options, io_binding);

请注意，在上述代码示例中，输出张量在绑定之前并未分配，而是将 Ort::MemoryInfo 绑定为输出。这是一种有效的方法，可以让会话根据所需的形状分配张量。特别是对于数据依赖型形状或动态形状，这可以是一种很好的解决方案，以获得正确的分配。但是，如果输出形状已知且输出张量应重复使用，那么将 Ort::Value 绑定到输出也很有益。这可以使用会话分配器或外部内存进行分配。有关更多详细信息，请参阅设备张量文档

 Ort::Allocator gpu_allocator(session, output_mem_info);
 auto output_value = Ort::Value::CreateTensor(
      gpu_allocator, output_shape.data(), output_shape.size(),
      ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT16);
 io_binding.BindOutput("output1", output_mem_info);

Python（请参阅Python API 文档）
C#（请参阅OrtIoBindingAllocationTest.cs）