JavaScript 中用于 BERT NLP 任务的 ONNX Runtime 自定义 Excel 函数

在本教程中，我们将探讨如何创建自定义 Excel 函数（ORT.Sentiment() 和 ORT.Question()），以使用 ONNX Runtime Web 实现 BERT NLP 模型，从而在电子表格任务中启用深度学习。推理在本地进行，就在 Excel 中！

Image of browser inferencing on sample images.

前提条件

Node.js
连接到 Microsoft 365 订阅的 Office（包括 Web 版 Office）。如果您尚未拥有 Office，可以加入 Microsoft 365 开发者计划，以获得一个免费的、可续订 90 天的 Microsoft 365 订阅，用于开发期间使用。
有关详细信息，请参阅 Office 加载项教程

什么是自定义函数？

Excel 有许多您可能熟悉的内置函数，例如 SUM()。自定义函数是一个有用的工具，可以通过在 JavaScript 中将这些函数定义为加载项的一部分来创建新函数并将其添加到 Excel 中。这些函数可以在 Excel 中像访问任何内置函数一样被访问。

创建自定义函数项目

既然我们了解了什么是自定义函数，那么让我们看看如何创建可以在本地推理模型的函数，以便在单元格中获取情感文本，或者通过提问并将答案返回到单元格来从单元格中提取信息。

如果您打算跟着操作，请克隆我们将在本博客中讨论的项目。该项目是使用 Yeoman CLI 的模板项目创建的。在此快速入门中了解有关基础项目的更多信息。
运行以下命令以安装包并构建项目。

npm install
npm run build

以下命令将在 Excel 网页版中运行加载项，并将加载项旁加载到命令中提供的电子表格。

// Command to run on the web.
// Replace "{url}" with the URL of an Excel document.
npm run start:web -- --document {url}

使用以下命令在 Excel 客户端中运行。

// Command to run on desktop (Windows or Mac)
npm run start:desktop

首次运行项目时，将出现两个提示
- 一个将要求启用开发者模式。这是旁加载插件所必需的。
- 接下来，当出现提示时，接受插件服务的证书。
要访问自定义函数，请在空单元格中键入 =ORT.Sentiment("TEXT") 和 =ORT.Question("QUESTION","CONTEXT") 并传入参数。

现在我们准备好深入研究代码了！

`manifest.xml` 文件

manifest.xml 文件指定所有自定义函数都属于 ORT 命名空间。您将使用该命名空间在 Excel 中访问自定义函数。将 manifest.xml 中的值更新为 ORT。

<bt:String id="Functions.Namespace" DefaultValue="ORT"/>
<ProviderName>ORT</ProviderName>

在此处了解有关清单文件配置的更多信息。

`functions.ts` 文件

在 function.ts 文件中，我们定义了函数名称、参数、逻辑和返回类型。

在 function.ts 文件的顶部导入 inferenceQuestion 和 inferenceSentiment 函数。（我们将在本教程后面介绍这些函数中的逻辑。）

/* global console */
import { inferenceQuestion } from "./bert/inferenceQuestion";
import { inferenceSentiment } from "./bert/inferenceSentiment";

接下来添加 sentiment 和 question 函数。

/**
* Returns the sentiment of a string.
* @customfunction
* @param text Text string
* @returns sentiment string.
*/
export async function sentiment(text: string): Promise<string> {
const result = await inferenceSentiment(text);
console.log(result[1][0]);
return result[1][0].toString();
}
/**
 * Returns the sentiment of a string.
 * @customfunction
 * @param question Question string
 * @param context Context string
 * @returns answer string.
 */
export async function question(question: string, context: string): Promise<string> {
const result = await inferenceQuestion(question, context);
if (result.length > 0) {
    console.log(result[0].text);
    return result[0].text.toString();
}
return "Unable to find answer";
}

`inferenceQuestion.ts` 文件

inferenceQuestion.ts 文件包含处理问答 BERT 模型的逻辑。该模型是使用本教程创建的。然后我们使用 ORT 量化工具来减小模型的大小。在此处了解有关量化的更多信息。

首先从 question_answer.ts 导入 onnxruntime-web 和辅助函数。question_answer.ts 是在此处找到的 tensorflow 示例的编辑版本。您可以在此项目的源代码此处找到编辑后的版本。

/* eslint-disable no-undef */
import * as ort from "onnxruntime-web";
import { create_model_input, Feature, getBestAnswers, Answer } from "./utils/question_answer";

inferenceQuestion 函数将接收问题和上下文，并根据推理结果提供答案。然后我们设置模型的路径。该路径是在 webpack.config.js 中使用 CopyWebpackPlugin 设置的。此插件在构建时将所需的资产复制到 dist 文件夹。

export async function inferenceQuestion(question: string, context: string): Promise<Answer[]> {
  const model: string = "./bert-large-uncased-int8.onnx";

现在让我们创建 ONNX Runtime 推理会话并设置选项。在此处了解所有 SessionOptions 的更多信息。

  // create session, set options
  const options: ort.InferenceSession.SessionOptions = {
    executionProviders: ["wasm"],
    // executionProviders: ['webgl']
    graphOptimizationLevel: "all",
  };
  console.log("Creating session");
  const session = await ort.InferenceSession.create(model, options);

接下来，我们使用 question_answer.ts 中的 create_model_input 函数对 question 和 context 进行编码。这将返回 Feature。

  // Get encoded ids from text tokenizer.
  const encoded: Feature = await create_model_input(question, context);
  console.log("encoded", encoded);

  export interface Feature {
    input_ids: Array<any>;
    input_mask: Array<any>;
    segment_ids: Array<any>;
    origTokens: Token[];
    tokenToOrigMap: { [key: number]: number };
}

既然我们有了编码的 Feature，我们需要创建类型为 BigInt 的数组（input_ids、attention_mask 和 token_type_ids）来创建 ort.Tensor 输入。

  // Create arrays of correct length
  const length = encoded.input_ids.length;
  var input_ids = new Array(length);
  var attention_mask = new Array(length);
  var token_type_ids = new Array(length);

  // Get encoded.input_ids as BigInt
  input_ids[0] = BigInt(101);
  attention_mask[0] = BigInt(1);
  token_type_ids[0] = BigInt(0);
  var i = 0;
  for (; i < length; i++) {
    input_ids[i + 1] = BigInt(encoded.input_ids[i]);
    attention_mask[i + 1] = BigInt(1);
    token_type_ids[i + 1] = BigInt(0);
  }
  input_ids[i + 1] = BigInt(102);
  attention_mask[i + 1] = BigInt(1);
  token_type_ids[i + 1] = BigInt(0);

  console.log("arrays", input_ids, attention_mask, token_type_ids);

从 Arrays 创建 ort.Tensor。

  const sequence_length = input_ids.length;
  var input_ids_tensor: ort.Tensor = new ort.Tensor("int64", BigInt64Array.from(input_ids), [1, sequence_length]);
  var attention_mask_tensor: ort.Tensor = new ort.Tensor("int64", BigInt64Array.from(attention_mask), [ 1, sequence_length]);
  var token_type_ids_tensor: ort.Tensor = new ort.Tensor("int64", BigInt64Array.from(token_type_ids), [ 1, sequence_length]);

我们已准备好运行推理！在这里，我们创建 OnnxValueMapType（输入对象）和 FetchesType（返回标签）。您可以发送对象和字符串数组而不声明类型，但添加类型是有用的。

  const model_input: ort.InferenceSession.OnnxValueMapType = {
    input_ids: input_ids_tensor,
    input_mask: attention_mask_tensor,
    segment_ids: token_type_ids_tensor,
  };
  const output_names: ort.InferenceSession.FetchesType = ["start_logits", "end_logits"];
  const output = await session.run(model_input, output_names);
  const result_length = output["start_logits"].data.length;

接下来，遍历结果并从生成的 start_logits 和 end_logits 创建一个 number 数组。

  const start_logits: number[] = Array(); 
  const end_logits: number[] = Array(); 
  console.log("start_logits", start_logits);
  console.log("end_logits", end_logits);
  for (let i = 0; i <= result_length; i++) {
    start_logits.push(Number(output["start_logits"].data[i]));
  }
  for (let i = 0; i  <= result_length; i++) {
    end_logits.push(Number(output["end_logits"].data[i]));
  }

最后，我们将从 question_answer.ts 调用 getBestAnswers。这将接收结果并进行后处理，以从推理结果中获取答案。

  const answers: Answer[] = getBestAnswers(
    start_logits,
    end_logits,
    encoded.origTokens,
    encoded.tokenToOrigMap,
    context
  );
  console.log("answers", answers);
  return answers;
}

然后，answers 将返回到 functions.ts 的 question，生成的字符串将被返回并填充到 Excel 单元格中。

export async function question(question: string, context: string): Promise<string> {
  const result = await inferenceQuestion(question, context);
  if (result.length > 0) {
    console.log(result[0].text);
    return result[0].text.toString();
  }
  return "Unable to find answer";
}

现在您可以运行以下命令，将加载项构建并旁加载到您的 Excel 电子表格中！

// Command to run on the web.
// Replace "{url}" with the URL of an Excel document.
npm run start:web -- --document {url}

以上是 ORT.Question() 自定义函数的详细说明，接下来我们将详细说明 ORT.Sentiment() 的实现方式。

`inferenceSentiment.ts` 文件

inferenceSentiment.ts 是用于推理和获取 Excel 单元格中文本情感的逻辑。此处的代码改编自此示例。让我们深入了解这部分的工作原理。

首先，让我们导入所需的包。正如您将在本教程中看到的，bertProcessing 函数将创建我们的模型输入。bert_tokenizer 是 BERT 模型的 JavaScript 分词器。onnxruntime-web 在浏览器上启用 JavaScript 推理。

/* eslint-disable no-undef */
import * as bertProcessing from "./bertProcessing";
import * as ort from "onnxruntime-web";
import { EMOJIS } from "./emoji";
import { loadTokenizer } from "./bert_tokenizer";

现在让我们加载已为情感分析微调的量化 BERT 模型。然后创建 ort.InferenceSession 和 ort.InferenceSession.SessionOptions。

export async function inferenceSentiment(text: string) {
  // Set model path.
  const model: string = "./xtremedistill-go-emotion-int8.onnx";
  const options: ort.InferenceSession.SessionOptions = {
    executionProviders: ["wasm"],
    // executionProviders: ['webgl']
    graphOptimizationLevel: "all",
  };
  console.log("Creating session");
  const session = await ort.InferenceSession.create(model, options);

接下来，我们对文本进行分词以创建 model_input，并将其与输出标签 output_0 一起发送到 session.run 以获取推理结果。

  // Get encoded ids from text tokenizer.
  const tokenizer = loadTokenizer();
  const encoded = await tokenizer.then((t) => {
    return t.tokenize(text);
  });
  console.log("encoded", encoded);
  const model_input = await bertProcessing.create_model_input(encoded);
  console.log("run session");
  const output = await session.run(model_input, ["output_0"]);
  const outputResult = output["output_0"].data;
  console.log("outputResult", outputResult);

接下来，我们解析输出以获取最佳结果，并将其映射到标签、分数和表情符号。

  let probs = [];
  for (let i = 0; i < outputResult.length; i++) {
    let sig = bertProcessing.sigmoid(outputResult[i]);
    probs.push(Math.floor(sig * 100));
  }
  console.log("probs", probs);
  const result = [];
  for (var i = 0; i < EMOJIS.length; i++) {
    const t = [EMOJIS[i], probs[i]];
    result[i] = t;
  }
  result.sort(bertProcessing.sortResult);
  console.log(result);
  const result_list = [];
  result_list[0] = ["Emotion", "Score"];
  for (i = 0; i < 6; i++) {
    result_list[i + 1] = result[i];
  }
  console.log(result_list);
  return result_list;
}

result_list 被返回并解析，以将最佳结果返回到 Excel 单元格。

export async function sentiment(text: string): Promise<string> {
  const result = await inferenceSentiment(text);
  console.log(result[1][0]);
  return result[1][0].toString();
}

现在您可以运行以下命令，将加载项构建并旁加载到您的 Excel 电子表格中！

// Command to run on the web.
// Replace "{url}" with the URL of an Excel document.
npm run start:web -- --document {url}

结论

在这里，我们回顾了使用 JavaScript 借助 ONNX Runtime Web 和开源模型在 Excel 加载项中创建自定义函数所需的逻辑。从这里，您可以采用此逻辑并更新到您拥有的特定模型或用例。请务必查看完整的源代码，其中包含分词器和预处理/后处理，以完成上述任务。

JavaScript 中用于 BERT NLP 任务的 ONNX Runtime 自定义 Excel 函数

目录

前提条件

什么是自定义函数？

创建自定义函数项目

`manifest.xml` 文件

`functions.ts` 文件

`inferenceQuestion.ts` 文件

`inferenceSentiment.ts` 文件

结论

附加资源

JavaScript 中用于 BERT NLP 任务的 ONNX Runtime 自定义 Excel 函数

目录

前提条件

什么是自定义函数？

创建自定义函数项目

manifest.xml 文件

functions.ts 文件

inferenceQuestion.ts 文件

inferenceSentiment.ts 文件

结论

附加资源

`manifest.xml` 文件

`functions.ts` 文件

`inferenceQuestion.ts` 文件

`inferenceSentiment.ts` 文件