ONNX Runtime JavaScript

Introduction

ONNX Runtime has a JavaScript API so that the neural network inference could be performed at the user front-end from the browser.

In this blog post, I would like to quickly discuss the ONNX Runtime JavaScript API using a MNIST classifier as an example.

MNIST Classifier

The MNIST classifier uses the pre-trained MNIST model from ONNX model zoo.


Line width : Color :



Predicted Digit Confidence Latency (ms)
     

ONNX Runtime JavaScript API

The basic usage of the ONNX Runtime JavaScript API is as follows.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
<script src="https://cdn.jsdelivr.net/npm/onnxruntime-web@1.12.1/dist/ort.min.js"></script>
<script>
const onnxModelURL = '/javascript/mnist/mnist-8.onnx';
// Profiling shows that wasm is faster than webgl for small neural networks such as the one for mnist.
const sessionOption = { executionProviders: ['wasm', 'webgl'] };
var inferenceSession;
async function createInferenceSession(onnxModelURL, sessionOption)
{
try {
inferenceSession = await ort.InferenceSession.create(onnxModelURL, sessionOption);
} catch (e) {
console.log(`failed to load ONNX model: ${e}.`);
}
}
// Load model and create inference session once.
createInferenceSession(onnxModelURL, sessionOption);
async function runMnistInference(inputDataArray, inferenceSession) {
try {
// create a new session and load the specific model.
// the model in this example contains a single MatMul node
// prepare inputs. a tensor need its corresponding TypedArray as data
const inputData = Float32Array.from(inputDataArray);
const inputTensor = new ort.Tensor('float32', inputData, [1, 1, 28, 28]);
// prepare feeds. use model input names as keys.
const feeds = { Input3: inputTensor };
// feed inputs and run
const results = await inferenceSession.run(feeds);
// read from results
const outputData = results.Plus214_Output_0.data;
} catch (e) {
console.log(`failed to inference ONNX model: ${e}.`);
}
}

It is recommended to load the ONNX model and create the inference session only once. Selecting the executionProviders is critical for the inference latency.

References


Author

Lei Mao

Posted on

11-28-2022

Updated on

11-28-2022

Licensed under


Comments