The PyTorch to ONNX conversion and ONNX inference often require saving and loading ONNX files on hard drive. With streams, sometimes referred as file-like objects, it is possible to save and load ONNX files on memory, which is significantly faster than the same process on hard drive.
In this blog post, I would like to discuss ONNX IO with streams.
ONNX IO With Streams
The following example from PyTorch to ONNX export and ONNX Runtime inference does not require any interaction with hard drive.
# Use ONNX _serialize to get binary string from ONNX model. model_proto_bytes = onnx._serialize(model_proto) assert model_proto_bytes == f.getvalue()
# Use ONNX _deserialize to get ONNX model from binary string. model_proto_from_deserialization = onnx._deserialize( model_proto_bytes, onnx.ModelProto()) assert model_proto == model_proto_from_deserialization
# Run ONNX Runtime. # InferenceSession could also take bytes. inference_session = rt.InferenceSession(model_proto_bytes) onnxruntime_random_input = np.random.randn(*input_shape).astype(np.float32)