Preparing Vision Models for Windows ML

Windows ML (WinML) provides a simple managed API for interfacing with machine learning (ML) models. WinML has the ability to read the model metadata and add helpful wrappers around the inputs. This post will cover how a model needs to be structured for WinML to enable these simple bindings.

Input Bindings

As mentioned, WinML can read metadata from the model. We are interested in the expected inputs and outputs from the model graph. A simple binding for a vision model would look like the following:

//instantiate model
var modelUri = new Uri("ms-appx:///Assets/model.onnx");
var modelFile = await StorageFile.GetFileFromApplicationUriAsync(modelUri);
var model = await LearningModelPreview.LoadModelFromStorageFileAsync(modelFile);
var modelBinding = new LearningModelBindingPreview(model);

//get input metadata
var inputImage = (ImageVariableDescriptorPreview)model.Description.InputFeatures
    .FirstOrDefault(f => f.ModelFeatureKind == LearningModelFeatureKindPreview.Image);

//convert softwareBitmap to VideoFrame and bind to model
modelBinding.Bind(inputImage.Name, VideoFrame.CreateWithSoftwareBitmap(/*softwareBitmap*/));

inputImage (ImageVariableDescriptorPreview) contains a number of properties such as:

Name - Internal name for the input in the model graph
BitmapPixelFormat - Image format and number of channels
Height - Expected pixel height of input image
Width - Expected pixel width of input image

WinML has a number of types that can be easily bound to a model including: ImageVariableDescriptorPreview, MapVariableDescriptorPreview, SequenceVariableDescriptorPreview and TensorVariableDescriptorPreview. These are automatically detected by WinML based on the size and ordering of the input tensor.

Input Tensor

For WinML to detect ImageVariableDescriptorPreview as the input binding, the input tensor needs to be in a specific order:

model graph The input tensor must be four-dimensional. The ordered dimensions refer to:

float[1, <RGB Channels>, <Pixels Height>, <Pixels Width>]

Netron, written by Lutz Roeder, is a great tool for inspecting your ONNX model. Inspect the model graph and make sure that the input tensor is in the correct format.

I have tested different ordering on the input tensor and WinML no longer detects it as an ImageVariableDescriptorPreview. It is still possible to use models in those formats, however you either need to rebuild the graph to use the format mentioned above, or use a TensorVariableDescriptorPreview. The second option requires a manual process to extract the bytes per channel and bind them into the model.

Jourdan