Windows ML (WinML) provides a simple managed API for interfacing with machine learning (ML) models. WinML has the ability to read the model metadata and add helpful wrappers around the inputs. This post will cover how a model needs to be structured for WinML to enable these simple bindings.
As mentioned, WinML can read metadata from the model. We are interested in the expected inputs and outputs from the model graph. A simple binding for a vision model would look like the following:
//instantiate model var modelUri = new Uri("ms-appx:///Assets/model.onnx"); var modelFile = await StorageFile.GetFileFromApplicationUriAsync(modelUri); var model = await LearningModelPreview.LoadModelFromStorageFileAsync(modelFile); var modelBinding = new LearningModelBindingPreview(model); //get input metadata var inputImage = (ImageVariableDescriptorPreview)model.Description.InputFeatures .FirstOrDefault(f => f.ModelFeatureKind == LearningModelFeatureKindPreview.Image); //convert softwareBitmap to VideoFrame and bind to model modelBinding.Bind(inputImage.Name, VideoFrame.CreateWithSoftwareBitmap(/*softwareBitmap*/));
inputImage (ImageVariableDescriptorPreview) contains a number of properties such as:
- Name - Internal name for the input in the model graph
- BitmapPixelFormat - Image format and number of channels
- Height - Expected pixel height of input image
- Width - Expected pixel width of input image
WinML has a number of types that can be easily bound to a model including: ImageVariableDescriptorPreview, MapVariableDescriptorPreview, SequenceVariableDescriptorPreview and TensorVariableDescriptorPreview. These are automatically detected by WinML based on the size and ordering of the input tensor.
For WinML to detect ImageVariableDescriptorPreview as the input binding, the input tensor needs to be in a specific order:
The input tensor must be four-dimensional. The ordered dimensions refer to:
float[1, <RGB Channels>, <Pixels Height>, <Pixels Width>]
I have tested different ordering on the input tensor and WinML no longer detects it as an ImageVariableDescriptorPreview. It is still possible to use models in those formats, however you either need to rebuild the graph to use the format mentioned above, or use a TensorVariableDescriptorPreview. The second option requires a manual process to extract the bytes per channel and bind them into the model.