Performing Model Inference

The Matrix framework provides the AIModelManager class to implement model loading and inference. For details, see the Matrix API Reference.

Model Inference Initialization

Set the model path in the graph configuration file of the custom inference model engine (add items to ai_config and set the model path on the host).
Use the Matrix framework to transfer the model file to the device.
Parse the custom items in the custom engine to obtain the path of the model on the device side.

Call AIModelManager::Init() to complete the initialization. The implementation is as follows:

/* Define member variables of a custom engine. */ 
std::shared_ptr<hiai::AIModelManager> modelManager;
/* Implement AIModelManager initialization with the Init function of the custom engine. */
std::vector<hiai::AIModelDescription> model_desc_vec;
hiai::AIModelDescription model_desc_;
......
/* Parse the model path from the ai_config structure in the graph configuration file. */
model_desc_.set_path(model_path);// Set the model path.
model_desc_vec.push_back(model_desc_);
ret = modelManager->Init(config, model_desc_vec);// If configurations are meaningless, input the arguments of Engine::Init.

Setting the Input and Output of Model Inference

The Matrix framework defines the IAITensor class for managing the input and output matrices of model inference. For ease of use, the Matrix framework derives AISimpleTensor and AINeuralNetworkBuffer based on the IAITensor class.
Memory for the model inference input and output is allocated by calling HIAI_DMalloc, which reduces memory copy.
Even though the Matrix framework can automatically free the memory managed by AISimpleTensor, you are advised to apply for and free the memory by yourself to prevent memory leakage or repeated freeing.
During model conversion, if the functions such as image cropping, format conversion, and image normalization of the AIPP module are enabled, the input data must be processed by the AIPP module before it is used for model inference.

Input and Output Implementation

The code for input and output implementation is as follows:

/* Obtain the descriptions of input and output tensors of the inference model. */
std::vector<hiai::TensorDimension> inputTensorDims;
std::vector<hiai::TensorDimension> outputTensorDims;
ret = modelManager->GetModelIOTensorDim(modelName, inputTensorDims, outputTensorDims);

/* Set the input. If there are multiple inputs, create and set them in sequence. */
std::shared_ptr<hiai::AISimpleTensor> inputTensor =
std::shared_ptr<hiai::AISimpleTensor>(new hiai::AISimpleTensor());
inputTensor->SetBuffer (< memory address of the input data >, < length of the input data >);
inputTensorVec.push_back(inputTensor);

/* Set the output. */
for (uint32_t index = 0; index < outputTensorDims.size(); index++) {
hiai::AITensorDescription outputTensorDesc = hiai::AINeuralNetworkBuffer::GetDescription();
uint8_t* buf = (uint8_t*)HIAI_DMalloc(outputTensorDims[index].size);
......
std::shared_ptr<hiai::IAITensor> outputTensor = hiai::AITensorFactory::GetInstance()->CreateTensor(
outputTensorDesc, buf, outputTensorDims[index].size);
outputTensorVec.push_back(outputTensor);
}

Model Inference

The Matrix framework supports either synchronous inference and asynchronous inference. Synchronous inference is used by default. You can set the AIContext configuration item and call the callback function to implement asynchronous inference.
```
/* Model inference */
hiai::AIContext aiContext;
HIAI_StatusT ret = modelManager->Process(aiContext, inputTensorVec, outputTensorVec, 0);
```

If the AIModelManager object loads multiple models, you can set the AIContext configuration item to set model parameters (initialization phase and model name). For details, see "Offline Model Manager" in the Matrix API Reference.

Model Inference Post-Processing

The result matrix of model inference is stored in the IAITensor object as the memory+description information. You need to parse the memory information into valid output based on the actual output format (data type and data sequence) of the model.

/* Parse the inference result. */
for (uint32_t index = 0; index < outputTensorVec.size(); index++) {
shared_ptr<hiai::AINeuralNetworkBuffer> resultTensor = std::static_pointer_cast<hiai::AINeuralNetworkBuffer>(outputTensorVec[i]);
// resultTensor->GetNumber()  -- N
// resultTensor->GetChannel() -- C
// resultTensor->GetHeight()  -- H
// resultTensor->GetWidth()   -- W
// resultTensor->GetSize() -- memory size
// resultTensor->GetBuffer() -- memory address
}

For details about the post-processing of common classification models, see the sample code InferClassification. For details about the post-processing of the SSD object detection model, see the sample code InferObjectDetection.

Parent topic: Offline Model Inference

Previous topic: Converting an Offline Model

Next topic: Commissioning Software Logs