Help Center/ Atlas 300 Application (Model 3000)/ Tuning Guide/ Key Points/ Memory Management/ Memory Management APIs Provided by the Matrix Module
Updated on 2022-03-13 GMT+08:00

Memory Management APIs Provided by the Matrix Module

The Matrix module provides a set of C/C++ APIs for allocating and freeing memory, including HIAI_DMalloc/HIAI_DFree and HIAI_DVPP_DMalloc/HIAI_DVPP_DFree. Among these APIs, HIAI_DMalloc and HIAI_DFree are used to apply for memory and transfer data from the host to the device by working with SendData. While HIAI_DVPP_DMalloc and HIAI_DVPP_DFree are used to allocate memory for the DVPP on the device. You can call the HIAI_DMalloc/HIAI_DFree and HIAI_DVPP_DMalloc/HIAI_DVPP_DFree APIs to allocate memory to reduce copy operations and save time.

API Description

Table 1 describes the functions of the HIAI_DMalloc/HIAI_DFree and HIAI_DVPP_DMalloc/HIAI_DVPP_DFree APIs.

Table 1 API description

API Name

Function

HIAIMemory::HIAI_DMalloc (for C++ only)

Allocates memory. The memory is similar to common memory but offers better performance in cross-side transmission (host-device/device-host) and model inference.

HIAIMemory::HIAI_DFree (for C++ only)

Frees the memory allocated by HIAIMemory::HIAI_DMalloc. This API is used together with HIAIMemory::HIAI_DMalloc.

When calling the HIAIMemory::HIAI_DMalloc API, you can set flag to MEMORY_ATTR_AUTO_FREE. In this case, if data is sent to the peer end by calling the SendData API, the allocated memory is automatically freed without calling the HIAIMemory::HIAI_DFree API after the program is complete. However, if the SendData API is not called to send data to the peer end after memory is allocated, you need to call HIAIMemory::HIAI_DFree to free the memory.

HIAI_DMalloc (for C/C++)

Allocates memory. The memory is similar to common memory but offers better performance in cross-side transmission (host-device/device-host) and model inference.

HIAI_DFree (for C/C++)

Frees the memory allocated by HIAI_DMalloc. This API is used together with HIAI_DMalloc.

When calling the HIAI_DMalloc API, you can set flag to MEMORY_ATTR_AUTO_FREE. In this case, if data is sent to the peer end by calling the SendData API, the allocated memory is automatically freed without calling the HIAI_DFree API after the program is complete. However, if the SendData API is not called to send data to the peer end after memory is allocated, you need to call HIAI_DFree to free the memory.

HIAIMemory::HIAI_DVPP_DMalloc (for C++ only)

Allocates memory for the DVPP on the device.

HIAIMemory::HIAI_DVPP_DFree (for C++ only)

Frees the memory allocated by the HIAIMemory::HIAI_DVPP_DMalloc API.

HIAI_DVPP_DMalloc (for C/C++)

Allocates memory for the DVPP on the device.

HIAI_DVPP_DFree (for C/C++)

Frees the memory allocated by the HIAI_DVPP_DMalloc API.

API Calling Process

Figure 1 API Calling Process

The usage of APIs in Figure 1 is described as follows:

  • The memory allocated by using the HIAI_DMalloc or HIAIMemory::HIAI_DMalloc API can be used in end-to-end data transmission and model inference. The data transmission efficiency and performance can be improved by calling the HIAI_DMalloc or HIAIMemory::HIAI_DMalloc API and using the HIAI_REGISTER_SERIALIZE_FUNC macro that serializes or deserializes user-defined data types.

    Allocating memory by using the HIAI_DMalloc or HIAIMemory::HIAI_DMalloc API has the following advantages:

    • The allocated memory can be directly used by host-device communication (HDC) module for data transmission to avoid data copy between the Matrix module and HDC.
    • You can use the allocated memory for zero-copy inference to reduce data copy time.
  • The memory allocated by using the HIAI_DVPP_DMalloc or HIAIMemory::HIAI_DVPP_DMalloc API can be used by the DVPP. After being used by the DVPP, data in the memory can be transparently transmitted to the inference model. If model inference is not required, data in the memory allocated by using the HIAI_DVPP_DMalloc API can be directly sent back to the host.
  • The memory allocated by using the HIAI_DMalloc, HIAIMemory::HIAI_DMalloc, HIAI_DVPP_DMalloc, and HIAIMemory::HIAI_DVPP_DMalloc APIs is compatible with memory management APIs provided by the native language. It can be used as common memory, but cannot be freed by using APIs such as free and delete. Generally, memory allocated by using the HIAI_DMalloc, HIAIMemory::HIAI_DMalloc, HIAI_DVPP_DMalloc, and HIAIMemory::HIAI_DVPP_DMalloc APIs needs to be freed by calling HIAI_DFree, HIAIMemory::HIAI_DFree, HIAI_DVPP_DFree, and HIAIMemory::HIAI_DVPP_DFree, respectively.

    When calling the HIAI_DMalloc or HIAIMemory::HIAI_DMalloc API, you can set flag to MEMORY_ATTR_AUTO_FREE. In this case, if data is sent to the peer end by calling the SendData API, the allocated memory is automatically freed without calling the HIAIMemory::HIAI_DFree API after the program is complete. However, if the SendData API is not called to send data to the peer end after memory is allocated, you need to call HIAIMemory::HIAI_DFree to free the memory.

  • The memory allocated by using the HIAI_DVPP_DMalloc or HIAIMemory::HIAI_DVPP_DMalloc API meets the requirements of the DVPP. Therefore, when the resources are limited, you are advised to use these APIs only for the DVPP.

Precautions for API Usage

When allocating memory by using HIAI_DMalloc or HIAIMemory::HIAI_DMalloc, pay attention to the following issues about memory management:

  • When allocating memory to be automatically freed for host-device or device-host data transmission, if a smart pointer is used, the Matrix module automatically frees the memory. Therefore, the destructor specified by the smart pointer must be empty. If the pointer is not a smart pointer, the Matrix module automatically frees the memory.
  • When allocating memory to be manually freed for host-device or device-host data transmission, if a smart pointer is used, you need to set the destructor to HIAI_DFree or HIAIMemory::HIAI_DFree. If the pointer is not a smart pointer, you need to call HIAI_DFree or HIAIMemory::HIAI_DFree to free the memory after data transmission is complete.
  • When memory to be manually freed is allocated, the SendData API cannot be called repeatedly to send data in the memory.
  • When allocating memory to be manually freed, if the memory is used for data transmission between the host and device, do not reuse the data in the memory before the memory is freed. If the memory is used for host-host or device-device data transmission, the data in the memory can be reused before the memory is freed.
  • When allocating memory to be manually freed, if the SendData API is called to asynchronously send data, data in the memory cannot be modified after data is sent.

If the HIAI_DVPP_MAlloc or HIAIMemory::HIAI_DVPP_DMalloc API is called to allocate memory for device-host data transmission, you need to call the HIAI_DVPP_DFree or HIAIMemory::HIAI_DVPP_DFree API to manually free the memory, because the HIAI_DVPP_MAlloc or HIAIMemory::HIAI_DVPP_DMalloc API does not automatically free the memory. If a smart pointer is used to store the allocated memory address, the destructor must be set to HIAI_DVPP_DFree or HIAIMemory::HIAI_DVPP_DFree.

API Calling Example

(1) When the performance optimization solution is used to transmit data, the data transmit API must be manually serialized and deserialized.
// Note: The serialization function is used at the transmit end and the deserialization function is used at the receive end. Therefore, you are advised to register this function with both transmit and receive ends.
// Data structure
typedef struct
{
    uint32_t left_offset = 0;
    uint32_t right_offset = 0;
    uint32_t top_offset = 0;
    uint32_t bottom_offset = 0;
    // The serialize function is used to serialize a structure.
    template <class Archive>
    void serialize(Archive & ar)
    {
        ar(left_offset,right_offset,top_offset,bottom_offset);
    }
} crop_rect;



// Registers the structure to be transferred between engines.
typedef struct EngineTransNew
{
    std::shared_ptr<uint8_t> trans_buff = nullptr;    // Transfer buffer
    uint32_t buffer_size = 0;                   // Transfer buffer size
    std::shared_ptr<uint8_t> trans_buff_extend = nullptr;
    uint32_t buffer_size_extend = 0;
    std::vector<crop_rect> crop_list;
    // The serialize function is used to serialize a structure.
    template <class Archive>
    void serialize(Archive & ar)
    {
        ar(buffer_size, buffer_size_extend, crop_list);
    }
}EngineTransNewT;
// Serialization function
/**
* @ingroup hiaiengine
* @brief GetTransSearPtr,        // Serializes the Trans data.
* @param [in]: data_ptr         // Structure pointer
* @param [out]: struct_str       // Structure buffer
* @param [out]: data_ptr         // Structure data pointer buffer
* @param [out]: struct_size      // Structure size
* @param [out]: data_size        // Structure data size
*/
void GetTransSearPtr(void* data_ptr, std::string& struct_str,
    uint8_t*& buffer, uint32_t& buffer_size)
{
    EngineTransNewT* engine_trans = (EngineTransNewT*)data_ptr;
    uint32_t dataLen = engine_trans->buffer_size;
    uint32_t dataLen_extend = engine_trans->buffer_size_extend;
    // Obtains the structure buffer and size.
    buffer_size = dataLen + dataLen_extend;
    buffer = (uint8_t*)engine_trans->trans_buff.get();

    // Serialization
    std::ostringstream outputStr;
    cereal::PortableBinaryOutputArchive archive(outputStr);
    archive((*engine_trans));
    struct_str = outputStr.str();
}
// Deserialization function
/**
* @ingroup hiaiengine
* @brief GetTransSearPtr,             // Deserializes the Trans data.
* @param [in]: ctrl_ptr              // Structure pointer
* @param [in]: data_ptr              // Structure data pointer
* @param [out]: std::shared_ptr<void> // Structure pointer assigned to the engine
*/
std::shared_ptr<void> GetTransDearPtr(
    const char* ctrlPtr, const uint32_t& ctrlLen,
    const uint8_t* dataPtr, const uint32_t& dataLen)
{
    if(ctrlPtr == nullptr) {
        return nullptr;
    }
    std::shared_ptr<EngineTransNewT> engine_trans_ptr = std::make_shared<EngineTransNewT>();
    // Assigns a value to engine_trans_ptr.
    std::istringstream inputStream(std::string(ctrlPtr, ctrlLen));
    cereal::PortableBinaryInputArchive archive(inputStream);
    archive((*engine_trans_ptr));
    uint32_t offsetLen = engine_trans_ptr->buffer_size;
    if(dataPtr != nullptr) {
        (engine_trans_ptr->trans_buff).reset((const_cast<uint8_t*>(dataPtr)), ReleaseDataBuffer);
        // trans_buff and trans_buff_extend point to a contiguous memory space whose address starts with dataPtr;
        // therefore, you only need to bind trans_buff to the destructor, and then the destructor will free the contiguous memory space after being used.
        (engine_trans_ptr->trans_buff_extend).reset((const_cast<uint8_t*>(dataPtr + offsetLen)), SearDeleteNothing);
    }
    return std::static_pointer_cast<void>(engine_trans_ptr);
}
// Registers EngineTransNewT
HIAI_REGISTER_SERIALIZE_FUNC("EngineTransNewT", EngineTransNewT, GetTransSearPtr, GetTransDearPtr);

(2) When sending data, you can use only the registered data types. Use HIAI_DMalloc to allocate memory to optimize performance.
     Note: When transferring data from the host to the device, you are advised to use HIAI_DMalloc to optimize transmission efficiency. The data size supported by the HIAI_DMalloc API ranges from 0 bytes to (256 MB – 96 bytes). If the data size exceeds this range, use the malloc API to allocate memory.
     // Allocates the data memory by calling the HIAI_DMalloc API. The value 10000 indicates the delay in microseconds, that is, if the memory space is insufficient, the program waits 10000 ms.
     HIAI_StatusT get_ret =  HIAIMemory::HIAI_DMalloc(width*align_height*3/2,(void*&)align_buffer, 10000); 
     // Sends data. After the SendData API is called, the HIAI_DFree API does not need to be called. The value 10000 indicates the delay.
     graph->SendData(engine_id_0, "TEST_STR", std::static_pointer_cast<void>(align_buffer), 10000);