更新时间:2025-09-08 GMT+08:00
分享

首次加载权重较慢

量化需要先加载权重,按照如下方法能够加速权重的加载。除此之外,加载权重的速度还受到磁盘读取速度等方面影响。

在quant_deepseek_w8a8.py脚本中新增set_initialized_submodules方法,并在主方法中调用,以加速加载模型权重。

# 增加下述set_initialized_submodules方法
def set_initialized_submodules(model, state_dict_keys):
    """
    Sets the `_is_hf_initialized` flag in all submodules of a given model when all its weights are in the loaded state
    dict.
    """
    state_dict_keys = set(state_dict_keys)
    not_initialized_submodules = {}
    for module_name, module in model.named_modules():
        if module_name == "":
            # When checking if the root module is loaded there's no need to prepend module_name.
            module_keys = set(module.state_dict())
        else:
            module_keys = {f"{module_name}.{k}" for k in module.state_dict()}
        if module_keys.issubset(state_dict_keys):
            module._is_hf_initialized = True
        else:
            not_initialized_submodules[module_name] = module
    return not_initialized_submodules

......

if __name__ == "__main__":
    # torch_npu will fork a new process to init,
    # it's lazy_init will fail after we load a big model,so we need to init it here
    torch_npu.npu.init()

    # 此处增加对set_initialized_submodules方法的调用
    patch("transformers.modeling_utils.set_initialized_submodules", new=set_initialized_submodules).start()

    # Invoke main process
    main()

相关文档