首次加载权重较慢
量化需要先加载权重,按照如下方法能够加速权重的加载。除此之外,加载权重的速度还受到磁盘读取速度等方面影响。
在quant_deepseek_w8a8.py脚本中新增set_initialized_submodules方法,并在主方法中调用,以加速加载模型权重。
# 增加下述set_initialized_submodules方法 def set_initialized_submodules(model, state_dict_keys): """ Sets the `_is_hf_initialized` flag in all submodules of a given model when all its weights are in the loaded state dict. """ state_dict_keys = set(state_dict_keys) not_initialized_submodules = {} for module_name, module in model.named_modules(): if module_name == "": # When checking if the root module is loaded there's no need to prepend module_name. module_keys = set(module.state_dict()) else: module_keys = {f"{module_name}.{k}" for k in module.state_dict()} if module_keys.issubset(state_dict_keys): module._is_hf_initialized = True else: not_initialized_submodules[module_name] = module return not_initialized_submodules ...... if __name__ == "__main__": # torch_npu will fork a new process to init, # it's lazy_init will fail after we load a big model,so we need to init it here torch_npu.npu.init() # 此处增加对set_initialized_submodules方法的调用 patch("transformers.modeling_utils.set_initialized_submodules", new=set_initialized_submodules).start() # Invoke main process main()