文档首页/ AI开发平台ModelArts/ 常见问题/ Standard Notebook/ 更多功能咨询/ 使用MoXing时,如何进行增量训练?
更新时间:2024-10-28 GMT+08:00

使用MoXing时,如何进行增量训练?

在使用MoXing构建模型时,如果您对前一次训练结果不满意,可以在更改部分数据和标注信息后,进行增量训练。

“mox.run”添加增量训练参数

在完成标注数据或数据集的修改后,您可以在“mox.run”中,修改“log_dir”参数,并新增“checkpoint_path”参数。其中“log_dir”参数建议设置为一个新的目录,“checkpoint_path”参数设置为上一次训练结果输出路径,如果是OBS目录,路径填写时建议使用“obs://”开头。

如果标注数据中的标签发生了变化,在运行“mox.run”前先执行如果标签发生变化的操作。

  mox.run(input_fn=input_fn,
          model_fn=model_fn,
          optimizer_fn=optimizer_fn,
          run_mode=flags.run_mode,
          inter_mode=mox.ModeKeys.EVAL if use_eval_data else None,
          log_dir=log_dir,
          batch_size=batch_size_per_device,
          auto_batch=False,
          max_number_of_steps=max_number_of_steps,
          log_every_n_steps=flags.log_every_n_steps,
          save_summary_steps=save_summary_steps,
          save_model_secs=save_model_secs,
          checkpoint_path=flags.checkpoint_url,
          export_model=mox.ExportKeys.TF_SERVING)

如果标签发生变化

当数据集中的标签发生变化时,需要执行如下语句。此语句需在“mox.run”之前运行。

语句中的“logits”,表示根据不同网络中分类层权重的变量名,配置不同的参数。此处填写其对应的关键字。

mox.set_flag('checkpoint_exclude_patterns', 'logits')

如果使用的是MoXing内置网络,其对应的关键字需使用如下API获取。此示例将打印Resnet_v1_50的关键字,为“logits”

import moxing.tensorflow as mox

model_meta = mox.get_model_meta(mox.NetworkKeys.RESNET_V1_50)
logits_pattern = model_meta.default_logits_pattern
print(logits_pattern)

您也可以通过如下接口,获取MoXing支持的网络名称列表。

import moxing.tensorflow as mox
print(help(mox.NetworkKeys))

打印出来的示例如下所示:

Help on class NetworkKeys in module 
moxing.tensorflow.nets.nets_factory:

class NetworkKeys(builtins.object)
 |  Data descriptors defined here:
 |  
 |  __dict__
 |      dictionary for instance variables (if defined)
 |  
 |  __weakref__
 |      list of weak references to the object (if defined)
 |  
 |  ----------------------------------------------------------------------
 |  Data and other attributes defined here:
 |  
 |  ALEXNET_V2 = 'alexnet_v2'
 |  
 |  CIFARNET = 'cifarnet'
 |  
 |  INCEPTION_RESNET_V2 = 'inception_resnet_v2'
 |  
 |  INCEPTION_V1 = 'inception_v1'
 |  
 |  INCEPTION_V2 = 'inception_v2'
 |  
 |  INCEPTION_V3 = 'inception_v3'
 |  
 |  INCEPTION_V4 = 'inception_v4'
 |  
 |  LENET = 'lenet'
 |  
 |  MOBILENET_V1 = 'mobilenet_v1'
 |  
 |  MOBILENET_V1_025 = 'mobilenet_v1_025'
 |  
 |  MOBILENET_V1_050 = 'mobilenet_v1_050'
 |  
 |  MOBILENET_V1_075 = 'mobilenet_v1_075'
 |  
 |  MOBILENET_V2 = 'mobilenet_v2'
 |  
 |  MOBILENET_V2_035 = 'mobilenet_v2_035'
 |  
 |  MOBILENET_V2_140 = 'mobilenet_v2_140'
 |  
 |  NASNET_CIFAR = 'nasnet_cifar'
 |  
 |  NASNET_LARGE = 'nasnet_large'
 |  
 |  NASNET_MOBILE = 'nasnet_mobile'
 |  
 |  OVERFEAT = 'overfeat'
 |  
 |  PNASNET_LARGE = 'pnasnet_large'
 |  
 |  PNASNET_MOBILE = 'pnasnet_mobile'
 |  
 |  PVANET = 'pvanet'
 |  
 |  RESNET_V1_101 = 'resnet_v1_101'
 |  
 |  RESNET_V1_110 = 'resnet_v1_110'
 |  
 |  RESNET_V1_152 = 'resnet_v1_152'
 |  
 |  RESNET_V1_18 = 'resnet_v1_18'
 |  
 |  RESNET_V1_20 = 'resnet_v1_20'
 |  
 |  RESNET_V1_200 = 'resnet_v1_200'
 |  
 |  RESNET_V1_50 = 'resnet_v1_50'
 |  
 |  RESNET_V1_50_8K = 'resnet_v1_50_8k'
 |  
 |  RESNET_V1_50_MOX = 'resnet_v1_50_mox'
 |  
 |  RESNET_V1_50_OCT = 'resnet_v1_50_oct'
 |  
 |  RESNET_V2_101 = 'resnet_v2_101'
 |  
 |  RESNET_V2_152 = 'resnet_v2_152'
 |  
 |  RESNET_V2_200 = 'resnet_v2_200'
 |  
 |  RESNET_V2_50 = 'resnet_v2_50'
 |  
 |  RESNEXT_B_101 = 'resnext_b_101'
 |  
 |  RESNEXT_B_50 = 'resnext_b_50'
 |  
 |  RESNEXT_C_101 = 'resnext_c_101'
 |  
 |  RESNEXT_C_50 = 'resnext_c_50'
 |  
 |  VGG_16 = 'vgg_16'
 |  
 |  VGG_16_BN = 'vgg_16_bn'
 |  
 |  VGG_19 = 'vgg_19'
 |  
 |  VGG_19_BN = 'vgg_19_bn'
 |  
 |  VGG_A = 'vgg_a'
 |  
 |  VGG_A_BN = 'vgg_a_bn'
 |  
 |  XCEPTION_41 = 'xception_41'
 |  
 |  XCEPTION_65 = 'xception_65'
 |  
 |  XCEPTION_71 = 'xception_71'