论文信息

论文信息

  • 论文名:Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
  • 作者:Priya Goyal, Piotr Dollar, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, Kaiming He(FAIR军团)
  • github link
  • arvix link

主要贡献(数据,模型,loss)

  • Linear Scaling Rule: When the minibatch size is multiplied by k, multiply the learning rate by k.
  • Constant warmup and Gradual warmup
  • Batch Normalization with Large Minibatches
  • Remark 1: Scaling the cross-entropy loss is not equivalent to scaling the learning rate.
  • Remark 2: Apply momentum correction after changing learning rate if using (10).
  • Remark 3: Normalize the per-worker loss by total minibatch size kn, not per-worker size n.
  • Remark 4: Use a single random shuffling of the training data (per epoch) that is divided amongst all k workers.
  • 炫富

文章细节(他山之石)

  • Constant warmup. Particularly helpful for prototyping object detection and segmentation methods that fine-tune pre-trained layers together with newly initialized layers.
  • Gradual warmup. Gradually ramps up the learning rate from a small to a large value.( linear or exp)

借鉴点(可以攻玉)

Linear Scaling Rule

  • mmcv里面有实现,作为一个配置项开关
    if args.autoscale_lr:
      # apply the linear scaling rule (https://arxiv.org/abs/1706.02677)
      cfg.optimizer['lr'] = cfg.optimizer['lr'] * cfg.gpus / 8
    

warmup

  • mmcv有实现,具体参见lr update hook
  • params: base_lr, warmup_iter, warmup_ratio
  • constant: base_lr * warmup_tatio
  • linear: base_lr * [1- (1 - iter_cnt / warmup_iter) * (1- warmup_rato)]
  • exp: base_lr * warmup_ratio ** (1-iter_cnt/warmup_iter)

lr_updater(from mmcv, always with warmup)

  • LrUpdaterHook: 包含基础功能,下面所有updaterhook都继承自此
  • FixedLrUpdaterHook: lr = base_lr
  • StepLrUpdaterHook: lr = base_lr * gamma**exp, usually gamma=0.1, exp基于是否达到step,exp=0,1,2,3,4
  • ExpLrUpdaterHook: lr = base_lr * gamma**progress, usually gamma=0.1, progress = epoch_cnt
  • PolyLrUpdaterHook: lr = (base_lr - min_lr) * coeff + min_lr, where coeff = (1 - progress / max_progress)**power, usually power=1, min_lr=0, progress=epoch_cnt, max_progress=max_epochs
  • InvLrUpdaterHook: lr = base_lr * (1 + gamma * progress)**(-power), usually power=1, gamma=0.1, progress=epoch_cnt
  • CosineLrUpdaterHook: lr = target_lr + 0.5 * (base_lr - target_lr) * (1 + cos(pi * (progress / max_progress)))