【pytorch】torch.nn

torch.nn

Parameters

class torch.nn.Parameter

data (Tensor) – parameter tensor.
requires_grad (bool, optional) – if the parameter requires gradient. Default: True

A kind of Tensor that is to be considered a module parameter.
Parameters are Tensor subclasses, that have a very special property when used with Module s - when they’re assigned as Module attributes they are automatically added to the list of its parameters, and will appear e.g. in parameters() iterator. Assigning a Tensor doesn’t have such effect. This is because one might want to cache some temporary state, like last hidden state of the RNN, in the model. If there was no such class as Parameter, these temporaries would get registered too.

SUMMARY: This is a class, whose objects can be acquired through methods of torch.nn.Module, i.e. parameters().

一切接对象，class，方法。具体操作的是对象，通过class实例化得到，得到的过程需要初始化，在网络中有很多需要配置的地方，简便起见，很多都有默认值，使得在多数情况下不需要特别地初始化。方法使得操作更加丰富、规范和便利。
class—(initionize, default)—object

torch.optim

How to use an optimizer?

To use torch.optim you have to construct an optimizer object, that will hold the current state and will update the parameters based on the computed gradients.

#Constructing it# To construct an Optimizer you have to give it an iterable containing the parameters (all should be Variable s) to optimize. Then, you can specify optimizer-specific options such as the learning rate, weight decay, etc.
【If you need to move a model to GPU via .cuda(), please do so before constructing optimizers for it. Parameters of a model after .cuda() will be different objects with those before the call. In general, you should make sure that optimized parameters live in consistent locations when optimizers are constructed and used.】

#Per-parameter options# Optimizers also support specifying per-parameter options. To do this, instead of passing an iterable of Variables, pass in an iterable of dicts. Each of them will define a separate parameter group, and should contain a params key, containing a list of parameters belonging to it. Other keys should match the keyword arguments accepted by the optimizers, and will be used as optimization options for this group.

#Taking an optimization step# All optimizers implement a step() method, that updates the parameters. It can be used in two ways:

optimizer.step()
optimizer.step(closure)

Algorithms

class torch.optim.Optimizer(params, defaults): Base class for all optimizers.
- add_param_group(param_group)
- load_state_dict(state_dict)
- state_dict()
- step(closure)
- zero_grad()
class torch.optim.Adadelta(params, lr=1.0, rho=0.9, eps=1e-06, weight_decay=0)
class torch.optim.Adagrad(params, lr=0.01, lr_decay=0, weight_decay=0, initial_accumulator_value=0)
class torch.optim.Adam(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0, amsgrad=False)
class torch.optim.SparseAdam(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08)
class torch.optim.RMSprop(params, lr=0.01, alpha=0.99, eps=1e-08, weight_decay=0, momentum=0, centered=False)
class torch.optim.SGD(params, lr=<object object>, momentum=0, dampening=0, weight_decay=0, nesterov=False)
…

####How to adjust Learning Rate?

torch.optim.lr_scheduler provides several methods to adjust the learning rate based on the number of epochs.【learning rate is the function of epochs】 torch.optim.lr_scheduler.ReduceLROnPlateau allows dynamic learning rate reducing based on some validation measurements.【learning rateis the function of some kind of measurements】.

class torch.optim.lr_scheduler.LambdaLR(optimizer, lr_lambda, last_epoch=-1)
- Sets the learning rate of each parameter group to the initial lr times a given function. When last_epoch=-1, sets initial lr as lr.
class torch.optim.lr_scheduler.StepLR(optimizer, step_size, gamma=0.1, last_epoch=-1)
- Sets the learning rate of each parameter group to the initial lr decayed by gamma every step_size epochs. When last_epoch=-1, sets initial lr as lr.
class torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones, gamma=0.1, last_epoch=-1)
- Set the learning rate of each parameter group to the initial lr decayed by gamma once the number of epoch reaches one of the milestones. When last_epoch=-1, sets initial lr as lr.
…
class torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', factor=0.1, patience=10, verbose=False, threshold=0.0001, threshold_mode='rel', cooldown=0, min_lr=0, eps=1e-08)
- Reduce learning rate when a metric has stopped improving. Models often benefit from reducing the learning rate by a factor of 2-10 once learning stagnates. This scheduler reads a metrics quantity and if no improvement is seen for a ‘patience’ number of epochs, the learning rate is reduced.

SUMMARY: construct an optimizer object, and mainly pay attention to these points: 1. which part of params need to be optimized and which not. 2. which algorithm to choose? 3. adjust learning rate in different period of training.

【pytorch】torch.nn

torch.nn

Parameters

torch.optim

How to use an optimizer?

Algorithms

近期文章

近期评论

标签

热门

文章归档

分类目录

功能