torch_brain.optim¶
Optimizers¶
Sparse variant of the Lamb optimizer. |
- class SparseLamb(params, lr=0.001, betas=(0.9, 0.999), eps=1e-06, weight_decay=0, clamp_value=10, adam=False, debias=False, sparse=False)[source]¶
Implements the sparse variant of the Lamb algorithm.
SparseLamb is a variant of the Lamb optimizer that only updates the parameters for which the gradient is non-zero. This is useful for models that do not always use all parameters during a single forward pass.
By default, SparseLamb is equivalent to Lamb. To activate the sparse variant, set the sparse argument to True.
- Parameters:
params – iterable of parameters to optimize or dicts defining parameter groups
lr (
float
) – learning rate (default: 1e-3)betas – coefficients used for computing running averages of gradient and its square (default: (0.9, 0.999))
eps (
float
) – term added to the denominator to improve numerical stability (default: 1e-8)weight_decay (
float
) – weight decay (L2 penalty) (default: 0)clamp_value (
float
) – clamp weight_norm in (0,clamp_value) (default: 10) set to a high value to avoid it (e.g 10e3)adam (
bool
) – always use trust ratio = 1, which turns this into Adam. Useful for comparison purposes. (default: False)debias (
bool
) – debias adam by (1 - beta**step) (default: False)sparse (
bool
) – only update the parameters that have non-zero gradients (default: False)
Note
Reference code: https://github.com/cybertronai/pytorch-lamb