torch_brain.optim

Optimizers

SparseLamb

Sparse variant of the Lamb optimizer.

class SparseLamb(params, lr=0.001, betas=(0.9, 0.999), eps=1e-06, weight_decay=0, clamp_value=10, adam=False, debias=False, sparse=False)[source]

Implements the sparse variant of the Lamb algorithm.

SparseLamb is a variant of the Lamb optimizer that only updates the parameters for which the gradient is non-zero. This is useful for models that do not always use all parameters during a single forward pass.

By default, SparseLamb is equivalent to Lamb. To activate the sparse variant, set the sparse argument to True.

Parameters:
  • params – iterable of parameters to optimize or dicts defining parameter groups

  • lr (float) – learning rate (default: 1e-3)

  • betas – coefficients used for computing running averages of gradient and its square (default: (0.9, 0.999))

  • eps (float) – term added to the denominator to improve numerical stability (default: 1e-8)

  • weight_decay (float) – weight decay (L2 penalty) (default: 0)

  • clamp_value (float) – clamp weight_norm in (0,clamp_value) (default: 10) set to a high value to avoid it (e.g 10e3)

  • adam (bool) – always use trust ratio = 1, which turns this into Adam. Useful for comparison purposes. (default: False)

  • debias (bool) – debias adam by (1 - beta**step) (default: False)

  • sparse (bool) – only update the parameters that have non-zero gradients (default: False)

step(closure=None)[source]

Performs a single optimization step.

Parameters:

closure – A closure that reevaluates the model and returns the loss.