tinygrad nn.optim

Note

You likely want the upstream tinygrad, not tinygrab. Tinygrab contains AI generated docstrings for a tinygrad snapshot. Upstream: https://tinygrad.org

tinygrad.nn.optim.Adam(params: List[Tensor], lr=0.001, b1=0.9, b2=0.999, eps=1e-08)[source]

Create an Adam optimizer.

This method creates an Adam optimizer for the given parameters with the following default values:

tinygrad.nn.optim.lr

Learning rate. Default is 0.001.

Type:

float

tinygrad.nn.optim.b1

First beta parameter. Default is 0.9.

Type:

float

tinygrad.nn.optim.b2

Second beta parameter. Default is 0.999.

Type:

float

tinygrad.nn.optim.eps

Epsilon value to prevent division by zero. Default is 1e-8.

Type:

float

Returns:

The LAMB optimizer with Adam settings.

Return type:

LAMB

tinygrad.nn.optim.AdamW(params: List[Tensor], lr=0.001, b1=0.9, b2=0.999, eps=1e-08, wd=0.01)[source]

Function to implement the AdamW optimizer.

This function is essentially just the trust ratio part of LARS applied to Adam/W. If we just set the trust ratio to 1.0 its just Adam/W.

tinygrad.nn.optim.params

A list of Tensors that will be updated by the optimizer.

Type:

List[Tensor]

tinygrad.nn.optim.lr

The learning rate. Default is 0.001.

Type:

float

tinygrad.nn.optim.b1

Exponential decay rate for the first moment estimates. Default is 0.9.

Type:

float

tinygrad.nn.optim.b2

Exponential decay rate for the second moment estimates. Default is 0.999.

Type:

float

tinygrad.nn.optim.eps

A small constant added to the denominator to prevent division by zero. Default is 1e-8.

Type:

float

tinygrad.nn.optim.wd

Weight decay parameter. Default is 0.01.

Type:

float

class tinygrad.nn.optim.LAMB(params: List[Tensor], lr=0.001, b1=0.9, b2=0.999, eps=1e-06, wd=0.0, adam=False)[source]

Bases: Optimizer

LAMB optimizer class.

params

The list of parameters to optimize.

Type:

List[Tensor]

lr

The learning rate. Defaults to 0.001.

Type:

float

b1

The exponential decay rate for the first moment estimates. Defaults to 0.9.

Type:

float

b2

The exponential decay rate for the second moment estimates. Defaults to 0.999.

Type:

float

eps

A small constant for numerical stability. Defaults to 1e-6.

Type:

float

wd

The weight decay coefficient. Defaults to 0.0.

Type:

float

adam

Whether to use the Adam optimizer. Defaults to False.

Type:

bool

step() None[source]

Perform one optimization step.

This method updates the parameters according to the LAMB or Adam algorithm.

class tinygrad.nn.optim.Optimizer(params: List[Tensor], lr: float)[source]

Bases: object

The optimizer class for updating parameters with gradients.

params

List of parameters to optimize. These parameters are assumed to be differentiable, and their gradient will be used during optimization.

Type:

List[Tensor]

lr

Learning rate for the optimizer

Type:

float

device

Device where the parameters are located

buffers

List of non-differentiable parameters (or buffers) that need to be realized along with the parameters. These typically include batch statistics in BatchNorm, running averages in Adam, etc.

Type:

List[Tensor]

realize(extra=None)[source]

Realizes all the parameters and buffers on the device. If extra parameters are provided, they will be realized as well.

Parameters:

extra (List[Tensor], optional) – Extra parameters that need to be realized. Defaults to None.

zero_grad()[source]

Sets the gradient of each optimized parameter to None. This is called at the beginning of each iteration.

class tinygrad.nn.optim.SGD(params: List[Tensor], lr=0.001, momentum=0, weight_decay=0.0, nesterov=False)[source]

Bases: Optimizer

Implements stochastic gradient descent (optionally with momentum).

params

The parameters to optimize.

Type:

List[Tensor]

lr

Learning rate. Default value is 0.001.

Type:

float

momentum

Momentum factor. Default value is 0. If set to None, no momentum will be used.

Type:

float

weight_decay

Weight decay (L2 penalty). Default value is 0.

Type:

float

nesterov

Enables Nesterov momentum. Default value is False.

Type:

bool

step() None[source]

Performs a single optimization step.

This method implements the weight update procedure of stochastic gradient descent with momentum and weight decay.

The actual updates are done in-place, so it’s safe to discard the return value.