tinygrad nn.optim
Note
You likely want the upstream tinygrad, not tinygrab. Tinygrab contains AI generated docstrings for a tinygrad snapshot. Upstream: https://tinygrad.org
- tinygrad.nn.optim.Adam(params: List[Tensor], lr=0.001, b1=0.9, b2=0.999, eps=1e-08)[source]
Create an Adam optimizer.
This method creates an Adam optimizer for the given parameters with the following default values:
- tinygrad.nn.optim.lr
Learning rate. Default is 0.001.
- Type:
float
- tinygrad.nn.optim.b1
First beta parameter. Default is 0.9.
- Type:
float
- tinygrad.nn.optim.b2
Second beta parameter. Default is 0.999.
- Type:
float
- tinygrad.nn.optim.eps
Epsilon value to prevent division by zero. Default is 1e-8.
- Type:
float
- Returns:
The LAMB optimizer with Adam settings.
- Return type:
- tinygrad.nn.optim.AdamW(params: List[Tensor], lr=0.001, b1=0.9, b2=0.999, eps=1e-08, wd=0.01)[source]
Function to implement the AdamW optimizer.
This function is essentially just the trust ratio part of LARS applied to Adam/W. If we just set the trust ratio to 1.0 its just Adam/W.
- tinygrad.nn.optim.params
A list of Tensors that will be updated by the optimizer.
- Type:
List[Tensor]
- tinygrad.nn.optim.lr
The learning rate. Default is 0.001.
- Type:
float
- tinygrad.nn.optim.b1
Exponential decay rate for the first moment estimates. Default is 0.9.
- Type:
float
- tinygrad.nn.optim.b2
Exponential decay rate for the second moment estimates. Default is 0.999.
- Type:
float
- tinygrad.nn.optim.eps
A small constant added to the denominator to prevent division by zero. Default is 1e-8.
- Type:
float
- tinygrad.nn.optim.wd
Weight decay parameter. Default is 0.01.
- Type:
float
- class tinygrad.nn.optim.LAMB(params: List[Tensor], lr=0.001, b1=0.9, b2=0.999, eps=1e-06, wd=0.0, adam=False)[source]
Bases:
Optimizer
LAMB optimizer class.
- lr
The learning rate. Defaults to 0.001.
- Type:
float
- b1
The exponential decay rate for the first moment estimates. Defaults to 0.9.
- Type:
float
- b2
The exponential decay rate for the second moment estimates. Defaults to 0.999.
- Type:
float
- eps
A small constant for numerical stability. Defaults to 1e-6.
- Type:
float
- wd
The weight decay coefficient. Defaults to 0.0.
- Type:
float
- adam
Whether to use the Adam optimizer. Defaults to False.
- Type:
bool
- class tinygrad.nn.optim.Optimizer(params: List[Tensor], lr: float)[source]
Bases:
object
The optimizer class for updating parameters with gradients.
- params
List of parameters to optimize. These parameters are assumed to be differentiable, and their gradient will be used during optimization.
- Type:
List[Tensor]
- lr
Learning rate for the optimizer
- Type:
float
- device
Device where the parameters are located
- buffers
List of non-differentiable parameters (or buffers) that need to be realized along with the parameters. These typically include batch statistics in BatchNorm, running averages in Adam, etc.
- Type:
List[Tensor]
- class tinygrad.nn.optim.SGD(params: List[Tensor], lr=0.001, momentum=0, weight_decay=0.0, nesterov=False)[source]
Bases:
Optimizer
Implements stochastic gradient descent (optionally with momentum).
- lr
Learning rate. Default value is 0.001.
- Type:
float
- momentum
Momentum factor. Default value is 0. If set to None, no momentum will be used.
- Type:
float
- weight_decay
Weight decay (L2 penalty). Default value is 0.
- Type:
float
- nesterov
Enables Nesterov momentum. Default value is False.
- Type:
bool