tinygrad tensor

Note

You likely want the upstream tinygrad, not tinygrab. Tinygrab contains AI generated docstrings for a tinygrad snapshot. Upstream: https://tinygrad.org

class tinygrad.tensor.Function(device: str, *tensors: Tensor)[source]

Bases: object

Class for creating a function in the autograd system.

device

The device where the operation takes place (e.g., “cpu”, “cuda”).

Type:

str

tensors

A list of input tensors for the operation.

Type:

List[Tensor]

needs_input_grad

List indicating whether an input tensor requires gradient computation.

Type:

List[bool]

requires_grad

Indicates whether the output tensor requires gradient computation.

Type:

Union[bool, None]

parents

The parent tensors for which gradients can be computed.

Type:

List[Tensor]

classmethod apply(*x: Tensor, **kwargs) Tensor[source]

Class method to apply the function on a tensor. Creates a context for the operation and computes its result.

Parameters:
  • cls (Type[Function]) – The class of the function being applied.

  • fxn (Type[Function]) – The function to be applied.

  • x (List[Tensor]) – The input tensors for the operation.

  • kwargs (Dict[str, Any]) – Additional keyword arguments for the operation.

Returns:

The result of the operation.

Return type:

Tensor

backward(*args, **kwargs)[source]

Backward method to be implemented by subclasses. This method is called when the function is called in backward mode.

Raises:

RuntimeError – If not overridden in a subclass.

forward(*args, **kwargs)[source]

Forward method to be implemented by subclasses. This method is called when the function is called in forward mode.

Raises:

NotImplementedError – If not overridden in a subclass.

class tinygrad.tensor.Tensor(data: None | int | float | list | LazyBuffer | ndarray | bytes, device: str | None = None, dtype: DType | None = None, requires_grad: bool | None = None)[source]

Bases: object

This class represents a tensor, which is the fundamental unit of data in tinygrad. It can be used for various mathematical operations and machine learning applications.

__slots__

List of attributes that are slotted for this class.

Type:

str

__deletable__

Tuple of attributes that can be deleted.

Type:

tuple

training

Class variable to track if the tensor is in training mode or not.

Type:

ClassVar[bool]

no_grad

Class variable to track if gradient computation is disabled or not.

Type:

ClassVar[bool]

default_type

Default data type for tensors.

Type:

ClassVar[DType]

property T: Tensor

Returns the transpose of the tensor.

self

The input tensor.

Type:

Tensor

Returns:

The transposed tensor.

Return type:

Tensor

abs()[source]

Calculate the absolute value of an object.

Returns:

The absolute value of the object.

Return type:

int

Raises:

ValueError – If the object does not support the absolute value computation.

acosh()[source]

Calculate the Inverse Hyperbolic Cosine (acosh) activation function.

This method calculates the acosh function for each element in self. The acosh function is defined as:

f(x) = log(x + sqrt((x - 1)(x + 1)))

Returns:

The transformed array after applying the acosh function element-wise.

Return type:

ndarray

log[source]

A method that applies the Natural Logarithm function to the data in self. The Natural Logarithm function is defined as: f(x) = ln(x)

Type:

method

square[source]

A method that squares each element in self. The Square operation returns an element-wise square of self.

Type:

method

sqrt[source]

A method that applies the Square Root function to the data in self. The Square Root function is defined as: f(x) = sqrt(x)

Type:

method

add(x: Tensor | float, reverse=False) Tensor[source]
static arange(start, stop=None, step=1, **kwargs)[source]

Create a tensor with evenly spaced values within a specified range.

Parameters:
  • start (int or float) – The start of the range (inclusive).

  • stop (int or float or None) – The end of the range (exclusive, if not specified, start is set to 0).

  • step (int or float) – The spacing between values. Default is 1.

  • kwargs – Additional keyword arguments for Tensor creation.

Returns:

A tensor with evenly spaced values within the specified range.

Return type:

Tensor

argmax(axis=None, keepdim=False)[source]

Returns the indices of the maximum value along a specified axis.

This method computes the index locations of the maximum values of a tensor’s elements along a given axis.

Parameters:
  • self (Tensor) – Tensor object on which operation is being performed.

  • axis (int, optional) – The axis along which the argmax will be computed. Default value is None, meaning the flattened input tensor is used.

  • keepdim (bool, optional) – If set to True, the output tensor will have the same number of dimensions as the input tensor. Default value is False.

idx

The tensor containing the index locations of maximum values along the specified axis.

Type:

Tensor

Returns:

A new tensor containing the indices of the maximum values.

argmin(axis=None, keepdim=False)[source]

Returns the indices of the minimum value along a specified axis.

This method computes the index locations of the minimum values of a tensor’s elements along a given axis.

Parameters:
  • self (Tensor) – Tensor object on which operation is being performed.

  • axis (int, optional) – The axis along which the argmin will be computed. Default value is None, meaning the flattened input tensor is used.

  • keepdim (bool, optional) – If set to True, the output tensor will have the same number of dimensions as the input tensor. Default value is False.

idx

The tensor containing the index locations of minimum values along the specified axis.

Type:

Tensor

Returns:

A new tensor containing the indices of the minimum values.

asinh()[source]

Calculate the Inverse Hyperbolic Sine (asinh) activation function.

This method calculates the asinh function for each element in self. The asinh function is defined as:

f(x) = log(x + sqrt(1 + x^2))

Returns:

The transformed array after applying the asinh function element-wise.

Return type:

ndarray

log[source]

A method that applies the Natural Logarithm function to the data in self. The Natural Logarithm function is defined as: f(x) = ln(x)

Type:

method

square[source]

A method that squares each element in self. The Square operation returns an element-wise square of self.

Type:

method

sqrt[source]

A method that applies the Square Root function to the data in self. The Square Root function is defined as: f(x) = sqrt(x)

Type:

method

assign(x) Tensor[source]

Assign a value to the tensor.

This method assigns a value to the tensor. It handles various cases such as when the tensor is not already a Tensor object or when it has a ‘DISK’ device. If the tensor requires gradient, an assertion error will be raised.

Parameters:

x (Any) – The value to be assigned to the tensor.

Returns:

The tensor with the assigned value.

Return type:

Tensor

atanh()[source]

Calculate the Inverse Hyperbolic Tangent (atanh) activation function.

This method calculates the atanh function for each element in self. The atanh function is defined as:

f(x) = log((1 + x) / (1 - x)) / 2

Returns:

The transformed array after applying the atanh function element-wise.

Return type:

ndarray

log[source]

A method that applies the Natural Logarithm function to the data in self. The Natural Logarithm function is defined as: f(x) = ln(x)

Type:

method

avg_pool2d(kernel_size=(2, 2), stride=None, dilation=1)[source]

Perform an average pooling operation on the input tensor.

kernel_size

The size of the sliding window for each dimension of the input tensor. Default is (2, 2).

Type:

tuple

stride

The stride of the sliding window for each dimension of the input tensor. If not provided, it defaults to be the same as kernel_size.

Type:

tuple or None

dilation

The spacing between the kernel points. Default is 1.

Type:

int

Returns:

The average pooled tensor.

Return type:

Tensor

backward() Tensor[source]

Compute the gradient of this tensor wrt its inputs.

Returns:

This tensor.

Return type:

Tensor

Raises:

AssertionError – If this tensor is not a scalar or has no gradient.

batchnorm(weight: Tensor | None, bias: Tensor | None, mean: Tensor, invstd: Tensor) Tensor[source]
binary_crossentropy(y: Tensor) Tensor[source]
binary_crossentropy_logits(y: Tensor) Tensor[source]
bitcast(dtype: DType) Tensor[source]
cast(dtype: DType) Tensor[source]
cat(*args: Tensor, dim: int = 0) Tensor[source]

Concatenate tensors along a given dimension.

This method concatenates the tensor self with other tensors in *args along the specified dimension dim. The tensors must have the same shape except for the dimension along which they are being concatenated. If dim is negative, it counts from the right.

\*args

Variable length argument list of tensors to be concatenated.

Type:

Tensor

dim

The dimension along which the tensors will be concatenated. Default is 0.

Type:

int

Returns:

The result of the concatenation operation.

Return type:

Tensor

ceil() Tensor[source]

Round up the tensor to the nearest integer.

Compares the tensor with its truncated version, and if greater, adds 1; otherwise, returns the original truncated value.

self

The input tensor.

Type:

Tensor

Returns:

The rounded-up tensor.

Return type:

Tensor

celu(alpha=1.0)[source]

Calculate the Continuously Differentiable Exponential Linear Unit (C-ELU) activation function.

This method calculates the C-ELU function for each element in self. The C-ELU function is defined as:

f(x) = max(0, x) + alpha * exp(-x / alpha) if x <= 0 f(x) = x if x > 0

Parameters:

alpha (float) – A scaling factor for the negative part of the function, default is 1.0.

Returns:

The transformed array after applying the C-ELU function element-wise.

Return type:

ndarray

maximum[source]

A method that takes the element-wise maximum of self and another array or scalar.

Type:

method

exp[source]

A method that computes the exponential of all elements in self. The exponential is applied element-wise.

Type:

method

minimum[source]

A method that takes the element-wise minimum of self and another array or scalar.

Type:

method

chunk(num: int, dim: int = 0) List[Tensor][source]

Splits this tensor into a specific number of chunks along the specified dimension.

Divides this tensor into a specific number of parts along the specified dimension. The tensor is divided into approximately equal parts, with the last part being potentially smaller if the tensor’s size along the given dimension is not divisible by num.

num

The number of chunks to split this tensor into.

Type:

int

dim

The dimension along which to split the tensor. Defaults to 0.

Type:

int, optional

Returns:

A list of tensors, resulting from splitting this tensor.

Return type:

List[Tensor]

clang() Tensor
clip(min_, max_)[source]

Clip the tensor to a specified range.

self

The input tensor.

Type:

Tensor

min_

The minimum value for clipping.

max_

The maximum value for clipping.

Returns:

A new tensor with values clipped between min_ and max_.

Return type:

Tensor

contiguous()[source]

Ensure that the storage of the tensor is contiguous in memory.

Returns:

A new tensor with contiguous storage.

Return type:

Tensor

contiguous_backward()[source]

Ensure that the gradient of the tensor is contiguous in memory.

Returns:

A new tensor with contiguous gradient storage.

Return type:

Tensor

conv2d(weight: Tensor, bias: Tensor | None = None, groups=1, stride=1, dilation=1, padding=0) Tensor[source]

Perform a 2D convolution operation on the input tensor.

This function convolves the input tensor with the given weight tensor and optionally adds the bias term. The convolution operation supports various optional parameters such as groups, stride, dilation, and padding.

Parameters:
  • self (Tensor) – The input tensor.

  • weight (Tensor) – The weight tensor for the convolution operation.

  • bias (Optional[Tensor]) – An optional bias term to be added after the convolution operation. Default is None.

  • groups (int) – Number of groups in which the input and output channels are divided. Default is 1.

  • stride (int or tuple) – Stride of the 2D convolution operation. Default is 1.

  • dilation (int or tuple) – Dilation factor of the 2D convolution operation. Default is 1.

  • padding (int or tuple) – Padding for the 2D convolution operation. Default is 0.

Returns:

The output tensor after performing the convolution operation, optionally adding the bias term if provided.

Return type:

Tensor

Raises:

ValueError – If the shape of the input tensor does not match the shape of the weight tensor or if the padding length is incorrect.

Note

This function assumes that the input tensor has a shape (batch_size, channels_in, height, width) and the weight tensor has a shape (channels_out, channels_in // groups, kernel_height, kernel_width).

conv_transpose2d(weight: Tensor, bias: Tensor | None = None, groups=1, stride=1, dilation=1, padding=0, output_padding=0) Tensor[source]

Compute the 2D transposed convolution of input tensor with the specified weight tensor.

self

Input tensor.

Type:

Tensor

weight

Weight tensor.

Type:

Tensor

bias

Bias tensor, if used. Default is None.

Type:

Optional[Tensor]

groups

Number of groups for the convolution. Default is 1.

Type:

int

stride

Stride of the convolution. Default is 1.

Type:

int or tuple

dilation

Spacing between the kernel elements. Default is 1.

Type:

int or tuple

padding

Padding added to both sides of the input. Default is 0.

Type:

int or tuple

output_padding

Additional size added to one side of the output shape. Default is 0.

Type:

int or tuple

Returns:

Output tensor after transposed convolution operation.

Return type:

Tensor

static corealize(lst: Iterable[Tensor])[source]

Realize a list of tensors.

This method takes an iterable collection of tensors and realizes them one by one.

Parameters:

lst (Iterable[Tensor]) – An iterable collection of tensors to be realized.

cos()[source]

Calculate the cosine of this object interpreted as an angle.

The angle is calculated by subtracting it from pi/2.

Returns:

The cosine of this object’s angle.

cosh()[source]

Calculate the Hyperbolic Cosine (cosh) activation function.

This method calculates the cosh function for each element in self. The cosh function is defined as:

f(x) = (exp(x) + exp(-x)) / 2

Returns:

The transformed array after applying the cosh function element-wise.

Return type:

ndarray

exp[source]

A method that applies the Exponential function to the data in self. The Exponential function is defined as: f(x) = e^x

Type:

method

neg[source]

A method that applies the Negation operation to the data in self. The Negation operation returns an element-wise negative of self.

Type:

method

cpu() Tensor
cuda() Tensor
cumsum(axis: int = 0) Tensor[source]

Calculate the cumulative sum of this tensor along a specified axis.

The implementation uses a two-stage approach for large tensors.

Parameters:
  • self (Tensor) – This tensor.

  • axis (int) – The axis along which to calculate the cumulative sum. Default is 0.

Returns:

The result of the cumulative sum operation.

Return type:

Tensor

deepwalk()[source]

Perform a depth-first search on the computation graph starting from this tensor.

Returns:

A list of tensors in topological order (deepest first).

Return type:

List[Tensor]

default_type: ClassVar[DType] = (10, 4, 'float', <class 'numpy.float32'>, 1)
detach() Tensor[source]

Detaches the tensor from its current computation graph, making it a leaf node.

Returns:

The detached tensor.

Return type:

Tensor

property device: str

Retrieve the device attribute from the lazydata of the tensor.

Returns:

The device where the tensor is stored (e.g., ‘cpu’, ‘cuda’).

Return type:

str

disk() Tensor
div(x: Tensor | float, reverse=False) Tensor[source]
dot(w: Tensor) Tensor[source]

Perform a dot product operation between this tensor and another tensor w.

The tensors should be at least 1D, and the last dimension of this tensor must match the second-to-last or last dimension of w.

Parameters:
  • self (Tensor) – This tensor.

  • w (Tensor) – The other tensor.

Returns:

The result of the dot product operation.

Return type:

Tensor

dropout(p=0.5) Tensor[source]
property dtype: DType

Retrieve the dtype attribute from the lazydata of the tensor.

Returns:

The data type of the tensor (e.g., float32, int64).

Return type:

DType

element_size() int[source]
elu(alpha=1.0)[source]

Calculate the Exponential Linear Unit (ELU) activation function.

This method calculates the ELU function for each element in self. The ELU function is defined as:

f(x) = max(0, x) - alpha * exp(-x) if x <= 0 f(x) = x if x > 0

Parameters:

alpha (float) – A scaling factor for the negative part of the function, default is 1.0.

Returns:

The transformed array after applying the ELU function element-wise.

Return type:

ndarray

relu[source]

A method that applies the Rectified Linear Unit (ReLU) function to the data in self. ReLU replaces all negative values with zero and keeps positive values unchanged.

Type:

method

exp[source]

A method that computes the exponential of all elements in self. The exponential is applied element-wise.

Type:

method

static empty(*shape, **kwargs)[source]

Create an uninitialized tensor.

shape

Shape of the tensor.

Type:

tuple

Returns:

Constructed tensor.

Return type:

Tensor

exp()[source]

Calculate the exponential of each element in the tensor.

Returns:

A new tensor with the exponential of each element.

Return type:

Tensor

exp2()[source]

Calculate the base 2 exponential of the current object.

self

The instance of the class.

Type:

object

Returns:

The base 2 exponential of the current object.

Return type:

float

expand(shape, *args) Tensor[source]

Expands the tensor to a new shape.

Parameters:
  • self – The tensor to be expanded.

  • shape (Tuple[int, ...]) – The desired shape of the expanded tensor.

  • *args – Additional arguments.

Returns:

The expanded tensor.

Return type:

Tensor

static eye(dim: int, **kwargs)[source]

Create an identity matrix of the specified dimension.

Parameters:
  • dim (int) – The number of rows and columns in the identity matrix.

  • kwargs – Additional keyword arguments for Tensor creation.

Returns:

An identity matrix of the specified dimension.

Return type:

Tensor

flatten(start_dim=0)[source]

Flattens the tensor from the specified dimension.

self

The input tensor.

Type:

Tensor

start_dim

The starting dimension to flatten from. Default is 0.

Type:

int

Returns:

The flattened tensor.

Return type:

Tensor

flip(axis, *args) Tensor[source]

Flips the tensor along a given axis or a list of axes.

Parameters:
  • self – The tensor to be flipped.

  • axis (int or Tuple[int, ...]) – The axis or axes along which to flip the tensor. If negative, it counts from the last dimension.

  • *args – Additional arguments.

Returns:

The flipped tensor.

Return type:

Tensor

float() Tensor[source]
floor() Tensor[source]

Round down the tensor to the nearest integer.

Compares the tensor with its truncated version, and if smaller, subtracts 1; otherwise, returns the original truncated value.

self

The input tensor.

Type:

Tensor

Returns:

The rounded-down tensor.

Return type:

Tensor

static full(shape: Tuple[Node | int, ...], fill_value, **kwargs)[source]

Create a tensor filled with a specified value.

Parameters:
  • shape (Tuple[int, ...]) – The shape of the desired tensor.

  • fill_value (int or float) – The value to fill the tensor with.

  • kwargs – Additional keyword arguments for Tensor creation.

Returns:

A tensor filled with the specified value.

Return type:

Tensor

full_like(fill_value, **kwargs)[source]

Creates a tensor filled with the specified fill_value. The shape of the new tensor is determined by the shape of the calling tensor. The data type and device can be optionally specified using keyword arguments. If not provided, they default to the data type and device of the calling tensor.

fill_value

Value to fill the new tensor with.

Type:

Any

\*\*kwargs

Keyword arguments for specifying additional parameters such as data type (dtype) and device (device).

Returns:

A new tensor filled with fill_value.

Return type:

Tensor

gather(idx: Tensor, dim: int) Tensor[source]

Gather tensor along dimension.

Parameters:
  • idx (Tensor) – Index tensor for gathering.

  • dim (int) – Dimension to gather along.

Returns:

Gathered tensor.

Return type:

Tensor

self

Input tensor for gather operation.

Type:

Tensor

idx

Index tensor for gathering. Must have the same number of dimensions as self.

Type:

Tensor

dim

Dimension to gather along.

Type:

int

Note

AssertionError will be raised if idx.ndim != self.ndim, i.e., if the index tensor does not have the same number of dimensions as the input tensor.

gelu()[source]

Apply the Gaussian Error Linear Unit (GELU) activation function.

This method applies the GELU function to each element in self. The GELU function is defined as:

f(x) = 0.5 * x * (1 + tanh(sqrt(2/pi) * (x + 0.044715 * x^3)))

Returns:

The transformed array after applying the GELU function element-wise.

Return type:

ndarray

tanh[source]

A method that applies the Hyperbolic Tangent function to the data in self. The Hyperbolic Tangent function is defined as: f(x) = tanh(x)

Type:

method

static glorot_uniform(*shape, **kwargs) Tensor[source]

Generate a tensor with random values sampled from a uniform distribution according to the Glorot initialization method.

shape

The shape of the output tensor.

Type:

tuple

Returns:

A tensor with random values sampled from a uniform distribution according to the Glorot initialization method.

Return type:

Tensor

gpu() Tensor
grad: Tensor | None
half() Tensor[source]
hardswish()[source]

Calculate the Hard Swish activation function.

This method calculates the Hard Swish function for each element in self. The Hard Swish function is defined as:

f(x) = x * (((x + 3) min 6) max 0) / 6

Returns:

The transformed array after applying the Hard Swish function element-wise.

Return type:

ndarray

relu6[source]

A method that applies the Rectified Linear Unit 6 (ReLU6) function to the data in self. The ReLU6 function is defined as: f(x) = min(max(0, x), 6)

Type:

method

hardtanh(min_val=-1, max_val=1)[source]

Apply the HardTanh activation function.

This method applies the HardTanh function to each element in self. The HardTanh function is defined as:
f(x) = max_val if x > max_val

= min_val if x < min_val = x otherwise

Parameters:
  • min_val (float) – The minimum value of the output range. Defaults to -1.

  • max_val (float) – The maximum value of the output range. Defaults to 1.

Returns:

The transformed array after applying the HardTanh function element-wise.

Return type:

ndarray

clip[source]

A method that clips self to a specified range [min_val, max_val]. If an element in self is less than min_val, it is set to min_val. If an element is greater than max_val, it is set to max_val. The clip operation does not modify elements that are within the range [min_val, max_val].

Type:

method

hip() Tensor
is_floating_point() bool[source]
item() float | int[source]

Returns the tensor as a Python scalar.

Raises:

AssertionError – If the number of elements in the tensor is not 1.

Returns:

The tensor’s value as a Python scalar.

Return type:

Union[float, int]

static kaiming_normal(*shape, a: float = 0.01, **kwargs) Tensor[source]

Generate a tensor with random values sampled from a normal distribution according to the Kaiming initialization method for weights.

shape

The shape of the output tensor.

Type:

tuple

a

The negative slope of the rectifier used after this layer. Default is 0.01.

Type:

float

Returns:

A tensor with random values sampled from a normal distribution according to the Kaiming initialization method for weights.

Return type:

Tensor

static kaiming_uniform(*shape, a: float = 0.01, **kwargs) Tensor[source]

Generate a tensor with random values sampled from a uniform distribution according to the Kaiming initialization method for weights.

shape

The shape of the output tensor.

Type:

tuple

a

The negative slope of the rectifier used after this layer. Default is 0.01.

Type:

float

Returns:

A tensor with random values sampled from a uniform distribution according to the Kaiming initialization method for weights.

Return type:

Tensor

layernorm(axis=-1, eps: float = 1e-05) Tensor[source]
lazydata
leakyrelu(neg_slope=0.01)[source]

Apply the Leaky ReLU activation function.

This method applies the Leaky ReLU function to each element in self. The Leaky ReLU function is defined as:

f(x) = max(x, neg_slope * x)

Parameters:

neg_slope (float) – The negative slope parameter for the Leaky ReLU function. Default is 0.01.

Returns:

The transformed array after applying the Leaky ReLU function element-wise.

Return type:

ndarray

relu[source]

A method that applies the Rectified Linear Unit (ReLU) function to the data in self. The ReLU function is defined as: f(x) = max(0, x)

Type:

method

linear(weight: Tensor, bias: Tensor | None = None)[source]
llvm() Tensor
log()[source]

Calculate the natural logarithm of each element in the tensor.

Returns:

A new tensor with the natural logarithm of each element.

Return type:

Tensor

log2()[source]

Calculate the base-2 logarithm of each element in the tensor.

Returns:

A new tensor with the base-2 logarithm of each element.

Return type:

Tensor

log_softmax(axis=-1)[source]

Calculate the log softmax of a tensor along a specified axis.

This method computes the logarithm of the softmax values of a tensor’s elements along a given axis. The softmax function is a function that turns a vector of numbers into a probability distribution, so that the elements of the vector add up to 1.

Parameters:
  • self (Tensor) – Tensor object on which operation is being performed.

  • axis (int, optional) – The axis along which the log softmax will be computed. Default value is -1, meaning the last dimension.

m

The softmax values of the tensor elements along the specified axis.

Type:

Tensor

ss

The sum of the softmax values along the specified axis.

Type:

Tensor

Returns:

A new tensor containing the log softmax values.

static manual_seed(seed=0)[source]

Set the seed for generating random numbers.

seed

Seed value. Defaults to 0.

Type:

int, optional

matmul(x: Tensor, reverse=False) Tensor[source]
max(axis=None, keepdim=False)[source]

Compute the maximum value along a given axis.

This method computes the maximum value of the elements in the input tensor along the specified axis. By default, it computes the maximum value of the flattened tensor.

Parameters:
  • self (Tensor) – The input tensor.

  • axis (int, optional) – Axis along which to operate. Default is None, which means the function will compute the maximum value of the flattened tensor.

  • keepdim (bool, optional) – Whether to retain the original dimension. Default is False.

Returns:

The output tensor containing the maximum values.

Return type:

Tensor

Examples

>>> a = Tensor([[1, 2], [3, 4]])
>>> a.max()
Tensor(4)
>>> a.max(axis=0)
Tensor([3, 4])
>>> a.max(axis=1)
Tensor([2, 4])
max_pool2d(kernel_size=(2, 2), stride=None, dilation=1)[source]

Perform a max pooling operation on the input tensor.

kernel_size

The size of the sliding window for each dimension of the input tensor. Default is (2, 2).

Type:

tuple

stride

The stride of the sliding window for each dimension of the input tensor. If not provided, it defaults to be the same as kernel_size.

Type:

tuple or None

dilation

The spacing between the kernel points. Default is 1.

Type:

int

Returns:

The max pooled tensor.

Return type:

Tensor

maximum(x: Tensor | float) Tensor[source]
mean(axis=None, keepdim=False)[source]

Compute the mean value along a given axis.

This method computes the average of the elements in the input tensor along the specified axis. By default, it computes the average of the flattened tensor.

Parameters:
  • self (Tensor) – The input tensor.

  • axis (int, optional) – Axis along which to operate. Default is None, which means the function will compute the average of the flattened tensor.

  • keepdim (bool, optional) – Whether to retain the original dimension. Default is False.

Returns:

The output tensor containing the mean values.

Return type:

Tensor

Examples

>>> a = Tensor([[1, 2], [3, 4]])
>>> a.mean()
Tensor(2.5)
>>> a.mean(axis=0)
Tensor([2, 3])
>>> a.mean(axis=1)
Tensor([1.5, 3.5])
metal() Tensor
min(axis=None, keepdim=False)[source]

Compute the minimum value along a given axis.

This method computes the minimum value of the elements in the input tensor along the specified axis. By default, it computes the minimum value of the flattened tensor.

Parameters:
  • self (Tensor) – The input tensor.

  • axis (int, optional) – Axis along which to operate. Default is None, which means the function will compute the minimum value of the flattened tensor.

  • keepdim (bool, optional) – Whether to retain the original dimension. Default is False.

Returns:

The output tensor containing the minimum values.

Return type:

Tensor

Examples

>>> a = Tensor([[1, 2], [3, 4]])
>>> a.min()
Tensor(1)
>>> a.min(axis=0)
Tensor([1, 2])
>>> a.min(axis=1)
Tensor([1, 3])
minimum(x: Tensor | float) Tensor[source]
mish()[source]

Apply the Mish activation function.

This method applies the Mish function to each element in self. The Mish function is defined as:

f(x) = x * tanh(softplus(x))

Returns:

The transformed array after applying the Mish function element-wise.

Return type:

ndarray

softplus[source]

A method that applies the Softplus function to the data in self. The Softplus function is defined as: f(x) = log(1 + exp(x))

Type:

method

tanh[source]

A method that applies the hyperbolic tangent function to the data in self. The hyperbolic tangent function is defined as: f(x) = (exp(x) - exp(-x)) / (exp(x) + exp(-x))

Type:

method

mul(x: Tensor | float, reverse=False) Tensor[source]
multinomial(num_samples: int = 1, replacement: bool = False) Tensor[source]

Draw samples from a multinomial distribution.

Parameters:
  • self (Tensor) – Input tensor of shape 1 or 2 dimensions.

  • num_samples (int) – Number of samples to draw, must be positive. Default is 1.

  • replacement (bool) – If True, sample with replacement. Default is False.

Returns:

The drawn samples.

Return type:

Tensor

Raises:
  • AssertionError – If the input tensor has an unsupported number of dimensions or num_samples is not positive.

  • AssertionError – If no replacement is requested, but num_samples > 1.

weight

The input tensor reshaped to have one more dimension.

Type:

Tensor

cdf

The cumulative distribution function of the input tensor.

Type:

Tensor

unif_samples

Uniformly distributed samples in the range [0, 1).

Type:

Tensor

indices

Indices of the drawn samples.

Type:

Tensor

nbytes() int[source]
property ndim: int
neg()[source]

Negate the value of each element in the tensor.

Returns:

A new tensor with all elements negated.

Return type:

Tensor

no_grad: ClassVar[bool] = False
static normal(*shape, mean=0.0, std=1.0, **kwargs) Tensor[source]

Generate a tensor with random values sampled from a normal distribution.

shape

The shape of the output tensor.

Type:

tuple

mean[source]

The mean of the normal distribution. Default is 0.0.

Type:

float

std[source]

The standard deviation of the normal distribution. Default is 1.0.

Type:

float

Returns:

A tensor with random values sampled from a normal distribution.

Return type:

Tensor

numel() Node | int[source]
numpy() ndarray[source]

Converts the tensor to a NumPy array.

Raises:

AssertionError – If the shape is symbolic or the dtype cannot be represented in NumPy.

Returns:

The NumPy equivalent of this tensor.

Return type:

np.ndarray

static ones(*shape, **kwargs)[source]

Create a tensor filled with ones.

Parameters:
  • shape (Tuple[int, ...]) – The shape of the desired tensor.

  • kwargs – Additional keyword arguments for Tensor creation.

Returns:

A tensor filled with ones.

Return type:

Tensor

ones_like(**kwargs)[source]

Creates a tensor filled with ones. The shape of the new tensor is determined by the shape of the calling tensor. The data type and device can be optionally specified using keyword arguments. If not provided, they default to the data type and device of the calling tensor.

\*\*kwargs

Keyword arguments for specifying additional parameters such as data type (dtype) and device (device).

Returns:

A new tensor filled with ones.

Return type:

Tensor

pad(arg: Tuple[Tuple[Node | int, Node | int] | None, ...], value: float = 0.0) Tensor[source]

Pad tensor with specified value.

Parameters:
  • arg (Tuple[Optional[Tuple[sint, sint]], ...]) – The padding size for each dimension. If None or (0, 0) is provided for a dimension, no padding is added in that dimension.

  • value (float) – The value to fill the padded area with. Default is 0.0.

Returns:

The tensor after padding.

Return type:

Tensor

self

input tensor to be padded

arg

tuple of padding sizes for each dimension

value

value used for padding

pad2d(padding: List[int] | Tuple[int, ...], value: float = 0) Tensor[source]

Pad the 2D tensor with specified values.

Parameters:
  • self (Tensor) – The input tensor.

  • padding (Union[List[int], Tuple[int, ...]]) – A sequence of integers representing the padding values for each side of the tensor. The order is (padding_left, padding_right, padding_top, padding_bottom).

  • value (float) – The padding value, defaults to 0.

Returns:

The output padded tensor.

Return type:

Tensor

permute(order, *args) Tensor[source]

Permutes the dimensions of the tensor according to a given order.

Parameters:
  • self – The tensor to be permuted.

  • order (Tuple[int, ...]) – The desired order of dimensions.

  • *args – Additional arguments.

Returns:

The permuted tensor.

Return type:

Tensor

pow(x: Tensor | float, reverse=False) Tensor[source]
quick_gelu()[source]

Apply a faster approximation of Gaussian Error Linear Unit (GELU) activation function.

This method applies an approximate GELU function to each element in self. The approximate GELU function is defined as:

f(x) = x * sigmoid(x * 1.702)

Returns:

The transformed array after applying the approximate GELU function element-wise.

Return type:

ndarray

sigmoid[source]

A method that applies the Sigmoid function to the data in self. The Sigmoid function is defined as: f(x) = 1 / (1 + exp(-x))

Type:

method

static rand(*shape, **kwargs)[source]

Create a tensor with random elements.

Parameters:
  • shape (Tuple[int, ...]) – The shape of the desired tensor.

  • kwargs – Additional keyword arguments for LoadOps.

Returns:

A tensor filled with random values.

Return type:

Tensor

static randint(*shape, low=0, high=10, **kwargs) Tensor[source]

Generates a tensor of the specified shape filled with random integers from a uniform distribution within the range [low, high). The data type can be optionally specified using a keyword argument. If not provided, it defaults to int32.

\*shape

Shape of the new tensor.

Type:

int

low

Lower bound of the uniform distribution. Defaults to 0.

Type:

int

high

Upper bound of the uniform distribution. Defaults to 10.

Type:

int

\*\*kwargs

Keyword arguments for specifying additional parameters.

Returns:

A new tensor filled with random integers from a uniform distribution.

Return type:

Tensor

static randn(*shape, dtype: DType | None = None, **kwargs) Tensor[source]

Generates a tensor of the specified shape filled with random numbers from a normal distribution (mean=0, standard deviation=1). The data type can be optionally specified using a keyword argument. If not provided, it defaults to the default data type.

\*shape

Shape of the new tensor.

Type:

int

dtype

Optional data type for the new tensor. Defaults to the default data type if not specified.

Type:

Optional[DType]

\*\*kwargs

Keyword arguments for specifying additional parameters.

Returns:

A new tensor filled with random numbers from a normal distribution.

Return type:

Tensor

realize() Tensor[source]

Realize the tensor.

This method realizes the tensor by running a schedule on its lazy data. The realized tensor is then returned.

Returns:

The realized tensor.

Return type:

Tensor

reciprocal()[source]

Calculate and return the element-wise reciprocal of the tensor.

For each element in the tensor, this function calculates its reciprocal (1 divided by the element value). The result is returned as a new tensor with the same shape as the original tensor.

Returns:

A tensor of the same shape as the input tensor, where all elements are replaced with their respective reciprocals.

Return type:

torch.Tensor

relu()[source]

Apply the Rectified Linear Unit (ReLU) function to the current object.

The ReLU function is defined as f(x) = max(0, x), which returns 0 if the input value x is negative and x if it is positive.

self

The instance of the class.

Type:

object

Returns:

The ReLU function applied to the current object.

Return type:

float

relu6()[source]

Calculate the Rectified Linear Unit 6 (ReLU6) activation function.

This method calculates the ReLU6 function for each element in self. The ReLU6 function is defined as:

f(x) = min(max(0, x), 6)

Returns:

The transformed array after applying the ReLU6 function element-wise.

Return type:

ndarray

relu[source]

A method that applies the Rectified Linear Unit (ReLU) function to the data in self. The ReLU function is defined as: f(x) = max(0, x)

Type:

method

repeat(repeats: Sequence[int]) Tensor[source]

Repeats this tensor along each dimension by the specified amounts.

Generates a new tensor which is a repetition of this tensor along each dimension. The number of repetitions for each dimension is defined by the repeats argument.

repeats

The number of repetitions for each dimension.

Type:

Sequence[int]

Returns:

The repeated tensor.

Return type:

Tensor

requires_grad: bool | None
reshape(shape, *args) Tensor[source]

Reshapes the tensor.

Parameters:
  • self – The tensor to be reshaped.

  • shape (Tuple[int, ...]) – The desired shape of the tensor.

  • *args – Additional arguments.

Returns:

The reshaped tensor.

Return type:

Tensor

rsqrt()[source]

Calculate the reciprocal square root of this object.

Returns:

The reciprocal square root of this object.

scaled_dot_product_attention(key: Tensor, value: Tensor, attn_mask: Tensor | None = None, dropout_p: float = 0.0, is_causal: bool = False) Tensor[source]
static scaled_uniform(*shape, **kwargs) Tensor[source]

Generate a tensor with random values sampled from a uniform distribution and scale it by prod(shape)**-0.5.

shape

The shape of the output tensor.

Type:

tuple

Returns:

A tensor with random values sampled from a uniform distribution and scaled by prod(shape)**-0.5.

Return type:

Tensor

sequential(ll: List[Callable[[Tensor], Tensor]])[source]
property shape: Tuple[Node | int, ...]

Retrieve the shape attribute from the lazydata of the tensor.

Returns:

A tuple representing the dimensions of the tensor.

Return type:

tuple

shrink(arg: Tuple[Tuple[Node | int, Node | int] | None, ...]) Tensor[source]

Shrinks the tensor along a given dimension or dimensions.

Parameters:
  • self – The tensor to be shrunk.

  • arg (Tuple[Tuple[int, int], ...]) – The dimensions and the size of the shrinking operation for each dimension. If None, no shrinking is performed.

Returns:

The shrunken tensor.

Return type:

Tensor

sigmoid()[source]

Apply the Sigmoid function to the current object.

The Sigmoid function is defined as f(x) = 1 / (1 + exp(-x)), which maps any input real number into a value between 0 and 1.

self

The instance of the class.

Type:

object

Returns:

The Sigmoid function applied to the current object.

Return type:

float

sign()[source]

Calculate and return the element-wise sign of the tensor.

For each element in the tensor, this function determines if it is positive or negative and assigns 1 to positive elements and -1 to negative elements. The result is returned as a new tensor with the same shape as the original tensor.

Returns:

A tensor of the same shape as the input tensor, where all positive elements are replaced with 1 and all negative elements are replaced with -1.

Return type:

torch.Tensor

silu()[source]

Calculate the Sigmoid Weighted Linear Unit (SiLU) activation function, also known as the swish function.

This method calculates the SiLU function for each element in self using the Swish function. The SiLU function is defined as:

f(x) = x * sigmoid(x)

Returns:

The transformed array after applying the SiLU function element-wise.

Return type:

ndarray

swish[source]

A method that applies the Swish function to the data in self. The Swish function is defined as: f(x) = x * sigmoid(x)

Type:

method

sin()[source]

Calculate the sine of the current object.

self

The instance of the class.

Type:

object

Returns:

The sine of the current object.

Return type:

float

sinh()[source]

Calculate the Hyperbolic Sine (sinh) activation function.

This method calculates the sinh function for each element in self. The sinh function is defined as:

f(x) = (exp(x) - exp(-x)) / 2

Returns:

The transformed array after applying the sinh function element-wise.

Return type:

ndarray

exp[source]

A method that applies the Exponential function to the data in self. The Exponential function is defined as: f(x) = e^x

Type:

method

neg[source]

A method that applies the Negation operation to the data in self. The Negation operation returns an element-wise negative of self.

Type:

method

slice(arg: Sequence[Tuple[int, Node | int] | None], value: float = 0) Tensor[source]

Slice tensor.

Parameters:
  • arg (Sequence) – Sequence of tuples or None for slicing.

  • value (float) – Value to pad with, default is 0.

Returns:

Sliced tensor.

Return type:

Tensor

softmax(axis=-1)[source]

Calculate the softmax of the tensor along a given axis.

The softmax function is often used in deep learning models such as neural networks. It converts an input tensor into a probability distribution where all values are between 0 and 1, and the sum of all values is equal to 1.

Parameters:

axis – int, optional, default=-1 Axis along which the softmax operation is performed.

self

Tensor The input tensor.

Returns:

Tensor The softmax values of the tensor along the given axis.

softplus(beta=1)[source]

Apply the Softplus function.

This method applies the Softplus function to each element in self. The Softplus function is defined as:

f(x) = (1/beta) * log(1 + exp(beta * x))

Parameters:

beta (float) – The beta parameter for the Softplus function. Default is 1.

Returns:

The transformed array after applying the Softplus function element-wise.

Return type:

ndarray

softsign()[source]

Apply the Softsign function.

This method applies the Softsign function to each element in self. The Softsign function is defined as:

f(x) = x / (1 + |x|)

Returns:

The transformed array after applying the Softsign function element-wise.

Return type:

ndarray

sparse_categorical_crossentropy(Y, ignore_index=-1) Tensor[source]
sqrt()[source]

Calculate the square root of the current object.

self

The instance of the class.

Type:

object

Returns:

The square root of the current object.

Return type:

float

square()[source]

Square each element in the tensor.

self

The input tensor.

Type:

Tensor

Returns:

A new tensor with each element squared.

Return type:

Tensor

squeeze(dim: int | None = None) Tensor[source]

Removes a dimension of size 1 from this tensor.

If dim is given, removes the specified dimension from this tensor if it has size 1. If dim is not provided, removes all dimensions of size 1 from this tensor. If the specified dimension does not have size 1, an error is raised.

dim

The dimension to remove if it has size 1. Defaults to None.

Type:

Optional[int], optional

Returns:

The tensor with the removed dimensions of size 1.

Return type:

Tensor

static stack(tensors: Sequence[Tensor], dim: int = 0) Tensor[source]

Stacks a sequence of tensors along the specified dimension.

This method takes a sequence of tensors and concatenates them along the specified dimension. The first tensor in the sequence is unsqueezed on the specified dimension. Then, all other tensors in the sequence are also unsqueezed on the specified dimension and concatenated with the first tensor.

tensors

A sequence of tensors to stack.

Type:

Sequence[Tensor]

dim

The dimension along which to stack the tensors. Defaults to 0.

Type:

int, optional

Returns:

The stacked tensor.

Return type:

Tensor

std(axis=None, keepdim=False, correction=1)[source]

Calculate the standard deviation of the tensor.

This method computes the standard deviation along a given axis. The standard deviation is calculated as sqrt((X - mean)**2 / N), where X is the tensor, mean is the mean value of X, and N is the number of elements in X or the number of elements in the output if keepdim is True. If correction is 0, then the divisor used in the calculation is N, otherwise it is N - 1. The default behavior is to use a correction term (N - 1).

Parameters:
  • axis – int or None, optional, default=None Axis along which the standard deviation is calculated. If None, compute the standard deviation of the flattened tensor.

  • keepdim – bool, optional, default=False If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the input array.

  • correction – int, optional, default=1 This parameter is used to decide whether to use Bessel’s correction (correction = 1), or not (correction = 0). Default value is 1.

self

Tensor The input tensor.

Returns:

Tensor The standard deviation of the tensor along the given axis.

sub(x: Tensor | float, reverse=False) Tensor[source]
sum(axis=None, keepdim=False)[source]

Computes the sum of the tensor along the specified axes.

self

The input tensor.

Type:

Tensor

axis

The axis/axes along which to compute the sum. Default is None.

Type:

Optional[Union[int, Tuple[int, …]]]

keepdim

Whether or not to retain the reduced dimensions. Default is False.

Type:

bool

Returns:

The tensor with the sum computed.

Return type:

Tensor

swish()[source]

Calculate the Swish activation function.

This method calculates the Swish function for each element in self. The Swish function is defined as:

f(x) = x * sigmoid(x)

Returns:

The transformed array after applying the Swish function element-wise.

Return type:

ndarray

sigmoid[source]

A method that applies the Sigmoid function to the data in self. The Sigmoid function is defined as: f(x) = 1 / (1 + exp(-x))

Type:

method

tan()[source]

Calculate the tangent of this object interpreted as an angle.

The tangent is calculated by dividing the sine by the cosine of this object’s angle.

Returns:

The tangent of this object’s angle.

tanh()[source]

Calculate the Hyperbolic Tangent (tanh) activation function.

This method calculates the tanh function for each element in self. The tanh function is defined as:

f(x) = 2 * sigmoid(2 * x) - 1

Returns:

The transformed array after applying the tanh function element-wise.

Return type:

ndarray

sigmoid[source]

A method that applies the Sigmoid function to the data in self. The Sigmoid function is defined as: f(x) = 1 / (1 + exp(-x))

Type:

method

to(device: str | None) Tensor[source]

Moves the tensor to a specified device (if different from its current device).

Parameters:

device (Optional[str]) – The target device. If None or equal to the current device, does nothing.

Returns:

The tensor on the target device.

Return type:

Tensor

to_(device: str | None)[source]

Moves the tensor in-place to a specified device (if different from its current device).

Parameters:

device (Optional[str]) – The target device. If None or equal to the current device, does nothing.

Returns:

Modifies the tensor in-place.

Return type:

None

torch() Tensor
class train(val=True)[source]

Bases: object

training: ClassVar[bool] = False
transpose(ax1=1, ax2=0) Tensor[source]

Transposes the tensor along the specified axes.

self

The input tensor.

Type:

Tensor

ax1

The first axis to be transposed. Default is 1.

Type:

int

ax2

The second axis to be transposed. Default is 0.

Type:

int

Returns:

The transposed tensor.

Return type:

Tensor

tril(k: int = 0) Tensor[source]

Create a new tensor with all elements above the k-th diagonal set to zero.

This method creates a new tensor with all elements above the k-th diagonal set to zero in the lower triangular matrix. The resulting tensor is created by applying the _tri function and using the where() function to combine it with the original tensor.

Parameters:

k (int, optional) – Diagonal offset (default=0).

Returns:

Tensor representing the lower triangular matrix.

Return type:

Tensor

triu(k: int = 0) Tensor[source]

Create a new tensor with all elements below the k-th diagonal set to zero.

This method creates a new tensor with all elements below the k-th diagonal set to zero in the upper triangular matrix. The resulting tensor is created by applying the _tri function and using the where() function to combine it with the original tensor.

Parameters:

k (int, optional) – Diagonal offset (default=0).

Returns:

Tensor representing the upper triangular matrix.

Return type:

Tensor

trunc() Tensor[source]

Truncate the tensor.

Casts the tensor to an int32 data type, ensures it is contiguous, and then casts it back to its original data type.

self

The input tensor.

Type:

Tensor

Returns:

The truncated tensor.

Return type:

Tensor

static uniform(*shape, low=0.0, high=1.0, **kwargs) Tensor[source]

Generate a tensor with random values sampled from a uniform distribution.

shape

The shape of the output tensor.

Type:

tuple

low

The lower bound of the uniform distribution. Default is 0.0.

Type:

float

high

The upper bound of the uniform distribution. Default is 1.0.

Type:

float

Returns:

A tensor with random values sampled from a uniform distribution.

Return type:

Tensor

unsqueeze(dim: int) Tensor[source]

Add a dimension to the tensor at the specified index.

Parameters:
  • self (Tensor) – The input tensor.

  • dim (int) – The index where the new dimension will be added.

Returns:

The output tensor with the additional dimension.

Return type:

Tensor

Raises:

ValueError – If dim is not a valid index for the new dimension.

webgpu() Tensor
where(input_: Tensor | float, other: Tensor | float)[source]
wino = 0
static zeros(*shape, **kwargs)[source]

Create a tensor filled with zeros.

Parameters:
  • shape (Tuple[int, ...]) – The shape of the desired tensor.

  • kwargs – Additional keyword arguments for Tensor creation.

Returns:

A tensor filled with zeros.

Return type:

Tensor

zeros_like(**kwargs)[source]

Creates a tensor filled with zeros. The shape of the new tensor is determined by the shape of the calling tensor. The data type and device can be optionally specified using keyword arguments. If not provided, they default to the data type and device of the calling tensor.

\*\*kwargs

Keyword arguments for specifying additional parameters such as data type (dtype) and device (device).

Returns:

A new tensor filled with zeros.

Return type:

Tensor

tinygrad.tensor.custom_random(out: Buffer)[source]