tinygrad codegen.kernel

Note

You likely want the upstream tinygrad, not tinygrab. Tinygrab contains AI generated docstrings for a tinygrad snapshot. Upstream: https://tinygrad.org

class tinygrad.codegen.kernel.Kernel(ast: LazyOp, opts: LinearizerOptions | None = None)[source]

Bases: object

The Kernel class represents a single kernel in the linearizer. It contains information about the AST (Abstract Syntax Tree), options, and various buffers and shape trackers used during the linearization process. This class also provides methods for simplifying and optimizing the linearized code.

ast

The abstract syntax tree representing the kernel’s operations.

Type:

LazyOp

opts

The options used during the linearization process.

Type:

Optional[LinearizerOptions]

info

Information about the floating-point operations in the kernel.

Type:

FlopCounter

reduceop

The single allowed reduce operation in an AST, if it exists.

Type:

Optional[Any]

bufs

The list of unique buffers used by the kernel.

Type:

List[Union[MemBuffer, ConstBuffer, LocalBuffer]]

earlybufs

The list of buffers before the reduce operation, if any.

Type:

List[Any]

full_buf_index

The index of the buffer with all axes.

Type:

int

sts

The shape trackers for each buffer in the kernel.

Type:

List[ShapeTracker]

applied_opts

The list of optimization options that have been applied to the kernel.

Type:

List[Opt]

group_for_reduce

Unknown.

Type:

List[int]

upcasted

A flag indicating whether an upcast operation has been performed on the kernel.

Type:

int

local_dims

The number of local dimensions in the kernel.

Type:

int

local_alias

A dictionary mapping integers to local buffers.

Type:

Dict[int, LocalBuffer]

tensor_core

Information about the tensor core being used, if any.

Type:

Optional[TensorCore]

dont_use_locals

A flag indicating whether local buffers should be used in the kernel.

Type:

bool

applied_opts_cache

A cache of optimization options that have been applied to the kernel.

Type:

Optional[List[Opt]]

acc_offsets(i: int) List[int][source]

Calculate access offsets for a given index.

i

The index to calculate access offsets for.

Type:

int

Returns:

A list of calculated access offsets.

Return type:

List[int]

alias_buffer(i, pattern)[source]

Alias a buffer.

Parameters:
  • i (int) – Index of the buffer to be aliased.

  • pattern (List) – List representing the pattern for each shape.

apply_opt(opt: Opt)[source]

Apply an optimization to the current object.

This method checks if the optimization operation is applicable based on the ‘dont_use_locals’ attribute and the type of the operation. It then appends the optimization to a list of applied optimizations. The axis for the optimization is calculated based on certain conditions and defaulted to -1 if no specific axis is given.

Parameters:

opt (Opt) – The optimization operation to apply.

Raises:

AssertionError – If ‘dont_use_locals’ attribute is True and the optimization operation is one of LOCAL, LASTLOCAL, GROUP, GROUPTOP, or UPCASTMID.

applied_opts

A list of previously applied optimization operations.

Type:

List[Opt]

dont_use_locals

If True, some optimization operations are not allowed.

Type:

bool

first_reduce

The index of the first reduction operation.

Type:

int

group_for_reduce

A list of groups for reduction operations.

Type:

list

apply_tensor_cores(use_tensor_cores=1, extra_opts: List[Opt] | None = None)[source]

Apply tensor cores to the computation.

use_tensor_cores

Flag indicating whether to apply tensor cores or not. Default is 1.

Type:

int

extra_opts

Optional list of extra options. Default is None.

Type:

Optional[List[Opt]]

This function checks if the following conditions are met for applying tensor cores: 1) use_tensor_cores flag is True. 2) The current device has local memory support. 3) Reduction operation exists and it’s a summation (ReduceOps.SUM). 4) The current device supports tensor cores.

If these conditions are met, the function iterates over all available tensor cores for the current device. It then checks if certain conditions hold true to apply tensor cores: 1) Tensor core architecture is compatible with the current system. 2) The reduction operation’s source is a LazyOp and its operation is UnaryOps.CAST with the correct dtype_out. 3) The multiplication operation (LazyOp with BinaryOps.MUL) exists and its sources are two LazyOps with BufferOps.LOAD operations and compatible dtypes with the tensor core configuration. 4) The strides of both source buffers for the multiplication operation are zero for the first reduction dimension. 5) The shape of the buffers is compatible with the tensor core dimensions.

If all these conditions are met, it selects the axes for buffer 0 and buffer 1 and applies tensor cores.

colored_shape(pad: int | None = None, dense=False) str[source]

Generate a string representation of the shape with each dimension colored according to its position.

Parameters:
  • pad (Optional[int]) – The number of spaces to pad the resulting string. If not provided, no padding is added.

  • dense (bool) – Whether or not to represent int dimensions in dense format (i.e., with 4 digits). Defaults to False.

Returns:

A string representation of the shape with each dimension colored according to its position.

Return type:

str

self.full_shape

The full shape of the object.

Type:

List[Union[int, str]]

self.colors

A function that returns a list of colors for each dimension.

Type:

Callable[[], List[str]]

ansilen

A function that calculates the length of a string in terminal characters.

Type:

Callable[[str], int]

colors() List[str][source]

Generate a list of color codes based on the dimensions of the object.

global_dims

Number of global dimensions

Type:

int

local_dims

Number of local dimensions

Type:

int

first_reduce

Index of the first reduce dimension

Type:

int

group_for_reduce

List of grouped dimensions for reduction

Type:

list

upcast_in_mid_reduce_axes

Set of axes where upcasting occurs during mid-reduction

Type:

set

shape_len

Length of the shape vector

Type:

int

upcasted

Number of upcasted dimensions

Type:

int

full_shape

Full shape of the object

Type:

list

sts

List of objects with shapes

Type:

list

Returns:

A list of color codes representing different types of dimensions.

Return type:

list

copy()[source]

Creates a deep copy of the current Kernel object. This can be useful for creating new kernels based on existing ones without modifying the original kernel.

Returns:

A deep copy of the current Kernel object.

property first_reduce: int

Calculate the index of the first reduction axis.

self.sts

A list of objects with a shape attribute.

Type:

List[SomeObject]

self.shape_len

The length of the shape attribute of an object in self.sts.

Type:

int

self.upcasted

The number of upcasted dimensions.

Type:

int

Returns:

The index of the first reduction axis.

Return type:

int

float4_axis(i: int)[source]

Compute the float4 axis.

Parameters:

i (int) – The index for which to compute the float4 axis.

Returns:

A list of integers representing the float4 axis.

Return type:

List[int]

property full_shape: Tuple[Node | int, ...]

Get the shape of the object at index self.full_buf_index in self.sts.

self.sts

A list of objects with a shape attribute.

Type:

List[SomeObject]

self.full_buf_index

The index of the object to get the shape from.

Type:

int

Returns:

The shape of the object at self.full_buf_index in self.sts.

Return type:

Tuple[sint, …]

property full_unupcasted_shape: Tuple[Node | int, ...]

Get the unupcasted shape of the object at index self.full_buf_index in self.sts.

self.full_shape

The shape of the object at self.full_buf_index in self.sts.

Type:

Tuple[sint, …]

self.upcasted

The number of upcasted dimensions.

Type:

int

Returns:

The unupcasted shape of the object at self.full_buf_index in self.sts.

Return type:

Tuple[sint, …]

get_upcast_dim(i: int) List[int][source]

Get dimensions that need to be upcasted.

i

The index to check for dimensions that need to be upcasted.

Type:

int

Returns:

A list of dimensions that need to be upcasted.

Return type:

List[int]

property global_dims: int

Calculate and return the difference between first_reduce and local_dims attributes.

self.first_reduce

The first reduced dimension.

Type:

int

self.local_dims

The local dimensions.

Type:

int

Returns:

The difference between self.first_reduce and self.local_dims.

Return type:

int

Notes

This method is a property, meaning it can be accessed like an attribute on an instance of the class. It’s important to note that there are eight chunks of the shape.

hand_coded_optimizations()[source]

This method handles the application of hand-coded optimizations.

MV_BLOCKSIZE

The block size for matrix-vector multiplication.

Type:

int

MV_THREADS_PER_ROW

The number of threads per row for matrix-vector multiplication.

Type:

int

MV_ROWS_PER_THREAD

The number of rows per thread for matrix-vector multiplication.

Type:

int

limit_dims_to_max(global_max: List[int], local_max: List[int])[source]

Limit dimensions to maximum allowed sizes.

Parameters:
  • global_max (List[int]) – List of maximum allowed global dimension sizes.

  • local_max (List[int]) – List of maximum allowed local dimension sizes.

property membufs: List[MemBuffer]

Membuffers attribute.

Returns:

A list of MemBuffer objects.

Return type:

List[MemBuffer]

property output_shape: Tuple[Node | int, ...]

Get the shape of the first object in self.sts.

self.sts

A list of objects with a shape attribute.

Type:

List[SomeObject]

Returns:

The shape of the first object in self.sts.

Return type:

Tuple[sint, …]

reshape_and_permute(new_shape_fxn, axis)[source]

Apply reshape and permute to all shapetrackers.

Parameters:
  • new_shape_fxn (function) – Function used for reshaping.

  • axis (int) – Axis for permutation.

Returns:

None

property shape_len: int

Get the length of the shape attribute of an object in self.sts.

self.sts

A list of objects with a shape attribute.

Type:

List[SomeObject]

Returns:

The length of the shape attribute of an object in self.sts.

Return type:

int

shape_offsets(i: int)[source]

Compute the offsets of the shape.

Parameters:

i (int) – The index for which to compute the offsets.

Returns:

An iterator that computes the cartesian product of input iterables.

Return type:

itertools.product

shift_to(axis, amount, top=False, insert_before=None)[source]

Shift elements to a specified location.

Parameters:
  • axis (int) – The axis to pull from.

  • amount (int) – The amount to take.

  • top (bool) – If you want to pull that amount from the top. Default is False.

  • insert_before (int) – Place to insert the new stuff. Default is None, which means end of list.

Returns:

None

simplify_merge_adjacent()[source]

Simplify by merging adjacent dimensions. This function checks if the shape_len is 0 and then proceeds to merge dimensions when possible. It also handles special cases for image dtypes and updates the shapes and strides accordingly.

Returns:

None

simplify_ones() bool[source]

Simplify by removing places where the shape is all ones. This function checks if the shape_len is 0 and then updates local_dims and upcasted values accordingly. It also reshapes and permutes the given shapes. The function returns True if any value in all_ones is True, else False.

Returns:

bool

upcast()[source]

Drop the final dimension.

Parameters:

None

Returns:

None

Raises:

AssertionError – If the final dimension size is 1, as it cannot be upcasted.

property upcast_in_mid_reduce_axes: List[int]

Get a list of indices where the dimensions are equal in both self.full_shape and self.sts[0].shape.

self.first_reduce

The index of the first reduction axis.

Type:

int

self.group_for_reduce

A list of integers representing groups for reduction.

Type:

List[int]

self.full_shape

The shape of the object at self.full_buf_index in self.sts.

Type:

Tuple[sint, …]

self.sts[0].shape

The shape of the first object in self.sts.

Type:

Tuple[sint, …]

Returns:

A list of indices where the dimensions are equal in both self.full_shape and self.sts[0].shape.

Return type:

List[int]

upcasted_axis(i: int)[source]

Compute the upcasted axis.

Parameters:

i (int) – The index for which to compute the upcasted axis.

Returns:

A list of tuples containing integers and a boolean value.

Return type:

List[Tuple[int, int, bool]]

class tinygrad.codegen.kernel.LinearizerOptions(device: str = '', supports_float4: bool = True, supports_float4_alu: bool = True, has_local: bool = True, has_shared: bool = True, global_max: List[int] | None = None, local_max: List[int] | None = None)[source]

Bases: NamedTuple

A named tuple for representing options related to linearizing memory accesses.

device

The target device for the linearization. Defaults to “”.

Type:

str, optional

supports_float4

Whether the target device supports float4 data type. Defaults to True.

Type:

bool, optional

supports_float4_alu

Whether the target device supports float4 ALU operations. Defaults to True.

Type:

bool, optional

has_local

Whether the target device has local memory. Defaults to True.

Type:

bool, optional

has_shared

Whether the target device has shared memory. Defaults to True.

Type:

bool, optional

global_max

The maximum global dimensions for linearization. Defaults to None.

Type:

Optional[List[int]], optional

local_max

The maximum local dimensions for linearization. Defaults to None.

Type:

Optional[List[int]], optional

device: str

Alias for field number 0

global_max: List[int] | None

Alias for field number 5

has_local: bool

Alias for field number 3

has_shared: bool

Alias for field number 4

local_max: List[int] | None

Alias for field number 6

supports_float4: bool

Alias for field number 1

supports_float4_alu: bool

Alias for field number 2

class tinygrad.codegen.kernel.LocalBuffer(name: str, size: int, dtype: DType = (10, 4, 'float', <class 'numpy.float32'>, 1), realized: None = None)[source]

Bases: NamedTuple

A named tuple for representing a local buffer in memory.

name

The name of the local buffer.

Type:

str

size

The size of the local buffer.

Type:

int

dtype

The data type of the elements in the local buffer. Defaults to dtypes.float32.

Type:

DType, optional

realized

A placeholder for future functionality. Defaults to None.

Type:

None, optional

dtype: DType

Alias for field number 2

name: str

Alias for field number 0

realized: None

Alias for field number 3

size: int

Alias for field number 1

class tinygrad.codegen.kernel.Opt(op: OptOps, axis: int | None = None, amt: int | None = None)[source]

Bases: object

Data class for operation options.

op

The operation to perform.

Type:

OptOps

axis

The axis along which the operation is performed. Defaults to None.

Type:

Optional[int]

amt

The amount or value used in the operation. Defaults to None.

Type:

Optional[int]

amt: int | None = None
axis: int | None = None
op: OptOps
class tinygrad.codegen.kernel.OptOps(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: Enum

This class represents an enumeration of optimization operations.

UPCAST

Represents the operation to upcast a data type.

Type:

auto()

UPCASTMID

Represents the operation to upcast data types in the middle of a sequence.

Type:

auto()

UNROLL

Represents the operation to unroll a loop.

Type:

auto()

LOCAL

Represents the operation to make a variable local.

Type:

auto()

LASTLOCAL

Represents the operation to make the last variable in a sequence local.

Type:

auto()

GROUP

Represents the operation to group variables.

Type:

auto()

GROUPTOP

Represents the operation to group variables at the top level.

Type:

auto()

NOLOCALS

Represents the operation to remove all local variables.

Type:

auto()

PADTO

Represents the operation to pad a sequence to a specific length.

Type:

auto()

GROUP = 6
GROUPTOP = 7
LASTLOCAL = 5
LOCAL = 4
NOLOCALS = 8
PADTO = 9
UNROLL = 3
UPCAST = 1
UPCASTMID = 2
class tinygrad.codegen.kernel.TensorCore(device: str, dims: List[int], dtype_in: DType, dtype_out: DType, threads: List[Tuple[int, int]], upcast_dim: int, thread_local_aliases: List[List[List[int]]], thread_local_sizes: List[int], arch: str | None = None)[source]

Bases: object

Data class for the Tensor Core.

device

The device on which the tensor core will be used.

Type:

str

dims

List of integers representing dimensions.

Type:

List[int]

dtype_in

Input data type.

Type:

DType

dtype_out

Output data type.

Type:

DType

threads

List of tuples where each tuple contains a TC dimension and an amount that constructs the warp thread structure.

Type:

List[Tuple[int, int]]

upcast_dim

The TC dimension to upcast.

Type:

int

thread_local_aliases

A list of lists of lists containing integers defining alias for each TC dimension. For example: [threads_1, …, threads_n, upcast_1(unrolled), upcast_2(upcast)] where 1 is warp threads, -1 is upcast, and 0 is unrolled.

Type:

List[List[List[int]]]

thread_local_sizes

List of integers representing the number of elements stored in registers for each TC dimension in each thread.

Type:

List[int]

arch

Optional architecture parameter. Default is None.

Type:

Optional[str]

arch: str | None = None
device: str
dims: List[int]
dtype_in: DType
dtype_out: DType
thread_local_aliases: List[List[List[int]]]
thread_local_sizes: List[int]
threads: List[Tuple[int, int]]
upcast_dim: int