tinygrad codegen.kernel
Note
You likely want the upstream tinygrad, not tinygrab. Tinygrab contains AI generated docstrings for a tinygrad snapshot. Upstream: https://tinygrad.org
- class tinygrad.codegen.kernel.Kernel(ast: LazyOp, opts: LinearizerOptions | None = None)[source]
Bases:
object
The Kernel class represents a single kernel in the linearizer. It contains information about the AST (Abstract Syntax Tree), options, and various buffers and shape trackers used during the linearization process. This class also provides methods for simplifying and optimizing the linearized code.
- opts
The options used during the linearization process.
- Type:
Optional[LinearizerOptions]
- info
Information about the floating-point operations in the kernel.
- Type:
- reduceop
The single allowed reduce operation in an AST, if it exists.
- Type:
Optional[Any]
- bufs
The list of unique buffers used by the kernel.
- Type:
List[Union[MemBuffer, ConstBuffer, LocalBuffer]]
- earlybufs
The list of buffers before the reduce operation, if any.
- Type:
List[Any]
- full_buf_index
The index of the buffer with all axes.
- Type:
int
- sts
The shape trackers for each buffer in the kernel.
- Type:
List[ShapeTracker]
- applied_opts
The list of optimization options that have been applied to the kernel.
- Type:
List[Opt]
- group_for_reduce
Unknown.
- Type:
List[int]
- upcasted
A flag indicating whether an upcast operation has been performed on the kernel.
- Type:
int
- local_dims
The number of local dimensions in the kernel.
- Type:
int
- local_alias
A dictionary mapping integers to local buffers.
- Type:
Dict[int, LocalBuffer]
- tensor_core
Information about the tensor core being used, if any.
- Type:
Optional[TensorCore]
- dont_use_locals
A flag indicating whether local buffers should be used in the kernel.
- Type:
bool
- applied_opts_cache
A cache of optimization options that have been applied to the kernel.
- Type:
Optional[List[Opt]]
- acc_offsets(i: int) List[int] [source]
Calculate access offsets for a given index.
- i
The index to calculate access offsets for.
- Type:
int
- Returns:
A list of calculated access offsets.
- Return type:
List[int]
- alias_buffer(i, pattern)[source]
Alias a buffer.
- Parameters:
i (int) – Index of the buffer to be aliased.
pattern (List) – List representing the pattern for each shape.
- apply_opt(opt: Opt)[source]
Apply an optimization to the current object.
This method checks if the optimization operation is applicable based on the ‘dont_use_locals’ attribute and the type of the operation. It then appends the optimization to a list of applied optimizations. The axis for the optimization is calculated based on certain conditions and defaulted to -1 if no specific axis is given.
- Parameters:
opt (Opt) – The optimization operation to apply.
- Raises:
AssertionError – If ‘dont_use_locals’ attribute is True and the optimization operation is one of LOCAL, LASTLOCAL, GROUP, GROUPTOP, or UPCASTMID.
- dont_use_locals
If True, some optimization operations are not allowed.
- Type:
bool
- first_reduce
The index of the first reduction operation.
- Type:
int
- group_for_reduce
A list of groups for reduction operations.
- Type:
list
- apply_tensor_cores(use_tensor_cores=1, extra_opts: List[Opt] | None = None)[source]
Apply tensor cores to the computation.
- use_tensor_cores
Flag indicating whether to apply tensor cores or not. Default is 1.
- Type:
int
This function checks if the following conditions are met for applying tensor cores: 1) use_tensor_cores flag is True. 2) The current device has local memory support. 3) Reduction operation exists and it’s a summation (ReduceOps.SUM). 4) The current device supports tensor cores.
If these conditions are met, the function iterates over all available tensor cores for the current device. It then checks if certain conditions hold true to apply tensor cores: 1) Tensor core architecture is compatible with the current system. 2) The reduction operation’s source is a LazyOp and its operation is UnaryOps.CAST with the correct dtype_out. 3) The multiplication operation (LazyOp with BinaryOps.MUL) exists and its sources are two LazyOps with BufferOps.LOAD operations and compatible dtypes with the tensor core configuration. 4) The strides of both source buffers for the multiplication operation are zero for the first reduction dimension. 5) The shape of the buffers is compatible with the tensor core dimensions.
If all these conditions are met, it selects the axes for buffer 0 and buffer 1 and applies tensor cores.
- colored_shape(pad: int | None = None, dense=False) str [source]
Generate a string representation of the shape with each dimension colored according to its position.
- Parameters:
pad (Optional[int]) – The number of spaces to pad the resulting string. If not provided, no padding is added.
dense (bool) – Whether or not to represent int dimensions in dense format (i.e., with 4 digits). Defaults to False.
- Returns:
A string representation of the shape with each dimension colored according to its position.
- Return type:
str
- self.full_shape
The full shape of the object.
- Type:
List[Union[int, str]]
- self.colors
A function that returns a list of colors for each dimension.
- Type:
Callable[[], List[str]]
- ansilen
A function that calculates the length of a string in terminal characters.
- Type:
Callable[[str], int]
- colors() List[str] [source]
Generate a list of color codes based on the dimensions of the object.
- global_dims
Number of global dimensions
- Type:
int
- local_dims
Number of local dimensions
- Type:
int
- first_reduce
Index of the first reduce dimension
- Type:
int
- group_for_reduce
List of grouped dimensions for reduction
- Type:
list
- upcast_in_mid_reduce_axes
Set of axes where upcasting occurs during mid-reduction
- Type:
set
- shape_len
Length of the shape vector
- Type:
int
- upcasted
Number of upcasted dimensions
- Type:
int
- full_shape
Full shape of the object
- Type:
list
- sts
List of objects with shapes
- Type:
list
- Returns:
A list of color codes representing different types of dimensions.
- Return type:
list
- copy()[source]
Creates a deep copy of the current Kernel object. This can be useful for creating new kernels based on existing ones without modifying the original kernel.
- Returns:
A deep copy of the current Kernel object.
- property first_reduce: int
Calculate the index of the first reduction axis.
- self.sts
A list of objects with a shape attribute.
- Type:
List[SomeObject]
- self.shape_len
The length of the shape attribute of an object in self.sts.
- Type:
int
- self.upcasted
The number of upcasted dimensions.
- Type:
int
- Returns:
The index of the first reduction axis.
- Return type:
int
- float4_axis(i: int)[source]
Compute the float4 axis.
- Parameters:
i (int) – The index for which to compute the float4 axis.
- Returns:
A list of integers representing the float4 axis.
- Return type:
List[int]
- property full_shape: Tuple[Node | int, ...]
Get the shape of the object at index self.full_buf_index in self.sts.
- self.sts
A list of objects with a shape attribute.
- Type:
List[SomeObject]
- self.full_buf_index
The index of the object to get the shape from.
- Type:
int
- Returns:
The shape of the object at self.full_buf_index in self.sts.
- Return type:
Tuple[sint, …]
- property full_unupcasted_shape: Tuple[Node | int, ...]
Get the unupcasted shape of the object at index self.full_buf_index in self.sts.
- self.full_shape
The shape of the object at self.full_buf_index in self.sts.
- Type:
Tuple[sint, …]
- self.upcasted
The number of upcasted dimensions.
- Type:
int
- Returns:
The unupcasted shape of the object at self.full_buf_index in self.sts.
- Return type:
Tuple[sint, …]
- get_upcast_dim(i: int) List[int] [source]
Get dimensions that need to be upcasted.
- i
The index to check for dimensions that need to be upcasted.
- Type:
int
- Returns:
A list of dimensions that need to be upcasted.
- Return type:
List[int]
- property global_dims: int
Calculate and return the difference between first_reduce and local_dims attributes.
- self.first_reduce
The first reduced dimension.
- Type:
int
- self.local_dims
The local dimensions.
- Type:
int
- Returns:
The difference between self.first_reduce and self.local_dims.
- Return type:
int
Notes
This method is a property, meaning it can be accessed like an attribute on an instance of the class. It’s important to note that there are eight chunks of the shape.
- hand_coded_optimizations()[source]
This method handles the application of hand-coded optimizations.
- MV_BLOCKSIZE
The block size for matrix-vector multiplication.
- Type:
int
- MV_THREADS_PER_ROW
The number of threads per row for matrix-vector multiplication.
- Type:
int
- MV_ROWS_PER_THREAD
The number of rows per thread for matrix-vector multiplication.
- Type:
int
- limit_dims_to_max(global_max: List[int], local_max: List[int])[source]
Limit dimensions to maximum allowed sizes.
- Parameters:
global_max (List[int]) – List of maximum allowed global dimension sizes.
local_max (List[int]) – List of maximum allowed local dimension sizes.
- property membufs: List[MemBuffer]
Membuffers attribute.
- Returns:
A list of MemBuffer objects.
- Return type:
List[MemBuffer]
- property output_shape: Tuple[Node | int, ...]
Get the shape of the first object in self.sts.
- self.sts
A list of objects with a shape attribute.
- Type:
List[SomeObject]
- Returns:
The shape of the first object in self.sts.
- Return type:
Tuple[sint, …]
- reshape_and_permute(new_shape_fxn, axis)[source]
Apply reshape and permute to all shapetrackers.
- Parameters:
new_shape_fxn (function) – Function used for reshaping.
axis (int) – Axis for permutation.
- Returns:
None
- property shape_len: int
Get the length of the shape attribute of an object in self.sts.
- self.sts
A list of objects with a shape attribute.
- Type:
List[SomeObject]
- Returns:
The length of the shape attribute of an object in self.sts.
- Return type:
int
- shape_offsets(i: int)[source]
Compute the offsets of the shape.
- Parameters:
i (int) – The index for which to compute the offsets.
- Returns:
An iterator that computes the cartesian product of input iterables.
- Return type:
itertools.product
- shift_to(axis, amount, top=False, insert_before=None)[source]
Shift elements to a specified location.
- Parameters:
axis (int) – The axis to pull from.
amount (int) – The amount to take.
top (bool) – If you want to pull that amount from the top. Default is False.
insert_before (int) – Place to insert the new stuff. Default is None, which means end of list.
- Returns:
None
- simplify_merge_adjacent()[source]
Simplify by merging adjacent dimensions. This function checks if the shape_len is 0 and then proceeds to merge dimensions when possible. It also handles special cases for image dtypes and updates the shapes and strides accordingly.
- Returns:
None
- simplify_ones() bool [source]
Simplify by removing places where the shape is all ones. This function checks if the shape_len is 0 and then updates local_dims and upcasted values accordingly. It also reshapes and permutes the given shapes. The function returns True if any value in all_ones is True, else False.
- Returns:
bool
- upcast()[source]
Drop the final dimension.
- Parameters:
None –
- Returns:
None
- Raises:
AssertionError – If the final dimension size is 1, as it cannot be upcasted.
- property upcast_in_mid_reduce_axes: List[int]
Get a list of indices where the dimensions are equal in both self.full_shape and self.sts[0].shape.
- self.first_reduce
The index of the first reduction axis.
- Type:
int
- self.group_for_reduce
A list of integers representing groups for reduction.
- Type:
List[int]
- self.full_shape
The shape of the object at self.full_buf_index in self.sts.
- Type:
Tuple[sint, …]
- self.sts[0].shape
The shape of the first object in self.sts.
- Type:
Tuple[sint, …]
- Returns:
A list of indices where the dimensions are equal in both self.full_shape and self.sts[0].shape.
- Return type:
List[int]
- class tinygrad.codegen.kernel.LinearizerOptions(device: str = '', supports_float4: bool = True, supports_float4_alu: bool = True, has_local: bool = True, has_shared: bool = True, global_max: List[int] | None = None, local_max: List[int] | None = None)[source]
Bases:
NamedTuple
A named tuple for representing options related to linearizing memory accesses.
- device
The target device for the linearization. Defaults to “”.
- Type:
str, optional
- supports_float4
Whether the target device supports float4 data type. Defaults to True.
- Type:
bool, optional
- supports_float4_alu
Whether the target device supports float4 ALU operations. Defaults to True.
- Type:
bool, optional
- has_local
Whether the target device has local memory. Defaults to True.
- Type:
bool, optional
Whether the target device has shared memory. Defaults to True.
- Type:
bool, optional
- global_max
The maximum global dimensions for linearization. Defaults to None.
- Type:
Optional[List[int]], optional
- local_max
The maximum local dimensions for linearization. Defaults to None.
- Type:
Optional[List[int]], optional
- device: str
Alias for field number 0
- has_local: bool
Alias for field number 3
- has_shared: bool
Alias for field number 4
- supports_float4: bool
Alias for field number 1
- supports_float4_alu: bool
Alias for field number 2
- class tinygrad.codegen.kernel.LocalBuffer(name: str, size: int, dtype: DType = (10, 4, 'float', <class 'numpy.float32'>, 1), realized: None = None)[source]
Bases:
NamedTuple
A named tuple for representing a local buffer in memory.
- name
The name of the local buffer.
- Type:
str
- size
The size of the local buffer.
- Type:
int
- dtype
The data type of the elements in the local buffer. Defaults to dtypes.float32.
- Type:
DType, optional
- realized
A placeholder for future functionality. Defaults to None.
- Type:
None, optional
- name: str
Alias for field number 0
- size: int
Alias for field number 1
- class tinygrad.codegen.kernel.Opt(op: OptOps, axis: int | None = None, amt: int | None = None)[source]
Bases:
object
Data class for operation options.
- axis
The axis along which the operation is performed. Defaults to None.
- Type:
Optional[int]
- amt
The amount or value used in the operation. Defaults to None.
- Type:
Optional[int]
- class tinygrad.codegen.kernel.OptOps(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]
Bases:
Enum
This class represents an enumeration of optimization operations.
- UPCAST
Represents the operation to upcast a data type.
- Type:
auto()
- UPCASTMID
Represents the operation to upcast data types in the middle of a sequence.
- Type:
auto()
- UNROLL
Represents the operation to unroll a loop.
- Type:
auto()
- LOCAL
Represents the operation to make a variable local.
- Type:
auto()
- LASTLOCAL
Represents the operation to make the last variable in a sequence local.
- Type:
auto()
- GROUP
Represents the operation to group variables.
- Type:
auto()
- GROUPTOP
Represents the operation to group variables at the top level.
- Type:
auto()
- NOLOCALS
Represents the operation to remove all local variables.
- Type:
auto()
- PADTO
Represents the operation to pad a sequence to a specific length.
- Type:
auto()
- GROUP = 6
- GROUPTOP = 7
- LASTLOCAL = 5
- LOCAL = 4
- NOLOCALS = 8
- PADTO = 9
- UNROLL = 3
- UPCAST = 1
- UPCASTMID = 2
- class tinygrad.codegen.kernel.TensorCore(device: str, dims: List[int], dtype_in: DType, dtype_out: DType, threads: List[Tuple[int, int]], upcast_dim: int, thread_local_aliases: List[List[List[int]]], thread_local_sizes: List[int], arch: str | None = None)[source]
Bases:
object
Data class for the Tensor Core.
- device
The device on which the tensor core will be used.
- Type:
str
- dims
List of integers representing dimensions.
- Type:
List[int]
- threads
List of tuples where each tuple contains a TC dimension and an amount that constructs the warp thread structure.
- Type:
List[Tuple[int, int]]
- upcast_dim
The TC dimension to upcast.
- Type:
int
- thread_local_aliases
A list of lists of lists containing integers defining alias for each TC dimension. For example: [threads_1, …, threads_n, upcast_1(unrolled), upcast_2(upcast)] where 1 is warp threads, -1 is upcast, and 0 is unrolled.
- Type:
List[List[List[int]]]
- thread_local_sizes
List of integers representing the number of elements stored in registers for each TC dimension in each thread.
- Type:
List[int]
- arch
Optional architecture parameter. Default is None.
- Type:
Optional[str]
- device: str
- dims: List[int]
- thread_local_aliases: List[List[List[int]]]
- thread_local_sizes: List[int]
- threads: List[Tuple[int, int]]
- upcast_dim: int