tinygrad renderer.cstyle

Note

You likely want the upstream tinygrad, not tinygrab. Tinygrab contains AI generated docstrings for a tinygrad snapshot. Upstream: https://tinygrad.org

class tinygrad.renderer.cstyle.CStyleLanguage(size_prefix: str = 'int', generic_var_prefix: str = '', kernel_prefix: str = '', buffer_prefix: str = '', buffer_suffix: str = '', smem_align: str = '', smem_prefix: str = '', smem_prefix_for_cast: bool = True, arg_int_prefix: str = '', barrier: str = '', xid: ~typing.List[str] = [], gid: ~typing.List[str] = [], lid: ~typing.List[str] = [], global_max: ~typing.List[int] = [], local_max: ~typing.List[int] = [], extra_args: ~typing.List[str] = [], float4: str | None = None, half_prekernel: str | None = None, uses_vload: bool = False, external_local_bufs: bool = False, uses_ptr_arithmetic: bool = False, launch_bounds: bool = False, code_for_op: ~typing.Dict = {BinaryOps.ADD: <function CStyleLanguage.<lambda>>, BinaryOps.CMPLT: <function CStyleLanguage.<lambda>>, BinaryOps.DIV: <function CStyleLanguage.<lambda>>, BinaryOps.MAX: <function CStyleLanguage.<lambda>>, BinaryOps.MOD: <function CStyleLanguage.<lambda>>, BinaryOps.MUL: <function CStyleLanguage.<lambda>>, BinaryOps.SUB: <function CStyleLanguage.<lambda>>, TernaryOps.MULACC: <function CStyleLanguage.<lambda>>, TernaryOps.WHERE: <function CStyleLanguage.<lambda>>, UnaryOps.EXP2: <function CStyleLanguage.<lambda>>, UnaryOps.LOG2: <function CStyleLanguage.<lambda>>, UnaryOps.NEG: <function CStyleLanguage.<lambda>>, UnaryOps.SIN: <function CStyleLanguage.<lambda>>, UnaryOps.SQRT: <function CStyleLanguage.<lambda>>})[source]

Bases: NamedTuple

Class representing a C-style programming language.

size_prefix

Prefix for size. Defaults to “int”.

Type:

str

generic_var_prefix

Prefix for generic variables. Defaults to empty string.

Type:

str

kernel_prefix

Prefix for kernels. Defaults to empty string.

Type:

str

buffer_prefix

Prefix for buffers. Defaults to empty string.

Type:

str

buffer_suffix

Suffix for buffers. Defaults to empty string.

Type:

str

smem_align

Alignment for shared memory. Defaults to empty string.

Type:

str

smem_prefix

Prefix for shared memory. Defaults to empty string.

Type:

str

smem_prefix_for_cast

Indicates whether prefix should be used in casts. Defaults to True.

Type:

bool

arg_int_prefix

Prefix for integer arguments. Defaults to empty string.

Type:

str

barrier

Barrier synchronization method. Defaults to empty string.

Type:

str

xid

List of x identifiers. Defaults to empty list.

Type:

List[str]

gid

List of g identifiers. Defaults to empty list.

Type:

List[str]

lid

List of l identifiers. Defaults to empty list.

Type:

List[str]

global_max

List of maximum global values. Defaults to empty list.

Type:

List[int]

local_max

List of maximum local values. Defaults to empty list.

Type:

List[int]

extra_args

List of extra arguments. Defaults to empty list.

Type:

List[str]

float4

Float4 value with None as default.

Type:

Optional[str]

half_prekernel

Half pre-kernel value with None as default.

Type:

Optional[str]

uses_vload

Indicates whether vload is used. Defaults to False.

Type:

bool

external_local_bufs

Indicates whether external local buffers are used. Defaults to False.

Type:

bool

uses_ptr_arithmetic

Indicates whether pointer arithmetic is used. Defaults to False.

Type:

bool

launch_bounds

Indicates whether launch bounds are used. Defaults to False.

Type:

bool

code_for_op

Dictionary containing operations for unary, binary, and ternary ops.

Type:

Dict

arg_int_prefix: str

Alias for field number 8

barrier: str

Alias for field number 9

buffer_prefix: str

Alias for field number 3

buffer_suffix: str

Alias for field number 4

code_for_op: Dict

Alias for field number 22

external_local_bufs: bool

Alias for field number 19

extra_args: List[str]

Alias for field number 15

float4: str | None

Alias for field number 16

generic_var_prefix: str

Alias for field number 1

gid: List[str]

Alias for field number 11

global_max: List[int]

Alias for field number 13

half_prekernel: str | None

Alias for field number 17

kernel_prefix: str

Alias for field number 2

launch_bounds: bool

Alias for field number 21

lid: List[str]

Alias for field number 12

local_max: List[int]

Alias for field number 14

render_cast(x: List[str], var_dtype: DType) str[source]

Returns a string expression of the casted xs with the given type.

self

The instance of the class.

Type:

Any

x

A list of strings to be casted.

Type:

List[str]

var_dtype

The datatype into which the elements of x are to be casted.

Type:

DType

Returns:

String expression of the casted xs with the given type.

Return type:

str

Raises:

AssertionError – If length of x is not equal to var_dtype.sz or if float4 attribute is None.

render_conditional(cond: str, x: str, y: str) str[source]

Render a conditional expression.

self

The object itself

Type:

Any

cond

The condition of the conditional expression

Type:

str

x

The value to return if the condition is true

Type:

str

y

The value to return if the condition is false

Type:

str

Returns:

The rendered conditional expression

Return type:

str

render_const(x: float | int | bool, var_dtype) str[source]

Returns a string expression of the constant with the given type.

x

The input value

Type:

Union[float, int, bool]

var_dtype

The data type of the variable

Returns:

A string representation of the constant

Return type:

str

render_for(expr: str, _min: int | str, _max: int | str) str[source]

Render a for loop.

self

The object itself

Type:

Any

expr

The loop variable expression

Type:

str

_min

The start value of the loop variable

Type:

Union[int, str]

_max

The end value of the loop variable

Type:

Union[int, str]

Returns:

The rendered for loop

Return type:

str

render_if(cond: str)[source]

Render an if statement.

self

The object itself

Type:

Any

cond

The condition of the if statement

Type:

str

Returns:

The rendered if statement

Return type:

str

render_kernel(function_name: str, kernel: List[str], bufs: List[Tuple[str, DType]], local_size: List[int], prekernel: List[str]) str[source]

Render the kernel with given parameters.

This function generates a complete OpenCL kernel program based on the provided arguments. It creates the necessary boilerplate code for the kernel, including sampler creation and buffer type definitions, then concatenates the actual kernel code and post-processing code if needed.

self

The instance of the OpenCLKernelBuilder class.

Type:

OpenCLKernelBuilder

function_name

The name of the kernel function.

Type:

str

kernel

The list of strings that form the body of the kernel function.

Type:

List[str]

bufs

A list of tuples where each tuple contains a buffer name and its data type.

Type:

List[Tuple[str, DType]]

local_size

The local size for the kernel execution.

Type:

List[int]

prekernel

Pre-kernel code that will be inserted before the actual kernel code.

Type:

List[str]

Returns:

The complete OpenCL kernel program as a string.

Return type:

str

render_load(output_dtype, buf_name, buf_dtype, idx, local=False) str[source]

Returns a string expression of the loaded value with the output type.

output_dtype

The output data type

buf_name

The buffer name

Type:

str

buf_dtype

The data type of the buffer

idx

The index in the buffer

Type:

int

local

Whether the buffer is local or not

Type:

bool

Returns:

A string representation of the loaded value

Return type:

str

render_local(name: str, size: int)[source]

Render the local memory variable declaration.

self

The object itself

Type:

Any

name

The name of the local memory variable

Type:

str

size

The size of the local memory variable array

Type:

int

Returns:

The rendered local memory variable declaration

Return type:

str

render_store(buf_name: str, buf_dtype: DType, var_name: str, var_dtype: DType, idx: str, local=False) str[source]

Returns a string statement that performs the store operation.

self

The instance of the class.

Type:

Any

buf_name

The buffer name.

Type:

str

buf_dtype

The buffer data type.

Type:

DType

var_name

The variable name.

Type:

str

var_dtype

The variable data type.

Type:

DType

idx

Index of the operation.

Type:

str

local

Whether the operation is local or not. Defaults to False.

Type:

bool, optional

Returns:

String statement that performs the store operation based on the given parameters.

Return type:

str

size_prefix: str

Alias for field number 0

smem_align: str

Alias for field number 5

smem_prefix: str

Alias for field number 6

smem_prefix_for_cast: bool

Alias for field number 7

uses_ptr_arithmetic: bool

Alias for field number 20

uses_vload: bool

Alias for field number 18

xid: List[str]

Alias for field number 10

class tinygrad.renderer.cstyle.CUDALanguage(size_prefix: str = 'int', generic_var_prefix: str = '', kernel_prefix: str = '', buffer_prefix: str = '', buffer_suffix: str = '', smem_align: str = '', smem_prefix: str = '', smem_prefix_for_cast: bool = True, arg_int_prefix: str = '', barrier: str = '', xid: ~typing.List[str] = [], gid: ~typing.List[str] = [], lid: ~typing.List[str] = [], global_max: ~typing.List[int] = [], local_max: ~typing.List[int] = [], extra_args: ~typing.List[str] = [], float4: str | None = None, half_prekernel: str | None = None, uses_vload: bool = False, external_local_bufs: bool = False, uses_ptr_arithmetic: bool = False, launch_bounds: bool = False, code_for_op: ~typing.Dict = {BinaryOps.ADD: <function CStyleLanguage.<lambda>>, BinaryOps.CMPLT: <function CStyleLanguage.<lambda>>, BinaryOps.DIV: <function CStyleLanguage.<lambda>>, BinaryOps.MAX: <function CStyleLanguage.<lambda>>, BinaryOps.MOD: <function CStyleLanguage.<lambda>>, BinaryOps.MUL: <function CStyleLanguage.<lambda>>, BinaryOps.SUB: <function CStyleLanguage.<lambda>>, TernaryOps.MULACC: <function CStyleLanguage.<lambda>>, TernaryOps.WHERE: <function CStyleLanguage.<lambda>>, UnaryOps.EXP2: <function CStyleLanguage.<lambda>>, UnaryOps.LOG2: <function CStyleLanguage.<lambda>>, UnaryOps.NEG: <function CStyleLanguage.<lambda>>, UnaryOps.SIN: <function CStyleLanguage.<lambda>>, UnaryOps.SQRT: <function CStyleLanguage.<lambda>>})[source]

Bases: CStyleLanguage

The CUDALanguage class is a subclass of the CStyleLanguage class, specifically tailored for CUDA GPU programming.

kernel_prefix

Prefix for defining kernels in CUDA.

Type:

str

smem_prefix

Prefix for shared memory variables in CUDA.

Type:

str

smem_prefix_for_cast

Flag to indicate whether prefix should be used with casting for shared memory.

Type:

bool

arg_int_prefix

Prefix for integer arguments in CUDA kernel functions.

Type:

str

barrier

Code snippet for thread synchronization in CUDA.

Type:

str

float4

Function name for creating a 4-component floating point number in CUDA.

Type:

str

gid

List of strings representing the global index in each dimension (x, y, z).

Type:

list

lid

List of strings representing the local index in each dimension (x, y, z).

Type:

list

xid

List of strings representing the combined global and local indices for each dimension (x, y, z).

Type:

list

code_for_op

Dictionary mapping binary operations to their corresponding CUDA code.

Type:

dict

half_prekernel

Pre-kernel code needed when working with half-precision floating point numbers in CUDA.

Type:

str

arg_int_prefix: str = 'const int'
barrier: str = '__syncthreads();'
code_for_op: Dict = {BinaryOps.ADD: <function CStyleLanguage.<lambda>>, BinaryOps.CMPLT: <function CStyleLanguage.<lambda>>, BinaryOps.DIV: <function CStyleLanguage.<lambda>>, BinaryOps.MAX: <function CUDALanguage.<lambda>>, BinaryOps.MOD: <function CStyleLanguage.<lambda>>, BinaryOps.MUL: <function CStyleLanguage.<lambda>>, BinaryOps.SUB: <function CStyleLanguage.<lambda>>, TernaryOps.MULACC: <function CStyleLanguage.<lambda>>, TernaryOps.WHERE: <function CStyleLanguage.<lambda>>, UnaryOps.EXP2: <function CStyleLanguage.<lambda>>, UnaryOps.LOG2: <function CStyleLanguage.<lambda>>, UnaryOps.NEG: <function CStyleLanguage.<lambda>>, UnaryOps.SIN: <function CStyleLanguage.<lambda>>, UnaryOps.SQRT: <function CStyleLanguage.<lambda>>}
float4: str | None = 'make_float4'
gid: List[str] = ['blockIdx.x', 'blockIdx.y', 'blockIdx.z']
half_prekernel: str | None = '\n    #include <cuda_fp16.h>\n    struct half4 { half x, y, z, w; };\n    __device__ half4 make_half4(half x, half y, half z, half w) { half4 ret; ret.x = x; ret.y = y; ret.z = z; ret.w = w; return ret; }\n  '
kernel_prefix: str = '#define INFINITY (__int_as_float(0x7f800000))\n#define NAN (__int_as_float(0x7fffffff))\nextern "C" __global__ '
lid: List[str] = ['threadIdx.x', 'threadIdx.y', 'threadIdx.z']
smem_prefix: str = '__shared__ '
smem_prefix_for_cast: bool = False
xid: List[str] = ['(blockIdx.x*blockDim.x+threadIdx.x)', '(blockIdx.y*blockDim.y+threadIdx.y)', '(blockIdx.z*blockDim.z+threadIdx.z)']
tinygrad.renderer.cstyle.CUDARenderer(function_name: str, uops: List[UOp]) Tuple[str, Dict]

Converts a list of micro-operations (uops) to the specified C-style language.

Parameters:
  • lang (CStyleLanguage) – The target C-style programming language for conversion.

  • function_name (str) – The name of the function being converted.

  • uops (List[UOp]) – A list of micro-operations to convert.

Returns:

A tuple containing the converted C-style code and a dictionary.

Return type:

Tuple[str, Dict]

Attributes:
  • local_size (List[int]): Holds the size of local variables.

  • kernel, prekernel, bufs (List[]): Lists for storing generated code and buffers.

  • depth (int): The indentation level for the generated code. Default is 1.

  • kk: A helper function to append lines of code with proper indentation.

  • c, r (DefaultDict[str, int], Dict): Counters and mappings for temporary variables.

  • ssa: A helper function to generate single static assignment (SSA) form representations.

  • child_count (DefaultDict[UOp, int]): A counter for the number of children each uop has.

class tinygrad.renderer.cstyle.HIPLanguage(size_prefix: str = 'int', generic_var_prefix: str = '', kernel_prefix: str = '', buffer_prefix: str = '', buffer_suffix: str = '', smem_align: str = '', smem_prefix: str = '', smem_prefix_for_cast: bool = True, arg_int_prefix: str = '', barrier: str = '', xid: ~typing.List[str] = [], gid: ~typing.List[str] = [], lid: ~typing.List[str] = [], global_max: ~typing.List[int] = [], local_max: ~typing.List[int] = [], extra_args: ~typing.List[str] = [], float4: str | None = None, half_prekernel: str | None = None, uses_vload: bool = False, external_local_bufs: bool = False, uses_ptr_arithmetic: bool = False, launch_bounds: bool = False, code_for_op: ~typing.Dict = {BinaryOps.ADD: <function CStyleLanguage.<lambda>>, BinaryOps.CMPLT: <function CStyleLanguage.<lambda>>, BinaryOps.DIV: <function CStyleLanguage.<lambda>>, BinaryOps.MAX: <function CStyleLanguage.<lambda>>, BinaryOps.MOD: <function CStyleLanguage.<lambda>>, BinaryOps.MUL: <function CStyleLanguage.<lambda>>, BinaryOps.SUB: <function CStyleLanguage.<lambda>>, TernaryOps.MULACC: <function CStyleLanguage.<lambda>>, TernaryOps.WHERE: <function CStyleLanguage.<lambda>>, UnaryOps.EXP2: <function CStyleLanguage.<lambda>>, UnaryOps.LOG2: <function CStyleLanguage.<lambda>>, UnaryOps.NEG: <function CStyleLanguage.<lambda>>, UnaryOps.SIN: <function CStyleLanguage.<lambda>>, UnaryOps.SQRT: <function CStyleLanguage.<lambda>>})[source]

Bases: CStyleLanguage

HIPLanguage class that inherits from CStyleLanguage.

kernel_prefix

Kernel prefix containing various function definitions and necessary includes.

Type:

str

launch_bounds

A boolean indicating if the language supports launch bounds.

Type:

bool

smem_prefix

The shared memory prefix for the language.

Type:

str

smem_prefix_for_cast

A boolean indicating whether a cast is required for the shared memory prefix.

Type:

bool

barrier

The barrier synchronization primitive for the language.

Type:

str

float4

The name of the float4 type for the language.

Type:

str

uses_vload

Whether the language uses vload.

Type:

bool

uses_ptr_arithmetic

Whether the language uses pointer arithmetic.

Type:

bool

arg_int_prefix

The integer prefix for function arguments.

Type:

str

gid

A list comprising strings representing the grid index in three dimensions.

Type:

list

lid

A list comprising strings representing the local thread index in three dimensions.

Type:

list

xid

A list comprising strings representing the extended thread index in three dimensions.

Type:

list

code_for_op

A dictionary containing lambda functions for different operations and their

Type:

dict

corresponding implementations based on data type.
arg_int_prefix: str = 'const int'
barrier: str = '__syncthreads();'
code_for_op: Dict = {BinaryOps.ADD: <function CStyleLanguage.<lambda>>, BinaryOps.CMPLT: <function CStyleLanguage.<lambda>>, BinaryOps.DIV: <function CStyleLanguage.<lambda>>, BinaryOps.MAX: <function HIPLanguage.<lambda>>, BinaryOps.MOD: <function CStyleLanguage.<lambda>>, BinaryOps.MUL: <function CStyleLanguage.<lambda>>, BinaryOps.SUB: <function CStyleLanguage.<lambda>>, TernaryOps.MULACC: <function CStyleLanguage.<lambda>>, TernaryOps.WHERE: <function HIPLanguage.<lambda>>, UnaryOps.EXP2: <function CStyleLanguage.<lambda>>, UnaryOps.LOG2: <function CStyleLanguage.<lambda>>, UnaryOps.NEG: <function CStyleLanguage.<lambda>>, UnaryOps.SIN: <function CStyleLanguage.<lambda>>, UnaryOps.SQRT: <function CStyleLanguage.<lambda>>}
float4: str | None = 'make_float4'
gid: List[str] = ['blockIdx.x', 'blockIdx.y', 'blockIdx.z']
half_prekernel: str | None = '#include <hip/hip_fp16.h>\n\ntypedef union { struct { half x, y, z, w; } __attribute__((aligned(8))); half data[4]; } half4; __device__ half4 make_half4(half x, half y, half z, half w) { return {x, y, z, w}; }\ntypedef union { struct { half x, y, z, w, a, b, c, d; } __attribute__((aligned(16))); half data[8]; } half8; __device__ half8 make_half8(half x, half y, half z, half w, half a, half b, half c, half d) { return {x, y, z, w, a, b, c, d}; }\n typedef _Float16 half16 __attribute__((ext_vector_type(16))); __device__ half16 make_half16(half x, half y, half z, half w, half a, half b, half c, half d, half e, half f, half g, half h, half i, half j, half k, half l) { return {x, y, z, w, a, b, c, d, e, f, g, h, i, j, k, l}; }\n__device__ float vload_half(size_t offset, const half *p) { return (float)*(p + offset); }\n__device__ float2 vload_half2(size_t offset, const half *p) { return make_float2((float)*(p + offset*2), (float)*(p + offset*2 + 1)); }\n__device__ float4 vload_half4(size_t offset, const half *p) { return make_float4((float)*(p + offset*4), (float)*(p + offset*4 + 1), (float)*(p + offset*4 + 2), (float)*(p + offset*4 + 3)); }\n__device__ void vstore_half(float data, size_t offset, half *p) { *(p + offset) = (half)data; }\n__device__ void vstore_half2(float2 data, size_t offset, half *p) { *(p + offset*2) = (half)data.x; *(p + offset*2 + 1) = (half)data.y; }\n__device__ void vstore_half4(float4 data, size_t offset, half *p) { *(p + offset*4) = (half)data.x; *(p + offset*4 + 1) = (half)data.y; *(p + offset*4 + 2) = (half)data.z; *(p + offset*4 + 3) = (half)data.w; }\n__device__ half exp2(half x) { return hexp2(x); }\n__device__ half log2(half x) { return hlog2(x); }\n__device__ half sin(half x) { return hsin(x); }\n__device__ half sqrt(half x) { return hsqrt(x); }\n__device__ half hmax(half a, half b) { return __hgt(a, b) ? a : b; }\n__device__ half operator%(const half &a, const half &b) { return __hsub(a, __hmul(b, __float2half(floorf(__half2float(a) / __half2float(b))))); }\n__device__ bool operator!=(const half &a, const int &b) { return (float)a != b; }\n\n// HACKS for ALU ops on half and result of half2 GEP\n__device__ half operator+(const half &a, const unsigned short &b) { return __hadd(a, (half)(b)); }\n__device__ half operator-(const half &a, const unsigned short &b) { return __hsub(a, (half)(b)); }\n__device__ half operator*(const half &a, const unsigned short &b) { return __hmul(a, (half)(b)); }\n__device__ half operator/(const half &a, const unsigned short &b) { return __hdiv(a, (half)(b)); }\n__device__ bool operator<(const half &a, const unsigned short &b) { return __hlt(a, (half)(b)); }\n// now the other way\n__device__ half operator+(const unsigned short &a, const half &b) { return __hadd((half)(a), b); }\n__device__ half operator-(const unsigned short &a, const half &b) { return __hsub((half)(a), b); }\n__device__ half operator*(const unsigned short &a, const half &b) { return __hmul((half)(a), b); }\n__device__ half operator/(const unsigned short &a, const half &b) { return __hdiv((half)(a), b); }\n__device__ bool operator<(const unsigned short &a, const half &b) { return __hlt((half)(a), b); }\n  '
kernel_prefix: str = '#include <hip/hip_common.h>\n#define INFINITY (__builtin_inff())\n#define NAN (__builtin_nanf(""))\n  __device__ float4 max(float4 x, float4 y) { return float4(max(x.x, y.x), max(x.y, y.y), max(x.z, y.z), max(x.w, y.w)); }\n  __device__ float4 pow(float x, float4 y) { return float4(pow(x, y.x), pow(x, y.y), pow(x, y.z), pow(x, y.w)); }\n  __device__ float4 pow(float4 x, float4 y) { return float4(pow(x.x, y.x), pow(x.y, y.y), pow(x.z, y.z), pow(x.w, y.w)); }\n  __device__ float4 log2(float4 x) { return float4(log2(x.x), log2(x.y), log2(x.z), log2(x.w)); }\n  __device__ float4 exp2(float4 x) { return float4(exp2(x.x), exp2(x.y), exp2(x.z), exp2(x.w)); }\n  __device__ float4 sin(float4 x) { return float4(sin(x.x), sin(x.y), sin(x.z), sin(x.w)); }\n  typedef float float8 __attribute__((ext_vector_type(8))); __device__ float8 make_float8(float x, float y, float z, float w, float a, float b, float c, float d) { return {x, y, z, w, a, b, c, d}; }\n  extern "C" __global__\n  '
launch_bounds: bool = True
lid: List[str] = ['threadIdx.x', 'threadIdx.y', 'threadIdx.z']
smem_prefix: str = '__shared__ '
smem_prefix_for_cast: bool = False
uses_ptr_arithmetic: bool = True
uses_vload: bool = True
xid: List[str] = ['(blockIdx.x*blockDim.x+threadIdx.x)', '(blockIdx.y*blockDim.y+threadIdx.y)', '(blockIdx.z*blockDim.z+threadIdx.z)']
tinygrad.renderer.cstyle.HIPRenderer(function_name: str, uops: List[UOp]) Tuple[str, Dict]

Converts a list of micro-operations (uops) to the specified C-style language.

Parameters:
  • lang (CStyleLanguage) – The target C-style programming language for conversion.

  • function_name (str) – The name of the function being converted.

  • uops (List[UOp]) – A list of micro-operations to convert.

Returns:

A tuple containing the converted C-style code and a dictionary.

Return type:

Tuple[str, Dict]

Attributes:
  • local_size (List[int]): Holds the size of local variables.

  • kernel, prekernel, bufs (List[]): Lists for storing generated code and buffers.

  • depth (int): The indentation level for the generated code. Default is 1.

  • kk: A helper function to append lines of code with proper indentation.

  • c, r (DefaultDict[str, int], Dict): Counters and mappings for temporary variables.

  • ssa: A helper function to generate single static assignment (SSA) form representations.

  • child_count (DefaultDict[UOp, int]): A counter for the number of children each uop has.

class tinygrad.renderer.cstyle.MetalLanguage(size_prefix: str = 'int', generic_var_prefix: str = '', kernel_prefix: str = '', buffer_prefix: str = '', buffer_suffix: str = '', smem_align: str = '', smem_prefix: str = '', smem_prefix_for_cast: bool = True, arg_int_prefix: str = '', barrier: str = '', xid: ~typing.List[str] = [], gid: ~typing.List[str] = [], lid: ~typing.List[str] = [], global_max: ~typing.List[int] = [], local_max: ~typing.List[int] = [], extra_args: ~typing.List[str] = [], float4: str | None = None, half_prekernel: str | None = None, uses_vload: bool = False, external_local_bufs: bool = False, uses_ptr_arithmetic: bool = False, launch_bounds: bool = False, code_for_op: ~typing.Dict = {BinaryOps.ADD: <function CStyleLanguage.<lambda>>, BinaryOps.CMPLT: <function CStyleLanguage.<lambda>>, BinaryOps.DIV: <function CStyleLanguage.<lambda>>, BinaryOps.MAX: <function CStyleLanguage.<lambda>>, BinaryOps.MOD: <function CStyleLanguage.<lambda>>, BinaryOps.MUL: <function CStyleLanguage.<lambda>>, BinaryOps.SUB: <function CStyleLanguage.<lambda>>, TernaryOps.MULACC: <function CStyleLanguage.<lambda>>, TernaryOps.WHERE: <function CStyleLanguage.<lambda>>, UnaryOps.EXP2: <function CStyleLanguage.<lambda>>, UnaryOps.LOG2: <function CStyleLanguage.<lambda>>, UnaryOps.NEG: <function CStyleLanguage.<lambda>>, UnaryOps.SIN: <function CStyleLanguage.<lambda>>, UnaryOps.SQRT: <function CStyleLanguage.<lambda>>})[source]

Bases: CStyleLanguage

MetalLanguage Class: Inherits from CStyleLanguage. Defines Metal language specific attributes and methods.

Attributes:

kernel_prefix (str): Prefix for kernel functions. Default is “#include <metal_stdlib>

using namespace metal; kernel “.

buffer_prefix (str): Prefix for device buffers. Default is “device “. smem_prefix (str): Prefix for threadgroup shared memory. Default is “threadgroup “. arg_int_prefix (str): Prefix for constant integer arguments. Default is “constant int&”. barrier (str): Thread group barrier for synchronization. Default is “threadgroup_barrier(mem_flags::mem_threadgroup);”. float4 (str): Data type for four component floating point values. Default is “float4”. uses_ptr_arithmetic (bool): Indicates whether the language requires pointer arithmetic. Default is True. gid (list[str]): List of strings representing global thread IDs. Generated from range 3. lid (list[str]): List of strings representing local thread IDs. Generated from range 3. extra_args (list[str]): Additional arguments required by Metal language. Default is [“uint3 gid [[threadgroup_position_in_grid]]”, “uint3 lid [[thread_position_in_threadgroup]]”].

arg_int_prefix: str = 'constant int&'
barrier: str = 'threadgroup_barrier(mem_flags::mem_threadgroup);'
buffer_prefix: str = 'device '
extra_args: List[str] = ['uint3 gid [[threadgroup_position_in_grid]]', 'uint3 lid [[thread_position_in_threadgroup]]']
float4: str | None = 'float4'
gid: List[str] = ['gid.x', 'gid.y', 'gid.z']
kernel_prefix: str = '#include <metal_stdlib>\nusing namespace metal;\nkernel '
lid: List[str] = ['lid.x', 'lid.y', 'lid.z']
smem_prefix: str = 'threadgroup '
uses_ptr_arithmetic: bool = True
tinygrad.renderer.cstyle.MetalRenderer(function_name: str, uops: List[UOp]) Tuple[str, Dict]

Converts a list of micro-operations (uops) to the specified C-style language.

Parameters:
  • lang (CStyleLanguage) – The target C-style programming language for conversion.

  • function_name (str) – The name of the function being converted.

  • uops (List[UOp]) – A list of micro-operations to convert.

Returns:

A tuple containing the converted C-style code and a dictionary.

Return type:

Tuple[str, Dict]

Attributes:
  • local_size (List[int]): Holds the size of local variables.

  • kernel, prekernel, bufs (List[]): Lists for storing generated code and buffers.

  • depth (int): The indentation level for the generated code. Default is 1.

  • kk: A helper function to append lines of code with proper indentation.

  • c, r (DefaultDict[str, int], Dict): Counters and mappings for temporary variables.

  • ssa: A helper function to generate single static assignment (SSA) form representations.

  • child_count (DefaultDict[UOp, int]): A counter for the number of children each uop has.

class tinygrad.renderer.cstyle.OpenCLLanguage(size_prefix: str = 'int', generic_var_prefix: str = '', kernel_prefix: str = '', buffer_prefix: str = '', buffer_suffix: str = '', smem_align: str = '', smem_prefix: str = '', smem_prefix_for_cast: bool = True, arg_int_prefix: str = '', barrier: str = '', xid: ~typing.List[str] = [], gid: ~typing.List[str] = [], lid: ~typing.List[str] = [], global_max: ~typing.List[int] = [], local_max: ~typing.List[int] = [], extra_args: ~typing.List[str] = [], float4: str | None = None, half_prekernel: str | None = None, uses_vload: bool = False, external_local_bufs: bool = False, uses_ptr_arithmetic: bool = False, launch_bounds: bool = False, code_for_op: ~typing.Dict = {BinaryOps.ADD: <function CStyleLanguage.<lambda>>, BinaryOps.CMPLT: <function CStyleLanguage.<lambda>>, BinaryOps.DIV: <function CStyleLanguage.<lambda>>, BinaryOps.MAX: <function CStyleLanguage.<lambda>>, BinaryOps.MOD: <function CStyleLanguage.<lambda>>, BinaryOps.MUL: <function CStyleLanguage.<lambda>>, BinaryOps.SUB: <function CStyleLanguage.<lambda>>, TernaryOps.MULACC: <function CStyleLanguage.<lambda>>, TernaryOps.WHERE: <function CStyleLanguage.<lambda>>, UnaryOps.EXP2: <function CStyleLanguage.<lambda>>, UnaryOps.LOG2: <function CStyleLanguage.<lambda>>, UnaryOps.NEG: <function CStyleLanguage.<lambda>>, UnaryOps.SIN: <function CStyleLanguage.<lambda>>, UnaryOps.SQRT: <function CStyleLanguage.<lambda>>})[source]

Bases: CStyleLanguage

OpenCLLanguage class. Inherits from CStyleLanguage.

kernel_prefix

Prefix for kernel functions. Default is “__kernel “.

Type:

str

buffer_prefix

Prefix for buffer variables. Default is “__global “.

Type:

str

smem_align

Alignment attribute for shared memory. Default is “__attribute__ ((aligned (16))) “.

Type:

str

smem_prefix

Prefix for shared memory variables. Default is “__local “.

Type:

str

arg_int_prefix

Prefix for integer function arguments. Default is “const int”.

Type:

str

half_prekernel

OpenCL extension pragma for enabling half precision. Default is “#pragma OPENCL EXTENSION cl_khr_fp16 : enable”.

Type:

str

barrier

Barrier code for synchronizing threads in a work group. Default is “barrier(CLK_LOCAL_MEM_FENCE);”.

Type:

str

float4

String conversion for type float4. Default is “(float4)”.

Type:

str

gid

List of get_group_id function calls for dimensions 0, 1, and 2.

Type:

list of str

lid

List of get_local_id function calls for dimensions 0, 1, and 2.

Type:

list of str

xid

List of get_global_id function calls for dimensions 0, 1, and 2.

Type:

list of str

uses_vload

Flag indicating if vload is used. Default is True.

Type:

bool

code_for_op

Dictionary mapping operation names to lambda functions that generate code for the operations. Inherits from CStyleLanguage and adds a new entry for TernaryOps.MULACC.

Type:

dict

arg_int_prefix: str = 'const int'
barrier: str = 'barrier(CLK_LOCAL_MEM_FENCE);'
buffer_prefix: str = '__global '
code_for_op: Dict = {BinaryOps.ADD: <function CStyleLanguage.<lambda>>, BinaryOps.CMPLT: <function CStyleLanguage.<lambda>>, BinaryOps.DIV: <function CStyleLanguage.<lambda>>, BinaryOps.MAX: <function CStyleLanguage.<lambda>>, BinaryOps.MOD: <function CStyleLanguage.<lambda>>, BinaryOps.MUL: <function CStyleLanguage.<lambda>>, BinaryOps.SUB: <function CStyleLanguage.<lambda>>, TernaryOps.MULACC: <function OpenCLLanguage.<lambda>>, TernaryOps.WHERE: <function CStyleLanguage.<lambda>>, UnaryOps.EXP2: <function CStyleLanguage.<lambda>>, UnaryOps.LOG2: <function CStyleLanguage.<lambda>>, UnaryOps.NEG: <function CStyleLanguage.<lambda>>, UnaryOps.SIN: <function CStyleLanguage.<lambda>>, UnaryOps.SQRT: <function CStyleLanguage.<lambda>>}
float4: str | None = '(float4)'
gid: List[str] = ['get_group_id(0)', 'get_group_id(1)', 'get_group_id(2)']
half_prekernel: str | None = '#pragma OPENCL EXTENSION cl_khr_fp16 : enable'
kernel_prefix: str = '__kernel '
lid: List[str] = ['get_local_id(0)', 'get_local_id(1)', 'get_local_id(2)']
smem_align: str = '__attribute__ ((aligned (16))) '
smem_prefix: str = '__local '
uses_vload: bool = True
xid: List[str] = ['get_global_id(0)', 'get_global_id(1)', 'get_global_id(2)']
tinygrad.renderer.cstyle.OpenCLRenderer(function_name: str, uops: List[UOp]) Tuple[str, Dict]

Converts a list of micro-operations (uops) to the specified C-style language.

Parameters:
  • lang (CStyleLanguage) – The target C-style programming language for conversion.

  • function_name (str) – The name of the function being converted.

  • uops (List[UOp]) – A list of micro-operations to convert.

Returns:

A tuple containing the converted C-style code and a dictionary.

Return type:

Tuple[str, Dict]

Attributes:
  • local_size (List[int]): Holds the size of local variables.

  • kernel, prekernel, bufs (List[]): Lists for storing generated code and buffers.

  • depth (int): The indentation level for the generated code. Default is 1.

  • kk: A helper function to append lines of code with proper indentation.

  • c, r (DefaultDict[str, int], Dict): Counters and mappings for temporary variables.

  • ssa: A helper function to generate single static assignment (SSA) form representations.

  • child_count (DefaultDict[UOp, int]): A counter for the number of children each uop has.

class tinygrad.renderer.cstyle.WGSLLanguage(size_prefix: str = 'int', generic_var_prefix: str = '', kernel_prefix: str = '', buffer_prefix: str = '', buffer_suffix: str = '', smem_align: str = '', smem_prefix: str = '', smem_prefix_for_cast: bool = True, arg_int_prefix: str = '', barrier: str = '', xid: ~typing.List[str] = [], gid: ~typing.List[str] = [], lid: ~typing.List[str] = [], global_max: ~typing.List[int] = [], local_max: ~typing.List[int] = [], extra_args: ~typing.List[str] = [], float4: str | None = None, half_prekernel: str | None = None, uses_vload: bool = False, external_local_bufs: bool = False, uses_ptr_arithmetic: bool = False, launch_bounds: bool = False, code_for_op: ~typing.Dict = {BinaryOps.ADD: <function CStyleLanguage.<lambda>>, BinaryOps.CMPLT: <function CStyleLanguage.<lambda>>, BinaryOps.DIV: <function CStyleLanguage.<lambda>>, BinaryOps.MAX: <function CStyleLanguage.<lambda>>, BinaryOps.MOD: <function CStyleLanguage.<lambda>>, BinaryOps.MUL: <function CStyleLanguage.<lambda>>, BinaryOps.SUB: <function CStyleLanguage.<lambda>>, TernaryOps.MULACC: <function CStyleLanguage.<lambda>>, TernaryOps.WHERE: <function CStyleLanguage.<lambda>>, UnaryOps.EXP2: <function CStyleLanguage.<lambda>>, UnaryOps.LOG2: <function CStyleLanguage.<lambda>>, UnaryOps.NEG: <function CStyleLanguage.<lambda>>, UnaryOps.SIN: <function CStyleLanguage.<lambda>>, UnaryOps.SQRT: <function CStyleLanguage.<lambda>>})[source]

Bases: CStyleLanguage

The WGSLLanguage class, a subclass of CStyleLanguage. This class is used to represent the WebGPU Shading Language (WGSL).

gid

List containing gid indices for x, y, and z dimensions.

Type:

list

lid

List containing lid indices for x, y, and z dimensions.

Type:

list

size_prefix

Prefix used to declare a variable. Defaults to “let”.

Type:

str

barrier

Code snippet for workgroup barrier. Defaults to “workgroupBarrier();”.

Type:

str

generic_var_prefix

Prefix used for generic variables. Defaults to “var “.

Type:

str

external_local_bufs

Flag indicating if local buffers are external. Defaults to True.

Type:

bool

code_for_op

Dictionary containing custom op codes for various operations like BinaryOps and TernaryOps.

Type:

dict

type_map

Dictionary mapping dtypes to WGSL types.

Type:

dict

barrier: str = 'workgroupBarrier();'
code_for_op: Dict = {BinaryOps.ADD: <function CStyleLanguage.<lambda>>, BinaryOps.CMPLT: <function WGSLLanguage.<lambda>>, BinaryOps.DIV: <function CStyleLanguage.<lambda>>, BinaryOps.MAX: <function CStyleLanguage.<lambda>>, BinaryOps.MOD: <function CStyleLanguage.<lambda>>, BinaryOps.MUL: <function CStyleLanguage.<lambda>>, BinaryOps.SUB: <function CStyleLanguage.<lambda>>, TernaryOps.MULACC: <function WGSLLanguage.<lambda>>, TernaryOps.WHERE: <function WGSLLanguage.<lambda>>, UnaryOps.EXP2: <function CStyleLanguage.<lambda>>, UnaryOps.LOG2: <function CStyleLanguage.<lambda>>, UnaryOps.NEG: <function CStyleLanguage.<lambda>>, UnaryOps.SIN: <function CStyleLanguage.<lambda>>, UnaryOps.SQRT: <function CStyleLanguage.<lambda>>}
external_local_bufs: bool = True
generic_var_prefix: str = 'var '
gid: List[str] = ['i32(gindex.x)', 'i32(gindex.y)', 'i32(gindex.z)']
lid: List[str] = ['i32(lindex.x)', 'i32(lindex.y)', 'i32(lindex.z)']
render_cast(x: List[str], var_dtype: DType) str[source]

Render a type cast for the given value with the target data type.

x

The value to be casted.

Type:

List[str]

var_dtype

The target data type.

Type:

DType

Returns:

The rendered type cast expression as a string.

Return type:

str

Raises:

NotImplementedError – If no cast is available for the target data type.

render_conditional(cond: str, x: str, y: str) str[source]

Render a conditional expression that selects between two values based on a condition.

cond

The conditional expression to be checked.

Type:

str

x

The value to select if the condition is true.

Type:

str

y

The value to select if the condition is false.

Type:

str

Returns:

The rendered conditional expression as a string.

Return type:

str

render_const(x: float | int, var_dtype) str[source]

Render a constant value.

Parameters:
  • x (Union[float, int]) – The constant value to be rendered.

  • var_dtype – The data type of the variable.

Returns:

The WGSL code snippet for the constant value.

Return type:

str

render_for(expr: str, _min: int | str, _max: int | str) str[source]

Render a for loop with the given expression and range.

expr

The loop variable’s name.

Type:

str

_min

The starting value of the loop variable.

Type:

Union[int, str]

_max

The ending value of the loop variable.

Type:

Union[int, str]

Returns:

The rendered for loop as a string.

Return type:

str

render_if(cond: str)[source]

Render an if statement with the given condition.

cond

The conditional expression to be checked in the if statement.

Type:

str

Returns:

The rendered if statement as a string.

Return type:

str

render_kernel(function_name: str, kernel: List[str], bufs: List[Tuple[str, DType]], local_size: List[int], prekernel: List[str]) str[source]

Render the kernel for execution.

function_name

The name of the function to be rendered.

Type:

str

kernel

The list of kernel code lines.

Type:

List[str]

bufs

A list of tuples containing buffer names and their respective data types.

Type:

List[Tuple[str, DType]]

local_size

The local size for workgroup execution. If not provided, default is [1].

Type:

List[int]

prekernel

Code lines to be executed before the kernel code.

Type:

List[str]

Returns:

The rendered kernel code as a string.

Return type:

str

render_local(name: str, size: int)[source]

Render a local variable declaration.

Parameters:
  • name (str) – The name of the variable to be declared.

  • size (int) – The size of the array.

Returns:

The WGSL code snippet for declaring a local variable.

Return type:

str

render_store(buf_name: str, buf_dtype: DType, var_name: str, var_dtype: DType, idx, local=False) str[source]

Render a store operation that stores a value in a buffer at the given index.

buf_name

The name of the buffer.

Type:

str

buf_dtype

The data type of the buffer.

Type:

DType

var_name

The name of the variable to store.

Type:

str

var_dtype

The data type of the variable.

Type:

DType

idx

The index at which to store the value in the buffer.

local

Whether this is a local store operation. Defaults to False.

Type:

bool, optional

Returns:

The rendered store operation as a string.

Return type:

str

size_prefix: str = 'let'
type_map = {(0, 1, 'bool', <class 'numpy.bool_'>, 1): 'bool', (5, 4, 'int', <class 'numpy.int32'>, 1): 'i32', (6, 4, 'unsigned int', <class 'numpy.uint32'>, 1): 'u32', (9, 2, 'half', <class 'numpy.float16'>, 1): 'f16', (10, 4, 'float', <class 'numpy.float32'>, 1): 'f32'}
tinygrad.renderer.cstyle.WGSLRenderer(function_name: str, uops: List[UOp]) Tuple[str, Dict]

Converts a list of micro-operations (uops) to the specified C-style language.

Parameters:
  • lang (CStyleLanguage) – The target C-style programming language for conversion.

  • function_name (str) – The name of the function being converted.

  • uops (List[UOp]) – A list of micro-operations to convert.

Returns:

A tuple containing the converted C-style code and a dictionary.

Return type:

Tuple[str, Dict]

Attributes:
  • local_size (List[int]): Holds the size of local variables.

  • kernel, prekernel, bufs (List[]): Lists for storing generated code and buffers.

  • depth (int): The indentation level for the generated code. Default is 1.

  • kk: A helper function to append lines of code with proper indentation.

  • c, r (DefaultDict[str, int], Dict): Counters and mappings for temporary variables.

  • ssa: A helper function to generate single static assignment (SSA) form representations.

  • child_count (DefaultDict[UOp, int]): A counter for the number of children each uop has.

tinygrad.renderer.cstyle.uops_to_cstyle(lang: CStyleLanguage, function_name: str, uops: List[UOp]) Tuple[str, Dict][source]

Converts a list of micro-operations (uops) to the specified C-style language.

Parameters:
  • lang (CStyleLanguage) – The target C-style programming language for conversion.

  • function_name (str) – The name of the function being converted.

  • uops (List[UOp]) – A list of micro-operations to convert.

Returns:

A tuple containing the converted C-style code and a dictionary.

Return type:

Tuple[str, Dict]

Attributes:
  • local_size (List[int]): Holds the size of local variables.

  • kernel, prekernel, bufs (List[]): Lists for storing generated code and buffers.

  • depth (int): The indentation level for the generated code. Default is 1.

  • kk: A helper function to append lines of code with proper indentation.

  • c, r (DefaultDict[str, int], Dict): Counters and mappings for temporary variables.

  • ssa: A helper function to generate single static assignment (SSA) form representations.

  • child_count (DefaultDict[UOp, int]): A counter for the number of children each uop has.