tinygrad renderer.cstyle
Note
You likely want the upstream tinygrad, not tinygrab. Tinygrab contains AI generated docstrings for a tinygrad snapshot. Upstream: https://tinygrad.org
- class tinygrad.renderer.cstyle.CStyleLanguage(size_prefix: str = 'int', generic_var_prefix: str = '', kernel_prefix: str = '', buffer_prefix: str = '', buffer_suffix: str = '', smem_align: str = '', smem_prefix: str = '', smem_prefix_for_cast: bool = True, arg_int_prefix: str = '', barrier: str = '', xid: ~typing.List[str] = [], gid: ~typing.List[str] = [], lid: ~typing.List[str] = [], global_max: ~typing.List[int] = [], local_max: ~typing.List[int] = [], extra_args: ~typing.List[str] = [], float4: str | None = None, half_prekernel: str | None = None, uses_vload: bool = False, external_local_bufs: bool = False, uses_ptr_arithmetic: bool = False, launch_bounds: bool = False, code_for_op: ~typing.Dict = {BinaryOps.ADD: <function CStyleLanguage.<lambda>>, BinaryOps.CMPLT: <function CStyleLanguage.<lambda>>, BinaryOps.DIV: <function CStyleLanguage.<lambda>>, BinaryOps.MAX: <function CStyleLanguage.<lambda>>, BinaryOps.MOD: <function CStyleLanguage.<lambda>>, BinaryOps.MUL: <function CStyleLanguage.<lambda>>, BinaryOps.SUB: <function CStyleLanguage.<lambda>>, TernaryOps.MULACC: <function CStyleLanguage.<lambda>>, TernaryOps.WHERE: <function CStyleLanguage.<lambda>>, UnaryOps.EXP2: <function CStyleLanguage.<lambda>>, UnaryOps.LOG2: <function CStyleLanguage.<lambda>>, UnaryOps.NEG: <function CStyleLanguage.<lambda>>, UnaryOps.SIN: <function CStyleLanguage.<lambda>>, UnaryOps.SQRT: <function CStyleLanguage.<lambda>>})[source]
Bases:
NamedTuple
Class representing a C-style programming language.
- size_prefix
Prefix for size. Defaults to “int”.
- Type:
str
- generic_var_prefix
Prefix for generic variables. Defaults to empty string.
- Type:
str
- kernel_prefix
Prefix for kernels. Defaults to empty string.
- Type:
str
- buffer_prefix
Prefix for buffers. Defaults to empty string.
- Type:
str
- buffer_suffix
Suffix for buffers. Defaults to empty string.
- Type:
str
- smem_align
Alignment for shared memory. Defaults to empty string.
- Type:
str
- smem_prefix
Prefix for shared memory. Defaults to empty string.
- Type:
str
- smem_prefix_for_cast
Indicates whether prefix should be used in casts. Defaults to True.
- Type:
bool
- arg_int_prefix
Prefix for integer arguments. Defaults to empty string.
- Type:
str
- barrier
Barrier synchronization method. Defaults to empty string.
- Type:
str
- xid
List of x identifiers. Defaults to empty list.
- Type:
List[str]
- gid
List of g identifiers. Defaults to empty list.
- Type:
List[str]
- lid
List of l identifiers. Defaults to empty list.
- Type:
List[str]
- global_max
List of maximum global values. Defaults to empty list.
- Type:
List[int]
- local_max
List of maximum local values. Defaults to empty list.
- Type:
List[int]
- extra_args
List of extra arguments. Defaults to empty list.
- Type:
List[str]
- float4
Float4 value with None as default.
- Type:
Optional[str]
- half_prekernel
Half pre-kernel value with None as default.
- Type:
Optional[str]
- uses_vload
Indicates whether vload is used. Defaults to False.
- Type:
bool
- external_local_bufs
Indicates whether external local buffers are used. Defaults to False.
- Type:
bool
- uses_ptr_arithmetic
Indicates whether pointer arithmetic is used. Defaults to False.
- Type:
bool
- launch_bounds
Indicates whether launch bounds are used. Defaults to False.
- Type:
bool
- code_for_op
Dictionary containing operations for unary, binary, and ternary ops.
- Type:
Dict
- arg_int_prefix: str
Alias for field number 8
- barrier: str
Alias for field number 9
- buffer_prefix: str
Alias for field number 3
- buffer_suffix: str
Alias for field number 4
- code_for_op: Dict
Alias for field number 22
- external_local_bufs: bool
Alias for field number 19
- extra_args: List[str]
Alias for field number 15
- generic_var_prefix: str
Alias for field number 1
- gid: List[str]
Alias for field number 11
- global_max: List[int]
Alias for field number 13
- kernel_prefix: str
Alias for field number 2
- launch_bounds: bool
Alias for field number 21
- lid: List[str]
Alias for field number 12
- local_max: List[int]
Alias for field number 14
- render_cast(x: List[str], var_dtype: DType) str [source]
Returns a string expression of the casted xs with the given type.
- self
The instance of the class.
- Type:
Any
- x
A list of strings to be casted.
- Type:
List[str]
- Returns:
String expression of the casted xs with the given type.
- Return type:
str
- Raises:
AssertionError – If length of x is not equal to var_dtype.sz or if float4 attribute is None.
- render_conditional(cond: str, x: str, y: str) str [source]
Render a conditional expression.
- self
The object itself
- Type:
Any
- cond
The condition of the conditional expression
- Type:
str
- x
The value to return if the condition is true
- Type:
str
- y
The value to return if the condition is false
- Type:
str
- Returns:
The rendered conditional expression
- Return type:
str
- render_const(x: float | int | bool, var_dtype) str [source]
Returns a string expression of the constant with the given type.
- x
The input value
- Type:
Union[float, int, bool]
- var_dtype
The data type of the variable
- Returns:
A string representation of the constant
- Return type:
str
- render_for(expr: str, _min: int | str, _max: int | str) str [source]
Render a for loop.
- self
The object itself
- Type:
Any
- expr
The loop variable expression
- Type:
str
- _min
The start value of the loop variable
- Type:
Union[int, str]
- _max
The end value of the loop variable
- Type:
Union[int, str]
- Returns:
The rendered for loop
- Return type:
str
- render_if(cond: str)[source]
Render an if statement.
- self
The object itself
- Type:
Any
- cond
The condition of the if statement
- Type:
str
- Returns:
The rendered if statement
- Return type:
str
- render_kernel(function_name: str, kernel: List[str], bufs: List[Tuple[str, DType]], local_size: List[int], prekernel: List[str]) str [source]
Render the kernel with given parameters.
This function generates a complete OpenCL kernel program based on the provided arguments. It creates the necessary boilerplate code for the kernel, including sampler creation and buffer type definitions, then concatenates the actual kernel code and post-processing code if needed.
- self
The instance of the OpenCLKernelBuilder class.
- Type:
OpenCLKernelBuilder
- function_name
The name of the kernel function.
- Type:
str
- kernel
The list of strings that form the body of the kernel function.
- Type:
List[str]
- bufs
A list of tuples where each tuple contains a buffer name and its data type.
- Type:
List[Tuple[str, DType]]
- local_size
The local size for the kernel execution.
- Type:
List[int]
- prekernel
Pre-kernel code that will be inserted before the actual kernel code.
- Type:
List[str]
- Returns:
The complete OpenCL kernel program as a string.
- Return type:
str
- render_load(output_dtype, buf_name, buf_dtype, idx, local=False) str [source]
Returns a string expression of the loaded value with the output type.
- output_dtype
The output data type
- buf_name
The buffer name
- Type:
str
- buf_dtype
The data type of the buffer
- idx
The index in the buffer
- Type:
int
- local
Whether the buffer is local or not
- Type:
bool
- Returns:
A string representation of the loaded value
- Return type:
str
- render_local(name: str, size: int)[source]
Render the local memory variable declaration.
- self
The object itself
- Type:
Any
- name
The name of the local memory variable
- Type:
str
- size
The size of the local memory variable array
- Type:
int
- Returns:
The rendered local memory variable declaration
- Return type:
str
- render_store(buf_name: str, buf_dtype: DType, var_name: str, var_dtype: DType, idx: str, local=False) str [source]
Returns a string statement that performs the store operation.
- self
The instance of the class.
- Type:
Any
- buf_name
The buffer name.
- Type:
str
- var_name
The variable name.
- Type:
str
- idx
Index of the operation.
- Type:
str
- local
Whether the operation is local or not. Defaults to False.
- Type:
bool, optional
- Returns:
String statement that performs the store operation based on the given parameters.
- Return type:
str
- size_prefix: str
Alias for field number 0
- smem_align: str
Alias for field number 5
- smem_prefix: str
Alias for field number 6
- smem_prefix_for_cast: bool
Alias for field number 7
- uses_ptr_arithmetic: bool
Alias for field number 20
- uses_vload: bool
Alias for field number 18
- xid: List[str]
Alias for field number 10
- class tinygrad.renderer.cstyle.CUDALanguage(size_prefix: str = 'int', generic_var_prefix: str = '', kernel_prefix: str = '', buffer_prefix: str = '', buffer_suffix: str = '', smem_align: str = '', smem_prefix: str = '', smem_prefix_for_cast: bool = True, arg_int_prefix: str = '', barrier: str = '', xid: ~typing.List[str] = [], gid: ~typing.List[str] = [], lid: ~typing.List[str] = [], global_max: ~typing.List[int] = [], local_max: ~typing.List[int] = [], extra_args: ~typing.List[str] = [], float4: str | None = None, half_prekernel: str | None = None, uses_vload: bool = False, external_local_bufs: bool = False, uses_ptr_arithmetic: bool = False, launch_bounds: bool = False, code_for_op: ~typing.Dict = {BinaryOps.ADD: <function CStyleLanguage.<lambda>>, BinaryOps.CMPLT: <function CStyleLanguage.<lambda>>, BinaryOps.DIV: <function CStyleLanguage.<lambda>>, BinaryOps.MAX: <function CStyleLanguage.<lambda>>, BinaryOps.MOD: <function CStyleLanguage.<lambda>>, BinaryOps.MUL: <function CStyleLanguage.<lambda>>, BinaryOps.SUB: <function CStyleLanguage.<lambda>>, TernaryOps.MULACC: <function CStyleLanguage.<lambda>>, TernaryOps.WHERE: <function CStyleLanguage.<lambda>>, UnaryOps.EXP2: <function CStyleLanguage.<lambda>>, UnaryOps.LOG2: <function CStyleLanguage.<lambda>>, UnaryOps.NEG: <function CStyleLanguage.<lambda>>, UnaryOps.SIN: <function CStyleLanguage.<lambda>>, UnaryOps.SQRT: <function CStyleLanguage.<lambda>>})[source]
Bases:
CStyleLanguage
The CUDALanguage class is a subclass of the CStyleLanguage class, specifically tailored for CUDA GPU programming.
- kernel_prefix
Prefix for defining kernels in CUDA.
- Type:
str
- smem_prefix
Prefix for shared memory variables in CUDA.
- Type:
str
- smem_prefix_for_cast
Flag to indicate whether prefix should be used with casting for shared memory.
- Type:
bool
- arg_int_prefix
Prefix for integer arguments in CUDA kernel functions.
- Type:
str
- barrier
Code snippet for thread synchronization in CUDA.
- Type:
str
- float4
Function name for creating a 4-component floating point number in CUDA.
- Type:
str
- gid
List of strings representing the global index in each dimension (x, y, z).
- Type:
list
- lid
List of strings representing the local index in each dimension (x, y, z).
- Type:
list
- xid
List of strings representing the combined global and local indices for each dimension (x, y, z).
- Type:
list
- code_for_op
Dictionary mapping binary operations to their corresponding CUDA code.
- Type:
dict
- half_prekernel
Pre-kernel code needed when working with half-precision floating point numbers in CUDA.
- Type:
str
- arg_int_prefix: str = 'const int'
- barrier: str = '__syncthreads();'
- code_for_op: Dict = {BinaryOps.ADD: <function CStyleLanguage.<lambda>>, BinaryOps.CMPLT: <function CStyleLanguage.<lambda>>, BinaryOps.DIV: <function CStyleLanguage.<lambda>>, BinaryOps.MAX: <function CUDALanguage.<lambda>>, BinaryOps.MOD: <function CStyleLanguage.<lambda>>, BinaryOps.MUL: <function CStyleLanguage.<lambda>>, BinaryOps.SUB: <function CStyleLanguage.<lambda>>, TernaryOps.MULACC: <function CStyleLanguage.<lambda>>, TernaryOps.WHERE: <function CStyleLanguage.<lambda>>, UnaryOps.EXP2: <function CStyleLanguage.<lambda>>, UnaryOps.LOG2: <function CStyleLanguage.<lambda>>, UnaryOps.NEG: <function CStyleLanguage.<lambda>>, UnaryOps.SIN: <function CStyleLanguage.<lambda>>, UnaryOps.SQRT: <function CStyleLanguage.<lambda>>}
- gid: List[str] = ['blockIdx.x', 'blockIdx.y', 'blockIdx.z']
- half_prekernel: str | None = '\n #include <cuda_fp16.h>\n struct half4 { half x, y, z, w; };\n __device__ half4 make_half4(half x, half y, half z, half w) { half4 ret; ret.x = x; ret.y = y; ret.z = z; ret.w = w; return ret; }\n '
- kernel_prefix: str = '#define INFINITY (__int_as_float(0x7f800000))\n#define NAN (__int_as_float(0x7fffffff))\nextern "C" __global__ '
- lid: List[str] = ['threadIdx.x', 'threadIdx.y', 'threadIdx.z']
- smem_prefix: str = '__shared__ '
- smem_prefix_for_cast: bool = False
- xid: List[str] = ['(blockIdx.x*blockDim.x+threadIdx.x)', '(blockIdx.y*blockDim.y+threadIdx.y)', '(blockIdx.z*blockDim.z+threadIdx.z)']
- tinygrad.renderer.cstyle.CUDARenderer(function_name: str, uops: List[UOp]) Tuple[str, Dict]
Converts a list of micro-operations (uops) to the specified C-style language.
- Parameters:
lang (CStyleLanguage) – The target C-style programming language for conversion.
function_name (str) – The name of the function being converted.
uops (List[UOp]) – A list of micro-operations to convert.
- Returns:
A tuple containing the converted C-style code and a dictionary.
- Return type:
Tuple[str, Dict]
- Attributes:
local_size (List[int]): Holds the size of local variables.
kernel, prekernel, bufs (List[]): Lists for storing generated code and buffers.
depth (int): The indentation level for the generated code. Default is 1.
kk: A helper function to append lines of code with proper indentation.
c, r (DefaultDict[str, int], Dict): Counters and mappings for temporary variables.
ssa: A helper function to generate single static assignment (SSA) form representations.
child_count (DefaultDict[UOp, int]): A counter for the number of children each uop has.
- class tinygrad.renderer.cstyle.HIPLanguage(size_prefix: str = 'int', generic_var_prefix: str = '', kernel_prefix: str = '', buffer_prefix: str = '', buffer_suffix: str = '', smem_align: str = '', smem_prefix: str = '', smem_prefix_for_cast: bool = True, arg_int_prefix: str = '', barrier: str = '', xid: ~typing.List[str] = [], gid: ~typing.List[str] = [], lid: ~typing.List[str] = [], global_max: ~typing.List[int] = [], local_max: ~typing.List[int] = [], extra_args: ~typing.List[str] = [], float4: str | None = None, half_prekernel: str | None = None, uses_vload: bool = False, external_local_bufs: bool = False, uses_ptr_arithmetic: bool = False, launch_bounds: bool = False, code_for_op: ~typing.Dict = {BinaryOps.ADD: <function CStyleLanguage.<lambda>>, BinaryOps.CMPLT: <function CStyleLanguage.<lambda>>, BinaryOps.DIV: <function CStyleLanguage.<lambda>>, BinaryOps.MAX: <function CStyleLanguage.<lambda>>, BinaryOps.MOD: <function CStyleLanguage.<lambda>>, BinaryOps.MUL: <function CStyleLanguage.<lambda>>, BinaryOps.SUB: <function CStyleLanguage.<lambda>>, TernaryOps.MULACC: <function CStyleLanguage.<lambda>>, TernaryOps.WHERE: <function CStyleLanguage.<lambda>>, UnaryOps.EXP2: <function CStyleLanguage.<lambda>>, UnaryOps.LOG2: <function CStyleLanguage.<lambda>>, UnaryOps.NEG: <function CStyleLanguage.<lambda>>, UnaryOps.SIN: <function CStyleLanguage.<lambda>>, UnaryOps.SQRT: <function CStyleLanguage.<lambda>>})[source]
Bases:
CStyleLanguage
HIPLanguage class that inherits from CStyleLanguage.
- kernel_prefix
Kernel prefix containing various function definitions and necessary includes.
- Type:
str
- launch_bounds
A boolean indicating if the language supports launch bounds.
- Type:
bool
- smem_prefix
The shared memory prefix for the language.
- Type:
str
- smem_prefix_for_cast
A boolean indicating whether a cast is required for the shared memory prefix.
- Type:
bool
- barrier
The barrier synchronization primitive for the language.
- Type:
str
- float4
The name of the float4 type for the language.
- Type:
str
- uses_vload
Whether the language uses vload.
- Type:
bool
- uses_ptr_arithmetic
Whether the language uses pointer arithmetic.
- Type:
bool
- arg_int_prefix
The integer prefix for function arguments.
- Type:
str
- gid
A list comprising strings representing the grid index in three dimensions.
- Type:
list
- lid
A list comprising strings representing the local thread index in three dimensions.
- Type:
list
- xid
A list comprising strings representing the extended thread index in three dimensions.
- Type:
list
- code_for_op
A dictionary containing lambda functions for different operations and their
- Type:
dict
- corresponding implementations based on data type.
- arg_int_prefix: str = 'const int'
- barrier: str = '__syncthreads();'
- code_for_op: Dict = {BinaryOps.ADD: <function CStyleLanguage.<lambda>>, BinaryOps.CMPLT: <function CStyleLanguage.<lambda>>, BinaryOps.DIV: <function CStyleLanguage.<lambda>>, BinaryOps.MAX: <function HIPLanguage.<lambda>>, BinaryOps.MOD: <function CStyleLanguage.<lambda>>, BinaryOps.MUL: <function CStyleLanguage.<lambda>>, BinaryOps.SUB: <function CStyleLanguage.<lambda>>, TernaryOps.MULACC: <function CStyleLanguage.<lambda>>, TernaryOps.WHERE: <function HIPLanguage.<lambda>>, UnaryOps.EXP2: <function CStyleLanguage.<lambda>>, UnaryOps.LOG2: <function CStyleLanguage.<lambda>>, UnaryOps.NEG: <function CStyleLanguage.<lambda>>, UnaryOps.SIN: <function CStyleLanguage.<lambda>>, UnaryOps.SQRT: <function CStyleLanguage.<lambda>>}
- gid: List[str] = ['blockIdx.x', 'blockIdx.y', 'blockIdx.z']
- half_prekernel: str | None = '#include <hip/hip_fp16.h>\n\ntypedef union { struct { half x, y, z, w; } __attribute__((aligned(8))); half data[4]; } half4; __device__ half4 make_half4(half x, half y, half z, half w) { return {x, y, z, w}; }\ntypedef union { struct { half x, y, z, w, a, b, c, d; } __attribute__((aligned(16))); half data[8]; } half8; __device__ half8 make_half8(half x, half y, half z, half w, half a, half b, half c, half d) { return {x, y, z, w, a, b, c, d}; }\n typedef _Float16 half16 __attribute__((ext_vector_type(16))); __device__ half16 make_half16(half x, half y, half z, half w, half a, half b, half c, half d, half e, half f, half g, half h, half i, half j, half k, half l) { return {x, y, z, w, a, b, c, d, e, f, g, h, i, j, k, l}; }\n__device__ float vload_half(size_t offset, const half *p) { return (float)*(p + offset); }\n__device__ float2 vload_half2(size_t offset, const half *p) { return make_float2((float)*(p + offset*2), (float)*(p + offset*2 + 1)); }\n__device__ float4 vload_half4(size_t offset, const half *p) { return make_float4((float)*(p + offset*4), (float)*(p + offset*4 + 1), (float)*(p + offset*4 + 2), (float)*(p + offset*4 + 3)); }\n__device__ void vstore_half(float data, size_t offset, half *p) { *(p + offset) = (half)data; }\n__device__ void vstore_half2(float2 data, size_t offset, half *p) { *(p + offset*2) = (half)data.x; *(p + offset*2 + 1) = (half)data.y; }\n__device__ void vstore_half4(float4 data, size_t offset, half *p) { *(p + offset*4) = (half)data.x; *(p + offset*4 + 1) = (half)data.y; *(p + offset*4 + 2) = (half)data.z; *(p + offset*4 + 3) = (half)data.w; }\n__device__ half exp2(half x) { return hexp2(x); }\n__device__ half log2(half x) { return hlog2(x); }\n__device__ half sin(half x) { return hsin(x); }\n__device__ half sqrt(half x) { return hsqrt(x); }\n__device__ half hmax(half a, half b) { return __hgt(a, b) ? a : b; }\n__device__ half operator%(const half &a, const half &b) { return __hsub(a, __hmul(b, __float2half(floorf(__half2float(a) / __half2float(b))))); }\n__device__ bool operator!=(const half &a, const int &b) { return (float)a != b; }\n\n// HACKS for ALU ops on half and result of half2 GEP\n__device__ half operator+(const half &a, const unsigned short &b) { return __hadd(a, (half)(b)); }\n__device__ half operator-(const half &a, const unsigned short &b) { return __hsub(a, (half)(b)); }\n__device__ half operator*(const half &a, const unsigned short &b) { return __hmul(a, (half)(b)); }\n__device__ half operator/(const half &a, const unsigned short &b) { return __hdiv(a, (half)(b)); }\n__device__ bool operator<(const half &a, const unsigned short &b) { return __hlt(a, (half)(b)); }\n// now the other way\n__device__ half operator+(const unsigned short &a, const half &b) { return __hadd((half)(a), b); }\n__device__ half operator-(const unsigned short &a, const half &b) { return __hsub((half)(a), b); }\n__device__ half operator*(const unsigned short &a, const half &b) { return __hmul((half)(a), b); }\n__device__ half operator/(const unsigned short &a, const half &b) { return __hdiv((half)(a), b); }\n__device__ bool operator<(const unsigned short &a, const half &b) { return __hlt((half)(a), b); }\n '
- kernel_prefix: str = '#include <hip/hip_common.h>\n#define INFINITY (__builtin_inff())\n#define NAN (__builtin_nanf(""))\n __device__ float4 max(float4 x, float4 y) { return float4(max(x.x, y.x), max(x.y, y.y), max(x.z, y.z), max(x.w, y.w)); }\n __device__ float4 pow(float x, float4 y) { return float4(pow(x, y.x), pow(x, y.y), pow(x, y.z), pow(x, y.w)); }\n __device__ float4 pow(float4 x, float4 y) { return float4(pow(x.x, y.x), pow(x.y, y.y), pow(x.z, y.z), pow(x.w, y.w)); }\n __device__ float4 log2(float4 x) { return float4(log2(x.x), log2(x.y), log2(x.z), log2(x.w)); }\n __device__ float4 exp2(float4 x) { return float4(exp2(x.x), exp2(x.y), exp2(x.z), exp2(x.w)); }\n __device__ float4 sin(float4 x) { return float4(sin(x.x), sin(x.y), sin(x.z), sin(x.w)); }\n typedef float float8 __attribute__((ext_vector_type(8))); __device__ float8 make_float8(float x, float y, float z, float w, float a, float b, float c, float d) { return {x, y, z, w, a, b, c, d}; }\n extern "C" __global__\n '
- launch_bounds: bool = True
- lid: List[str] = ['threadIdx.x', 'threadIdx.y', 'threadIdx.z']
- smem_prefix: str = '__shared__ '
- smem_prefix_for_cast: bool = False
- uses_ptr_arithmetic: bool = True
- uses_vload: bool = True
- xid: List[str] = ['(blockIdx.x*blockDim.x+threadIdx.x)', '(blockIdx.y*blockDim.y+threadIdx.y)', '(blockIdx.z*blockDim.z+threadIdx.z)']
- tinygrad.renderer.cstyle.HIPRenderer(function_name: str, uops: List[UOp]) Tuple[str, Dict]
Converts a list of micro-operations (uops) to the specified C-style language.
- Parameters:
lang (CStyleLanguage) – The target C-style programming language for conversion.
function_name (str) – The name of the function being converted.
uops (List[UOp]) – A list of micro-operations to convert.
- Returns:
A tuple containing the converted C-style code and a dictionary.
- Return type:
Tuple[str, Dict]
- Attributes:
local_size (List[int]): Holds the size of local variables.
kernel, prekernel, bufs (List[]): Lists for storing generated code and buffers.
depth (int): The indentation level for the generated code. Default is 1.
kk: A helper function to append lines of code with proper indentation.
c, r (DefaultDict[str, int], Dict): Counters and mappings for temporary variables.
ssa: A helper function to generate single static assignment (SSA) form representations.
child_count (DefaultDict[UOp, int]): A counter for the number of children each uop has.
- class tinygrad.renderer.cstyle.MetalLanguage(size_prefix: str = 'int', generic_var_prefix: str = '', kernel_prefix: str = '', buffer_prefix: str = '', buffer_suffix: str = '', smem_align: str = '', smem_prefix: str = '', smem_prefix_for_cast: bool = True, arg_int_prefix: str = '', barrier: str = '', xid: ~typing.List[str] = [], gid: ~typing.List[str] = [], lid: ~typing.List[str] = [], global_max: ~typing.List[int] = [], local_max: ~typing.List[int] = [], extra_args: ~typing.List[str] = [], float4: str | None = None, half_prekernel: str | None = None, uses_vload: bool = False, external_local_bufs: bool = False, uses_ptr_arithmetic: bool = False, launch_bounds: bool = False, code_for_op: ~typing.Dict = {BinaryOps.ADD: <function CStyleLanguage.<lambda>>, BinaryOps.CMPLT: <function CStyleLanguage.<lambda>>, BinaryOps.DIV: <function CStyleLanguage.<lambda>>, BinaryOps.MAX: <function CStyleLanguage.<lambda>>, BinaryOps.MOD: <function CStyleLanguage.<lambda>>, BinaryOps.MUL: <function CStyleLanguage.<lambda>>, BinaryOps.SUB: <function CStyleLanguage.<lambda>>, TernaryOps.MULACC: <function CStyleLanguage.<lambda>>, TernaryOps.WHERE: <function CStyleLanguage.<lambda>>, UnaryOps.EXP2: <function CStyleLanguage.<lambda>>, UnaryOps.LOG2: <function CStyleLanguage.<lambda>>, UnaryOps.NEG: <function CStyleLanguage.<lambda>>, UnaryOps.SIN: <function CStyleLanguage.<lambda>>, UnaryOps.SQRT: <function CStyleLanguage.<lambda>>})[source]
Bases:
CStyleLanguage
MetalLanguage Class: Inherits from CStyleLanguage. Defines Metal language specific attributes and methods.
- Attributes:
kernel_prefix (str): Prefix for kernel functions. Default is “#include <metal_stdlib>
using namespace metal; kernel “.
buffer_prefix (str): Prefix for device buffers. Default is “device “. smem_prefix (str): Prefix for threadgroup shared memory. Default is “threadgroup “. arg_int_prefix (str): Prefix for constant integer arguments. Default is “constant int&”. barrier (str): Thread group barrier for synchronization. Default is “threadgroup_barrier(mem_flags::mem_threadgroup);”. float4 (str): Data type for four component floating point values. Default is “float4”. uses_ptr_arithmetic (bool): Indicates whether the language requires pointer arithmetic. Default is True. gid (list[str]): List of strings representing global thread IDs. Generated from range 3. lid (list[str]): List of strings representing local thread IDs. Generated from range 3. extra_args (list[str]): Additional arguments required by Metal language. Default is [“uint3 gid [[threadgroup_position_in_grid]]”, “uint3 lid [[thread_position_in_threadgroup]]”].
- arg_int_prefix: str = 'constant int&'
- barrier: str = 'threadgroup_barrier(mem_flags::mem_threadgroup);'
- buffer_prefix: str = 'device '
- extra_args: List[str] = ['uint3 gid [[threadgroup_position_in_grid]]', 'uint3 lid [[thread_position_in_threadgroup]]']
- gid: List[str] = ['gid.x', 'gid.y', 'gid.z']
- kernel_prefix: str = '#include <metal_stdlib>\nusing namespace metal;\nkernel '
- lid: List[str] = ['lid.x', 'lid.y', 'lid.z']
- smem_prefix: str = 'threadgroup '
- uses_ptr_arithmetic: bool = True
- tinygrad.renderer.cstyle.MetalRenderer(function_name: str, uops: List[UOp]) Tuple[str, Dict]
Converts a list of micro-operations (uops) to the specified C-style language.
- Parameters:
lang (CStyleLanguage) – The target C-style programming language for conversion.
function_name (str) – The name of the function being converted.
uops (List[UOp]) – A list of micro-operations to convert.
- Returns:
A tuple containing the converted C-style code and a dictionary.
- Return type:
Tuple[str, Dict]
- Attributes:
local_size (List[int]): Holds the size of local variables.
kernel, prekernel, bufs (List[]): Lists for storing generated code and buffers.
depth (int): The indentation level for the generated code. Default is 1.
kk: A helper function to append lines of code with proper indentation.
c, r (DefaultDict[str, int], Dict): Counters and mappings for temporary variables.
ssa: A helper function to generate single static assignment (SSA) form representations.
child_count (DefaultDict[UOp, int]): A counter for the number of children each uop has.
- class tinygrad.renderer.cstyle.OpenCLLanguage(size_prefix: str = 'int', generic_var_prefix: str = '', kernel_prefix: str = '', buffer_prefix: str = '', buffer_suffix: str = '', smem_align: str = '', smem_prefix: str = '', smem_prefix_for_cast: bool = True, arg_int_prefix: str = '', barrier: str = '', xid: ~typing.List[str] = [], gid: ~typing.List[str] = [], lid: ~typing.List[str] = [], global_max: ~typing.List[int] = [], local_max: ~typing.List[int] = [], extra_args: ~typing.List[str] = [], float4: str | None = None, half_prekernel: str | None = None, uses_vload: bool = False, external_local_bufs: bool = False, uses_ptr_arithmetic: bool = False, launch_bounds: bool = False, code_for_op: ~typing.Dict = {BinaryOps.ADD: <function CStyleLanguage.<lambda>>, BinaryOps.CMPLT: <function CStyleLanguage.<lambda>>, BinaryOps.DIV: <function CStyleLanguage.<lambda>>, BinaryOps.MAX: <function CStyleLanguage.<lambda>>, BinaryOps.MOD: <function CStyleLanguage.<lambda>>, BinaryOps.MUL: <function CStyleLanguage.<lambda>>, BinaryOps.SUB: <function CStyleLanguage.<lambda>>, TernaryOps.MULACC: <function CStyleLanguage.<lambda>>, TernaryOps.WHERE: <function CStyleLanguage.<lambda>>, UnaryOps.EXP2: <function CStyleLanguage.<lambda>>, UnaryOps.LOG2: <function CStyleLanguage.<lambda>>, UnaryOps.NEG: <function CStyleLanguage.<lambda>>, UnaryOps.SIN: <function CStyleLanguage.<lambda>>, UnaryOps.SQRT: <function CStyleLanguage.<lambda>>})[source]
Bases:
CStyleLanguage
OpenCLLanguage class. Inherits from CStyleLanguage.
- kernel_prefix
Prefix for kernel functions. Default is “__kernel “.
- Type:
str
- buffer_prefix
Prefix for buffer variables. Default is “__global “.
- Type:
str
- smem_align
Alignment attribute for shared memory. Default is “__attribute__ ((aligned (16))) “.
- Type:
str
- smem_prefix
Prefix for shared memory variables. Default is “__local “.
- Type:
str
- arg_int_prefix
Prefix for integer function arguments. Default is “const int”.
- Type:
str
- half_prekernel
OpenCL extension pragma for enabling half precision. Default is “#pragma OPENCL EXTENSION cl_khr_fp16 : enable”.
- Type:
str
- barrier
Barrier code for synchronizing threads in a work group. Default is “barrier(CLK_LOCAL_MEM_FENCE);”.
- Type:
str
- float4
String conversion for type float4. Default is “(float4)”.
- Type:
str
- gid
List of get_group_id function calls for dimensions 0, 1, and 2.
- Type:
list of str
- lid
List of get_local_id function calls for dimensions 0, 1, and 2.
- Type:
list of str
- xid
List of get_global_id function calls for dimensions 0, 1, and 2.
- Type:
list of str
- uses_vload
Flag indicating if vload is used. Default is True.
- Type:
bool
- code_for_op
Dictionary mapping operation names to lambda functions that generate code for the operations. Inherits from CStyleLanguage and adds a new entry for TernaryOps.MULACC.
- Type:
dict
- arg_int_prefix: str = 'const int'
- barrier: str = 'barrier(CLK_LOCAL_MEM_FENCE);'
- buffer_prefix: str = '__global '
- code_for_op: Dict = {BinaryOps.ADD: <function CStyleLanguage.<lambda>>, BinaryOps.CMPLT: <function CStyleLanguage.<lambda>>, BinaryOps.DIV: <function CStyleLanguage.<lambda>>, BinaryOps.MAX: <function CStyleLanguage.<lambda>>, BinaryOps.MOD: <function CStyleLanguage.<lambda>>, BinaryOps.MUL: <function CStyleLanguage.<lambda>>, BinaryOps.SUB: <function CStyleLanguage.<lambda>>, TernaryOps.MULACC: <function OpenCLLanguage.<lambda>>, TernaryOps.WHERE: <function CStyleLanguage.<lambda>>, UnaryOps.EXP2: <function CStyleLanguage.<lambda>>, UnaryOps.LOG2: <function CStyleLanguage.<lambda>>, UnaryOps.NEG: <function CStyleLanguage.<lambda>>, UnaryOps.SIN: <function CStyleLanguage.<lambda>>, UnaryOps.SQRT: <function CStyleLanguage.<lambda>>}
- gid: List[str] = ['get_group_id(0)', 'get_group_id(1)', 'get_group_id(2)']
- kernel_prefix: str = '__kernel '
- lid: List[str] = ['get_local_id(0)', 'get_local_id(1)', 'get_local_id(2)']
- smem_align: str = '__attribute__ ((aligned (16))) '
- smem_prefix: str = '__local '
- uses_vload: bool = True
- xid: List[str] = ['get_global_id(0)', 'get_global_id(1)', 'get_global_id(2)']
- tinygrad.renderer.cstyle.OpenCLRenderer(function_name: str, uops: List[UOp]) Tuple[str, Dict]
Converts a list of micro-operations (uops) to the specified C-style language.
- Parameters:
lang (CStyleLanguage) – The target C-style programming language for conversion.
function_name (str) – The name of the function being converted.
uops (List[UOp]) – A list of micro-operations to convert.
- Returns:
A tuple containing the converted C-style code and a dictionary.
- Return type:
Tuple[str, Dict]
- Attributes:
local_size (List[int]): Holds the size of local variables.
kernel, prekernel, bufs (List[]): Lists for storing generated code and buffers.
depth (int): The indentation level for the generated code. Default is 1.
kk: A helper function to append lines of code with proper indentation.
c, r (DefaultDict[str, int], Dict): Counters and mappings for temporary variables.
ssa: A helper function to generate single static assignment (SSA) form representations.
child_count (DefaultDict[UOp, int]): A counter for the number of children each uop has.
- class tinygrad.renderer.cstyle.WGSLLanguage(size_prefix: str = 'int', generic_var_prefix: str = '', kernel_prefix: str = '', buffer_prefix: str = '', buffer_suffix: str = '', smem_align: str = '', smem_prefix: str = '', smem_prefix_for_cast: bool = True, arg_int_prefix: str = '', barrier: str = '', xid: ~typing.List[str] = [], gid: ~typing.List[str] = [], lid: ~typing.List[str] = [], global_max: ~typing.List[int] = [], local_max: ~typing.List[int] = [], extra_args: ~typing.List[str] = [], float4: str | None = None, half_prekernel: str | None = None, uses_vload: bool = False, external_local_bufs: bool = False, uses_ptr_arithmetic: bool = False, launch_bounds: bool = False, code_for_op: ~typing.Dict = {BinaryOps.ADD: <function CStyleLanguage.<lambda>>, BinaryOps.CMPLT: <function CStyleLanguage.<lambda>>, BinaryOps.DIV: <function CStyleLanguage.<lambda>>, BinaryOps.MAX: <function CStyleLanguage.<lambda>>, BinaryOps.MOD: <function CStyleLanguage.<lambda>>, BinaryOps.MUL: <function CStyleLanguage.<lambda>>, BinaryOps.SUB: <function CStyleLanguage.<lambda>>, TernaryOps.MULACC: <function CStyleLanguage.<lambda>>, TernaryOps.WHERE: <function CStyleLanguage.<lambda>>, UnaryOps.EXP2: <function CStyleLanguage.<lambda>>, UnaryOps.LOG2: <function CStyleLanguage.<lambda>>, UnaryOps.NEG: <function CStyleLanguage.<lambda>>, UnaryOps.SIN: <function CStyleLanguage.<lambda>>, UnaryOps.SQRT: <function CStyleLanguage.<lambda>>})[source]
Bases:
CStyleLanguage
The WGSLLanguage class, a subclass of CStyleLanguage. This class is used to represent the WebGPU Shading Language (WGSL).
- gid
List containing gid indices for x, y, and z dimensions.
- Type:
list
- lid
List containing lid indices for x, y, and z dimensions.
- Type:
list
- size_prefix
Prefix used to declare a variable. Defaults to “let”.
- Type:
str
- barrier
Code snippet for workgroup barrier. Defaults to “workgroupBarrier();”.
- Type:
str
- generic_var_prefix
Prefix used for generic variables. Defaults to “var “.
- Type:
str
- external_local_bufs
Flag indicating if local buffers are external. Defaults to True.
- Type:
bool
- code_for_op
Dictionary containing custom op codes for various operations like BinaryOps and TernaryOps.
- Type:
dict
- type_map
Dictionary mapping dtypes to WGSL types.
- Type:
dict
- barrier: str = 'workgroupBarrier();'
- code_for_op: Dict = {BinaryOps.ADD: <function CStyleLanguage.<lambda>>, BinaryOps.CMPLT: <function WGSLLanguage.<lambda>>, BinaryOps.DIV: <function CStyleLanguage.<lambda>>, BinaryOps.MAX: <function CStyleLanguage.<lambda>>, BinaryOps.MOD: <function CStyleLanguage.<lambda>>, BinaryOps.MUL: <function CStyleLanguage.<lambda>>, BinaryOps.SUB: <function CStyleLanguage.<lambda>>, TernaryOps.MULACC: <function WGSLLanguage.<lambda>>, TernaryOps.WHERE: <function WGSLLanguage.<lambda>>, UnaryOps.EXP2: <function CStyleLanguage.<lambda>>, UnaryOps.LOG2: <function CStyleLanguage.<lambda>>, UnaryOps.NEG: <function CStyleLanguage.<lambda>>, UnaryOps.SIN: <function CStyleLanguage.<lambda>>, UnaryOps.SQRT: <function CStyleLanguage.<lambda>>}
- external_local_bufs: bool = True
- generic_var_prefix: str = 'var '
- gid: List[str] = ['i32(gindex.x)', 'i32(gindex.y)', 'i32(gindex.z)']
- lid: List[str] = ['i32(lindex.x)', 'i32(lindex.y)', 'i32(lindex.z)']
- render_cast(x: List[str], var_dtype: DType) str [source]
Render a type cast for the given value with the target data type.
- x
The value to be casted.
- Type:
List[str]
- Returns:
The rendered type cast expression as a string.
- Return type:
str
- Raises:
NotImplementedError – If no cast is available for the target data type.
- render_conditional(cond: str, x: str, y: str) str [source]
Render a conditional expression that selects between two values based on a condition.
- cond
The conditional expression to be checked.
- Type:
str
- x
The value to select if the condition is true.
- Type:
str
- y
The value to select if the condition is false.
- Type:
str
- Returns:
The rendered conditional expression as a string.
- Return type:
str
- render_const(x: float | int, var_dtype) str [source]
Render a constant value.
- Parameters:
x (Union[float, int]) – The constant value to be rendered.
var_dtype – The data type of the variable.
- Returns:
The WGSL code snippet for the constant value.
- Return type:
str
- render_for(expr: str, _min: int | str, _max: int | str) str [source]
Render a for loop with the given expression and range.
- expr
The loop variable’s name.
- Type:
str
- _min
The starting value of the loop variable.
- Type:
Union[int, str]
- _max
The ending value of the loop variable.
- Type:
Union[int, str]
- Returns:
The rendered for loop as a string.
- Return type:
str
- render_if(cond: str)[source]
Render an if statement with the given condition.
- cond
The conditional expression to be checked in the if statement.
- Type:
str
- Returns:
The rendered if statement as a string.
- Return type:
str
- render_kernel(function_name: str, kernel: List[str], bufs: List[Tuple[str, DType]], local_size: List[int], prekernel: List[str]) str [source]
Render the kernel for execution.
- function_name
The name of the function to be rendered.
- Type:
str
- kernel
The list of kernel code lines.
- Type:
List[str]
- bufs
A list of tuples containing buffer names and their respective data types.
- Type:
List[Tuple[str, DType]]
- local_size
The local size for workgroup execution. If not provided, default is [1].
- Type:
List[int]
- prekernel
Code lines to be executed before the kernel code.
- Type:
List[str]
- Returns:
The rendered kernel code as a string.
- Return type:
str
- render_local(name: str, size: int)[source]
Render a local variable declaration.
- Parameters:
name (str) – The name of the variable to be declared.
size (int) – The size of the array.
- Returns:
The WGSL code snippet for declaring a local variable.
- Return type:
str
- render_store(buf_name: str, buf_dtype: DType, var_name: str, var_dtype: DType, idx, local=False) str [source]
Render a store operation that stores a value in a buffer at the given index.
- buf_name
The name of the buffer.
- Type:
str
- var_name
The name of the variable to store.
- Type:
str
- idx
The index at which to store the value in the buffer.
- local
Whether this is a local store operation. Defaults to False.
- Type:
bool, optional
- Returns:
The rendered store operation as a string.
- Return type:
str
- size_prefix: str = 'let'
- type_map = {(0, 1, 'bool', <class 'numpy.bool_'>, 1): 'bool', (5, 4, 'int', <class 'numpy.int32'>, 1): 'i32', (6, 4, 'unsigned int', <class 'numpy.uint32'>, 1): 'u32', (9, 2, 'half', <class 'numpy.float16'>, 1): 'f16', (10, 4, 'float', <class 'numpy.float32'>, 1): 'f32'}
- tinygrad.renderer.cstyle.WGSLRenderer(function_name: str, uops: List[UOp]) Tuple[str, Dict]
Converts a list of micro-operations (uops) to the specified C-style language.
- Parameters:
lang (CStyleLanguage) – The target C-style programming language for conversion.
function_name (str) – The name of the function being converted.
uops (List[UOp]) – A list of micro-operations to convert.
- Returns:
A tuple containing the converted C-style code and a dictionary.
- Return type:
Tuple[str, Dict]
- Attributes:
local_size (List[int]): Holds the size of local variables.
kernel, prekernel, bufs (List[]): Lists for storing generated code and buffers.
depth (int): The indentation level for the generated code. Default is 1.
kk: A helper function to append lines of code with proper indentation.
c, r (DefaultDict[str, int], Dict): Counters and mappings for temporary variables.
ssa: A helper function to generate single static assignment (SSA) form representations.
child_count (DefaultDict[UOp, int]): A counter for the number of children each uop has.
- tinygrad.renderer.cstyle.uops_to_cstyle(lang: CStyleLanguage, function_name: str, uops: List[UOp]) Tuple[str, Dict] [source]
Converts a list of micro-operations (uops) to the specified C-style language.
- Parameters:
lang (CStyleLanguage) – The target C-style programming language for conversion.
function_name (str) – The name of the function being converted.
uops (List[UOp]) – A list of micro-operations to convert.
- Returns:
A tuple containing the converted C-style code and a dictionary.
- Return type:
Tuple[str, Dict]
- Attributes:
local_size (List[int]): Holds the size of local variables.
kernel, prekernel, bufs (List[]): Lists for storing generated code and buffers.
depth (int): The indentation level for the generated code. Default is 1.
kk: A helper function to append lines of code with proper indentation.
c, r (DefaultDict[str, int], Dict): Counters and mappings for temporary variables.
ssa: A helper function to generate single static assignment (SSA) form representations.
child_count (DefaultDict[UOp, int]): A counter for the number of children each uop has.