tinygrad runtime.ops_cuda

Note

You likely want the upstream tinygrad, not tinygrab. Tinygrab contains AI generated docstrings for a tinygrad snapshot. Upstream: https://tinygrad.org

class tinygrad.runtime.ops_cuda.CUDAAllocator(device: CUDADevice)[source]

Bases: LRUAllocator

CUDA Allocator Class.

This class is a subclass of LRUAllocator and provides functionality for allocating and deallocating memory on the CUDA device. It also handles copying data from host to device and vice versa.

device

The CUDA device object representing the GPU on which memory is allocated and manipulated.

Type:: CUDADevice

copyin(dest, src: memoryview)[source]

Copies data from host (CPU) memory to device (GPU) memory.

Parameters:

dest (ctypes.c_void_p) – A pointer to the destination memory on the GPU.
src (memoryview) – A memoryview object representing the source memory on the CPU.

copyout(dest: memoryview, src)[source]

Copies data from device (GPU) memory to host (CPU) memory.

Parameters:

dest (memoryview) – A memoryview object representing the destination memory on the CPU.
src (ctypes.c_void_p) – A pointer to the source memory on the GPU.

class tinygrad.runtime.ops_cuda.CUDADevice(device: str)[source]

Bases: Compiled

This class represents a CUDA device for computation. It initializes the device, context, and allocator.

default_arch_name

The default architecture name for the device. Defaults to “sm_35”.

Type:: str

__init__(self, device: str): Initializes the CUDADevice object with a specific device.

synchronize(self)[source]: Synchronizes the computation on the device and waits for it to finish.

default_arch_name = 'sm_35'

synchronize()[source]

Synchronizes the computation on the device and waits for it to finish. This ensures that all queued operations have completed before continuing.

Parameters:: None –
Returns:: None

class tinygrad.runtime.ops_cuda.CUDAProgram(device: CUDADevice, name: str, lib: bytes)[source]

Bases: object

This class represents a CUDA program. It handles the loading, execution and deletion of CUDA programs.

device

The CUDA device to which this program is associated.

Type:: CUDADevice

name

The name of the CUDA program.

Type:: str

lib

The compiled program data in bytes.

Type:: bytes

prg

The CUDA function object or the raw bytecode, depending on CUDACPU.

Type:: cuda.CUfunction or bytes

tinygrad.runtime.ops_cuda.check(status)[source]

A function to check the status of an operation and raise a RuntimeError if it is not 0.

This function checks the ‘status’ argument and raises a RuntimeError with a descriptive error message if the status is non-zero. The error message includes the CUDA Error code and its corresponding string representation, which are retrieved by calling an external ctypes function.

tinygrad.runtime.ops_cuda.status

The status code to check. If it is not 0, a RuntimeError will be raised.

Type:: int

tinygrad.runtime.ops_cuda.cu_time_execution(cb, enable=False) → float | None[source]

This function measures the execution time of a provided callback function.

tinygrad.runtime.ops_cuda.cb

The callback function to measure the execution time for.

Type:: function

tinygrad.runtime.ops_cuda.enable

An optional boolean flag that enables or disables timing. Default is False.

Type:: bool

Returns:

If CUDACPU is not set, returns a float representing the elapsed time of the callback function’s execution.: Otherwise, it returns None.

Return type:

Optional[float]