tinygrad runtime.ops_cuda

Note

You likely want the upstream tinygrad, not tinygrab. Tinygrab contains AI generated docstrings for a tinygrad snapshot. Upstream: https://tinygrad.org

class tinygrad.runtime.ops_cuda.CUDAAllocator(device: CUDADevice)[source]

Bases: LRUAllocator

CUDA Allocator Class.

This class is a subclass of LRUAllocator and provides functionality for allocating and deallocating memory on the CUDA device. It also handles copying data from host to device and vice versa.

device

The CUDA device object representing the GPU on which memory is allocated and manipulated.

Type:

CUDADevice

copyin(dest, src: memoryview)[source]

Copies data from host (CPU) memory to device (GPU) memory.

Parameters:
  • dest (ctypes.c_void_p) – A pointer to the destination memory on the GPU.

  • src (memoryview) – A memoryview object representing the source memory on the CPU.

copyout(dest: memoryview, src)[source]

Copies data from device (GPU) memory to host (CPU) memory.

Parameters:
  • dest (memoryview) – A memoryview object representing the destination memory on the CPU.

  • src (ctypes.c_void_p) – A pointer to the source memory on the GPU.

class tinygrad.runtime.ops_cuda.CUDADevice(device: str)[source]

Bases: Compiled

This class represents a CUDA device for computation. It initializes the device, context, and allocator.

default_arch_name

The default architecture name for the device. Defaults to “sm_35”.

Type:

str

__init__(self, device

str): Initializes the CUDADevice object with a specific device.

synchronize(self)[source]

Synchronizes the computation on the device and waits for it to finish.

default_arch_name = 'sm_35'
synchronize()[source]

Synchronizes the computation on the device and waits for it to finish. This ensures that all queued operations have completed before continuing.

Parameters:

None

Returns:

None

class tinygrad.runtime.ops_cuda.CUDAProgram(device: CUDADevice, name: str, lib: bytes)[source]

Bases: object

This class represents a CUDA program. It handles the loading, execution and deletion of CUDA programs.

device

The CUDA device to which this program is associated.

Type:

CUDADevice

name

The name of the CUDA program.

Type:

str

lib

The compiled program data in bytes.

Type:

bytes

prg

The CUDA function object or the raw bytecode, depending on CUDACPU.

Type:

cuda.CUfunction or bytes

tinygrad.runtime.ops_cuda.check(status)[source]

A function to check the status of an operation and raise a RuntimeError if it is not 0.

This function checks the ‘status’ argument and raises a RuntimeError with a descriptive error message if the status is non-zero. The error message includes the CUDA Error code and its corresponding string representation, which are retrieved by calling an external ctypes function.

tinygrad.runtime.ops_cuda.status

The status code to check. If it is not 0, a RuntimeError will be raised.

Type:

int

tinygrad.runtime.ops_cuda.cu_time_execution(cb, enable=False) float | None[source]

This function measures the execution time of a provided callback function.

tinygrad.runtime.ops_cuda.cb

The callback function to measure the execution time for.

Type:

function

tinygrad.runtime.ops_cuda.enable

An optional boolean flag that enables or disables timing. Default is False.

Type:

bool

Returns:

If CUDACPU is not set, returns a float representing the elapsed time of the callback function’s execution.

Otherwise, it returns None.

Return type:

Optional[float]