Contents

One Figure to Summarize Pytorch Memory Management

Contents
/pytorch-vram-usage/img-1.png
Pytorch Memory Management Illustrated

(The text below is generated by ChatGPT 4 and verified by me. The figure above, however, is created by me.)

In PyTorch GPU memory management, the terms “allocated,” “unallocated,” “free,” “reserved,” etc., refer to different states of GPU memory usage. These terms help users understand how PyTorch is handling GPU memory under the hood, which is especially important for optimizing applications and troubleshooting out-of-memory errors.

Here’s what each term generally means:

  • Allocated Memory: This is the amount of memory currently being used by active tensors in PyTorch. When you create a tensor or a model parameter, PyTorch allocates GPU memory to store its values. This memory is considered allocated until the tensor is deleted or goes out of scope.

  • Cached (or Reserved) Memory: PyTorch uses a caching memory allocator to manage GPU memory more efficiently. When memory is allocated for the first time, the allocator often reserves a larger block of memory than what is requested. This reserved memory is ready for future allocations without the need to ask the GPU driver for more memory, which is a relatively slow operation.

  • Unallocated (but Cached/Reserved) Memory: This refers to the memory that has been reserved by PyTorch’s allocator but is currently not allocated to any tensor. It’s a pool of memory that is immediately available for new tensors without the need to perform an expensive allocation operation with the GPU driver.

  • Free Memory: This term can be a bit misleading in the context of PyTorch, as it often refers to memory that is truly free - not reserved by PyTorch and returned back to the GPU for other applications or processes to use. In many cases, what is considered “free” memory by GPU monitoring tools may not match PyTorch’s view of unallocated memory because PyTorch might be holding onto reserved memory that appears as used from the outside.

  • Max Memory Allocated: This is a metric that tracks the peak, or the maximum amount of GPU memory allocated by tensors at any point in time during the lifetime of a program.

  • Max Memory Reserved: This metric tracks the peak, or the maximum amount of memory reserved by the caching allocator during the lifetime of a program.