AI Chips: A100 GPU with Nvidia Ampere architecture

Generality provides flexibility at the cost of complexity

GPU execution model (optional)

a CUDA kernel (blockId, blockDim, and threadIdx are implicitly defined and initialized.)
Original source code from CUDA documentation
An SM: modified from source
Ampere architecture
Source (Fermi architect)
Source (Fermi architect)

Branch divergence (optional)

Source
Source

GPU Memory Hierarchy (Optional)

Source: Nvidia
Source

Nvidia Ampere architecture with A100 GPU

  • 8 GPC and 16 SM/GPC and 128 SMs per full GPU.
  • 6 HBM2 stacks and 12 512-bit Memory Controllers.
GA100 Full GPU with 128 SMs (A100 Tensor Core GPU has 108 SMs) GA100: 54.2 billion transistors with a die size of 826 mm²
  • 4 processing block/SM, 1 Warp scheduler/processing block.
  • 64 FP32 CUDA Cores/SM and 8192 FP32 CUDA Cores per full GPU.
  • 64 INT32 CUDA Cores/SM, 32 FP64 CUDA Cores/SM.
  • 192 KB of combined shared memory and L1 data cache
  • 1 Tensor Cores/SM and 512 Tensor Cores per full GPU.
Source
NVIDIA A100 Tensor Core GPU Performance

Third-Generation NVIDIA Tensor Core

Source
Source
TensorFloat-32 (Right: Matrix Multiply-Accumulate (MMA) Operations)
TFLOPS

Matrix sparsity

Source
Source

MIG (Multi-Instance GPU) Architecture

Source
Source

Memory Architectures

Source
Xilinx
Source
Source
Source
Source

NVLink

NVIDIA DGX 100 System
Source
Source

PCIe Gen 4 support

Other A100 features

  • a 5-core hardware JPEG decode engine called NVJPG: It avoids decompression at the CPU level that may overload the PCIe.
  • a GA100 Optical Flow Accelerator for optical flow and stereo disparity: Optical flow measures the apparent motion of points between two images, and stereo disparity measures the depth of objects from a system of two cameras.

Credit & References

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store