Accelerating CNN Training with Concurrent Execution of GPU and Processing-In-Memory
The baseline system architecture in this paper, consisting of a host and a PIM. The host is a GPU similar in design to NVIDIA’s V100, and the main memory is three-dimensional (3D) stacked DRAM (HBM). The PIM consists of a single-instruction-multiple- data (SIMD) operator and temporary storage, and four banks within a bank group share one PIM computing unit.The CONV layer is performed on the host GPU, whereas the non-CONV layer is executed on the PIM.
