2024.10 CNN 학습을 위한 가속 방법에 관한 논문이 IEEE Access 에 게재되었습니다.

Accelerating CNN Training with Concurrent Execution of GPU and Processing-In-Memory

The baseline system architecture in this paper, consisting of a host and a PIM. The host is a GPU similar in design to NVIDIA’s V100, and the main memory is three-dimensional (3D) stacked DRAM (HBM). The PIM consists of a single-instruction-multiple- data (SIMD) operator and temporary storage, and four banks within a bank group share one PIM computing unit.The CONV layer is performed on the host GPU, whereas the non-CONV layer is executed on the PIM.