首页 > itarticle > share memory

share memory

admin 11月 13, 2020 0

Title: Efficient Shared-Memory Implementation of HPCG and Its application to Unstructured Matrices

HPL:bound by floating-point compute capability
HPCG:bound by the memory bandwidth
因为HPCG内核涉及的都是大型稀疏矩阵，但是这些矩阵都不能很好的fit in cache的size。

SYMGS

Task Scheduling with P2P Synchronization:
j->i smoothing the i variable(row) depends on smoothing the j variable(row)

data dependency:在分解得到的上三角or下三角矩阵的非零元素的(i,j)对应的行都有依赖性，run a and c in parallel even with a transitive dependency a → b → c as long as a and c are not directly connected
Block Multi-Color Reordering
Running Multiple MPI Ranks per Node