
Title: Efficient Shared-Memory Implementation of HPCG and Its application to Unstructured Matrices
HPL:bound by floating-point compute capability
HPCG:bound by the memory bandwidth
因为HPCG内核涉及的都是大型稀疏矩阵,但是这些矩阵都不能很好的fit in cache的size。
SYMGS
-
Task Scheduling with P2P Synchronization:
j->i smoothing the i variable(row) depends on smoothing the j variable(row)data dependency:在分解得到的上三角or下三角矩阵的非零元素的(i,j)对应的行都有依赖性,run a and c in parallel even with a transitive dependency a → b → c as long as a and c are not directly connected
-
Block Multi-Color Reordering
-
Running Multiple MPI Ranks per Node




近期评论