preconditioned conjugate gradient

  • data:The data structure with all necessary CG vectors preallocated
  • max_iter:The maximum number of iterations to perform, even if tolerance is not met.
  • normr:The 2-norm of the residual vector after the last iteration.
  • normr0:The 2-norm of the residual vector before the first iteration.
  • times:The 7-element vector of the timing information accumulated during all of the iterations.
  • doPreconditioning:The flag to indicate whether the preconditioner should be invoked at each iteration.

ComputeWAXPBY

compute the update of a vector with the sum of two
scaled vectors where: w = alpha x + beta y

参数形式(n,alpha,x,beta,y,w)

对于alpha或者beta等于1的情况,可以特殊考虑

1
2
3
4
5
6
#ifndef HPCG_NO_OPENMP
#pragma omp parallel for
#endif
for (local_int_t i=0; i<n; i++)
wv[i] = alpha * xv[i] + beta * yv[i];
}

ComputeDotProduct

time_allreduce:the time it took to perform the communication between processes通信所花费时间

参数形式(n,x,y,result,time_allreduce)

若x,y相等,特殊处理
利用openMP中的reduction
#pragma omp parallel for reduction (+:local_result)
并且利用MPI’s reduce function来collect all partial sums
MPI_Allreduce(&local_result, &global_result, 1, MPI_DOUBLE, MPI_SUM,MPI_COMM_WORLD);

迭代过程

首先计算r0=b-Ax0

k=1; k<=max_iter && normr/normr0 > tolerance; k++

preconditioning:
ComputeMG_ref(A, r, z)
Z <- M^-1 * r

no preconditioning:
copy r to z

1
2
3
4
5
6
7
8
9
if(k==1)
p=z
rtz = r * z (s)

else
older rtz = rtz
new rtz = r * z
beta = new rtz / older rtz
p = beta * p + z

SPMV:Ap = A p
Dot: pAp = p
Ap
alpha = rtz/ pAp
WAXPBY: x = x + alpha p
WAXPBY: r = r - alpha
Ap
Dot: normr = sqrt(r * r)

break条件:normr/normr0 <= tolerance || k > max_iter

time record

  • times[1] += t1; // dot product time
  • times[2] += t2; // WAXPBY time
  • times[3] += t3; // SPMV time
  • times[4] += t4; // AllReduce time
  • times[5] += t5; // preconditioner apply time
  • times[6] += t6; // exchange halo time